Re: Is executor computing time affected by network latency?
> > The best network results are achieved when Spark nodes share the same > hosts as Hadoop or they happen to be on the same subnet. > That's only true for those portions of a Spark execution pipeline that are actually reading from HDFS. If you're re-using an RDD for which the needed shuffle files are already available on Executor nodes or are looking at stages of a Spark SQL query execution later than those reading from HDFS, then data locality and network utilization concerns don't really have anything to do with co-location of Executors and HDFS data nodes. On Fri, Sep 23, 2016 at 1:31 PM, Mich Talebzadeh wrote: > Does this assume that Spark is running on the same hosts as HDFS? Hence > does increasing the latency affects the network latency on Hadoop nodes as > well in your tests? > > The best network results are achieved when Spark nodes share the same > hosts as Hadoop or they happen to be on the same subnet. > > > HTH > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 22 September 2016 at 14:54, gusiri wrote: > >> Hi, >> >> When I increase the network latency among spark nodes, >> >> I see compute time (=executor computing time in Spark Web UI) also >> increases. >> >> In the graph attached, left = latency 1ms vs right = latency 500ms. >> >> Is there any communication between worker and driver/master even 'during' >> executor computing? or any idea on this result? >> >> >> <http://apache-spark-user-list.1001560.n3.nabble.com/file/ >> n27779/Screen_Shot_2016-09-21_at_5.png> >> >> >> >> >> >> Thank you very much in advance. >> >> //gusiri >> >> >> >> >> -- >> View this message in context: http://apache-spark-user-list. >> 1001560.n3.nabble.com/Is-executor-computing-time-affected- >> by-network-latency-tp27779.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >
Re: Is executor computing time affected by network latency?
Does this assume that Spark is running on the same hosts as HDFS? Hence does increasing the latency affects the network latency on Hadoop nodes as well in your tests? The best network results are achieved when Spark nodes share the same hosts as Hadoop or they happen to be on the same subnet. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 22 September 2016 at 14:54, gusiri wrote: > Hi, > > When I increase the network latency among spark nodes, > > I see compute time (=executor computing time in Spark Web UI) also > increases. > > In the graph attached, left = latency 1ms vs right = latency 500ms. > > Is there any communication between worker and driver/master even 'during' > executor computing? or any idea on this result? > > > <http://apache-spark-user-list.1001560.n3.nabble.com/ > file/n27779/Screen_Shot_2016-09-21_at_5.png> > > > > > > Thank you very much in advance. > > //gusiri > > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Is-executor-computing-time- > affected-by-network-latency-tp27779.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: Is executor computing time affected by network latency?
See the reference on shuffles <http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/programming-guide.html#shuffle-operations>, "Spark’s mechanism for re-distributing data so that it’s grouped differently across partitions. This typically involves copying data across executors and machines, making the shuffle a complex and costly operation." On Thu, Sep 22, 2016 at 4:14 PM, Soumitra Johri < soumitra.siddha...@gmail.com> wrote: > If your job involves a shuffle then the compute for the entire batch will > increase with network latency. What would be interesting is to see how much > time each task/job/stage takes. > > On Thu, Sep 22, 2016 at 5:11 PM Peter Figliozzi > wrote: > >> It seems to me they must communicate for joins, sorts, grouping, and so >> forth, where the original data partitioning needs to change. You could >> repeat your experiment for different code snippets. I'll bet it depends on >> what you do. >> >> On Thu, Sep 22, 2016 at 8:54 AM, gusiri wrote: >> >>> Hi, >>> >>> When I increase the network latency among spark nodes, >>> >>> I see compute time (=executor computing time in Spark Web UI) also >>> increases. >>> >>> In the graph attached, left = latency 1ms vs right = latency 500ms. >>> >>> Is there any communication between worker and driver/master even 'during' >>> executor computing? or any idea on this result? >>> >>> >>> <http://apache-spark-user-list.1001560.n3.nabble.com/ >>> file/n27779/Screen_Shot_2016-09-21_at_5.png> >>> >>> >>> >>> >>> >>> Thank you very much in advance. >>> >>> //gusiri >>> >>> >>> >>> >>> -- >>> View this message in context: http://apache-spark-user-list. >>> 1001560.n3.nabble.com/Is-executor-computing-time- >>> affected-by-network-latency-tp27779.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> - >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>> >>
Re: Is executor computing time affected by network latency?
If your job involves a shuffle then the compute for the entire batch will increase with network latency. What would be interesting is to see how much time each task/job/stage takes. On Thu, Sep 22, 2016 at 5:11 PM Peter Figliozzi wrote: > It seems to me they must communicate for joins, sorts, grouping, and so > forth, where the original data partitioning needs to change. You could > repeat your experiment for different code snippets. I'll bet it depends on > what you do. > > On Thu, Sep 22, 2016 at 8:54 AM, gusiri wrote: > >> Hi, >> >> When I increase the network latency among spark nodes, >> >> I see compute time (=executor computing time in Spark Web UI) also >> increases. >> >> In the graph attached, left = latency 1ms vs right = latency 500ms. >> >> Is there any communication between worker and driver/master even 'during' >> executor computing? or any idea on this result? >> >> >> < >> http://apache-spark-user-list.1001560.n3.nabble.com/file/n27779/Screen_Shot_2016-09-21_at_5.png >> > >> >> >> >> >> >> Thank you very much in advance. >> >> //gusiri >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Is-executor-computing-time-affected-by-network-latency-tp27779.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >
Re: Is executor computing time affected by network latency?
It seems to me they must communicate for joins, sorts, grouping, and so forth, where the original data partitioning needs to change. You could repeat your experiment for different code snippets. I'll bet it depends on what you do. On Thu, Sep 22, 2016 at 8:54 AM, gusiri wrote: > Hi, > > When I increase the network latency among spark nodes, > > I see compute time (=executor computing time in Spark Web UI) also > increases. > > In the graph attached, left = latency 1ms vs right = latency 500ms. > > Is there any communication between worker and driver/master even 'during' > executor computing? or any idea on this result? > > > <http://apache-spark-user-list.1001560.n3.nabble.com/ > file/n27779/Screen_Shot_2016-09-21_at_5.png> > > > > > > Thank you very much in advance. > > //gusiri > > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Is-executor-computing-time- > affected-by-network-latency-tp27779.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Is executor computing time affected by network latency?
Hi, When I increase the network latency among spark nodes, I see compute time (=executor computing time in Spark Web UI) also increases. In the graph attached, left = latency 1ms vs right = latency 500ms. Is there any communication between worker and driver/master even 'during' executor computing? or any idea on this result? <http://apache-spark-user-list.1001560.n3.nabble.com/file/n27779/Screen_Shot_2016-09-21_at_5.png> Thank you very much in advance. //gusiri -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-executor-computing-time-affected-by-network-latency-tp27779.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org