note that with AWS, you can use Placement Groups <http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html> and EC2 instances with Enhanced Networking <http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html> to lower network latency and increase network throughput within the same AZ (data center).
On Tue, Dec 22, 2015 at 12:11 AM, Eran Witkon <eranwit...@gmail.com> wrote: > I'll check it out. > > On Tue, 22 Dec 2015 at 00:30 Michal Klos <michal.klo...@gmail.com> wrote: > >> If you are running on Amazon, then it's always a crapshoot as well. >> >> M >> >> On Dec 21, 2015, at 4:41 PM, Josh Rosen <joshro...@databricks.com> wrote: >> >> @Eran, are Server 1 and Server 2 both part of the same cluster / do they >> have similar positions in the network topology w.r.t the Spark executors? >> If Server 1 had fast network access to the executors but Server 2 was >> across a WAN then I'd expect the job to run slower from Server 2 duet to >> the extra network latency / reduced bandwidth. This is assuming that you're >> running the driver in non-cluster deploy mode (so the driver process runs >> on the machine which submitted the job). >> >> On Mon, Dec 21, 2015 at 1:30 PM, Igor Berman <igor.ber...@gmail.com> >> wrote: >> >>> look for differences: packages versions, cpu/network/memory diff etc etc >>> >>> >>> On 21 December 2015 at 14:53, Eran Witkon <eranwit...@gmail.com> wrote: >>> >>>> Hi, >>>> I know it is a wide question but can you think of reasons why a pyspark >>>> job which runs on from server 1 using user 1 will run faster then the same >>>> job when running on server 2 with user 1 >>>> Eran >>>> >>> >>> >> -- *Chris Fregly* Principal Data Solutions Engineer IBM Spark Technology Center, San Francisco, CA http://spark.tc | http://advancedspark.com