[ https://issues.apache.org/jira/browse/IMPALA-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827317#comment-16827317 ]
Ruslan Dautkhanov commented on IMPALA-6088: ------------------------------------------- Would be also interesting to look if something like Torrent protocol helps here too. This Torrent protocol can take into account rack location, if available. Spark uses this to avoid -query coordinator- Spark Driver to be a network bottleneck https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala {quote} * The driver divides the serialized object into small chunks and * stores those chunks in the BlockManager of the driver. * On each executor, the executor first attempts to fetch the object from its BlockManager. If * it does not exist, it then uses remote fetches to fetch the small chunks from the driver and/or * other executors if available. Once it gets the chunks, it puts the chunks in its own * BlockManager, ready for other executors to fetch from. * This prevents the driver from being the bottleneck in sending out multiple copies of the * broadcast data (one per executor). {quote} > Rack aware broadcast operator > ----------------------------- > > Key: IMPALA-6088 > URL: https://issues.apache.org/jira/browse/IMPALA-6088 > Project: IMPALA > Issue Type: Sub-task > Components: Distributed Exec > Reporter: Mostafa Mokhtar > Priority: Major > > When conducting large scale experiments on a 6 rack cluster with aggregator > core network topology overall cluster bandwidth utilization was limited. > With aggregator core networks nodes and racks are not equidistant, which > means a broadcast operation can be inefficient as the broadcasting node needs > to send the same data N times to each node on a remote rack. > Ideally Rowbatches should be sent once per remote rack then a node on each > remote rack would broadcast within its rack. > Table below represent rack to rack latency for the 90% of operations, ration > between best and worst case is 7.3x > | || va|| vc|| vd1|| vd3|| ve| > |va| 4,238| 4,290| 9,692| 8,897| 8,208| > |vc| 9,290| 4,396| 30,952| 13,529| 14,578| > |vd1| 9,131| 29,066| 4,346| 17,265| 16,849| > |vd3| 7,409| 15,517| 17,265| 4,370| 4,687| > |ve| 4,914| 16,894| 16,430| 4,713| 4,472| -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org