Re: YARN Shuffle service and its compatibility

Marcin Tustin Mon, 18 Apr 2016 13:55:26 -0700

I'm good with option B at least until it blocks something utterly wonderful
(like shuffles are 10x faster).


On Mon, Apr 18, 2016 at 4:51 PM, Mark Grover <m...@apache.org> wrote:

> Hi all,
> If you don't use Spark on YARN, you probably don't need to read further.
>
> Here's the *user scenario*:
> There are going to be folks who may be interested in running two versions
> of Spark (say Spark 1.6.x and Spark 2.x) on the same YARN cluster.
>
> And, here's the *problem*:
> That's all fine, should work well. However, there's one problem that
> relates to the YARN shuffle service
> <https://github.com/apache/spark/blob/master/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java>.
> This service is run by the YARN Node Managers on all nodes of the cluster
> that have YARN NMs as an auxillary service
> <https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html>
> .
>
> The key question here is -
> Option A:  Should the user be running 2 shuffle services - one for Spark
> 1.6.x and one for Spark 2.x?
> OR
> Option B: Should the user be running only 1 shuffle service that services
> both the Spark 1.6.x and Spark 2.x installs? This will likely have to be
> the Spark 1.6.x shuffle service (while ensuring it's forward compatible
> with Spark 2.x).
>
> *Discussion of above options:*
> A few things to note about the shuffle service:
> 1. Looking at the commit history, there aren't a whole of lot of changes
> that go into the shuffle service, rarely ones that are incompatible.
> There's only one incompatible change
> <https://issues.apache.org/jira/browse/SPARK-12130> that's been made to
> the shuffle service, as far as I can tell, and that too, seems fairly
> cosmetic.
> 2. Shuffle services for 1.6.x and 2.x serve very similar purpose (to
> provide shuffle blocks) and can easily be just one service that does it,
> even on a YARN cluster that runs both Spark 1.x and Spark 2.x.
> 3. The shuffle service is not version-spaced. This means that, the way the
> code is currently, if we were to drop the jars for Spark1 and Spark2's
> shuffle service in YARN NM's classpath, YARN NM won't be able to start both
> services. It would arbitrarily pick one service to start (based on what
> appears on the classpath first). Also, the service name is hardcoded
> <https://github.com/apache/spark/blob/master/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java#L96>
> in Spark code and that name is also not version-spaced.
>
> Option A is arguably cleaner but it's more operational overhead and some
> code relocation/shading/version-spacing/name-spacing to make it work (due
> to #3 above), potentially to not a whole lot of value (given #2 above).
>
> Option B is simpler, lean and more operationally efficient. However, that
> requires that we as a community, keep Spark 1's shuffle service forward
> compatible with Spark 2 i.e. don't break compatibility between Spark1's and
> Spark2's shuffle service. We could even add a test (mima?) to assert that
> during the life time of Spark2. If we do go down that way, we should revert
> SPARK-12130 <https://issues.apache.org/jira/browse/SPARK-12130> - the
> only backwards incompatible change made to Spark2 shuffle service so far.
>
> My personal vote goes towards Option B and I think reverting SPARK-12130
> is ok. What do others think?
>
> Thanks!
> Mark
>
>

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
 led 
by Fidelity

Re: YARN Shuffle service and its compatibility

Reply via email to