Re: Question of fetching mapper output

orpl Thu, 20 Jul 2023 22:54:30 -0700

Hi Keyong,

Unlike Spark/Flink clients, we had to directly modify the MR3 runtime codeto support Celeborn and thus don't add new code to Celeborn. We release theMR3 runtime code in Github, which could be used just as an example ofexploiting Celeborn.

The API is clean and the code is also clearly structured and easy tofollow, but still we find a few questions ws we test the MR3-Celebornextension, especially for dealing with exceptions (e.g., Premature EOFfrom inputStream). I hope this mailing list is the right place to ask suchquestions.


Best,

--- Sungwoo

On Mon, 17 Jul 2023, Keyong Zhou wrote:

Hi Sungwoo,

It's really great to hear that! To be honest, we never expected such things
will happen.

Just curious, is it possible that you contribute the integration with MR3
to Celeborn community?
It will be a great feature for Celeborn, also the community can work
together to better support MR3 (and Hive).

Thanks,
Keyong Zhou



<o...@pl.postech.ac.kr> 于2023年7月16日周日 22:33?道：

We have extended the implementation of MR3 so that all partition
inputs can be fetched with a single call, e.g.:

   rssShuffleClient.readPartition(..., 0, 100)

Now, Hive-MR3 with Celeborn runs as fast as Hive-MR3 with its own shuffle
handlers when tested with 10TB TPC-DS benchmark. For some queries, it is
even noticeably faster.

Thanks,

--- Sungwoo

On Thu, 13 Jul 2023, o...@pl.postech.ac.kr wrote:

Hi Team,

I have a question on how a reducer should fetch the output of mappers.
As an example, consider this standard scenario:

1. There are 100 mapper and 50 reducers.
2. Each mapper creates 50 partitions, each of which is to be fetched by

the

corresponding reducer.
3. Each reducer is responsible for a single partition and tries to fetch

partitions (one from each mapper).

In our current implementation, a reducer calls
rssShuffleClient.readPartition() 100 times (one for each mapper):

 rssShuffleClient.readPartition(..., mapIndex, mapIndex + 1)

My question is: if reducers start after the completion of all mappers,

can we

call (or should we try to call) rssShuffleClient.readPartition() only

once,

as in?

 rssShuffleClient.readPartition(..., 0, 100)

My understanding of remote shuffle service (like Magnet for Spark) is

that

all the partitions destined to the same reducer are automatically merged

by

the shuffle service, so we thought that just a single call might be

enough.


Thanks,

--- Sungwoo Park

Re: Question of fetching mapper output

Reply via email to