Hi,

I have a question regarding when data points are sorted when applying a simple Map Reduce Job.

I have the following code:

data = readFromSource()

data.map(....).groupBy(0).reduce(...)

This code will be translated into the following execution plan:

map -> combiner -> hash partitioning and sorting on 0 -> reduce.


If I am right then the combiner firstly sorts the data, then it applies the combine function, and then it partitions the result.

Now the partitions are consumed by the reducers. For each mapper/combiner machine, the reducer has an input gateway. For example, the mappers and combiners run on 10 machines, then each reducer has 10 input gateways. Now, the reducer consumes the data via a MutableObjectIterator. This iterator firstly consumes data from one input gateway, then from the other and so on. Is the data of a single input gateway already sorted? Because the combiner function has sorted the data already. Is the order of the data points maintained after they are sent through the network?

In my code, the MutableObjectIterator instances are subclasses of NormalizedKeySorter. Does this mean that the data from an input gateway is firstly sorted before it is handover to the reduce function? Is this because the order of the data points is not mainted after sending through the network?


It would be nice if someone can answer my question. If my assumptions are wrong, please correct me :)


BR,

Hilmi




--
==================================================================
Hilmi Yildirim, M.Sc.
Researcher

DFKI GmbH
Intelligente Analytik für Massendaten
DFKI Projektbüro Berlin
Alt-Moabit 91c
D-10559 Berlin
Phone: +49 30 23895 1814

E-Mail: hilmi.yildi...@dfki.de

-------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------

Reply via email to