The analog to PairRDD is a GroupedDataset (created by calling groupBy),
which offers similar functionality, but doesn't require you to construct
new object that are in the form of key/value pairs. It doesn't matter if
they are complex objects, as long as you can create an encoder for them
Thanks - this helps a lot except for the issue of looking at schools in
neighboring regions
On Wed, Jan 20, 2016 at 10:43 AM, Michael Armbrust
wrote:
> The analog to PairRDD is a GroupedDataset (created by calling groupBy),
> which offers similar functionality, but
Yeah, that tough. Perhaps you could do something like a flatMap and emit
multiple virtual copies of each student for each region that is neighboring
their actual region.
On Wed, Jan 20, 2016 at 10:50 AM, Steve Lewis wrote:
> Thanks - this helps a lot except for the issue
We have been working a large search problem which we have been solving in
the following ways.
We have two sets of objects, say children and schools. The object is to
find the closest school to each child. There is a distance measure but it
is relatively expensive and would be very costly to apply