Re: I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Michael Armbrust
The analog to PairRDD is a GroupedDataset (created by calling groupBy), which offers similar functionality, but doesn't require you to construct new object that are in the form of key/value pairs. It doesn't matter if they are complex objects, as long as you can create an encoder for them

Re: I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Steve Lewis
Thanks - this helps a lot except for the issue of looking at schools in neighboring regions On Wed, Jan 20, 2016 at 10:43 AM, Michael Armbrust wrote: > The analog to PairRDD is a GroupedDataset (created by calling groupBy), > which offers similar functionality, but

Re: I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Michael Armbrust
Yeah, that tough. Perhaps you could do something like a flatMap and emit multiple virtual copies of each student for each region that is neighboring their actual region. On Wed, Jan 20, 2016 at 10:50 AM, Steve Lewis wrote: > Thanks - this helps a lot except for the issue

I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Steve Lewis
We have been working a large search problem which we have been solving in the following ways. We have two sets of objects, say children and schools. The object is to find the closest school to each child. There is a distance measure but it is relatively expensive and would be very costly to apply