+1 for attempting this, but beware: DistributedRowMatrix uses map-side
joins, and I'm not sure those are supported in the 0.20+ API.  In fact, I
have specifically ran into problems because of this when I tried it in the
past.

Now, some methods can just well, get slower by doing two-pass approaches
(reduce-side join plus a second pass) to one-pass solveable problems, but a
second pass over the data is a pretty bitter pill to swallow.  Finding a way
to do a map-side join in 0.20 would be nicer, if possible.

  -jake

On Sat, Sep 4, 2010 at 8:02 AM, Jeff Eastman <[email protected]>wrote:

>  +1 A user mandate, a motivated developer, perfect. You have my support
> Shannon, let me know if you run into problems.
>
>
> On 9/3/10 12:17 PM, Shannon Quinn wrote:
>
>> Apologies for missing this; I was actually very interested in doing the
>> DRM porting to 20.2, considering how much my GSoC project relies on it.
>>
>> Unless someone has already volunteered...in which case I'd love to help :)
>>
>> Shannon
>>
>> Apologies for the brevity, this was sent from my iPhone
>>
>> On Sep 3, 2010, at 15:11, Sebastian Schelter<[email protected]>  wrote:
>>
>>  I'd like to see it ported, so RowSimilarityJob can become a method of
>>> DistributedRowMatrix.
>>>
>>> Am 03.09.2010 20:48, schrieb Jeff Eastman:
>>>
>>>> Is anybody working on this? Has anybody else looked at it? It seems
>>>> to have a few unported dependencies like some of the classifiers.
>>>>
>>>
>

Reply via email to