This was precisely the issue I ran into toward the end of my GSoC project; there's a commit from one or two months ago to the Hadoop mapreduce package that has a 0.20.2-compatible CompositeInputFormat, but it's not in the official release - in my case, I wrote a small workaround that should last until the next release, but it's not suitable for the DistributedRowMatrix...so I'm working on something else :) Suggestions are certainly welcome in the meantime!

On 9/5/2010 1:23 PM, Jeff Eastman wrote:
CompositeInputFormat needs to be ported or an alternative developed using the 20.2 API. Perhaps a good sub-project?

On 9/4/10 3:41 PM, Jake Mannix wrote:
+1 for attempting this, but beware: DistributedRowMatrix uses map-side
joins, and I'm not sure those are supported in the 0.20+ API. In fact, I have specifically ran into problems because of this when I tried it in the
past.

Now, some methods can just well, get slower by doing two-pass approaches
(reduce-side join plus a second pass) to one-pass solveable problems, but a second pass over the data is a pretty bitter pill to swallow. Finding a way
to do a map-side join in 0.20 would be nicer, if possible.

   -jake

On Sat, Sep 4, 2010 at 8:02 AM, Jeff Eastman<[email protected]>wrote:

+1 A user mandate, a motivated developer, perfect. You have my support
Shannon, let me know if you run into problems.


On 9/3/10 12:17 PM, Shannon Quinn wrote:

Apologies for missing this; I was actually very interested in doing the DRM porting to 20.2, considering how much my GSoC project relies on it.

Unless someone has already volunteered...in which case I'd love to help :)

Shannon

Apologies for the brevity, this was sent from my iPhone

On Sep 3, 2010, at 15:11, Sebastian Schelter<[email protected]>   wrote:

I'd like to see it ported, so RowSimilarityJob can become a method of
DistributedRowMatrix.

Am 03.09.2010 20:48, schrieb Jeff Eastman:

Is anybody working on this? Has anybody else looked at it? It seems
to have a few unported dependencies like some of the classifiers.



Reply via email to