Hey Grant-

I believe scaling Mean-Shift clustering using M/R will be pretty
straightforward. I'm not as sure about K-Means using KD-Trees, since I
haven't personally implemented that algorithm, but since it follows K-Means
fairly closely I imagine it is possible.

I'll get to work on a proposal with some of my ideas, and hopefully get some
feedback from you guys during the process.

Thanks for all the responses so far.

Matt

On Thu, Mar 6, 2008 at 3:25 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:

> I haven't read the papers, but the big question is do you think they
> can scale using M/R or some other distributed techniques?
>
> If so, feel free to write up a bit of a proposal using the info at:
> http://wiki.apache.org/general/SummerOfCode2008
>   If you are unsure, that is fine too.  We could start with a simpler
> implementation, and then look to distribute it.
>
>
> On Mar 6, 2008, at 2:45 PM, Matthew Riley wrote:
>
> > Hey Jeff-
> >
> > I'm certainly willing to put some energy into developing
> > implementations of
> > these algorithms, and it's good to hear that you may be interested in
> > guiding us in the right direction.
> >
> > Here are the references I learned the algorithms from- some are more
> > detailed than others:
> >
> > Mean-Shift clustering was introduced here and this paper is a thorough
> > reference:
> > Mean-Shift: A Robust Approach to Feature Space Analysis
> > http://courses.csail.mit.edu/6.869/handouts/PAMIMeanshift.pdf
> >
> > And here's a PDF with just guts of the algorithm outlined:
> > homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/TUZEL1/MeanShift.pdf
> >
> > It looks like there isn't a definitive reference for the k-means
> > approximation with randomized k-d trees, but there are promising
> > results
> > introduced here:
> >
> > Object retrieval with large vocabularies and fast spatial matching:
> > http://www.robots.ox.ac.uk/~vgg/publications/papers/philbin07.pdf*<http://www.robots.ox.ac.uk/%7Evgg/publications/papers/philbin07.pdf*>
> > *
> > And a deeper explanation of the technique here:
> >
> > Randomized KD-Trees for Real-Time Keypoint Detection:
> > ieeexplore.ieee.org/iel5/9901/31473/01467521.pdf?arnumber=1467521
> >
> > Let me know what you think.
> >
> > Matt
> >
> > On Thu, Mar 6, 2008 at 11:45 AM, Jeff Eastman <[EMAIL PROTECTED]>
> > wrote:
> >
> >> Hi Matthew,
> >>
> >> As with most open source projects, "interest" is mainly a function of
> >> the willingness of somebody to contribute their energy. Clustering is
> >> certainly within the scope of the project. I'd be interested in
> >> exploring additional clustering algorithms with you and your
> >> colleague.
> >> I'm a complete noob in this area and it is always enlightening to
> >> work
> >> with students who have more current theoretical exposures.
> >>
> >> Do you have some links on these approaches that you find particularly
> >> helpful?
> >>
> >> Jeff
> >>
> >> -----Original Message-----
> >> From: Matthew Riley [mailto:[EMAIL PROTECTED]
> >> Sent: Wednesday, March 05, 2008 11:11 PM
> >> To: mahout-dev@lucene.apache.org; [EMAIL PROTECTED]
> >> Subject: Re: Google Summer of Code
> >>
> >> Hey everyone-
> >>
> >> I've been watching the mailing list for a little while now, hoping to
> >> contribute once I became more familiar, but I wanted to jump in
> >> here now
> >> and
> >> express my interest in the Summer of Code project. I'm currently a
> >> graduate
> >> student in electrical engineering at UT-Austin working in computer
> >> vision,
> >> which is closely tied to many of the problems Mahout is addressing
> >> (especially in my area of content-based retrieval).
> >>
> >> What can I do to help out?
> >>
> >> I've discussed some potential Mahout projects with another student
> >> recently-
> >> mostly focused around approximate k-means algorithms (since that's a
> >> problem
> >> I've been working on lately). It sounds like you guys are already
> >> implementing canopy clustering for k-means- Is there any interest in
> >> developing another approximation algorithm based on randomized kd-
> >> trees
> >> for
> >> high dimensional data? What about mean-shift clustering?
> >>
> >> Again, I would be glad to help in any way I can.
> >>
> >> Matt
> >>
> >> On Thu, Mar 6, 2008 at 12:56 AM, Isabel Drost <[EMAIL PROTECTED]
> >> drost.de>
> >> wrote:
> >>
> >>> On Saturday 01 March 2008, Grant Ingersoll wrote:
> >>>> Also, any thoughts on what we might want someone to do?  I think it
> >>>> would be great to have someone implement one of the algorithms on
> >> our
> >>>> wiki.
> >>>
> >>> Just as a general note, the deadline for applications:
> >>>
> >>> March 12: Mentoring organization application deadline (12 noon
> >> PDT/19:00
> >>> UTC).
> >>>
> >>> I suppose we should identify interesing tasks until that deadline.
> >>> As
> >> a
> >>> general guideline for mentors and for project proposals:
> >>>
> >>> http://code.google.com/p/google-summer-of-code/wiki/AdviceforMentors
> >>>
> >>> Isabel
> >>>
> >>> --
> >>> Better late than never.         -- Titus Livius (Livy)
> >>>  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
> >>> /,`.-'`'    -.  ;-;;,_
> >>> |,4-  ) )-,_..;\ (  `'-'
> >>> '---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[EMAIL PROTECTED]>
> >>>
> >>
>
> --------------------------
> Grant Ingersoll
> http://www.lucenebootcamp.com
> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>

Reply via email to