Re: Google Summer of Code: Bring out your projects

2010-03-12 Thread Grant Ingersoll

On Mar 12, 2010, at 1:22 AM, Robin Anil wrote:

 Shall I go and put some of the ideas up. I will do it as a whole for the
 project. Later we can re-assign things maybe ? How does that sound? Unlike
 other projects we cant really go an put a proposal like Implement
 back-propagation and expect a student to take it up and reduce things to
 map/reduce.
 
 Some of the ideas (i am going to be really ambitious/vague here, but write
 clear expectations or guidelines on what is an ideal proposal)
 
 1) Implement a cool classifier over map/reduce
 2) Implement a cool clustering algorithm on map/reduce
 3) Implement a meta-learner to plugin to various classifiers in mahout and
 have bagging, boosting support.
 4) Continuous performance benchmarking/dashboard maybe wrappers over EC2
 5) Create a matrix implementations of MYSQL and NOSQL(hbase, cassandra)
 access for all the algorithms to use.
 6) Implement some of the ideas from Netflix top 5 to boost recommendations
 packge
 7) Visualization tool for clustering, classification or recommendation.
 ability to explain(optional)
 8) Improve mahout-math package

9. Implement M/R Tika integration to take rich documents on HDFS and output 
Vectors.   Likely not a full Summer of Work there, but could be part of some 
larger Utils capabilities focused on making it easier to consume Mahout.  
Also included: Finish ARFF compatibility.  
10. Benchmark.  Break the record?

I think we should still solicit ideas on list here that we can put up on JIRA.

 
 
 Who is free to mentor this year?  i.e giving 5-6 hours weekly to a student
 and hear then crib(sorry ian and isabel :P) and give words of encouragement.
 And yes, code reviews.

I'm in.

Re: Google Summer of Code Proposal Submission

2009-03-27 Thread Philip Ramsey
Grant,

Thank you very much for the feedback! I'll make those changes and
elaborations to my proposal very soon.
Our thinking with the bi-grams is that, if we can maintain a relatively low
error-rate in computing sets of similar words at a grassroots level, then we
can have a powerful base case for inferring grammars on n-gram strings. But
we haven't started thinking in great detail about how we'll graduate to a
top-down parse. A lot of our current work is building off of research done
by Lillian Lee over at Cornell. Their research, in a restricted test space
of transitive verb-object noun pairs, looked at the benefits and
limitations of a number of different similarity/distance measures. Based on
their results, we've generalized the test space to an entire raw text
corpus, and have been tweaking their measures to get better scores and be
optimal over a cluster.

Again, thanks for the feedback,
Philip

On Thu, Mar 26, 2009 at 1:11 AM, Grant Ingersoll gsing...@apache.orgwrote:

 Hi Philip,

 Thanks for the proposal.  Sounds interesting.  For the proposal that you
 submit, you should make sure to add references, details on how you plan to
 implement, etc.  Of course, no need to do that in great depth on the wiki.

 Also, have you looked at going beyond just bi-grams?  Not sure if it makes
 sense or not, but was just curious.   Also, you should have a look at the
 Watchmaker stuff that is in Mahout already and maybe be able to address how
 what you are proposing relates.

 -Grant


 On Mar 25, 2009, at 7:37 PM, Philip Ramsey wrote:

  Hello Folks,

 I'm a student at The Evergreen State College and yesterday I submitted a
 proposal for the GSoC project to the wiki. I'm sending a link to my
 submission, with hopes that some of you might have feedback or questions
 or
 advice:


 http://wiki.apache.org/general/SoC2009/PhilipRamsey-Mahout-AlgorithmsProposal

 Thanks a lot,
 Philip Ramsey
 goal.oriented.des...@gmail.com





Re: Google Summer of Code Proposal Submission

2009-03-26 Thread Grant Ingersoll

Hi Philip,

Thanks for the proposal.  Sounds interesting.  For the proposal that  
you submit, you should make sure to add references, details on how you  
plan to implement, etc.  Of course, no need to do that in great depth  
on the wiki.


Also, have you looked at going beyond just bi-grams?  Not sure if it  
makes sense or not, but was just curious.   Also, you should have a  
look at the Watchmaker stuff that is in Mahout already and maybe be  
able to address how what you are proposing relates.


-Grant

On Mar 25, 2009, at 7:37 PM, Philip Ramsey wrote:


Hello Folks,

I'm a student at The Evergreen State College and yesterday I  
submitted a

proposal for the GSoC project to the wiki. I'm sending a link to my
submission, with hopes that some of you might have feedback or  
questions or

advice:

http://wiki.apache.org/general/SoC2009/PhilipRamsey-Mahout-AlgorithmsProposal

Thanks a lot,
Philip Ramsey
goal.oriented.des...@gmail.com




Re: Google Summer of Code

2008-04-22 Thread Isabel Drost
On Tuesday 22 April 2008, deneche abdelhakim wrote:
 So we are four students, that's cool. I wish us good work and great fun in
 this summer.

I am really happy, we received a few slots more than expected.  Welcome to the 
Mahout project to both of you and congratulations to the successful GSoC 
application. I wish you a lot of fun, working on your proposed topics and 
hope that all students can finish their work successfully. I think not only 
your individual mentors will help you, but as usual in Apache land the whole 
community will be happy to work with you on the mailing list.

There were quite a few applications that unfortunately were not accepted. I 
would like to invite those who did not get selected to stick around and 
contribute. As Mahout is still young, it is especially easy to make a 
difference. So if you are interested in machine learning, we would be happy 
to welcome you here, even if Google does not sponsor your summer.

Isabel

-- 
Imbalance of power corrupts and monopoly of power corrupts absolutely.  
-- 
Genji
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part.


Re: Google Summer of Code

2008-04-22 Thread Grant Ingersoll
Also, have a look at: http://www.apache.org/dev/ for more info.  It  
would be helpful if all people (esp. GSOCers) who plan on contributing  
code file a CLA (http://www.apache.org/licenses/#clas) although it is  
not explicitly required, just makes things a bit nicer for us on the  
legal side.


-Grant

On Apr 22, 2008, at 7:58 AM, Grant Ingersoll wrote:


Welcome aboard!

We had a lot of very nice proposals, including a couple that were,  
unfortunately just below the cutoff.  We (the ASF) had originally  
hoped to get more slots from Google, but they had an even bigger  
response from other projects as well.  As it were, Mahout alone had  
something like 15 applicants, most of which were high quality and  
well-thought out.  For those who didn't get selected, please do feel  
welcome here with the rest of us volunteers  :-).


To those accepted, do try to keep in mind that we should keep  
project discussions on the list.  I think it is fine to ask mentors  
questions in private related to administrative stuff, but if you  
have questions about how to code something, etc. those are best  
handled on this list, as it creates a history and allows others to  
understand design decisions, etc.


Cheers,
Grant


On Apr 21, 2008, at 10:28 PM, Robin Anil wrote:


Hi Everyone,
This is one of those days where I wake up and see  
that I
have got accepted to GSoc with Mahout (:32-all-out:) . I am really  
excited
to kick start the work. I know I have a lot to understand in terms  
of coding
practices, the whole workflow/process. And i would like to  
congratulate and
say hi to my fellow Gsoc'ers Farid, Yun and Abdel,  Hi to my mentor  
Ian

Holsman and to rest of the community.

I am usually online of google talk: if you use it do add me:
[EMAIL PROTECTED]

Cheers and Good Day
Robin








RE : Google Summer of Code

2008-04-21 Thread deneche abdelhakim
Hi Robin, 

I am very happy that I've been accepted, thanks to the Mahout Community that 
kindly commented on my draft.

So we are four students, that's cool. I wish us good work and great fun in this 
summer.

Hakim


Robin Anil [EMAIL PROTECTED] a écrit : Hi Everyone,
  This is one of those days where I wake up and see that I
have got accepted to GSoc with Mahout (:32-all-out:) . I am really excited
to kick start the work. I know I have a lot to understand in terms of coding
practices, the whole workflow/process. And i would like to congratulate and
say hi to my fellow Gsoc'ers Farid, Yun and Abdel,  Hi to my mentor Ian
Holsman and to rest of the community.

I am usually online of google talk: if you use it do add me:
[EMAIL PROTECTED]

Cheers and Good Day
Robin


 __
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible 
contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail 

Re: Google Summer of Code

2008-03-25 Thread Isabel Drost
On Tuesday 25 March 2008, Josh Harguess wrote:
 I have completed an application for Google Summer of Code for the
 implementation of the PCA algorithm in Mahout.  My research is directly
 related to the use of PCA, so I am very familiar with that algorithm.

Great!


 However, since I work in the area of pattern recognition and machine
 learning, I am also familiar with the other algorithms listed on your
 site. Since there was not a ranking of desired algorithms, I chose PCA, 
 but if there is a more immediate need for a different algorithm, I can
 most likely help with that instead / as well.

I think it is fine to choose the algorithm you are most familiar with. 
Currently we are happy to have someone who takes care of any of the 
algorithms. As you have experience with any of the algorithms, you could also 
contribute by taking part in the discussions on the mailing lists.

Isabel


-- 
Only God can make random selections.
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part.


Re: Google Summer of Code

2008-03-25 Thread Isabel Drost
On Tuesday 25 March 2008, Marko Novakovic wrote:
 Other components will be clasifier, crawler and
 indexer.

So it will be the typical setup: Crawl web pages, classify them as positive or 
negative and in the end index them correctly? I would be especially 
interested in how the classifier will be build - as far as you can share any 
such knowledge on a public mailing list before September '08.


 I have idea about architecture in which all 
 components will be run at each machine.

I think the system architecture was pretty clear from the slides you sent. I 
would be nice if you could briefly sketch them on list as the slides have not 
survived being sent to a mailing list :)


 My idea for clustering would be making relevance by
 properties, like repetition keywods on page, relevant
 tags, keyword in subject etc. For each property will
 be allocated one axis and from n-dimensional space
 clustering machine will group pages by proper
 algrithm, in my case k-Means.

If I understood the task correctly the goal is to build a system that is 
capable of separating posts that express some opinion from objective ones and 
afterwards to group positive vs. negative postings, right?

I do not yet see, how the clustering algorithm k-means helps you achieve this 
task.


 If you want I will be able to describe detailed
 relevance for clustering with proper examples
 tomorrow.

Sounds good.

Isabel


-- 
Life sucks, but death doesn't put out at all  -- Thomas J. 
Kopp
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part.


Re: Google Summer of Code

2008-03-25 Thread Isabel Drost
On Tuesday 25 March 2008, Marko Novakovic wrote:
 I attached beta version of presentation.
 I must consult with mentor form my college to examine
 exact which the role of clusterin is in this system.

Hmm, one of the slides talks about using the clustering algorithm to identify 
new topics. I guess I still do not get the full picture.

Did you happen to have a chance to look at the k-Means code in the repository 
yet?

Isabel

-- 
I don't mind arguing with myself.  It's when I lose that it bothers me. 
-- 
Richard Powers
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part.


Re: Google summer of code mahout-machine-learning

2008-03-24 Thread Isabel Drost
On Wednesday 19 March 2008, Frédéric wrote:
 Hello,

 I am a french student, currently studying distributed systems in Finland.

Sounds interesting. What are you working on?


 To be honest I don't know all the algorithms listed in the paper.

I think it is sufficient to either know at least one of them enough to work on 
a scalable, at best parallel version of it. Another option that I consider 
interesting is to look for some real world problem one would like to solve 
with machine learning and to work on the solution.


 Unfortunately, I have some exams to take this week and I'm sorry for
 not having enough time to give you more details. But I will give you
 more informations about my ideas and my skills related to this project
 as soon as possible.

Looking forward to reading more about your ideas.

Isabel

-- 
The two things that can get you into trouble quicker than anything else are 
fast women and slow horses.
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part.


Re: Google Summer of Code

2008-03-24 Thread Marko Novakovic
The cluster will be one component at search engine.
Other components will be clasifier, crawler and
indexer. I have idea about architecture in which all
components will be run at each machine.
Weba pages will be sent to cpu-s by hash function,
which will be variable depending on inserting new or
disposing or damaging working cpu-s.
Between the crawler and the other part of system will
be queue, from which will be scheduled pages by hash.

My idea for clustering would be making relevance by
properties, like repetition keywods on page, relevant
tags, keyword in subject etc. For each property will
be allocated one axis and from n-dimensional space
clustering machine will group pages by proper
algrithm, in my case k-Means.
If you want I will be able to describe detailed
relevance for clustering with proper examples
tomorrow.

Greetings

--- Isabel Drost [EMAIL PROTECTED]
wrote:

 On Monday 24 March 2008, Marko Novakovic wrote:
  and I am interesting to implement this clustering
  algorithm at Handop platform.
 
 So you would like to get a distributed clustering
 algorithm for grouping 
 search results? It would be nice to hear more about
 your approach to this 
 problem. 
 
 There are a few guys here who have been working on
 clustering search results 
 already. I guess they might be able to provide some
 help as well.
 
 We already have a k-Means implementation, but so far
 it has not been 
 integrated into a search result clustering context.
 
 Isabel
 
 -- 
 Science is what happens when preconception meets
 verification.
   |\  _,,,---,,_   Web:  
 http://www.isabel-drost.de
   /,`.-'`'-.  ;-;;,_
  |,4-  ) )-,_..;\ (  `'-'
 '---''(_/--'  `-'\_) (fL)  IM: 
 xmpp://[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


RE: Google Summer of Code[esp. More Clustering]

2008-03-11 Thread Jeff Eastman
Hi Matthew,

I've implemented a minimal, non-MR version of the algorithm below to see how
it would behave. The operant code is in TestMeanShift.testMeanShift() and
MeanShiftCanopy.mergeCanopy(). The rest of the MR classes are stuff I copied
from Canopy so you can ignore them.

The TestMeanShift.setUp() method builds a 100x3 point matrix that represents
a 10x10 image with the diagonal intensified (i.e. a '\' character mask).
Then testMeanShift()creates an initial set of 100 canopies from it and
iterates over the canopies merging their centroids into a new set of
canopies until the canopy list size does not change any more. Finally it
prints out the canopies that were found for each cell in the original image.

Every time two canopies come within T2 distance of each other they merge,
reducing the number of canopies. The original points that were bound to each
canopy are also merged so that, at the end of the iteration, the original
points are available in their respective canopies.

Depending upon the values chosen for T1 and T2, the process either converges
quickly or slowly. The loop terminates before actual convergence is
achieved, but it does seem to cluster the input coherently.

I hesitate to call this MeanShift but it is something similar that follows
the same general algorithm, as I understand it at least. I hope you find it
interesting.

Jeff


 -Original Message-
 From: Jeff Eastman [mailto:[EMAIL PROTECTED]
 Sent: Monday, March 10, 2008 9:09 PM
 To: mahout-dev@lucene.apache.org
 Subject: RE: Google Summer of Code[esp. More Clustering]
 
 Hi Matthew,
 
 I'd like to pursue that canopy thought a little further and mix it in with
 your sub sampling idea. Optimizing can come later, once we figure out how
 to
 do mean-shift in M/R at all. How about this?
 
 1. Each mean-shift iteration consists of a canopy clustering of all the
 points, with T1 set to the desired sampling resolution (h?) and T2 set to
 1.
 This will create one canopy centered on each point in the input set which
 contains all of its neighbors that are close enough to influence its next
 position in its trajectory (the window?).
 
 2. We then calculate the centroid of each canopy (that's actually done
 already by the canopy cluster reducer). Is this centroid not also the
 weighted average you desire for the next location of the point at its
 center?
 
 3. As the computation proceeds, the canopies will collapse together as
 their
 various centroids move inside the T2=1 radius. At the point when all
 points
 have converged, the remaining canopies will be the mean-shift clusters
 (modes?) of the dataset and their contents will be the migrated points in
 each cluster.
 
 4. If each original point is duplicated as its own payload, then the
 iterations will produce clusters of migrated points whose payloads are the
 final contents of each cluster.
 
 Can you wrap your mind around this enough to validate my assumptions?
 
 Jeff
 
  -Original Message-
  From: Matthew Riley [mailto:[EMAIL PROTECTED]
  Sent: Monday, March 10, 2008 5:58 PM
  To: mahout-dev@lucene.apache.org
  Subject: Re: Google Summer of Code[esp. More Clustering]
 
  Hi Jeff-
 
  I think your basin of attraction understanding is right on. I also
 like
  your ideas for distributing the mean-shift iterations by following a
  canopy-style method. My intuition was a little different, and I would
 like
  to hear your ideas on it:
 
  Just to make sure we're on the same page
  Say we have 1 million point in our original dataset, and we want to
  cluster
  by mean-shift. At each iteration of mean-shift we subsample (say) 10,000
  points from the original dataset and follow the gradient of those points
  to
  the region of highest density (and as we saw from the paper, rather than
  calculate the gradient itself we can equivalently compute the weighted
  average of our subsampled points and move the centroid to that point).
  This
  part seems fairly straightforward to distribute - we just send a
 different
  subsampled set to each processor and each processor returns the final
  centroid for that set.
 
  The problem I see is that 10,000 points (or whatever value we choose),
 may
  be too much for a single processor if we have to compute the distance to
  every single point when we compute the weighted mean. My thought here
 was
  to
  exploit the fact that we're using a kernel function (gaussian, uniform,
  etc.) in the weighted mean calculation and that kernel will have a set
  radius. Because the radius is static, it may be easy to (quickly)
 identify
  the points that we must consider in the calculation (i.e. those within
 the
  radius) by using a locality sensitive hashing scheme, tuned to that
  particular radius. Of course, the degree of advantage we get from this
  method will depend on the data itself, but intuitively I think we will
  usually see a dramatic improvement.
 
  Honestly, I should do more background work developing this idea, and
  possibly try

Re: Google Summer of Code

2008-03-10 Thread Anush Shetty
On Mon, Mar 10, 2008 at 4:50 PM, Grant Ingersoll [EMAIL PROTECTED]
wrote:

 Wow, maybe w/ all of our mentors we could get 2 students...


neat ++ :)



-- 
((Anush Shetty)) ((mail AT anushshetty DOT com))


RE: Google Summer of Code[esp. More Clustering]

2008-03-10 Thread Jeff Eastman
Hi Matthew,

I'd like to pursue that canopy thought a little further and mix it in with
your sub sampling idea. Optimizing can come later, once we figure out how to
do mean-shift in M/R at all. How about this?

1. Each mean-shift iteration consists of a canopy clustering of all the
points, with T1 set to the desired sampling resolution (h?) and T2 set to 1.
This will create one canopy centered on each point in the input set which
contains all of its neighbors that are close enough to influence its next
position in its trajectory (the window?).

2. We then calculate the centroid of each canopy (that's actually done
already by the canopy cluster reducer). Is this centroid not also the
weighted average you desire for the next location of the point at its
center?

3. As the computation proceeds, the canopies will collapse together as their
various centroids move inside the T2=1 radius. At the point when all points
have converged, the remaining canopies will be the mean-shift clusters
(modes?) of the dataset and their contents will be the migrated points in
each cluster.

4. If each original point is duplicated as its own payload, then the
iterations will produce clusters of migrated points whose payloads are the
final contents of each cluster.

Can you wrap your mind around this enough to validate my assumptions? 

Jeff

 -Original Message-
 From: Matthew Riley [mailto:[EMAIL PROTECTED]
 Sent: Monday, March 10, 2008 5:58 PM
 To: mahout-dev@lucene.apache.org
 Subject: Re: Google Summer of Code[esp. More Clustering]
 
 Hi Jeff-
 
 I think your basin of attraction understanding is right on. I also like
 your ideas for distributing the mean-shift iterations by following a
 canopy-style method. My intuition was a little different, and I would like
 to hear your ideas on it:
 
 Just to make sure we're on the same page
 Say we have 1 million point in our original dataset, and we want to
 cluster
 by mean-shift. At each iteration of mean-shift we subsample (say) 10,000
 points from the original dataset and follow the gradient of those points
 to
 the region of highest density (and as we saw from the paper, rather than
 calculate the gradient itself we can equivalently compute the weighted
 average of our subsampled points and move the centroid to that point).
 This
 part seems fairly straightforward to distribute - we just send a different
 subsampled set to each processor and each processor returns the final
 centroid for that set.
 
 The problem I see is that 10,000 points (or whatever value we choose), may
 be too much for a single processor if we have to compute the distance to
 every single point when we compute the weighted mean. My thought here was
 to
 exploit the fact that we're using a kernel function (gaussian, uniform,
 etc.) in the weighted mean calculation and that kernel will have a set
 radius. Because the radius is static, it may be easy to (quickly) identify
 the points that we must consider in the calculation (i.e. those within the
 radius) by using a locality sensitive hashing scheme, tuned to that
 particular radius. Of course, the degree of advantage we get from this
 method will depend on the data itself, but intuitively I think we will
 usually see a dramatic improvement.
 
 Honestly, I should do more background work developing this idea, and
 possibly try a matlab implementation to test the feasibility. This sounds
 more like a research paper than something we should dive into immediately,
 but I wanted to share the idea and get some feedback if anyone has
 thoughts...
 
 Matt
 
 
 On Mon, Mar 10, 2008 at 11:29 AM, Jeff Eastman [EMAIL PROTECTED]
 wrote:
 
  Hi Matthew,
 
  I've been looking over the mean-shift papers for the last several days.
  While the details of the math are still sinking in, it looks like the
  basic algorithm might be summarized thusly:
 
  Points in an n-d feature space are migrated iteratively in the direction
  of maxima in their local density functions. Points within a basin of
  attraction all converge to the same maxima and thus belong to the same
  cluster.
 
  A physical analogy might be(?):
 
  Gas particles in 3-space, operating with gravitational attraction but
  without momentum, would tend to cluster similarly.
 
  The algorithm seems to require that each point be compared with every
  other point. This might be taken to require each mapper to see all of
  the points, thus frustrating scalability. OTOH, Canopy clustering avoids
  this by clustering the clusters produced by the subsets of points seen
  by each mapper. K-means has the requirement that each point needs to be
  compared with all of the cluster centers, not points. It has a similar
  iterative structure over clusters (a much smaller, constant number) that
  might be employed.
 
  There is a lot of locality in the local density function window, and
  this could perhaps be exploited. If points could be pre-clustered (as
  canopy is often used to prime the k-means iterations

Re: Google Summer of Code

2008-03-09 Thread Ian Holsman

Hi Grant.
I'll be happy to mentor someone for this project.

regards
Ian



| A person or group responsible for review and ranking of student
| applications,

I'd be happy to help out here. Anyone else?


Cool





Re: Google Summer of Code

2008-03-08 Thread Grant Ingersoll

Note, the deadline for project proposals is March 12.

I put an item up for us at: http://wiki.apache.org/general/SummerOfCode2008 
   I think it is probably general enough to cover all of the bases  
discussed here.  Please feel free to add your name to the list of  
mentors if you can.  Perhaps we can share duties.


-Grant



On Mar 7, 2008, at 1:43 PM, Isabel Drost wrote:


On Friday 07 March 2008, Grant Ingersoll wrote:

Sounds good.  I should also note that all mentoring should (barring
personal conversation) should take place on the dev list.  That is,
decisions, discussions on what to do should be done on the list so
that we all benefit from the understanding.  Not that you were
suggesting otherwise!


Sure, after all, GSoC is about integrating students into free software
projects - and making decisions offline certainly is not the way,  
Apache

projects work. Thanks for pointing that out.

Isabel




Re: Google Summer of Code

2008-03-07 Thread Isabel Drost
On Thursday 06 March 2008, Matthew Riley wrote:
 I would basically be interested in doing anything that fits in well with
 the overall goals of the Mahout project. Whether that is implementing well
 known algorithms within the Hadoop framework or working on some novel idea
 is up to the mentors, I presume. 

I would be happy with both options: Working on well known algorithms within 
the Hadoop framework certainly is one of our main goals. But at least me 
personally am also interested in providing space for novel ideas. I consider 
it really important for researchers to not only publish the data they 
experimented on but also the implementation used. If working on the latter 
within Mahout helps to maybe focus a little more than usual on scalability 
and maintainability - great.

So if you have an idea that fits well with your day to day work as well as 
with the overall goals of Mahout that would be fine. I would guess, this 
makes it easier to find some spare time to work on the project ;)

Isabel 

-- 
Each new user of a new system uncovers a new class of bugs. -- 
Kernighan
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part.


Re: Google Summer of Code

2008-03-07 Thread Dawid Weiss


What about encouraging your students to submit their work at Mahout? Just a 
naive thought of mine.


Those students I'm in charge of have their area of interest defined already -- 
too late to change it. Good idea for the future, I have been thinking about it, 
actually.


D.


Re: Google Summer of Code

2008-03-07 Thread Isabel Drost
On Thursday 06 March 2008, Grant Ingersoll wrote:
 I think we can split the duties a bit, too. 

I think the Apache FAQ also said that - according with the usual Apache way of 
doing things - it would be ok if the GSoC students would receive help from 
all community members. So the actual time spent for one mentor could very 
well drop to about 3h per week.

Still I would not rely on that when accepting the duty to become a mentor - 
after all, at least officially it is the mentor who is responsible for 
encouraging the student.

Isabel



-- 
The bug stops here.
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part.


Re: Google Summer of Code

2008-03-07 Thread Grant Ingersoll


On Mar 7, 2008, at 3:08 AM, Isabel Drost wrote:


On Thursday 06 March 2008, Grant Ingersoll wrote:

I think we can split the duties a bit, too.


I think the Apache FAQ also said that - according with the usual  
Apache way of
doing things - it would be ok if the GSoC students would receive  
help from
all community members. So the actual time spent for one mentor could  
very

well drop to about 3h per week.

Still I would not rely on that when accepting the duty to become a  
mentor -

after all, at least officially it is the mentor who is responsible for
encouraging the student.


Sounds good.  I should also note that all mentoring should (barring  
personal conversation) should take place on the dev list.  That is,  
decisions, discussions on what to do should be done on the list so  
that we all benefit from the understanding.  Not that you were  
suggesting otherwise!


-Grant



Re: Google Summer of Code

2008-03-07 Thread Isabel Drost
On Friday 07 March 2008, Grant Ingersoll wrote:
 Sounds good.  I should also note that all mentoring should (barring
 personal conversation) should take place on the dev list.  That is,
 decisions, discussions on what to do should be done on the list so
 that we all benefit from the understanding.  Not that you were
 suggesting otherwise!

Sure, after all, GSoC is about integrating students into free software 
projects - and making decisions offline certainly is not the way, Apache 
projects work. Thanks for pointing that out.

Isabel


-- 
Never pay a compliment as if expecting a receipt.
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part.


Re: Google Summer of Code

2008-03-06 Thread Grant Ingersoll
I think the Mentoring Org is already setup.  After March 3, mentors  
can register.  See http://wiki.apache.org/general/SummerOfCode2008.   
I'm willing to mentor, but would like to share the load a bit too.


-Grant


On Mar 6, 2008, at 1:56 AM, Isabel Drost wrote:


On Saturday 01 March 2008, Grant Ingersoll wrote:

Also, any thoughts on what we might want someone to do?  I think it
would be great to have someone implement one of the algorithms on our
wiki.


Just as a general note, the deadline for applications:

March 12: Mentoring organization application deadline (12 noon PDT/ 
19:00 UTC).


I suppose we should identify interesing tasks until that deadline.  
As a

general guideline for mentors and for project proposals:

http://code.google.com/p/google-summer-of-code/wiki/AdviceforMentors

Isabel

--
Better late than never. -- Titus Livius (Livy)
 |\  _,,,---,,_   Web:   http://www.isabel-drost.de
 /,`.-'`'-.  ;-;;,_
|,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]





Re: Google Summer of Code[esp. More Clustering]

2008-03-06 Thread Grant Ingersoll
I haven't read the papers, but the big question is do you think they  
can scale using M/R or some other distributed techniques?


If so, feel free to write up a bit of a proposal using the info at: http://wiki.apache.org/general/SummerOfCode2008 
  If you are unsure, that is fine too.  We could start with a simpler  
implementation, and then look to distribute it.



On Mar 6, 2008, at 2:45 PM, Matthew Riley wrote:


Hey Jeff-

I'm certainly willing to put some energy into developing  
implementations of

these algorithms, and it's good to hear that you may be interested in
guiding us in the right direction.

Here are the references I learned the algorithms from- some are more
detailed than others:

Mean-Shift clustering was introduced here and this paper is a thorough
reference:
Mean-Shift: A Robust Approach to Feature Space Analysis
http://courses.csail.mit.edu/6.869/handouts/PAMIMeanshift.pdf

And here's a PDF with just guts of the algorithm outlined:
homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/TUZEL1/MeanShift.pdf

It looks like there isn't a definitive reference for the k-means
approximation with randomized k-d trees, but there are promising  
results

introduced here:

Object retrieval with large vocabularies and fast spatial matching:
http://www.robots.ox.ac.uk/~vgg/publications/papers/philbin07.pdf*
*
And a deeper explanation of the technique here:

Randomized KD-Trees for Real-Time Keypoint Detection:
ieeexplore.ieee.org/iel5/9901/31473/01467521.pdf?arnumber=1467521

Let me know what you think.

Matt

On Thu, Mar 6, 2008 at 11:45 AM, Jeff Eastman [EMAIL PROTECTED]  
wrote:



Hi Matthew,

As with most open source projects, interest is mainly a function of
the willingness of somebody to contribute their energy. Clustering is
certainly within the scope of the project. I'd be interested in
exploring additional clustering algorithms with you and your  
colleague.
I'm a complete noob in this area and it is always enlightening to  
work

with students who have more current theoretical exposures.

Do you have some links on these approaches that you find particularly
helpful?

Jeff

-Original Message-
From: Matthew Riley [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 05, 2008 11:11 PM
To: mahout-dev@lucene.apache.org; [EMAIL PROTECTED]
Subject: Re: Google Summer of Code

Hey everyone-

I've been watching the mailing list for a little while now, hoping to
contribute once I became more familiar, but I wanted to jump in  
here now

and
express my interest in the Summer of Code project. I'm currently a
graduate
student in electrical engineering at UT-Austin working in computer
vision,
which is closely tied to many of the problems Mahout is addressing
(especially in my area of content-based retrieval).

What can I do to help out?

I've discussed some potential Mahout projects with another student
recently-
mostly focused around approximate k-means algorithms (since that's a
problem
I've been working on lately). It sounds like you guys are already
implementing canopy clustering for k-means- Is there any interest in
developing another approximation algorithm based on randomized kd- 
trees

for
high dimensional data? What about mean-shift clustering?

Again, I would be glad to help in any way I can.

Matt

On Thu, Mar 6, 2008 at 12:56 AM, Isabel Drost [EMAIL PROTECTED] 
drost.de

wrote:


On Saturday 01 March 2008, Grant Ingersoll wrote:

Also, any thoughts on what we might want someone to do?  I think it
would be great to have someone implement one of the algorithms on

our

wiki.


Just as a general note, the deadline for applications:

March 12: Mentoring organization application deadline (12 noon

PDT/19:00

UTC).

I suppose we should identify interesing tasks until that deadline.  
As

a

general guideline for mentors and for project proposals:

http://code.google.com/p/google-summer-of-code/wiki/AdviceforMentors

Isabel

--
Better late than never. -- Titus Livius (Livy)
 |\  _,,,---,,_   Web:   http://www.isabel-drost.de
/,`.-'`'-.  ;-;;,_
|,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]





--
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







Re: Google Summer of Code

2008-03-06 Thread Matthew Riley
Hey Dawid,

Is it information retrieval from visual data you're working on? We have
 recently
 had a presentation about a guy who implemented motion detection on GPUs
 with
 very impressive speedups (orders of magnitude compared to normal CPUs).
 I'm
 wondering if your expertise here could be used to implement map-reduce
 distributed jobs for running multiple GPUs in parallel. I know this sounds
 a bit
 crazy, but I've heard of bio-engineering companies doing just that --
 running a
 cluster of GPUs to speed up their computations. Just a wild thought. Back
 to
 your proposal though.


Yes, it is basically information retrieval that I'm performing on sets of
images- in fact, a lot of the best algorithms employed today for object
detection, object retrieval, etc. are adaptations of basic text-retrieval
approaches (e.g. tfidf-weighted vector space models). I've personally never
worked with GPUs for image processing, but I imagine the vector processing
abilities would be useful at almost every stage of the indexing and
retrieval processes. I would be interested in looking into those
possibilities in more details.


  mostly focused around approximate k-means algorithms (since that's a
 problem
  I've been working on lately). It sounds like you guys are already
  implementing canopy clustering for k-means- Is there any interest in
  developing another approximation algorithm based on randomized kd-trees
 for
  high dimensional data? What about mean-shift clustering?

  From my experience the largest challenge in data clustering is not
 figuring out
 a new clustering methodology, but finding the right existing one to tackle
 a
 particular problem. Isabel mentioned web spam detection challenge --  this
 is a
 good example of a multi-feature classification problem and I know people
 have
 tried clustering the host graph to come up with more coarse-grained
 features for
 hosts. From my own interest, a very interesting challenge is doing
 something
 like Google News does (event aggregation). This is less trivial than you
 might
 think at first -- most news are very similar to each other (copy/paste and
 editing changes), so it's trivial to find small clusters of near-clones.
 Then
 the problem becomes more difficult because all news speak about pretty
 much the
 same people/ events (take presidential election in the U.S.). I think the
 problems you could state here are:

 1) approximating optimal clustering granularity (call it the number of
 clusters
 if you wish, although I think clustering should be driven by other factors
 rather than just the number of clusters),

 2) coming up with clusters of news items _other_ than keyword-based
 similarity.
 One example here is grouping news by region (geolocation), sentiment
 (positive/
 negative news), people-related news, etc.

 3) multilingual news matching and clustering.

 All the above issues are on the border of different domains -- NLP,
 clustering,
 classification. The tricky part is being able to put them together. What
 would
 be of interest to you?


These are all interesting problems, actually. I've done some research into
sentiment analysis, as you mentioned in (2), and I think it's still a wide
open problem. Oren Etzioni at UWash does some interesting related work:
www.cs.washington.edu/homes/etzioni/.

I would basically be interested in doing anything that fits in well with the
overall goals of the Mahout project. Whether that is implementing well known
algorithms within the Hadoop framework or working on some novel idea is up
to the mentors, I presume. Personally, if I'm going to be working on
something novel, I would like to relate it to my current research work...
and I'm happy to discuss that with anyone on the list who is interested.

Matt




 D.

 
  Again, I would be glad to help in any way I can.
 
  Matt
 
  On Thu, Mar 6, 2008 at 12:56 AM, Isabel Drost [EMAIL PROTECTED]
  wrote:
 
  On Saturday 01 March 2008, Grant Ingersoll wrote:
  Also, any thoughts on what we might want someone to do?  I think it
  would be great to have someone implement one of the algorithms on our
  wiki.
  Just as a general note, the deadline for applications:
 
  March 12: Mentoring organization application deadline (12 noon
 PDT/19:00
  UTC).
 
  I suppose we should identify interesing tasks until that deadline. As a
  general guideline for mentors and for project proposals:
 
  http://code.google.com/p/google-summer-of-code/wiki/AdviceforMentors
 
  Isabel
 
  --
  Better late than never. -- Titus Livius (Livy)
|\  _,,,---,,_   Web:   http://www.isabel-drost.de
   /,`.-'`'-.  ;-;;,_
   |,4-  ) )-,_..;\ (  `'-'
  '---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]
 
 



Re: Google Summer of Code[esp. More Clustering]

2008-03-06 Thread Matthew Riley
Hey Grant-

I believe scaling Mean-Shift clustering using M/R will be pretty
straightforward. I'm not as sure about K-Means using KD-Trees, since I
haven't personally implemented that algorithm, but since it follows K-Means
fairly closely I imagine it is possible.

I'll get to work on a proposal with some of my ideas, and hopefully get some
feedback from you guys during the process.

Thanks for all the responses so far.

Matt

On Thu, Mar 6, 2008 at 3:25 PM, Grant Ingersoll [EMAIL PROTECTED] wrote:

 I haven't read the papers, but the big question is do you think they
 can scale using M/R or some other distributed techniques?

 If so, feel free to write up a bit of a proposal using the info at:
 http://wiki.apache.org/general/SummerOfCode2008
   If you are unsure, that is fine too.  We could start with a simpler
 implementation, and then look to distribute it.


 On Mar 6, 2008, at 2:45 PM, Matthew Riley wrote:

  Hey Jeff-
 
  I'm certainly willing to put some energy into developing
  implementations of
  these algorithms, and it's good to hear that you may be interested in
  guiding us in the right direction.
 
  Here are the references I learned the algorithms from- some are more
  detailed than others:
 
  Mean-Shift clustering was introduced here and this paper is a thorough
  reference:
  Mean-Shift: A Robust Approach to Feature Space Analysis
  http://courses.csail.mit.edu/6.869/handouts/PAMIMeanshift.pdf
 
  And here's a PDF with just guts of the algorithm outlined:
  homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/TUZEL1/MeanShift.pdf
 
  It looks like there isn't a definitive reference for the k-means
  approximation with randomized k-d trees, but there are promising
  results
  introduced here:
 
  Object retrieval with large vocabularies and fast spatial matching:
  http://www.robots.ox.ac.uk/~vgg/publications/papers/philbin07.pdf*http://www.robots.ox.ac.uk/%7Evgg/publications/papers/philbin07.pdf*
  *
  And a deeper explanation of the technique here:
 
  Randomized KD-Trees for Real-Time Keypoint Detection:
  ieeexplore.ieee.org/iel5/9901/31473/01467521.pdf?arnumber=1467521
 
  Let me know what you think.
 
  Matt
 
  On Thu, Mar 6, 2008 at 11:45 AM, Jeff Eastman [EMAIL PROTECTED]
  wrote:
 
  Hi Matthew,
 
  As with most open source projects, interest is mainly a function of
  the willingness of somebody to contribute their energy. Clustering is
  certainly within the scope of the project. I'd be interested in
  exploring additional clustering algorithms with you and your
  colleague.
  I'm a complete noob in this area and it is always enlightening to
  work
  with students who have more current theoretical exposures.
 
  Do you have some links on these approaches that you find particularly
  helpful?
 
  Jeff
 
  -Original Message-
  From: Matthew Riley [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, March 05, 2008 11:11 PM
  To: mahout-dev@lucene.apache.org; [EMAIL PROTECTED]
  Subject: Re: Google Summer of Code
 
  Hey everyone-
 
  I've been watching the mailing list for a little while now, hoping to
  contribute once I became more familiar, but I wanted to jump in
  here now
  and
  express my interest in the Summer of Code project. I'm currently a
  graduate
  student in electrical engineering at UT-Austin working in computer
  vision,
  which is closely tied to many of the problems Mahout is addressing
  (especially in my area of content-based retrieval).
 
  What can I do to help out?
 
  I've discussed some potential Mahout projects with another student
  recently-
  mostly focused around approximate k-means algorithms (since that's a
  problem
  I've been working on lately). It sounds like you guys are already
  implementing canopy clustering for k-means- Is there any interest in
  developing another approximation algorithm based on randomized kd-
  trees
  for
  high dimensional data? What about mean-shift clustering?
 
  Again, I would be glad to help in any way I can.
 
  Matt
 
  On Thu, Mar 6, 2008 at 12:56 AM, Isabel Drost [EMAIL PROTECTED]
  drost.de
  wrote:
 
  On Saturday 01 March 2008, Grant Ingersoll wrote:
  Also, any thoughts on what we might want someone to do?  I think it
  would be great to have someone implement one of the algorithms on
  our
  wiki.
 
  Just as a general note, the deadline for applications:
 
  March 12: Mentoring organization application deadline (12 noon
  PDT/19:00
  UTC).
 
  I suppose we should identify interesing tasks until that deadline.
  As
  a
  general guideline for mentors and for project proposals:
 
  http://code.google.com/p/google-summer-of-code/wiki/AdviceforMentors
 
  Isabel
 
  --
  Better late than never. -- Titus Livius (Livy)
   |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_
  |,4-  ) )-,_..;\ (  `'-'
  '---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]
 
 

 --
 Grant Ingersoll
 http://www.lucenebootcamp.com
 Next Training: April 7, 2008

Re: Google Summer of Code

2008-03-05 Thread Isabel Drost
On Saturday 01 March 2008, Grant Ingersoll wrote:
  Any of the other committers willing to mentor? 

Could you please clarify - or point to a page that does so - about what it 
means to become a Mentor? Anyone have any experience being a mentor? I would 
be happy to help - but I would rather learn a bit more about the mentor side 
of GSoC 

 Also, any thoughts on what we might want someone to do?  I think it
 would be great to have someone implement one of the algorithms on our
 wiki.

I think just implementing one of the algorithms might help Mahout but it might 
be a bit hard to attract students to do that without some real task at hand.

What about putting up tasks that solve problems e.g. from this years KDD cup 
or the web spam challenge? Than the benefit for participants would be two 
fold - first they would help Mahout and second they could compete with others 
in the field.

Isabel

-- 
A man's best friend is his dogma.
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part.


Re: Google Summer of Code

2008-03-05 Thread Simon Willnauer
On Wed, Mar 5, 2008 at 8:52 PM, Isabel Drost
[EMAIL PROTECTED] wrote:
 On Saturday 01 March 2008, Grant Ingersoll wrote:
Any of the other committers willing to mentor?

  Could you please clarify - or point to a page that does so - about what it
  means to become a Mentor? Anyone have any experience being a mentor? I would
  be happy to help - but I would rather learn a bit more about the mentor side
  of GSoC

You could have a look at the FAQ or the GSoC pages
http://code.google.com/soc/2008/ and
http://code.google.com/soc/2008/faqs.html respectively.
Or join the #gsoc IRC channel on freenode.

you could also contact some of the google folks they are very helpful
if you have questions beyond the FAQ. (watch out for lh in the IRC
channel)

best regards,

simon


   Also, any thoughts on what we might want someone to do?  I think it
   would be great to have someone implement one of the algorithms on our
   wiki.

  I think just implementing one of the algorithms might help Mahout but it 
 might
  be a bit hard to attract students to do that without some real task at hand.

  What about putting up tasks that solve problems e.g. from this years KDD cup
  or the web spam challenge? Than the benefit for participants would be two
  fold - first they would help Mahout and second they could compete with others
  in the field.

  Isabel

  --
  A man's best friend is his dogma.
   |\  _,,,---,,_   Web:   http://www.isabel-drost.de
   /,`.-'`'-.  ;-;;,_
   |,4-  ) )-,_..;\ (  `'-'
  '---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]



Re: Google Summer of Code

2008-03-05 Thread Isabel Drost
On Wednesday 05 March 2008, Simon Willnauer wrote:
 You could have a look at the FAQ or the GSoC pages
 http://code.google.com/soc/2008/ and
 http://code.google.com/soc/2008/faqs.html respectively.

Hmm, there is little about what mentors are expected apart from the following 
rather general question, is there?

| 2. What is the role of a mentoring organization?

If we want to take part in GSoC, from that question, I guess we need a little 
more than only mentors:

| A pool of project ideas for students to choose from.

Grant already asked for ideas.

| An organization administrator to act as the project's main point of contact
| for Google;  

Any volunteers?

| A person or group responsible for review and ranking of student
| applications,

I'd be happy to help out here. Anyone else?


| A person or group of people responsible for monitoring the progress of each
| accepted student and to mentor her/him as the project progresses;  + backup

That would be the mentors Grant already mentioned.


| A written evaluation of each student participant, including how s/he worked
| with the group, whether s/he should be invited back should we do another
| Google Summer of Code, etc.   

I guess this could be done by each member but should be reviewed by more than 
one person, as it looks like the evaluations are going to be highly 
subjective.


 Or join the #gsoc IRC channel on freenode.

Sorry, but working on Mahout only after work I usually do not have the time to 
follow irc channels :(

Anyone here, who already took part in GSoC and could give us a little summary 
of her experiences? Is it possible to do the mentoring job in the freetime 
after work or should one better plan more time than that?

Isabel


-- 
Remember kids, if there's a loaded gun in the room, be sure that you're the 
one holding it -- Captain Combat
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part.


Re: Google Summer of Code

2008-03-05 Thread Matthew Riley
Hey everyone-

I've been watching the mailing list for a little while now, hoping to
contribute once I became more familiar, but I wanted to jump in here now and
express my interest in the Summer of Code project. I'm currently a graduate
student in electrical engineering at UT-Austin working in computer vision,
which is closely tied to many of the problems Mahout is addressing
(especially in my area of content-based retrieval).

What can I do to help out?

I've discussed some potential Mahout projects with another student recently-
mostly focused around approximate k-means algorithms (since that's a problem
I've been working on lately). It sounds like you guys are already
implementing canopy clustering for k-means- Is there any interest in
developing another approximation algorithm based on randomized kd-trees for
high dimensional data? What about mean-shift clustering?

Again, I would be glad to help in any way I can.

Matt

On Thu, Mar 6, 2008 at 12:56 AM, Isabel Drost [EMAIL PROTECTED]
wrote:

 On Saturday 01 March 2008, Grant Ingersoll wrote:
  Also, any thoughts on what we might want someone to do?  I think it
  would be great to have someone implement one of the algorithms on our
  wiki.

 Just as a general note, the deadline for applications:

 March 12: Mentoring organization application deadline (12 noon PDT/19:00
 UTC).

 I suppose we should identify interesing tasks until that deadline. As a
 general guideline for mentors and for project proposals:

 http://code.google.com/p/google-summer-of-code/wiki/AdviceforMentors

 Isabel

 --
 Better late than never. -- Titus Livius (Livy)
   |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_
  |,4-  ) )-,_..;\ (  `'-'
 '---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]



Re: Google Summer of Code

2008-03-01 Thread Grant Ingersoll
Well, here's your chance.  Make a proposal of something you would like  
to work on that fits with what we are doing and we'll discuss it and  
possibly put it up as a project.


I think it would be great if anyone took on something like M/R SVM  
implementation, or one of the other ones that is not already under way.


-Grant

On Mar 1, 2008, at 2:08 AM, [EMAIL PROTECTED] wrote:


Hi Gang,

I think we should put in for this:
http://wiki.apache.org/general/SummerOfCode2008

I would be there are some students interested in doing ML on Hadoop.

Yes. I would be happy to work :) Didn't know that Mahout is also
participating in SoC.

Any of the other committers willing to mentor?  I am, but would also
like some others to help out if you have the time.  See
http://wiki.apache.org/general/SummerOfCodeMentor
.


Thanks,
Grant













Re: Google Summer of Code

2008-02-29 Thread Grant Ingersoll
Also, any thoughts on what we might want someone to do?  I think it  
would be great to have someone implement one of the algorithms on our  
wiki.


-Grant

On Feb 29, 2008, at 9:33 PM, Grant Ingersoll wrote:


Hi Gang,

I think we should put in for this: 
http://wiki.apache.org/general/SummerOfCode2008

I would be there are some students interested in doing ML on  
Hadoop.  Any of the other committers willing to mentor?  I am, but  
would also like some others to help out if you have the time.  See http://wiki.apache.org/general/SummerOfCodeMentor 
.



Thanks,
Grant







Re: Google Summer of Code

2008-02-29 Thread jaideep
 Hi Gang,

 I think we should put in for this:
 http://wiki.apache.org/general/SummerOfCode2008

 I would be there are some students interested in doing ML on Hadoop.
Yes. I would be happy to work :) Didn't know that Mahout is also
participating in SoC.
 Any of the other committers willing to mentor?  I am, but would also
 like some others to help out if you have the time.  See
 http://wiki.apache.org/general/SummerOfCodeMentor
 .


 Thanks,
 Grant