On Mar 24, 2008, at 11:07 AM, Robin Anil wrote:

Hi Admins,
I went through the Google Summer of Code Wiki and found out
about  the mahout-machine-learning project. I wish to participate in
implementing the papers. I am currently working on my Btech Thesis which is to extract opinionated Sentences from Blogs which is also a part of Text
Retrieval Conference TREC 2008  Blog
Track<http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG/#head-9dd52f8791e8d7ba62f3bdd63932e0ec04e83ac8 >under
the guidance of Prof.
Sudeshna Sarkar <http://www.facweb.iitkgp.ernet.in/%7Esudeshna>. For
implementing of my Trec System, I have experimented with Classifiers( NB,
SVM, Decision Trees) and Clustering Algorithms( k-means, and Gaussian
Mixtures). For the project i had used C# version of Lucene (Lucene.NET) to
index and Retrieve Documents in the Blog06
Collection<http://ir.dcs.gla.ac.uk/test_collections/ blog06info.html>(160GB).
I believe working on this project would aid me to further improve the
performance and the efficiency of the system i am working on as well as ease
me in working with the open source community.

I am a 4th year CS Student of IIT Kharagpur working towards a Dual Degree (
B.Tech + M.Tech). And this would be the first time working with an
Open-Source project. Could you suggest me the things I should get
comfortable with in implementing this as well as the detail you require in
the proposal for implementation


I'd have a look at the wiki and the NIPS paper listed there, and also search the archives for GSOC discussions. I'd also start looking into Hadoop and the existing code we have. Then, just go ahead and make a proposal. I'm particularly interested in classifiers, but I know there is a good deal of interest in clustering too (we already have a k-means impl). For classifiers, I am slowly, but surely, working on a naive bayes implementation (time is always a question for me), thus, implementing decision trees or SVM would be really cool.

Cheers,
Grant

Reply via email to