Re: Regarding Google Summer of Code Lucene Mahout Project

Grant Ingersoll Mon, 24 Mar 2008 10:29:25 -0700


On Mar 24, 2008, at 11:07 AM, Robin Anil wrote:

Hi Admins,
I went through the Google Summer of Code Wiki andfound out
about  the mahout-machine-learning project. I wish to participate in
implementing the papers. I am currently working on my Btech Thesiswhich isto extract opinionated Sentences from Blogs which is also a part ofText
Retrieval Conference TREC 2008  Blog
Track<http://ir.dcs.gla.ac.uk/wiki/TREC-BLOG/#head-9dd52f8791e8d7ba62f3bdd63932e0ec04e83ac8>under
the guidance of Prof.
Sudeshna Sarkar <http://www.facweb.iitkgp.ernet.in/%7Esudeshna>. For
implementing of my Trec System, I have experimented withClassifiers( NB,
SVM, Decision Trees) and Clustering Algorithms( k-means, and Gaussian
Mixtures). For the project i had used C# version of Lucene(Lucene.NET) to
index and Retrieve Documents in the Blog06
Collection<http://ir.dcs.gla.ac.uk/test_collections/blog06info.html>(160GB).
I believe working on this project would aid me to further improve the
performance and the efficiency of the system i am working on as wellas ease
me in working with the open source community.
I am a 4th year CS Student of IIT Kharagpur working towards a DualDegree (
B.Tech + M.Tech). And this would be the first time working with an
Open-Source project. Could you suggest me the things I should get
comfortable with in implementing this as well as the detail yourequire in
the proposal for implementation

I'd have a look at the wiki and the NIPS paper listed there, and alsosearch the archives for GSOC discussions. I'd also start looking intoHadoop and the existing code we have. Then, just go ahead and make aproposal. I'm particularly interested in classifiers, but I knowthere is a good deal of interest in clustering too (we already have ak-means impl). For classifiers, I am slowly, but surely, working on anaive bayes implementation (time is always a question for me), thus,implementing decision trees or SVM would be really cool.


Cheers,
Grant

Re: Regarding Google Summer of Code Lucene Mahout Project

Reply via email to