Re: GSOC (SVM algorithm)

Marko Novakovic Mon, 31 Mar 2008 12:35:04 -0700

> For everyone applying for GSOC, not just Marko:
> 
> We have a good number of applications and will
> probably only get 1 or  
> 2 students (I'm not even sure 1 is guaranteed but it
> is likely), even  
> though we have 4 willing mentors since the ASF is
> slotted a certain  
> number of students for the whole group.


If I understand correctly, only 1 or 2 students will
be chosed in Mahout project.

> I would
> encourage everyone to  
> make sure their proposals are as strong as you can
> possibly make  
> them.  This means timelines, bios, supporting
> materials,  
> recommendations, references, etc.  Basic job
> interview stuff, I guess.
> 
> I can't speak to the other evaluators, but I know at
> least part of my  
> criteria will be based on the level of details
> provided, etc.  In my  
> mind, it is not enough to simply say you are going
> to work on some ML  
> algorithm, or, as some have said, claim to implement
> all 10 algorithms  
> in the NIPS paper on the wiki in 3 months, or pick
> one once the  
> project starts.  I'd also make some effort to show
> you have done your  
> background research and include any references that
> you have that  
> discuss the problem or would be helpful in us
> understanding them better.
> 
> Cheers,
> Grant
> 

This is my proposal:

This is my application, give me feedback, please.

The Implementation of Support Vector Machine Algorithm
at Hadoop Platform

Abstract

I have been researching in Search Engines
functionalities, like ranking, presenting relevant
page to users, etc. 
I noted that SVM algorithm is good solution for
clasifying crawled Web pages in search engines.
After I had been reading and elaborating article
[Joachims, 2007]
I decided to implement SVM optimized for processing
text data and retrieving relevant feedback.
According to SVM is very complex algorithm, which has
a lot of operations, 
I choose map-reduce Hadoop platform.

[Joachims, 2007] T. Joachims, F. Radlinski: "Search
Engines that Laerning from Implicit Feedback," IEEE
Computer, August 2007, pp 38

Detailed Description

Dear Google and Apache,

Project: Lucene Mahout

My Idea:

I have idea to implement model and solution for
retrieving relevant ranking Web pages, in order to
user's recent behavior. 
According to SE-s have a lot of crawled Web pages, 
machine learning algorithms, which is used by SE, must
be realized as distributed or paralilized, if we want
to obtain  real-time results  and have fresh retrieved
database. 
I want to implement the Support Vector Machine (SVM)
formulation for optimizing multivariate performance
measures described in [Joachims, 2005]. Furthermore,
that would implement the alternative structural
formulation of the SVM optimization problem for
conventional binary classification with error rate and
ordinal regression described in [Joachims, 2006].
There is not usually important to use a large parallel
cluster for processing relevance feedback because
there are only a few training examples in these cases.
According to SVM training cost goes up extremly with
the size of the problem (quadratic complexity), I want
to deploy this solution at first 100 pages for each
combination of user and query.
I also, choose SVM algorithm because I comprehend that
this is big temptation for me and will be useful for
professors at my college.
I will exploit working on this project for writing new
article about deployment of SVM algorithm optimization
at SE-a.
I have prepared to this project reading articles:
[1] C. Burges, "A Tutorial on Suppot Vector Machines
for Pattern Recognition," Kluwer Academic Publishers,
Boston, 1998
[2] R.E Fan, P.H Chen, C.J. Lin, "Working Set
Selection Using Second Order Information for Training
Support Vector Machines," Journal of Machine Learning
Research 6 (2005), pp 18891918
I also have read Hadoop documentation and examined
your implementations of algoritm kMeans at this
platform.

Methodoligies of Development:

- Test Driven Development
- Deployment ANT an JUnit
- SDK: Eclipse
- SVN System for Versioning
- Javadoc

About Me:

My resume you can see at link
http://atisha34.googlepages.com/.
I also participate in some academic projects at my
college:
- Working at topic based Search Engine, called Grain,
which is in construction at my faculty.
- Tutorial about SE-s, mentored by professor Veljko
Milutinovic: "The New Avenues in Search Engines" 
presentation:
http://atisha34.googlepages.com/Searchengines.ppt
abstract:
http://atisha34.googlepages.com/TheNewAvenuesinWebSearch.docx
I should publish article driven by this presentation
at IPSI Magazine.
- Other projects in which I participate aren't related
to machine learning and search engines.

My Interests:
- Search Engines
- Software Engineering and Test Driven Development
- Machine Learning
- Database Modeling and OO Design
- ERP and Business Processes

Sincerely Yours,
Marko Novakovic
 
[Joachims, 2006] T. Joachims, Training Linear SVMs in
Linear Time, Proceedings of the ACM Conference on
Knowledge Discovery and Data Mining (KDD), 2006.
[Joachims, 2005] T. Joachims, A Support Vector Method
for Multivariate Performance Measures, Proceedings of
the International Conference on Machine Learning
(ICML), 2005.




      
____________________________________________________________________________________
Special deal for Yahoo! users & friends - No Cost. Get a month of Blockbuster 
Total Access now 
http://tc.deals.yahoo.com/tc/blockbuster/text3.com

Re: GSOC (SVM algorithm)

Reply via email to