> For everyone applying for GSOC, not just Marko: > > We have a good number of applications and will > probably only get 1 or > 2 students (I'm not even sure 1 is guaranteed but it > is likely), even > though we have 4 willing mentors since the ASF is > slotted a certain > number of students for the whole group.
If I understand correctly, only 1 or 2 students will be chosed in Mahout project. > I would > encourage everyone to > make sure their proposals are as strong as you can > possibly make > them. This means timelines, bios, supporting > materials, > recommendations, references, etc. Basic job > interview stuff, I guess. > > I can't speak to the other evaluators, but I know at > least part of my > criteria will be based on the level of details > provided, etc. In my > mind, it is not enough to simply say you are going > to work on some ML > algorithm, or, as some have said, claim to implement > all 10 algorithms > in the NIPS paper on the wiki in 3 months, or pick > one once the > project starts. I'd also make some effort to show > you have done your > background research and include any references that > you have that > discuss the problem or would be helpful in us > understanding them better. > > Cheers, > Grant > This is my proposal: This is my application, give me feedback, please. The Implementation of Support Vector Machine Algorithm at Hadoop Platform Abstract I have been researching in Search Engines functionalities, like ranking, presenting relevant page to users, etc. I noted that SVM algorithm is good solution for clasifying crawled Web pages in search engines. After I had been reading and elaborating article [Joachims, 2007] I decided to implement SVM optimized for processing text data and retrieving relevant feedback. According to SVM is very complex algorithm, which has a lot of operations, I choose map-reduce Hadoop platform. [Joachims, 2007] T. Joachims, F. Radlinski: "Search Engines that Laerning from Implicit Feedback," IEEE Computer, August 2007, pp 38 Detailed Description Dear Google and Apache, Project: Lucene Mahout My Idea: I have idea to implement model and solution for retrieving relevant ranking Web pages, in order to user's recent behavior. According to SE-s have a lot of crawled Web pages, machine learning algorithms, which is used by SE, must be realized as distributed or paralilized, if we want to obtain real-time results and have fresh retrieved database. I want to implement the Support Vector Machine (SVM) formulation for optimizing multivariate performance measures described in [Joachims, 2005]. Furthermore, that would implement the alternative structural formulation of the SVM optimization problem for conventional binary classification with error rate and ordinal regression described in [Joachims, 2006]. There is not usually important to use a large parallel cluster for processing relevance feedback because there are only a few training examples in these cases. According to SVM training cost goes up extremly with the size of the problem (quadratic complexity), I want to deploy this solution at first 100 pages for each combination of user and query. I also, choose SVM algorithm because I comprehend that this is big temptation for me and will be useful for professors at my college. I will exploit working on this project for writing new article about deployment of SVM algorithm optimization at SE-a. I have prepared to this project reading articles: [1] C. Burges, "A Tutorial on Suppot Vector Machines for Pattern Recognition," Kluwer Academic Publishers, Boston, 1998 [2] R.E Fan, P.H Chen, C.J. Lin, "Working Set Selection Using Second Order Information for Training Support Vector Machines," Journal of Machine Learning Research 6 (2005), pp 18891918 I also have read Hadoop documentation and examined your implementations of algoritm kMeans at this platform. Methodoligies of Development: - Test Driven Development - Deployment ANT an JUnit - SDK: Eclipse - SVN System for Versioning - Javadoc About Me: My resume you can see at link http://atisha34.googlepages.com/. I also participate in some academic projects at my college: - Working at topic based Search Engine, called Grain, which is in construction at my faculty. - Tutorial about SE-s, mentored by professor Veljko Milutinovic: "The New Avenues in Search Engines" presentation: http://atisha34.googlepages.com/Searchengines.ppt abstract: http://atisha34.googlepages.com/TheNewAvenuesinWebSearch.docx I should publish article driven by this presentation at IPSI Magazine. - Other projects in which I participate aren't related to machine learning and search engines. My Interests: - Search Engines - Software Engineering and Test Driven Development - Machine Learning - Database Modeling and OO Design - ERP and Business Processes Sincerely Yours, Marko Novakovic [Joachims, 2006] T. Joachims, Training Linear SVMs in Linear Time, Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), 2006. [Joachims, 2005] T. Joachims, A Support Vector Method for Multivariate Performance Measures, Proceedings of the International Conference on Machine Learning (ICML), 2005. ____________________________________________________________________________________ Special deal for Yahoo! users & friends - No Cost. Get a month of Blockbuster Total Access now http://tc.deals.yahoo.com/tc/blockbuster/text3.com