Thank you, Grant! What you said has given me important advice. Once I browsed the history on the index of GSoC, I found it ideas in the Mahout - and even all the ideas from apache foundation - are indeed competitive, that proves that the mahout - and apache - is a project indeed worthy to do. That is also one of the reasons I chose it. Firstly, I will treate my proposal seriously to complete it well and reasonable within these days, and if accepted, I will handle the project with enough time to ensure its quality.
I am now writing my proposal, and will finish it these days. If i run into questions, I am thankful for you to be kindly enough to give me the advice! Best wishes! On Mon, Mar 29, 2010 at 7:10 PM, Grant Ingersoll <gsing...@apache.org>wrote: > > On Mar 29, 2010, at 2:54 AM, 杨杰 wrote: > > > Thank you, Grant! > > > > I'm now read the information on the website you recommended. I will try > my > > best to finish the study of Solr plugin and go further with Mahout to > > prepare for my proposal as quickly as possible, and my proposal will be > > submitted within 5 days. > > > > I have another question. When submitting proposal, should I wait until my > > mentor signed? I found there are many projects without signed mentor > > currently. > > > > You will need to apply on the GSOC website (it's good to add it up on JIRA > too). Then, us mentors will go in and rate the proposals. Part of this > rating is a mentor indicating they are willing to be a mentor for that > particular project. In the past, Mahout has been _VERY_ competitive (i.e. > lots of good proposals and only 2-3 mentors to go around). Not too mention > that the Apache Soft. Foundation only gets a certain number of slots, so you > are competing against other ASF projects as well. Thus, it is very > important that you provide a well thought out and well referenced plan. It > is also important that you have a reasonable plan for implementing, > including what summer obligations you have other than GSOC (which are fine > to have, just be up front about them) and that you are not taking on too > much. > > > > > Wait for your reply humbly. > > > > > > Best wishes. > > > > On Sun, Mar 28, 2010 at 8:00 PM, Grant Ingersoll <gsing...@apache.org > >wrote: > > > >> > >> On Mar 28, 2010, at 12:52 AM, 杨杰 wrote: > >> > >>> Dear Mahout Developers, > >>> > >>> I'm Yang Jie, a MSc student in Computer Science from China. I am eager > to > >>> apply for the project of Implement Integration of Mahout Clustering or > >>> Classification with Apache Solr[Mahout-343]. > >>> > >>> I am very interested in large-scale machine learning – also one of the > >>> directions of my group - and indexing in the information retrieval. > That > >> is > >>> the reason why I chose the large scaled topical partitional indexing as > >> my > >>> graduates' dissertation. As a result, when found it, I was quite > >> attracted! > >>> It is related to my work so that I could pay enough time into this > >> project. > >>> If get this honor, I will try my best to make it as pretty as I can. > >>> > >>> My main purpose about this project is to add a classification algorithm > >> to > >>> the index module to Solr, if I had understood the description > correctly. > >> The > >>> main target to use the plugin on of my plan will focus on Solr's > indexing > >>> module. That means, tests of my plugin will be on this module firstly. > I > >>> have now read the code of lucene, tested the Mahout and indexing of > >> lucene > >>> on Map/Reduce and had a preliminary understand upon Solr. What I am > doing > >>> now is gathering the data structure and plugin information of Solr. > >>> > >>> Currently, there is still some questions in my mind: > >>> > >>> 1. > >>> > >>> Should I impletement a plugin to Solr which could handle any of the > >>> classification algorithms in Mahout based on the data schema, or is it > >> a > >>> plugin only for one of the classification algorithms? This is what I > >> didn't > >>> understand from the name of the project(sorry). > >> > >> I think a general solution is better, but it will likely make sense in > your > >> planning to pick one for the first phase of your project and get it > working > >> and then work w/ the Solr/Mahout community to generalize it. > >> > >> > >>> 2. > >>> > >>> I've now run some algorithms in Mahout on the Map/Reduce cluster, and > >>> tried Solr, but still lack of further information about this project. > >> Then > >>> how could I get start with it? > >> > >> > >> Here's the rough details I have in my head: > >> > >> 1. Training of the classifier is handled off line > >> 2. Once trained, an UpdateProcessor is hooked into Solr such that as the > >> document comes in, the update processor will take in the field from the > >> document and come up w/ the appropriate classification(s) and then add > those > >> labels to a new, configurable field to be indexed > >> > >> Adding a request handler that could kick off training, manage the model, > >> etc. would also be useful. Think about what you can get done in the > time > >> frame given. > >> > >> > >> > >>> > >>> I am now going on with the plugin introduction of Solr. If got your > help, > >> I > >>> will be quite encouraged. The project is a meaningful experience for > me, > >> and > >>> it attracts me to pay my energy on it. I will try my best to complete > it. > >>> > >>> Best wishes ! > >>> > >>> > >>> -- > >>> Yang Jie(杨杰) > >>> hi.baidu.com/thinkdifferent > >>> > >>> Group of CLOUD, Xi'an Jiaotong University > >>> Department of Computer Science and Technology, Xi’an Jiaotong > University > >>> > >>> PHONE: 86 1346888 3723 > >>> TEL: 86 29 82665263 EXT. 608 > >>> MSN: xtyangjie2...@yahoo.com.cn > >>> > >>> once i didn't know software is not free; then i knew it days later; now > i > >>> find it indeed free. > >> > >> -------------------------- > >> Grant Ingersoll > >> http://www.lucidimagination.com/ > >> > >> Search the Lucene ecosystem using Solr/Lucene: > >> http://www.lucidimagination.com/search > >> > >> > > > > > > -- > > Yang Jie(杨杰) > > hi.baidu.com/thinkdifferent > > > > Group of CLOUD, Xi'an Jiaotong University > > Department of Computer Science and Technology, Xi’an Jiaotong University > > > > PHONE: 86 1346888 3723 > > TEL: 86 29 82665263 EXT. 608 > > MSN: xtyangjie2...@yahoo.com.cn > > > > once i didn't know software is not free; then i knew it days later; now i > > find it indeed free. > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem using Solr/Lucene: > http://www.lucidimagination.com/search > > -- Yang Jie(杨杰) hi.baidu.com/thinkdifferent Group of CLOUD, Xi'an Jiaotong University Department of Computer Science and Technology, Xi’an Jiaotong University PHONE: 86 1346888 3723 TEL: 86 29 82665263 EXT. 608 MSN: xtyangjie2...@yahoo.com.cn once i didn't know software is not free; then i knew it days later; now i find it indeed free.