On Mar 29, 2010, at 2:54 AM, 杨杰 wrote:

> Thank you, Grant!
> 
> I'm now read the information on the website you recommended. I will try my
> best to finish the study of Solr plugin and go further with Mahout to
> prepare for my proposal as quickly as possible, and my proposal will be
> submitted within 5 days.
> 
> I have another question. When submitting proposal, should I wait until my
> mentor signed? I found there are many projects without signed mentor
> currently.
> 

You will need to apply on the GSOC website (it's good to add it up on JIRA 
too).  Then, us mentors will go in and rate the proposals.  Part of this rating 
is a mentor indicating they are willing to be a mentor for that particular 
project.  In the past, Mahout has been _VERY_ competitive (i.e. lots of good 
proposals and only 2-3 mentors to go around).  Not too mention that the Apache 
Soft. Foundation only gets a certain number of slots, so you are competing 
against other ASF projects as well.  Thus, it is very important that you 
provide a well thought out and well referenced plan.  It is also important that 
you have a reasonable plan for implementing, including what summer obligations 
you have other than GSOC (which are fine to have, just be up front about them) 
and that you are not taking on too much.



> Wait for your reply humbly.
> 
> 
> Best wishes.
> 
> On Sun, Mar 28, 2010 at 8:00 PM, Grant Ingersoll <gsing...@apache.org>wrote:
> 
>> 
>> On Mar 28, 2010, at 12:52 AM, 杨杰 wrote:
>> 
>>> Dear Mahout Developers,
>>> 
>>> I'm Yang Jie, a MSc student in Computer Science from China. I am eager to
>>> apply for the project of Implement Integration of Mahout Clustering or
>>> Classification with Apache Solr[Mahout-343].
>>> 
>>> I am very interested in large-scale machine learning – also one of the
>>> directions of my group - and indexing in the information retrieval. That
>> is
>>> the reason why I chose the large scaled topical partitional indexing as
>> my
>>> graduates' dissertation. As a result, when found it, I was quite
>> attracted!
>>> It is related to my work so that I could pay enough time into this
>> project.
>>> If get this honor, I will try my best to make it as pretty as I can.
>>> 
>>> My main purpose about this project is to add a classification algorithm
>> to
>>> the index module to Solr, if I had understood the description correctly.
>> The
>>> main target to use the plugin on of my plan will focus on Solr's indexing
>>> module. That means, tests of my plugin will be on this module firstly. I
>>> have now read the code of lucene, tested the Mahout and indexing of
>> lucene
>>> on Map/Reduce and had a preliminary understand upon Solr. What I am doing
>>> now is gathering the data structure and plugin information of Solr.
>>> 
>>> Currently, there is still some questions in my mind:
>>> 
>>>  1.
>>> 
>>>  Should I impletement a plugin to Solr which could handle any of the
>>>  classification algorithms in Mahout based on the data schema, or is it
>> a
>>>  plugin only for one of the classification algorithms? This is what I
>> didn't
>>>  understand from the name of the project(sorry).
>> 
>> I think a general solution is better, but it will likely make sense in your
>> planning to pick one for the first phase of your project and get it working
>> and then work w/ the Solr/Mahout community to generalize it.
>> 
>> 
>>>  2.
>>> 
>>>  I've now run some algorithms in Mahout on the Map/Reduce cluster, and
>>>  tried Solr, but still lack of further information about this project.
>> Then
>>>  how could I get start with it?
>> 
>> 
>> Here's the rough details I have in my head:
>> 
>> 1. Training of the classifier is handled off line
>> 2. Once trained, an UpdateProcessor is hooked into Solr such that as the
>> document comes in, the update processor will take in the field from the
>> document and come up w/ the appropriate classification(s) and then add those
>> labels to a new, configurable field to be indexed
>> 
>> Adding a request handler that could kick off training, manage the model,
>> etc. would also be useful.  Think about what you can get done in the time
>> frame given.
>> 
>> 
>> 
>>> 
>>> I am now going on with the plugin introduction of Solr. If got your help,
>> I
>>> will be quite encouraged. The project is a meaningful experience for me,
>> and
>>> it attracts me to pay my energy on it. I will try my best to complete it.
>>> 
>>> Best wishes !
>>> 
>>> 
>>> --
>>> Yang Jie(杨杰)
>>> hi.baidu.com/thinkdifferent
>>> 
>>> Group of CLOUD, Xi'an Jiaotong University
>>> Department of Computer Science and Technology, Xi’an Jiaotong University
>>> 
>>> PHONE: 86 1346888 3723
>>> TEL: 86 29 82665263 EXT. 608
>>> MSN: xtyangjie2...@yahoo.com.cn
>>> 
>>> once i didn't know software is not free; then i knew it days later; now i
>>> find it indeed free.
>> 
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem using Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>> 
> 
> 
> -- 
> Yang Jie(杨杰)
> hi.baidu.com/thinkdifferent
> 
> Group of CLOUD, Xi'an Jiaotong University
> Department of Computer Science and Technology, Xi’an Jiaotong University
> 
> PHONE: 86 1346888 3723
> TEL: 86 29 82665263 EXT. 608
> MSN: xtyangjie2...@yahoo.com.cn
> 
> once i didn't know software is not free; then i knew it days later; now i
> find it indeed free.

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search

Reply via email to