Thank you, Grant! What you said has given me important advice.

Once I browsed the history on the index of GSoC, I found it ideas in the
Mahout - and even all the ideas from apache foundation - are indeed
competitive, that proves that the mahout - and apache - is a project indeed
worthy to do. That is also one of the reasons I chose it. Firstly, I will
treate my proposal seriously to complete it well and reasonable within these
days, and if accepted, I will handle the project with enough time to ensure
its quality.

I am now writing my proposal, and will finish it these days. If i run into
questions, I am thankful for you to be kindly enough to give me the advice!

Best wishes!

On Mon, Mar 29, 2010 at 7:10 PM, Grant Ingersoll <gsing...@apache.org>wrote:

>
> On Mar 29, 2010, at 2:54 AM, 杨杰 wrote:
>
> > Thank you, Grant!
> >
> > I'm now read the information on the website you recommended. I will try
> my
> > best to finish the study of Solr plugin and go further with Mahout to
> > prepare for my proposal as quickly as possible, and my proposal will be
> > submitted within 5 days.
> >
> > I have another question. When submitting proposal, should I wait until my
> > mentor signed? I found there are many projects without signed mentor
> > currently.
> >
>
> You will need to apply on the GSOC website (it's good to add it up on JIRA
> too).  Then, us mentors will go in and rate the proposals.  Part of this
> rating is a mentor indicating they are willing to be a mentor for that
> particular project.  In the past, Mahout has been _VERY_ competitive (i.e.
> lots of good proposals and only 2-3 mentors to go around).  Not too mention
> that the Apache Soft. Foundation only gets a certain number of slots, so you
> are competing against other ASF projects as well.  Thus, it is very
> important that you provide a well thought out and well referenced plan.  It
> is also important that you have a reasonable plan for implementing,
> including what summer obligations you have other than GSOC (which are fine
> to have, just be up front about them) and that you are not taking on too
> much.
>
>
>
> > Wait for your reply humbly.
> >
> >
> > Best wishes.
> >
> > On Sun, Mar 28, 2010 at 8:00 PM, Grant Ingersoll <gsing...@apache.org
> >wrote:
> >
> >>
> >> On Mar 28, 2010, at 12:52 AM, 杨杰 wrote:
> >>
> >>> Dear Mahout Developers,
> >>>
> >>> I'm Yang Jie, a MSc student in Computer Science from China. I am eager
> to
> >>> apply for the project of Implement Integration of Mahout Clustering or
> >>> Classification with Apache Solr[Mahout-343].
> >>>
> >>> I am very interested in large-scale machine learning – also one of the
> >>> directions of my group - and indexing in the information retrieval.
> That
> >> is
> >>> the reason why I chose the large scaled topical partitional indexing as
> >> my
> >>> graduates' dissertation. As a result, when found it, I was quite
> >> attracted!
> >>> It is related to my work so that I could pay enough time into this
> >> project.
> >>> If get this honor, I will try my best to make it as pretty as I can.
> >>>
> >>> My main purpose about this project is to add a classification algorithm
> >> to
> >>> the index module to Solr, if I had understood the description
> correctly.
> >> The
> >>> main target to use the plugin on of my plan will focus on Solr's
> indexing
> >>> module. That means, tests of my plugin will be on this module firstly.
> I
> >>> have now read the code of lucene, tested the Mahout and indexing of
> >> lucene
> >>> on Map/Reduce and had a preliminary understand upon Solr. What I am
> doing
> >>> now is gathering the data structure and plugin information of Solr.
> >>>
> >>> Currently, there is still some questions in my mind:
> >>>
> >>>  1.
> >>>
> >>>  Should I impletement a plugin to Solr which could handle any of the
> >>>  classification algorithms in Mahout based on the data schema, or is it
> >> a
> >>>  plugin only for one of the classification algorithms? This is what I
> >> didn't
> >>>  understand from the name of the project(sorry).
> >>
> >> I think a general solution is better, but it will likely make sense in
> your
> >> planning to pick one for the first phase of your project and get it
> working
> >> and then work w/ the Solr/Mahout community to generalize it.
> >>
> >>
> >>>  2.
> >>>
> >>>  I've now run some algorithms in Mahout on the Map/Reduce cluster, and
> >>>  tried Solr, but still lack of further information about this project.
> >> Then
> >>>  how could I get start with it?
> >>
> >>
> >> Here's the rough details I have in my head:
> >>
> >> 1. Training of the classifier is handled off line
> >> 2. Once trained, an UpdateProcessor is hooked into Solr such that as the
> >> document comes in, the update processor will take in the field from the
> >> document and come up w/ the appropriate classification(s) and then add
> those
> >> labels to a new, configurable field to be indexed
> >>
> >> Adding a request handler that could kick off training, manage the model,
> >> etc. would also be useful.  Think about what you can get done in the
> time
> >> frame given.
> >>
> >>
> >>
> >>>
> >>> I am now going on with the plugin introduction of Solr. If got your
> help,
> >> I
> >>> will be quite encouraged. The project is a meaningful experience for
> me,
> >> and
> >>> it attracts me to pay my energy on it. I will try my best to complete
> it.
> >>>
> >>> Best wishes !
> >>>
> >>>
> >>> --
> >>> Yang Jie(杨杰)
> >>> hi.baidu.com/thinkdifferent
> >>>
> >>> Group of CLOUD, Xi'an Jiaotong University
> >>> Department of Computer Science and Technology, Xi’an Jiaotong
> University
> >>>
> >>> PHONE: 86 1346888 3723
> >>> TEL: 86 29 82665263 EXT. 608
> >>> MSN: xtyangjie2...@yahoo.com.cn
> >>>
> >>> once i didn't know software is not free; then i knew it days later; now
> i
> >>> find it indeed free.
> >>
> >> --------------------------
> >> Grant Ingersoll
> >> http://www.lucidimagination.com/
> >>
> >> Search the Lucene ecosystem using Solr/Lucene:
> >> http://www.lucidimagination.com/search
> >>
> >>
> >
> >
> > --
> > Yang Jie(杨杰)
> > hi.baidu.com/thinkdifferent
> >
> > Group of CLOUD, Xi'an Jiaotong University
> > Department of Computer Science and Technology, Xi’an Jiaotong University
> >
> > PHONE: 86 1346888 3723
> > TEL: 86 29 82665263 EXT. 608
> > MSN: xtyangjie2...@yahoo.com.cn
> >
> > once i didn't know software is not free; then i knew it days later; now i
> > find it indeed free.
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>


-- 
Yang Jie(杨杰)
hi.baidu.com/thinkdifferent

Group of CLOUD, Xi'an Jiaotong University
Department of Computer Science and Technology, Xi’an Jiaotong University

PHONE: 86 1346888 3723
TEL: 86 29 82665263 EXT. 608
MSN: xtyangjie2...@yahoo.com.cn

once i didn't know software is not free; then i knew it days later; now i
find it indeed free.

Reply via email to