On May 13, 2010, at 2:38 AM, Jake Mannix wrote: > It sounds like if someone is going to run CRF-NER on a massive > data set (wikipedia, the web, etc), then parallelizing it on Hadoop > makes total sense, yes. In this case, the parallelization is of > the "trivial" sort (null Reducer), most likely, but it's still a totally > sensible thing to do. > > If Mahout had a NLP subproject, this kind of thing would fit > in there, and would certainly be a welcome contribution, but we > don't yet.
I think we could still take the contribution, as it doesn't require a subproject/module to be setup just yet. We have collocations now, so it is heading towards critical mass. > > Maybe you should put it up on google-code or github, and if > you Apache license, it could be easily incorporated (here or > elsewhere) later. > > -jake > > 2010/5/12 张佳宝 <zhangjia...@gmail.com> > >> I am not hadooping the traning,I am concerning the split of dateset using >> hadoop,is this a useful work? >> >> 2010/5/12 Benson Margulies <bimargul...@gmail.com> >> >>> I assume that you are hadooping the training? The decoder is likely to >>> fast to bother with. >>> >>> >>> 2010/5/12 张佳宝 <zhangjia...@gmail.com>: >>>> CRF >>>> >>>> 2010/5/12 Benson Margulies <bimargul...@gmail.com> >>>> >>>>> What sort of model are you using? >>>>> >>>>> On Tue, May 11, 2010 at 10:04 PM, 张佳宝 <zhangjia...@gmail.com> wrote: >>>>>> Hi, >>>>>> I am working with named entity recognization though Hadoop map >> reduce >>>>> frame >>>>>> using large mount of website-data.It is a similar work to Mahout ,so >> I >>>>> want >>>>>> to know if there is anyone have done this work?and if you are >>> intersted >>>>> in >>>>>> it ,i can contribute it to you when I totally finished it . >>>>>> >>>>> >>>> >>> >> -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search