It sounds like if someone is going to run CRF-NER on a massive
data set (wikipedia, the web, etc), then parallelizing it on Hadoop
makes total sense, yes.  In this case, the parallelization is of
the "trivial" sort (null Reducer), most likely, but it's still a totally
sensible thing to do.

If Mahout had a NLP subproject, this kind of thing would fit
in there, and would certainly be a welcome contribution, but we
don't yet.

Maybe you should put it up on google-code or github, and if
you Apache license, it could be easily incorporated (here or
elsewhere) later.

  -jake

2010/5/12 张佳宝 <[email protected]>

> I am not hadooping the traning,I am concerning the split of dateset using
> hadoop,is this a useful work?
>
> 2010/5/12 Benson Margulies <[email protected]>
>
> > I assume that you are hadooping the training? The decoder is likely to
> > fast to bother with.
> >
> >
> > 2010/5/12 张佳宝 <[email protected]>:
> > > CRF
> > >
> > > 2010/5/12 Benson Margulies <[email protected]>
> > >
> > >> What sort of model are you using?
> > >>
> > >> On Tue, May 11, 2010 at 10:04 PM, 张佳宝 <[email protected]> wrote:
> > >> > Hi,
> > >> > I am working with named entity recognization though Hadoop map
> reduce
> > >> frame
> > >> > using large mount of website-data.It is a similar work to Mahout ,so
> I
> > >> want
> > >> > to know if there is  anyone have done this work?and if you are
> > intersted
> > >> in
> > >> > it ,i can contribute it to you when I totally finished it .
> > >> >
> > >>
> > >
> >
>

Reply via email to