It sounds like if someone is going to run CRF-NER on a massive data set (wikipedia, the web, etc), then parallelizing it on Hadoop makes total sense, yes. In this case, the parallelization is of the "trivial" sort (null Reducer), most likely, but it's still a totally sensible thing to do.
If Mahout had a NLP subproject, this kind of thing would fit in there, and would certainly be a welcome contribution, but we don't yet. Maybe you should put it up on google-code or github, and if you Apache license, it could be easily incorporated (here or elsewhere) later. -jake 2010/5/12 张佳宝 <[email protected]> > I am not hadooping the traning,I am concerning the split of dateset using > hadoop,is this a useful work? > > 2010/5/12 Benson Margulies <[email protected]> > > > I assume that you are hadooping the training? The decoder is likely to > > fast to bother with. > > > > > > 2010/5/12 张佳宝 <[email protected]>: > > > CRF > > > > > > 2010/5/12 Benson Margulies <[email protected]> > > > > > >> What sort of model are you using? > > >> > > >> On Tue, May 11, 2010 at 10:04 PM, 张佳宝 <[email protected]> wrote: > > >> > Hi, > > >> > I am working with named entity recognization though Hadoop map > reduce > > >> frame > > >> > using large mount of website-data.It is a similar work to Mahout ,so > I > > >> want > > >> > to know if there is anyone have done this work?and if you are > > intersted > > >> in > > >> > it ,i can contribute it to you when I totally finished it . > > >> > > > >> > > > > > >
