On May 13, 2010, at 2:38 AM, Jake Mannix wrote:

> It sounds like if someone is going to run CRF-NER on a massive
> data set (wikipedia, the web, etc), then parallelizing it on Hadoop
> makes total sense, yes.  In this case, the parallelization is of
> the "trivial" sort (null Reducer), most likely, but it's still a totally
> sensible thing to do.
> 
> If Mahout had a NLP subproject, this kind of thing would fit
> in there, and would certainly be a welcome contribution, but we
> don't yet.

I think we could still take the contribution, as it doesn't require a 
subproject/module to be setup just yet.  We have collocations now, so it is 
heading towards critical mass.


> 
> Maybe you should put it up on google-code or github, and if
> you Apache license, it could be easily incorporated (here or
> elsewhere) later.
> 
>  -jake
> 
> 2010/5/12 张佳宝 <zhangjia...@gmail.com>
> 
>> I am not hadooping the traning,I am concerning the split of dateset using
>> hadoop,is this a useful work?
>> 
>> 2010/5/12 Benson Margulies <bimargul...@gmail.com>
>> 
>>> I assume that you are hadooping the training? The decoder is likely to
>>> fast to bother with.
>>> 
>>> 
>>> 2010/5/12 张佳宝 <zhangjia...@gmail.com>:
>>>> CRF
>>>> 
>>>> 2010/5/12 Benson Margulies <bimargul...@gmail.com>
>>>> 
>>>>> What sort of model are you using?
>>>>> 
>>>>> On Tue, May 11, 2010 at 10:04 PM, 张佳宝 <zhangjia...@gmail.com> wrote:
>>>>>> Hi,
>>>>>> I am working with named entity recognization though Hadoop map
>> reduce
>>>>> frame
>>>>>> using large mount of website-data.It is a similar work to Mahout ,so
>> I
>>>>> want
>>>>>> to know if there is  anyone have done this work?and if you are
>>> intersted
>>>>> in
>>>>>> it ,i can contribute it to you when I totally finished it .
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search

Reply via email to