Re: pignlproc: a new tool to build NER models from Wikipedia / DBpedia dumps

Grant Ingersoll Sat, 22 Jan 2011 05:05:02 -0800

On Jan 13, 2011, at 10:55 AM, Jörn Kottmann wrote:

> On 1/11/11 2:21 PM, Olivier Grisel wrote:
>> 2011/1/4 Olivier Grisel<[email protected]>:
>>> I plan to give more details in a blog post soon (tm).
>> Here it is:
>> 
>>   
>> http://blogs.nuxeo.com/dev/2011/01/mining-wikipedia-with-hadoop-and-pig-for-natural-language-processing.html
>> 
>> It gives a bit more context and some additional results and clues for
>> improvements and potential new usages.
>> 
> Now I read this post too, sounds very interesting.
> 
> What is the biggest training file for the name finder you can generate with 
> this method?
> 
> I think we need MapReduce training support for OpenNLP. Actually that is 
> already on my
> todo list, but currently I am still busy with the Apache migration and the 
> next release.
> Anyway I hope we can get that done at least partially for the name finder 
> this year.
>


One of the things that I mentioned earlier is that it might make sense to just 
build on Mahout for this stuff.  We'd love to do MaxEnt, but we also have a lot 
of other classifiers (bayes, SGD, Random Forests).  To me, if OpenNLP was 
abstracted a little bit from the classification algorithm, that would make it 
easier for people to plug-in/try out their own, including the Pig stuff Olivier 
is suggesting.

-Grant

Re: pignlproc: a new tool to build NER models from Wikipedia / DBpedia dumps

Reply via email to