Re: pignlproc: a new tool to build NER models from Wikipedia / DBpedia dumps

Olivier Grisel Wed, 05 Jan 2011 07:45:40 -0800

2011/1/5 Jason Baldridge <[email protected]>:
> This looks great, and it aligns with my own recent interest in large scale
> NLP with Hadoop, including working with Wikipedia. I'll look at it more
> closely later, but in principle I would be interested in having this brought
> into the OpenNLP project in some way!


Thanks for your interest. Don't hesitate to fork the repo on github to
experiment with your own design ideas. OpenNLP methods often handle
String[][] and Span[] data-structures where span start and end index
either refer to char positions or token indices. It might be
interesting make some generic wrappers for those data-structures from
/ to pig tuples by taking care of not reallocating memory when not
necessary.

Mining a medium / large scale corpus in an almost interactive ways
with the pig shell (grunt) is a great way to quickly test ideas and
prototypes to tap into the unreasonable effectiveness of data.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: pignlproc: a new tool to build NER models from Wikipedia / DBpedia dumps

Reply via email to