On 1/5/11 4:44 PM, Olivier Grisel wrote:
2011/1/5 Jason Baldridge<[email protected]>:
This looks great, and it aligns with my own recent interest in large scale
NLP with Hadoop, including working with Wikipedia. I'll look at it more
closely later, but in principle I would be interested in having this brought
into the OpenNLP project in some way!
Thanks for your interest. Don't hesitate to fork the repo on github to
experiment with your own design ideas. OpenNLP methods often handle
String[][] and Span[] data-structures where span start and end index
either refer to char positions or token indices. It might be
interesting make some generic wrappers for those data-structures from
/ to pig tuples by taking care of not reallocating memory when not
necessary.


Making OpenNLP faster is always nice, I believe we should one day
go away from String and use CharSequence instead, because that usually
avoids a memory copy. And might be easy to integrate with pig (never used
pig myself).

Jörn

Reply via email to