On Wed, Jul 6, 2011 at 8:32 PM, wren ng thornton <w...@freegeek.org> wrote:
> On 7/6/11 9:27 AM, Dmitri O.Kondratiev wrote: > > Hi, > > Continuing my search of Haskell NLP tools and libs, I wonder if the > > following Haskell libraries exist (googling them does not help): > > 1) End of Sentence (EOS) Detection. Break text into a collection of > > meaningful sentences. > > Depending on how you mean, this is either fairly trivial (for English) or > an ill-defined problem. For things like determining whether the "." > character is intended as a full stop vs part of an abbreviation; that's > trivial. > > But for general sentence breaking, how do you intend to deal with > quotations? What about when news articles quote someone uttering a few > sentences before the end-quote marker? So far as I'm aware, there's no > satisfactory definition of what the solution should be in all reasonable > cases. A "sentence" isn't really very well-defined in practice. > I am looking for Haskell implementation of sentence tokenizer such as described by Tibor Kiss and Jan Strunk’s in “Unsupervised Multilingual Sentence Boundary Detection”, which is implemented in NLTK: http://nltk.googlecode.com/svn/trunk/doc/api/nltk.tokenize.punkt-module.html > > 2) Part-of-Speech (POS) Tagging. Assign part-of-speech information to > each > > token. > > There are numerous approaches to this problem; do you care about the > solution, or will any one of them suffice? > > I've been working over the last year+ on an optimized HMM-based POS > tagger/supertagger with online tagging and anytime n-best tagging. I'm > planning to release it this summer (i.e., by the end of August), though > there are a few things I'd like to polish up before doing so. In > particular, I want to make the package less monolithic. When I release it > I'll make announcements here and on the nlp@ list. I am looking for some already working POS tagging framework that can be customized for different pidgin languages. > > 3) Chunking. Analyze each tagged token within a sentence and assemble > > compound tokens that express logical concepts. Define a custom grammar. > > > > 4) Extraction. Analyze each chunk and further tag the chunks as named > > entities, such as people, organizations, locations, etc. > > > > Any ideas where to look for similar Haskell libraries? > > I don't know of any work in these areas in Haskell (though I'd love to > hear about it). You should try asking on the nlp@ list where the other > linguists and NLPers are more likely to see it. > > I will, though n...@projects.haskell.org. looks very quiet...
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe