On Fri, Jul 1, 2011 at 3:31 AM, Dmitri O.Kondratiev <doko...@gmail.com> wrote: > Hi, > Please advise on NLP libraries similar to Natural Language Toolkit
There is a (slowly?) growing NLP community for haskell over at: http://projects.haskell.org/nlp/ The nlp mailing list may be a better place to ask for details. To the best of my knowledge, most of the NLTK / OpenNLP capabilities have yet to be implemented/ported to Haskell, but there are some packages to take a look at on Hackage. > First of all I need: > - tools to construct 'bag of words' > (http://en.wikipedia.org/wiki/Bag_of_words_model), which is a list of words > in the > article. This is trivially implemented if you have a natural language tokenizer you're happy with. Toktok might be worth looking at: http://hackage.haskell.org/package/toktok but I *think* it takes a pretty simple view of tokens (assume it is the tokenizer I've been using within the GF). Eric Kow (?) has a tokenizer implementation, which I can't seem to find at the moment - if I recall correctly, it is also very simple, but it would be a great place to implement a more complex tokenizer :) > - tools to prune common words, such as prepositions and conjunctions, as > well as extremely rare words, such as the ones with typos. I'm not sure what you mean by 'prune'. Are you looking for a stopword list to remove irrelevant / confusing words from something like a search query? (that's not hard to do with a stemmer and a set) > - stemming tools There is an implementation of the porter stemmer on Hackage: - http://hackage.haskell.org/package/porter > - Naive Bayes classifier I'm not aware of a general-purpose bayesian classifier lib. for haskell, but it *would* be great to have :) There are probably some general-purpose statistical packages that I'm unaware of that offer a larger set of capabilities... > - SVM classifier There are a few of these. Take a look at the AI category on hackage: - http://hackage.haskell.org/packages/archive/pkg-list.html#cat:ai --Rogan > - k-means clustering _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe