It definitely belongs. And besides lots and lots of the data in large scale machine learning looks like text. Friends on linkedIn, history of traffic violations for insurance, list of users who have clicked on an ad, the list goes on forever.
Basically "text" is an ordered sequence of symbols and you encounter that all over the place. Cooccurrence at the window and the document level is very widely applicable. On Thu, Jan 7, 2010 at 4:03 PM, Otis Gospodnetic <[email protected] > wrote: > NLP does fall under the Mahout umbrella, I'd say. Future subproject > perhaps? -- Ted Dunning, CTO DeepDyve
