On 4/28/11 4:56 AM, Jason Baldridge wrote:
No -- what's that?
Quite a while back I made a maxent based document categorizer (see the doccat package)
to detect the language of medical records. Currently it only has a document level bag-of-words feature generator, but feature generation is extensible. Jörn
