On 4/14/11 5:15 AM, Jason Baldridge wrote:
For many applications, it would be useful to have a universal tagset for any language you are working with. See below for details on a project that provides mappings from many standard treebanks to a course-grained tagset (12 tags). We might want to support these mappings to simple tags in our models (e.g. have a model that uses corpus-native tags and another that uses universal tags).
All we need to do is, to replace the pos-tags in the training data with the universal one, right? I guess we could add support for this to the converter tool or make a small tool which can replace the tags. Jörn
