Nutch Language classifier uses alpha2. Most systems I have used in the past (albeit not NLP oriented) typically use alpha2. Also the names are explicitly called out when users of OpenNLP load a model, this would be one more place existing users would have to change (consider that an API incompatibility).
Whats the net gain your interested in by moving to alpha3? C On May 17, 2011, at 12:16 PM, Jason Baldridge wrote: > Sure. So change that to be ISO 639-3. > > On Tue, May 17, 2011 at 1:50 PM, Benson Margulies > <[email protected]>wrote: > >> -2 is pretty useless. Use -3 if you want to switch. >> >> On Tue, May 17, 2011 at 2:40 PM, Oleg Tikhonov <[email protected]> wrote: >>> My two cents, tesseract-ocr also uses ISO 639-3 and it would be great for >>> those who builds the solutions such as openNLP + tesseract. >>> >>> -Oleg >>> >>> On Tue, May 17, 2011 at 9:33 PM, Jason Baldridge >>> <[email protected]>wrote: >>> >>>> I think we should change to the three character convention for language >>>> specific materials, e.g. "eng" rather than "en" for English. >>>> >>>> http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes >>>> >>>> Do others agree? >>>> >>>> -- >>>> Jason Baldridge >>>> Assistant Professor, Department of Linguistics >>>> The University of Texas at Austin >>>> http://www.jasonbaldridge.com >>>> http://twitter.com/jasonbaldridge >>>> >>> >> > > > > -- > Jason Baldridge > Assistant Professor, Department of Linguistics > The University of Texas at Austin > http://www.jasonbaldridge.com > http://twitter.com/jasonbaldridge
