Re: switch to ISO 639-2 codes for languages?

Chris Collins Tue, 17 May 2011 13:24:37 -0700

Nutch Language classifier uses alpha2.  Most systems I have used in the past 
(albeit not NLP oriented) typically use alpha2. Also the names are explicitly 
called out when users of OpenNLP load a model, this would be one more place 
existing users would have to change (consider that an API incompatibility).


Whats the net gain your interested in by moving to alpha3?

C
On May 17, 2011, at 12:16 PM, Jason Baldridge wrote:

> Sure. So change that to be ISO 639-3.
> 
> On Tue, May 17, 2011 at 1:50 PM, Benson Margulies 
> <[email protected]>wrote:
> 
>> -2 is pretty useless. Use -3 if you want to switch.
>> 
>> On Tue, May 17, 2011 at 2:40 PM, Oleg Tikhonov <[email protected]> wrote:
>>> My two cents, tesseract-ocr also uses ISO 639-3 and it would be great for
>>> those who builds the solutions such as openNLP + tesseract.
>>> 
>>> -Oleg
>>> 
>>> On Tue, May 17, 2011 at 9:33 PM, Jason Baldridge
>>> <[email protected]>wrote:
>>> 
>>>> I think we should change to the three character convention for language
>>>> specific materials, e.g. "eng" rather than "en" for English.
>>>> 
>>>> http://en.wikipedia.org/wiki/List_of_ISO_639-2_codes
>>>> 
>>>> Do others agree?
>>>> 
>>>> --
>>>> Jason Baldridge
>>>> Assistant Professor, Department of Linguistics
>>>> The University of Texas at Austin
>>>> http://www.jasonbaldridge.com
>>>> http://twitter.com/jasonbaldridge
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Jason Baldridge
> Assistant Professor, Department of Linguistics
> The University of Texas at Austin
> http://www.jasonbaldridge.com
> http://twitter.com/jasonbaldridge

Re: switch to ISO 639-2 codes for languages?

Reply via email to