Re: [OT] Re: the Ethnologue

2000-11-30 Thread John Cowan

On Thu, 30 Nov 2000, Kenneth Whistler wrote:

> > Scots is a separate language!  If you understand anything at all
> > it's by a happy accident.  (There is of course Scots-flavored
> > English as well, which is another matter.)
> 
> I was, of course, referring to Scots (alleged) English, and not
> to Scots Gaelic.

Sae wes I.  But Scotland's a twa-leidit fowkrick (Scots an Scots Gaelic),
o three gif we coont "mim-mou'd Sudron".

-- 
John Cowan   [EMAIL PROTECTED]
One art/there is/no less/no more/All things/to do/with sparks/galore
--Douglas Hofstadter





Re: [OT] Re: the Ethnologue

2000-11-30 Thread Kenneth Whistler

John Cowan replied:

> Kenneth Whistler wrote:
> 
> > To that I would add Glaswegian. When watching the
> > Scots-produced mystery shows that show up on PBS in the United
> > States on occasion, my wife and I often turn to each other
> > in bafflement and say, "Subtitles, please."
> 
> Scots is a separate language!  If you understand anything at all
> it's by a happy accident.  (There is of course Scots-flavored
> English as well, which is another matter.)

I was, of course, referring to Scots (alleged) English, and not
to Scots Gaelic.

--Ken



Re: [OT] Re: the Ethnologue

2000-11-30 Thread John Cowan

Kenneth Whistler wrote:

> To that I would add Glaswegian. When watching the
> Scots-produced mystery shows that show up on PBS in the United
> States on occasion, my wife and I often turn to each other
> in bafflement and say, "Subtitles, please."

Scots is a separate language!  If you understand anything at all
it's by a happy accident.  (There is of course Scots-flavored
English as well, which is another matter.)

-- 
There is / one art   || John Cowan <[EMAIL PROTECTED]>
no more / no less|| http://www.reutershealth.com
to do / all things   || http://www.ccil.org/~cowan
with art- / lessness \\ -- Piet Hein



Re: [OT] Re: the Ethnologue

2000-11-30 Thread Kenneth Whistler

John Cowan noted:

> 
> In general, Geordie (the traditional dialect spoken around the Tyne
> River in England) is considered to be the English dialect most difficult
> for North Americans.

To that I would add Glaswegian. When watching the
Scots-produced mystery shows that show up on PBS in the United
States on occasion, my wife and I often turn to each other
in bafflement and say, "Subtitles, please."

> In countries where English is widely spoken as a second language (former
> parts of the British Empire), the varieties are often very different.
> Indians have trouble with Kenyan English and vice versa, IIRC.

And in response to Elliotte Harold's comment, when encountering
spoken versions of English, one's task of understanding is often
made easier by the fact that an interlocutor will generally try to
move their pronunciation and usage towards what they (and you)
perceive as more standard, specifically to assist in the task
of communication. And varieties heard in media also tend to be
more comprehensible regional norms, precisely because they are
aiming at a wide audience.

--Ken



Re: [OT] Re: the Ethnologue

2000-11-30 Thread John Cowan

Elliotte Rusty Harold wrote:

> I've yet to encounter a spoken
> version of English that I couldn't understand, after at most a couple
> of minutes of accustoming myself to the accent.

You live in a country where dialect differentiation is a feeble thing,
consisting mainly in pronunciation, and where dialect areas stretch for
hundreds or even thousands of miles.  Australia and Northern China
(in the Chinese-speaking region) are about the only other parts of the
Earth with this property.  English elsewhere is more diverse.

In general, Geordie (the traditional dialect spoken around the Tyne
River in England) is considered to be the English dialect most difficult
for North Americans.

In countries where English is widely spoken as a second language (former
parts of the British Empire), the varieties are often very different.
Indians have trouble with Kenyan English and vice versa, IIRC.

-- 
There is / one art   || John Cowan <[EMAIL PROTECTED]>
no more / no less|| http://www.reutershealth.com
to do / all things   || http://www.ccil.org/~cowan
with art- / lessness \\ -- Piet Hein



RE: [OT] Re: the Ethnologue

2000-11-30 Thread Doug Ewell

Elliotte Rusty Harold <[EMAIL PROTECTED]> wrote:

> At 7:18 AM -0800 11/23/00, Christopher John Fynn wrote:
>
>> Spoken language  is not necessarily at all the same thing as
>> written language . There are e.g. plenty of mutually
>> incomprehensible forms of spoken English which might each deserve a
>> code in a standard for spoken languages but probably far fewer
>> mutually incomprehensible varieties of written English.
> ...
> I've yet to encounter a spoken version of English that I couldn't
> understand, after at most a couple of minutes of accustoming myself
> to the accent.

I think if you take Christopher's original statement and substitute the
word "Arabic" in place of "English," his point would be proven valid
with a better example.

But Elliotte is basically correct; the differences between dialects of
English are not generally as great as people sometimes make them out
to be.  Sure, it can be a challenge initially to understand another
dialect.  I remember being thrown for a loop by a waiter in Hemel
Hempstead, England who asked me, "Are you on holiday?"  At that point
I had three obstacles to overcome:

1. the word "holiday" used for what I would call "vacation"
2. the dropping of the "h" in "'oliday"
3. the European-style high-falling question tone instead of the
   American-style mid-falling-rising tone

But after a second or two I did understand him (and yes, I was indeed
on holiday).

Differences in accent often say more about the speaker than about the
language he is speaking; a Texan who speaks English with a Texas
accent would most likely speak French or Spanish with a Texas accent as
well.  And most of the vocabulary differences are in well-publicized
word pairs like hood/bonnet and elevator/lift.  This is really no
different from hearing a teenager use a vogue word such as "phat" that
has not yet reached the mainstream (and can easily be confused for a
mainstream homonym).

Naturally, this is all coming from a language hack, not a trained
linguist, so please be gentle as you correct my errors.

-Doug Ewell
 Fullerton, California



RE: [OT] Re: the Ethnologue

2000-11-30 Thread Elliotte Rusty Harold

At 7:18 AM -0800 11/23/00, Christopher John Fynn wrote:


>Spoken language  is not necessarily at all the same
>thing as written language .
>There are e.g. plenty of mutually incomprehensible
>forms of spoken English which might each deserve a
>code in a standard for spoken languages but probably
>far fewer mutually incomprehensible varieties of written
>English.

I find myself compelled to indulge to some off-topic curiosity here. 
As a native speaker of American English (suburban New Orleans 
dialect, sometimes known as "Yat") I've yet to encounter a spoken 
version of English that I couldn't understand, after at most a couple 
of minutes of accustoming myself to the accent. I've heard some 
pretty thickly accented Englishes (from my perspective) ranging from 
the Cajun bayous of Lousiana to the South Bronx to Yorkshire to New 
Zealand. So far, they were all obviously English, and at least 
intelligible to me. The only times I've had real problems were with 
non-native speakers who had a very limited command of English, and 
even then I was always eventually able to make myself understood and 
vice versa. Could you give some examples that you would consider to 
be "mutually incomprehensible forms of spoken English"?
-- 

+---++---+
| Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer |
+---++---+
|  The XML Bible (IDG Books, 1999)   |
|  http://metalab.unc.edu/xml/books/bible/   |
|   http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/   |
+--+-+
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ |
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/ |
+--+-+



RE: [OT] Re: the Ethnologue

2000-11-23 Thread Christopher John Fynn


Peter Constable wrote:

> This is a good example of why an enumeration of "languages" 
> based only on written forms (as found in ISO 639) is 
> insufficient for all user needs.

Of course ISO 639 is insufficient for *all* user needs 
- no standard is. And is there actually a remit for 
ISO 639 to include spoken languages?

Another post mentioned that ISO 639 was started for 
bibliographic purposes. Perhaps ISO 639 should stick 
to being a standard of codes for written languages and 
a separate standard (or a new part of ISO 639) should 
be started for spoken languages. There may just be too 
many conflicts trying to encode both spoken and written 
languages in the same standard with one set of codes. 

Spoken language  is not necessarily at all the same 
thing as written language . 
There are e.g. plenty of mutually incomprehensible 
forms of spoken English which might each deserve a 
code in a standard for spoken languages but probably 
far fewer mutually incomprehensible varieties of written 
English. And the varieties of dialects of written English
do not map neatly to the varieties of spoken English. 
The same is true for other written and spoken
languages.

- Chris



Re: [OT] Re: the Ethnologue

2000-09-25 Thread Jonathan Coxhead

   On 20 Sep 00, at 9:42, [EMAIL PROTECTED] wrote:

 | On 09/17/2000 11:39:14 AM Doug Ewell wrote:
 | 
 | >What names are I supposed to associate with codes like SHU, MKJ, and
 | >SRC in my (possibly hypothetical) application that deals with language
 | >tags?  Such associations are normally expected to be one-to-one.
 | >
 | >If Ethnologue codes are going to be regarded as a standard outside the
 | >confines of SIL, each code needs to be associated with a single, 
normative
 | >name.
 | 
 | A universally "politically correct" name in every case is insoluable.

   Isn't that exactly what the 3-letter code is? It can be used universally 
and unambiguously to denote any of the languages that are catalogued (or 
Ethnologued).

   It's only when you start addressing humans---in some specific language, 
in some specific locale---that a human-orientated name is needed. And once 
you've got to that point, you are clearly in the realm of human preference, 
and you can invoke whatever political, cultural or social conventions are 
desired by the particular user of the system.

   Maybe the Ethnologue could have a more comprehensive "vocabulary" 
incorporated into it: for each country/language combination, it could list 
all the names of all the *other* country/language combinations---with 
alternates---and notes on the political, cultural or social weights behind 
them. (Most of the entries would be empty, I'd guess.) This would be 
potentially controversial (and a lot of work), but if approached with the 
proper scholarly impartiality, could help its clients avoid many of the 
more egregious errors.

 | Simply picking on as a default *for the purposes of implementation of the
 | system of identifiers* is reasonable, and is a problem we have to be 
able to
 | solve if we are going to present a view of the data that is organised 
first
 | by language - at the least, you have to list one name first. This is
 | certainly going to happen.

   You only have to list one name first in each country/language entry for 
the name of the language in a given country/language. The list of languages 
can readily be sorted by identifier, and so can the names of the other 
country/language combinations listed within. I'd guess there's not much 
controversy about order after that ...

/|
 o o o (_|/
/|
   (_/



Re: [OT] Re: the Ethnologue

2000-09-22 Thread Edward Cherlin

At 6:24 AM -0800 9/21/00, Marion Gunn wrote:
>Arsa Antoine Leca:
>
>>  
>>Hindi, Hindustani, Urdu could be considered co-dialects, but 
>>have important
>>sociolinguistic differences. Hindi uses the Devanagari writing system, and
>>formal vocabulary is borrowed from Sanskrit, de-Persianized, 
>>de-Arabicized.
>>Literary Hindi, or Hindi-Urdu, has four varieties: Hindi (High 
>>Hindi, Nagari
>>Hindi, Literary Hindi, Standard Hindi)...
>>  
>>  from the online Ethnologue database, 13th ed.
>> 
>>http://www.sil.org/ethnologue/countries/Inda.html#HND>
>>
>
>Mm. Maybe a more polite (more PC) turn of phrase might be found than "could be
>considered co-dialects", which more than implies, it postulates the 
>existence of a
>standard language referent of which the above "could" be considered dialects.
>
>Someone this week, I think it might have been on this list, spoke of 
>languages as
>being "allied" to each other. I rather like that. Would it be acceptable to
>suggest replacing "co-dialects" with "allied languages"?
>mg

As long as nobody supposes that the speakers are supposed to be 
allied. Consider

Serbia, Bosnia, Croatia (Serbo-Croatian)
India, Pakistan (Hindi/Urdu)
China, Taiwan, Singapore (Chinese)
North and South Korea
The many Arab countries and dialects
Iran, Afghanistan (Farsi, Dari, Pashto)

or the U.S. and UK in 1776 and 1812. Historical examples could be 
greatly multiplied.
-- 

Edward Cherlin
Generalist
"A knot!" exclaimed Alice. "Oh, do let me help to undo it."
Alice in Wonderland



Re: [OT] Re: the Ethnologue

2000-09-21 Thread Marion Gunn

Arsa Kevin Bracey:

>
> As far as I'm aware the co- prefix does mean an equal grouping. Examples that
> spring to mind are co-worker, co-conspirator, co-exist, coincidence and
> co-operative. I thought co-dialects was a cunningly concise way of saying
> that they could all be considered dialects of each other...
>

And so it is, but even the concept "peer" is misleading (some "co-dialects"
priding themselves on being "more equal" than others of their group, some
members of which they may abhor, and deprecate for legal use in their land),
which is also why I favour "allied languages", which neatly sidesteps the
question of hierarchical relationships, real, implied, or created for any
political purpose, so that Croatian-Bosnian-Serbian then become linguistic
allies for solid linguistic reasons, nothing more implied.
mg


>
> --
> Kevin Bracey

Marion Gunn
Everson Gunn Teoranta






Re: [OT] Re: the Ethnologue

2000-09-21 Thread Kevin Bracey

In message <[EMAIL PROTECTED]>
  Doug Ewell <[EMAIL PROTECTED]> wrote:

> Marion Gunn <[EMAIL PROTECTED]> wrote:
> >
> > Mm. Maybe a more polite (more PC) turn of phrase might be found than
> > "could be considered co-dialects", which more than implies, it
> > postulates the existence of a standard language referent of which the
> > above "could" be considered dialects.
> 
> Mmm.  I hadn't thought of it that way.  The impression I got from the
> prefix "co-" was one of equality among peers, as in "co-author" or
> "co-champion"; but now I recognize a separate, contrasting sense of
> "co-" to denote subsidiary status, as in "co-pilot."  I suspect the
> Ethnologue staff intended the former (polite?) sense, but it could be
> intepreted either way as desired.
>
> What fun language is.

As far as I'm aware the co- prefix does mean an equal grouping. Examples that
spring to mind are co-worker, co-conspirator, co-exist, coincidence and
co-operative. I thought co-dialects was a cunningly concise way of saying
that they could all be considered dialects of each other.

I suspect co-pilot was intended as a polite way of NOT saying that the pilot
was secondary to the pilot. But because he clearly is, it looks like a
secondary implication of subsidarity has attached itself to the term, and so
now people start looking for a new term that doesn't imply subsidiarity.
Repeat this cycle until bored, or there are no words left :) 

What fun PC is!

-- 
Kevin Bracey, Principal Software Engineer
Pace Micro Technology plc Tel: +44 (0) 1223 518566
645 Newmarket RoadFax: +44 (0) 1223 518526
Cambridge, CB5 8PB, United KingdomWWW: http://www.acorn.co.uk/



Re: [OT] Re: the Ethnologue

2000-09-21 Thread Doug Ewell

Marion Gunn <[EMAIL PROTECTED]> wrote:

>>   Hindi, Hindustani, Urdu could be considered co-dialects...
>
> Mm. Maybe a more polite (more PC) turn of phrase might be found than
> "could be considered co-dialects", which more than implies, it
> postulates the existence of a standard language referent of which the
> above "could" be considered dialects.

Mmm.  I hadn't thought of it that way.  The impression I got from the
prefix "co-" was one of equality among peers, as in "co-author" or
"co-champion"; but now I recognize a separate, contrasting sense of
"co-" to denote subsidiary status, as in "co-pilot."  I suspect the
Ethnologue staff intended the former (polite?) sense, but it could be
intepreted either way as desired.

What fun language is!

-Doug Ewell
 Fullerton, California



Re: [OT] Re: the Ethnologue

2000-09-21 Thread Marion Gunn

Arsa Antoine Leca:

> 
>   Hindi, Hindustani, Urdu could be considered co-dialects, but have important
>   sociolinguistic differences. Hindi uses the Devanagari writing system, and
>   formal vocabulary is borrowed from Sanskrit, de-Persianized, de-Arabicized.
>   Literary Hindi, or Hindi-Urdu, has four varieties: Hindi (High Hindi, Nagari
>   Hindi, Literary Hindi, Standard Hindi)...
> 
> from the online Ethnologue database, 13th ed.
> http://www.sil.org/ethnologue/countries/Inda.html#HND>
>

Mm. Maybe a more polite (more PC) turn of phrase might be found than "could be
considered co-dialects", which more than implies, it postulates the existence of a
standard language referent of which the above "could" be considered dialects.

Someone this week, I think it might have been on this list, spoke of languages as
being "allied" to each other. I rather like that. Would it be acceptable to
suggest replacing "co-dialects" with "allied languages"?
mg


>
> Of course, Peter and many people here know that I am taking the worst possible
> example. Perhaps one may also fill reports to make clearer that most if not all
> of these different entries are mutually intelligible (at least to the extend
> that the language I am speaking when speaking of linguistics or of Unicode is
> intelligible to the average French-speaking person).
>
> Antoine

--
Marion Gunn
Everson Gunn Teoranta






Re: [OT] Re: the Ethnologue

2000-09-21 Thread Otto Stolz

Am 2000-09-16 hat Michael Kaplan geschrieben:
> In a way, this is one of the only advantages to not giving locale tags any
> significance -- by assigning them numbers, you really are trying to stay
> out of the business of people who have very different ideas about names and
> such. In a world where countries can go to war over lesser matters then
> this, I prefer the numbers to having yet another tightrope to walk. :-(

And then, they will go to war over the order in which the numbers are
assigned :-(

Best wishes,
   Otto



Re: [OT] Re: the Ethnologue

2000-09-21 Thread Antoine Leca

Peter Constable wrote:
> 
> >> > SRC is the code for 'Bosnian', 'Croatian', and 'Serbo-Croatian', which
> >> > means that there is a many-to-one mapping from ISO 639-1 'bs', 'hr',
> >> > 'sr' to Ethnologue 'SRC'.
> >>
> >> By Ethnologue standards of mutual intelligibility, there is only one
> >> language here.
>
> >Well, thisis one that can actually get some of the speakers (or their
> >governments) pretty upset, though.
>
> As I've been saying, this amounts to differences of operational definitions
> (which may not be explicitly and consciously defined). The Ethnologue is
> attempting to consistently apply a definition based primarily on mutual
> non-intelligibility. There is no question that there are communities that
> speak the same "language" (by this definition), but that have distinct
> identities for various ethnic, social, religious or political reasons, and
> that the distinct identities get carried into their perception of language
> categories.


  Hindi, Hindustani, Urdu could be considered co-dialects, but have important
  sociolinguistic differences. Hindi uses the Devanagari writing system, and
  formal vocabulary is borrowed from Sanskrit, de-Persianized, de-Arabicized.
  Literary Hindi, or Hindi-Urdu, has four varieties: Hindi (High Hindi, Nagari
  Hindi, Literary Hindi, Standard Hindi); Urdu; Dakhini; Rekhta. [...]
  Languages and dialects in the Western Hindi group are Hindustani, Bangaru,
  Braj Bhasha, Kanauji, Bundeli; see separate entries.

from the online Ethnologue database, 13th ed.
http://www.sil.org/ethnologue/countries/Inda.html#HND>

Of course, Peter and many people here know that I am taking the worst possible
example. Perhaps one may also fill reports to make clearer that most if not all
of these different entries are mutually intelligible (at least to the extend
that the language I am speaking when speaking of linguistics or of Unicode is
intelligible to the average French-speaking person).


Antoine



RE: [OT] Re: the Ethnologue

2000-09-20 Thread Carl W. Brown

>From: Nick Nicholas [mailto:[EMAIL PROTECTED]]
>Sent: Wednesday, September 20, 2000 4:48 PM

>Apart from cohabiting in Anatolia for a millenium. :-) In any case, the
>Ethnologue is correct about Urum; Urum and Mariupolitan Greek are the two
>languages spoken by an ethnically Greek population, which moved to the area
>around Mariupol in the Ukraine from Crimea in the 18th century. During
>their stay in Crimea, a large part of the population was linguistically
>assimilated to Turkic; but the two language groups consider themselves the
>same ethnic group (Urum < Rumei "Romans", the mediaeval Greek autonym), and
>recently published anthologies of Mariupolitan Greek (in Cyrillic) include
>Urum texts.

>(Oh, and there's no glyphs in '80s Urum or Mariupolitan that would be out
>of place in Ukrainian, in case anyone was interested...)

Thank you.  That certainly clarifies that issue.  Going one step further
even though the script uses the Cyrillic alphabet how should one treat Latin
characters notably the use of the dot less and dotted i?

Carl





RE: [OT] Re: the Ethnologue

2000-09-20 Thread Nick Nicholas

>From: "Carl W. Brown" <[EMAIL PROTECTED]>
>>From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
>>Sent: Wednesday, September 20, 2000 11:06 AM

>I agree.  For example when it was brought up that other Turkic languages
>might be using the dot less i.  I noticed that the SIL confirmed that
>Azerbaijan uses the Latin alphabet.  On the other hand it said that Urum was
>"Spoken by ethnic 'Greeks'".  Unless this is some kind of inside joke I can
>not imagine any Greek having anything to do with anything Turkish.

Apart from cohabiting in Anatolia for a millenium. :-) In any case, the
Ethnologue is correct about Urum; Urum and Mariupolitan Greek are the two
languages spoken by an ethnically Greek population, which moved to the area
around Mariupol in the Ukraine from Crimea in the 18th century. During
their stay in Crimea, a large part of the population was linguistically
assimilated to Turkic; but the two language groups consider themselves the
same ethnic group (Urum < Rumei "Romans", the mediaeval Greek autonym), and
recently published anthologies of Mariupolitan Greek (in Cyrillic) include
Urum texts.

(Oh, and there's no glyphs in '80s Urum or Mariupolitan that would be out
of place in Ukrainian, in case anyone was interested...)

The Ethnologue does indeed contain inaccuracies and points of contention,
subject to improvement. And its linguistic classification scheme is not
always what meets with the broadest scholarly acceptance. (I worked as a
research assistant on a project on Papuan languages, for instance, where
the researcher had several misgivings.) As Peter Constable has pointed out,
such disagreements are unavoidable in a field in flux, like the linguistic
classification of non-literary languages. At any rate, that the Ethnologue
has the broadest coverage of any source out there, and that it is being
continually refined and improved, is not in dispute. And for the issue of
distinct language tagging, linguistic classification does not seem to me
very germane. In any case, given the nature of the SIL's work, and the
ISO's current coverage, the accuracy of its coverage of Papua New Guinea or
South America is surely more important an issue to evaluate than what it
has to say about Europe.

   Nick Nicholas, TLG, University of California, Irvine
  [EMAIL PROTECTED]www.tlg.uci.edu/~opoudjis
"My most mighty, God-respected, God-glorified, God-promoted, God-governed,
God-magnified Holy Lord King. Health and merriment to your soul, vigour
and well-being to your divine and royal body, prosperity to the benefactions
issuing from your hand, and everything else good and salvific does my
humble self wish to your Holy Majesty on behalf of God Almighty."
--- Miklosich & Mueller I. CLXXXIV; Patriarch to Emperor.





RE: [OT] Re: the Ethnologue

2000-09-20 Thread Carl W. Brown

>From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
>Sent: Wednesday, September 20, 2000 11:06 AM

>What is important here is that, where ISO doesn't provide a code, that
>users do have some other source of codes for internal and, more
>importantly, interchange purposes. Many independent agencies and
>individuals are already using Ethnologue codes in this way precisely
>because ISO provides very limited coverage.

I agree.  For example when it was brought up that other Turkic languages
might be using the dot less i.  I noticed that the SIL confirmed that
Azerbaijan uses the Latin alphabet.  On the other hand it said that Urum was
"Spoken by ethnic 'Greeks'".  Unless this is some kind of inside joke I can
not imagine any Greek having anything to do with anything Turkish.

I was proposing using the SIL codes to supplement the ISO codes rather than
the IANA codes.

Carl




Re: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable


On 09/19/2000 06:01:46 AM Antoine Leca wrote:

>Most of these differences are related to the spoken languages, and do not
>appear in writing. Since IT is mainly related with writing, this is a
>more minor point that it may appear at first sight.

Some domains of IT are mainly interested in writing, but that is by no
means true across the board. Examples include:

- processing that operates on speech data rather than text or text alone
- linguists interested in speech varieties (what they generally think of as
languages)
- governements and development agencies interested in all language
distinctions, including those between unwritten languages



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable


On 09/17/2000 11:13:36 PM John Cowan wrote:

>Exactly so.  And BTW "my proposal" is also Harald Alvestrand's proposal.

I wasn't aware of that until Harald mentioned something not too many days
ago.


- Peter




Re: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable


On 09/17/2000 10:37:42 PM Doug Ewell wrote:


>Since I have spent this whole, *very* OT discussion as the contrarian

It hasn't been all that off-topic. This has come up on numerous occasions
on this list, and I think is of interest to many of the participants, even
though it isn't strictly about Unicode.

>("devil's advocate" is too polite), I will take this opportunity to say
>that now that I understand John's proposal more clearly, I like it and
>think it makes a good deal of sense in an RFC 1766 bis environment.

John's proposal is one particular implementation of what Gary and I have
proposed in our paper. We favour the creation of a mechanism for distinct
namespaces, however.



>The mechanism for using these codes would need to be explicitly
>specified in RFC 1766 bis,

That may depend upon the exact implementation.


>and the rules would have to be the same as
>for other "i-" and "x-" codes, namely that ISO 639-1 codes must be used
>whenever possible, followed in turn by ISO 639-2 codes,

Absolutely.


>"i-sil-xxx"
>Ethnologue codes (whoops, John, that's a real code (for Keo)), other
>"i-" codes, and finally "x-" codes.  I think that's what John is
>proposing, anyway.

One issue is relative precedence of "i-sil-@@@" codes (where @ is some
ALPHA, to avoid confusion with Keo) and other "i-" codes. Again, we'd
suggest that Ethnologue codes be kept in a distinct namespace (which is one
way to view "i-sil-"), but some issues remain.


>My other concerns about the Ethnologue remain: I still believe there
>needs to be one normative name for each language (politically incorrect
>though it may be);

We will have to address this in some measure in order to present certain
views into the data.


>and some common sense needs to prevail regarding the
>scope of the language tag (like exactly how specific we need to be
>about the exact dialect of Chinese in a text message).

In general, this is determined by application needs together with a
consideration of distinct operational definitions. I think for most users
there will not be too much difficulty in knowing what to use, though, since
the vast majority of new identifiers are each generally of interest only to
a relatively limited set of users (though there are a number of users that
would be interested in the whole lot).


>But John's
>proposal might be a solution for those people who really need a
>standard language tag for Mukumina.

Just so.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





RE: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable


On 09/17/2000 08:02:20 PM John Cowan wrote:

>> Where I see using the SIL is as an extension of the ISO standard.
>
>RFC 1766 exists to allow flexible extension to the ISO standard.
>
>> If there
>> is no ISO code then use the SIL code.
>
>There are already collisions, so simply using one or the other
>gets you into trouble.  For example, ARC is the SIL code for Archi,
>a Northern Caucasian language spoken in the Russian Federation.
>But you cannot use it in an ISO 639 field, because ARC in 639
>represents Aramaic, which is differentiated by SIL into 16 languages.
>
>But under my proposal, Archi is i-sil-arc, and Aramaic is arc.  If
>you want to specify Assyrian Neo-Aramaic specifically, you can use
>i-sil-aii.

John is absolutely correct here, and I need to qualify my agreement to
Carl's statement along exactly the lines John is indicating here.
Ethnologue can supplement ISO codes, but we're not suggesting simply adding
all the Ethnologue codes to the same namespace. That would not work. On the
other hand, "i-sil-xxx" would. It is also necessary to ensure that, if the
category denoted by an instance of "i-sil-xxx" matches that of some ISO
code, then only the ISO code should be used. To deal with this, a mapping
between ISO and Ethnologue is needed, and that is being worked on. (This
mapping will also solve an existing and serious problem of ISO 639-x:
inadequate documentation.)



>Locales are by no means the only uses of language tagging.  My primary
>interest is in labeling the languages used in multimedia objects,
including
>text, audio content, or both.

This is a good example of why an enumeration of "languages" based only on
written forms (as found in ISO 639) is insufficient for all user needs.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





RE: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable


On 09/17/2000 07:22:05 PM "Carl W. Brown" wrote:

>You are right the Ethnologue is not appropriate as a standard.

If we're assuming a single standard, in the sense of a single "tiling of
the plane" of languages, we're not proposing that the Ethnologue be the
standard. We are suggesting, though, that the need for alternate "tilings"
be acknowledged, and that the Ethnologue would serve well as one "tiling".


>Where I see using the SIL is as an extension of the ISO standard.

Just so.


>As far as research goes, you have to
>do your own to be able to prepare the locale.  This will eliminate 90% of
>the flaky SIL languages.  There either will be no demand or the research
>will uncover which of several encodings to use.

(Flaky? Who wants to admit their language is flaky?)

Indeed, I don't expect anybody to suddenly provide full locale data for
thousands of locales. Indeed, implementers will discover what they need to
do based on user requests, and will have to solve the problem of gathering
the necessary data just as they have to do now.

I'm not aware of any group of users requesting a populated locale database
covering thousands of locales. I am aware of several groups of users asking
for thousands of language identifiers, however.


>On the other hand if you consider that language is part of cultural
>expression and that different languages express ideas specific to the
>culture then the SIL is incomplete.  For example, Boont is an English
slang
>language developed around Booneville California.

If sufficiently-documented data is provided to indicate that Boont counts
as a distinct language, according to the operational definitions assumed,
then I would expect the editorial staff would add this to the Ethnologue.
It's not so much a question of whether there is an interest in Bible
translation into the language, but rather of what the sociolinguistic facts
about the language are. (I have no other knowledge of "Boont", so have no
idea whether it would get counted or not. If it is "slang", my guess would
be that it probably doesn't constitute a complete langauge. At this point,
I'm getting in over my head in terms of understanding of sociolinguistics,
so I'll stop here before I get into more trouble than I might have already
gotten into.)


>Standards as extremely important and they should be solid. They must work
>for you but in this business you can not be slaves to them.  The
>implementations should be based on standards but be flexible to
accommodate
>exceptions when needed.  If I use the SIL codes I stand a good chance that
>the codes may be the same codes that ISO may adopt and I can avoid a later
>conversation.

What is important here is that, where ISO doesn't provide a code, that
users do have some other source of codes for internal and, more
importantly, interchange purposes. Many independent agencies and
individuals are already using Ethnologue codes in this way precisely
because ISO provides very limited coverage.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable


On 09/17/2000 11:39:14 AM Doug Ewell wrote:

>What names are I supposed to associate with codes like SHU, MKJ, and
>SRC in my (possibly hypothetical) application that deals with language
>tags?  Such associations are normally expected to be one-to-one.
>
>If Ethnologue codes are going to be regarded as a standard outside the
>confines of SIL, each code needs to be associated with a single,
>normative name.

A universally "politically correct" name in every case is insoluable.
Simply picking on as a default *for the purposes of implementation of the
system of identifiers* is reasonable, and is a problem we have to be able
to solve if we are going to present a view of the data that is organised
first by language - at the least, you have to list one name first. This is
certainly going to happen.


>But for the code
>GSW, the Ethnologue staff created separate entries for "Allemanisch,"
>"Alsatian," and "Schwyzerdütsch," which *may* appease nationalistic
>preferences but definitely *does* result in inconsistency and
>confusion.

I explained the reason for this earlier. I agree that it can result in
confusion. The solution will be to provide better views into the data,
which we intend to do.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable


On 09/17/2000 03:19:32 PM Doug Ewell wrote:

>Well, perhaps this is another, unintended example of a problem with
>incorporating the Ethnologue linguistic distinctions into other
>standards without serious review.  If Spaniards consider their language
>sufficiently different from the Spanish spoken by Latin Americans,
>should there be separate codes for the two, or not?

The answer to such question must be answered in terms of a particular
operational definition of "language" for a given namespace of identifiers.
There is no one "right" answer.


>How does this map intelligently to the existing (like it or not)
>ISO 639 standard?  Standards intended for widespread use should address
>issues like these explicitly.

And there is no way for standards to address such issues without
recognising the role of operational definitions.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable


On 09/16/2000 04:27:45 PM Doug Ewell wrote:

>All I am asking in this particular case is for the Ethnologue editor to
>assign *one* primary name (and spelling) to each three-letter language
>code, and to relegate the other names to alternate status in a
>consistent way.  That is the first necessary step, although maybe not
>the last, in moving the Ethnologue coding system closer to "maturity."

I hope my previous message adequately demonstrated that the information
inside the Ethnologue database already has the desired consistency in
categorization, and have given adequate assurances that the issue of
presenting the information is currently being addressed.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: [OT] Re: the Ethnologue

2000-09-20 Thread Peter_Constable


On 09/16/2000 06:15:51 PM "Michael \(michka\) Kaplan" wrote:

>From: "John Cowan" <[EMAIL PROTECTED]>
>> On Sat, 16 Sep 2000, Doug Ewell wrote:
>> > SRC is the code for 'Bosnian', 'Croatian', and 'Serbo-Croatian', which
>> > means that there is a many-to-one mapping from ISO 639-1 'bs', 'hr',
>> > 'sr' to Ethnologue 'SRC'.  This is likely to cause much more
widespread
>> > trouble than the Hopi example mentioned earlier.
>>
>> By Ethnologue standards of mutual intelligibility, there is only one
>> language here.
>
>Well, thisis one that can actually get some of the speakers (or their
>governments) pretty upset, though. And both ISO639-x and rfc1766 have to
>care about such things

As I've been saying, this amounts to differences of operational definitions
(which may not be explicitly and consciously defined). The Ethnologue is
attempting to consistently apply a definition based primarily on mutual
non-intelligibility. There is no question that there are communities that
speak the same "language" (by this definition), but that have distinct
identities for various ethnic, social, religious or political reasons, and
that the distinct identities get carried into their perception of language
categories. Exactly the opposite is also true: e.g. that because people
share a particular written form it is perceived that they all speak the
same language, for instance, "Chinese".

What is crucial here is that there are situations in IT where more than one
way of "tiling the plane" is needed, since different users and different
applications have different requirements. The only resolutions to this
problem are distinct namespaces based on distinct definitions for different
purposes, or chaos, or that some IT needs simply are ignored. The first of
these is the only solution.




>John, a solution must be acheived, nevertheless. If a large part or even
all
>of the Ethnologue is to be used as a part of any of these standards, then
it
>must be done.
>
>In a way, this is one of the only advantages to not giving locale tags any
>significance -- by assigning them numbers, you really are trying to stay
out
>of the business of people who have very different ideas about names and
>such. In a world where countries can go to war over lesser matters then
>this, I prefer the numbers to having yet another tightrope to walk. :-(

This is exactly one of the points Gary and I make in our paper regarding
benifits of dispensing with a requirement that tags be mnemonic.




- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: [OT] Re: the Ethnologue

2000-09-19 Thread Antoine Leca

Doug Ewell wrote:
> 
> Michael Kaplan <[EMAIL PROTECTED]> wrote:
> 
> >> Spaniards generally refer to their national language as "castellano,"
> >> not "español,"

In fact, "castellano" is more like a compromise used to describe the
linguistic situation of Spain. When speaking with Spaniards, native
Castilian people will almost never use the word "Castilian", always
"Spanish" (or the equivalent translations "espaõl", "espagnol", etc.)
In fact, someone which naturally uses "Castilian" instead of "Spanish"
in a conversation have probably another language as mothertongue...

>From what I said (tyhat is, very few), Hispanoamericans use "español",
although they know for sure what "castellano" means. Perhaps even,
the use of "castellano" may denotes the European Spanish when or where
it differs with their own languages.


> > FWIW, I do not know of any Spaniards who object to "español" for the
> > generic language spoken by everyone around the world Castilian
> > they reserve for their own (pure) Spanish

I beg to differ.
 
> Well, perhaps this is another, unintended example of a problem with
> incorporating the Ethnologue linguistic distinctions into other
> standards without serious review.  If Spaniards consider their language
> sufficiently different from the Spanish spoken by Latin Americans,

They don't. At the contrary, they are proud that their language is
spoken all around the world.
Now, they are very well aware that there are differencies; the main
differences are systematic differences in prononciation (ll, y mainly).


> should there be separate codes for the two, or not?  What about similar
> concerns with French vs. Canadian French, American vs. British English,
> etc.?  How does this map intelligently to the existing (like it or not)
> ISO 639 standard?  Standards intended for widespread use should address
> issues like these explicitly.

Most of these differences are related to the spoken languages, and do not
appear in writing. Since IT is mainly related with writing, this is a
more minor point that it may appear at first sight.


Antoine



Re: [OT] Re: the Ethnologue

2000-09-17 Thread John Cowan

On Sun, 17 Sep 2000, Doug Ewell wrote:

> Since I have spent this whole, *very* OT discussion as the contrarian
> ("devil's advocate" is too polite), I will take this opportunity to say
> that now that I understand John's proposal more clearly, I like it and
> think it makes a good deal of sense in an RFC 1766 bis environment.

Hurrah, hurrah!

> If "i-" tags are just an RFC 1766 thing, then this can work exactly as
> John suggested.  OTOH, if they are specified by ISO 639 in any way,
> then we would have to use "x-" tags instead, since we are not at
> liberty to extend ISO 639 unilaterally.

They are an RFC 1766 thing:  "i" is short for IANA, the registration
agency associated with the RFCs.

> The mechanism for using these codes would need to be explicitly
> specified in RFC 1766 bis, and the rules would have to be the same as
> for other "i-" and "x-" codes, namely that ISO 639-1 codes must be used
> whenever possible, followed in turn by ISO 639-2 codes, "i-sil-xxx"
> Ethnologue codes (whoops, John, that's a real code (for Keo)), other
> "i-" codes, and finally "x-" codes.  I think that's what John is
> proposing, anyway.

Just so.  Of course, this rule applies to the review/registry system.
Thus, i-sil-eng would never even be registered, because en serves the
same purpose.

> My other concerns about the Ethnologue remain: I still believe there
> needs to be one normative name for each language (politically incorrect
> though it may be);

I too agree that this would be desirable, but for the sake of 14
cases out of 7000, I wouldn't hold the whole system hostage.

> and some common sense needs to prevail regarding the
> scope of the language tag (like exactly how specific we need to be
> about the exact dialect of Chinese in a text message).

We need to be as specific as we need to be to solve the particular
problem, I guess.  If "zh" is all you need, then use it; otherwise
go to zh-guoyu or zh-yue or whatever.

> But John's
> proposal might be a solution for those people who really need a
> standard language tag for Mukumina.

Exactly so.  And BTW "my proposal" is also Harald Alvestrand's proposal.

-- 
John Cowan   [EMAIL PROTECTED]
One art/there is/no less/no more/All things/to do/with sparks/galore
--Douglas Hofstadter





Re: [OT] Re: the Ethnologue

2000-09-17 Thread Doug Ewell

John Cowan <[EMAIL PROTECTED]> wrote:

> I am in favor of registering the tags in the Ethnologue (except for
> those which are *semantically* the same as existing 639-2 languages)
> in the RFC 1766 registry in the form i-sil-xxx.

and later:

> There are already collisions, so simply using one or the other
> gets you into trouble.  For example, ARC is the SIL code for Archi,
> a Northern Caucasian language spoken in the Russian Federation.
> But you cannot use it in an ISO 639 field, because ARC in 639
> represents Aramaic, which is differentiated by SIL into 16 languages.
>
> But under my proposal, Archi is i-sil-arc, and Aramaic is arc.  If
> you want to specify Assyrian Neo-Aramaic specifically, you can use
> i-sil-aii.

Since I have spent this whole, *very* OT discussion as the contrarian
("devil's advocate" is too polite), I will take this opportunity to say
that now that I understand John's proposal more clearly, I like it and
think it makes a good deal of sense in an RFC 1766 bis environment.

If "i-" tags are just an RFC 1766 thing, then this can work exactly as
John suggested.  OTOH, if they are specified by ISO 639 in any way,
then we would have to use "x-" tags instead, since we are not at
liberty to extend ISO 639 unilaterally.

The mechanism for using these codes would need to be explicitly
specified in RFC 1766 bis, and the rules would have to be the same as
for other "i-" and "x-" codes, namely that ISO 639-1 codes must be used
whenever possible, followed in turn by ISO 639-2 codes, "i-sil-xxx"
Ethnologue codes (whoops, John, that's a real code (for Keo)), other
"i-" codes, and finally "x-" codes.  I think that's what John is
proposing, anyway.

My other concerns about the Ethnologue remain: I still believe there
needs to be one normative name for each language (politically incorrect
though it may be); and some common sense needs to prevail regarding the
scope of the language tag (like exactly how specific we need to be
about the exact dialect of Chinese in a text message).  But John's
proposal might be a solution for those people who really need a
standard language tag for Mukumina.

[Note to Harald:  "RFC 1766 bis" was Carl W. Brown's term for your
draft successor to RFC 1766.  He cited an earlier draft, in which the
proposed guidelines for the second subtag were defined explicitly.]

-Doug Ewell
 Fullerton, California



RE: [OT] Re: the Ethnologue

2000-09-17 Thread John Cowan

On Sun, 17 Sep 2000, Carl W. Brown wrote:

> I can understand your point of view as a standards person.
> 
> You are right the Ethnologue is not appropriate as a standard.  But that
> does not make it useless.

I am not a "standards person", and I think you have my stand mixed up.
I am in favor of registering the tags in the Ethnologue (except for
those which are *semantically* the same as existing 639-2 languages)
in the RFC 1766 registry in the form i-sil-xxx.

> Where I see using the SIL is as an extension of the ISO standard.

RFC 1766 exists to allow flexible extension to the ISO standard.

> If there
> is no ISO code then use the SIL code.

There are already collisions, so simply using one or the other
gets you into trouble.  For example, ARC is the SIL code for Archi,
a Northern Caucasian language spoken in the Russian Federation.
But you cannot use it in an ISO 639 field, because ARC in 639
represents Aramaic, which is differentiated by SIL into 16 languages.

But under my proposal, Archi is i-sil-arc, and Aramaic is arc.  If
you want to specify Assyrian Neo-Aramaic specifically, you can use
i-sil-aii.

> As far as research goes, you have to
> do your own to be able to prepare the locale.  This will eliminate 90% of
> the flaky SIL languages.  There either will be no demand or the research
> will uncover which of several encodings to use.  Yes this is not a standard
> but it is a way to implement until a standard can be developed.

Locales are by no means the only uses of language tagging.  My primary
interest is in labeling the languages used in multimedia objects, including
text, audio content, or both.

> It is
> easier to deal with the SIL codes than the i-x codes.

What i-x codes?  Currently there are only a few.

> Besides I can not
> take any standard that implements i-klingon as a human language too
> seriously.

Why not?  Human beings speak it (some more fluently than others), and
write texts in it.  Just follow the links from www.kli.org.  It is not
anybody's native language, but neither is Ladino (i-sil-spj).

> On the other hand if you consider that language is part of cultural
> expression and that different languages express ideas specific to the
> culture then the SIL is incomplete.

The notion of a complete list of languages is a phantasm.

> For example, Boont is an English slang
> language developed around Booneville California.  This is not listed but
> then you have to remember that the list is explicitly funded for the purpose
> of translating bibles and I doubt that there is any interest in languages
> that are not primary languages.  People who speak Boont also speak English.

There are many languages listed in the Ethnologue that aren't native
languages.  As for the short ling, the kimmies at SIL were plenty bahl to
omeert it.

-- 
John Cowan   [EMAIL PROTECTED]
One art/there is/no less/no more/All things/to do/with sparks/galore
--Douglas Hofstadter





RE: [OT] Re: the Ethnologue

2000-09-17 Thread Carl W. Brown

>John Cowan wrote:

>I see the problem: the same language (with the same code) may be
preferentially
>known by one name in one country and another name in another.  Because
>the Ethnologue names languages by country, conflicts like this can appear.
>The entry on "Chadian Spoken Arabic" (in Chad) lists "Shuwa Arabic" as a
>synonym; the name "Shuwa Arabic" is the primary name in Niger, Nigeria,
>and Cameroon.

> 

>It seems clear from the detailed information that in all 14 cases,
>there is only one language, known by different names in different
>countries.  Expecting the Ethnologue to solve this problem by fiat,
>or even to openly prefer one name over another when nationalist sympathies
>decree otherwise, is IMHO not reasonable.

>John Cowan   [EMAIL PROTECTED]
>One art/there is/no less/no more/All things/to do/with sparks/galore
>   --Douglas Hofstadter

I can understand your point of view as a standards person.

You are right the Ethnologue is not appropriate as a standard.  But that
does not make it useless.  Your quote from Doug points this out.  45 years
ago he & I were into exact things such as number theory and physics.
Topology was as far as we would venture in soft sciences.  Then in 5th grade
I left for Brazil.  We just met up after 45 years.  The idea that language
is both a standard and not a standard thrills both of us because it makes
this field far more complex and intriguing than physics.

Where I see using the SIL is as an extension of the ISO standard.  If there
is no ISO code then use the SIL code.  As far as research goes, you have to
do your own to be able to prepare the locale.  This will eliminate 90% of
the flaky SIL languages.  There either will be no demand or the research
will uncover which of several encodings to use.  Yes this is not a standard
but it is a way to implement until a standard can be developed.  It is
easier to deal with the SIL codes than the i-x codes.  Besides I can not
take any standard that implements i-klingon as a human language too
seriously.

On the other hand if you consider that language is part of cultural
expression and that different languages express ideas specific to the
culture then the SIL is incomplete.  For example, Boont is an English slang
language developed around Booneville California.  This is not listed but
then you have to remember that the list is explicitly funded for the purpose
of translating bibles and I doubt that there is any interest in languages
that are not primary languages.  People who speak Boont also speak English.

Standards as extremely important and they should be solid. They must work
for you but in this business you can not be slaves to them.  The
implementations should be based on standards but be flexible to accommodate
exceptions when needed.  If I use the SIL codes I stand a good chance that
the codes may be the same codes that ISO may adopt and I can avoid a later
conversation.  These codes fit into the 639-2 tables with no program
changes.  For me it is a win-win situation.  I just need to keep track of
them and check every time the ISO standard is updated to insure that new ISO
codes are not using the SIL codes.  If so, then I will have to migrate the
SIL codes.

In practice few sites will implement languages that are not covered by the
639 list.  So these exceptions should be very few and should be manageable.

Carl






Re: [OT] Re: the Ethnologue

2000-09-17 Thread Michael \(michka\) Kaplan

Well, to cover THAT level of variation, there is only the Ethnologue that I
have ever seen. But the specific question was about language differences
that ISO *can* cover.

michka

a new book on internationalization in VB at
http://www.i18nWithVB.com/

- Original Message -
From: "Carl W. Brown" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Sunday, September 17, 2000 3:41 PM
Subject: RE: [OT] Re: the Ethnologue


> > Michka wrote :
>
> >Most seem to be okay with the addition of the country/region tag from
> >ISO-3166 for determing the difference between languages spoken in several
> >places -- this is usually what is done for English, Arabic, Portuguese,
> >French, and Chinese, as well.
>
> I don't see how one can use ISO-3166 regions.  The region tags don't
follow
> linguistic breaks even if they are roughly geographic.  For example, if I
> wanted to describe the Northeastern dialects of Brazilian Portuguese that
> have traits such as pluralizing the article but not the noun and a heavy
> native Amarican Indian influence, I would be hard pressed to find an
> ISO-3166-2 designation to use.
>
> The region might be described as a set of states where the differences in
> culture between the narrow costal strip and high desert interior is
greater
> that the differences between states.
>
> Carl
>
>




RE: [OT] Re: the Ethnologue

2000-09-17 Thread Carl W. Brown

> Michka wrote :

>Most seem to be okay with the addition of the country/region tag from
>ISO-3166 for determing the difference between languages spoken in several
>places -- this is usually what is done for English, Arabic, Portuguese,
>French, and Chinese, as well.

I don't see how one can use ISO-3166 regions.  The region tags don't follow
linguistic breaks even if they are roughly geographic.  For example, if I
wanted to describe the Northeastern dialects of Brazilian Portuguese that
have traits such as pluralizing the article but not the noun and a heavy
native Amarican Indian influence, I would be hard pressed to find an
ISO-3166-2 designation to use.

The region might be described as a set of states where the differences in
culture between the narrow costal strip and high desert interior is greater
that the differences between states.

Carl




Re: [OT] Re: the Ethnologue

2000-09-17 Thread Michael \(michka\) Kaplan

Most seem to be okay with the addition of the country/region tag from
ISO-3166 for determing the difference between languages spoken in several
places -- this is usually what is done for English, Arabic, Portuguese,
French, and Chinese, as well.

Under Windows, they just tack on a new sublanguage to create a new LCID...
Spanish, English, and Arabic seem to be duking it out for the largest number
of ones accepted from release to release -- although they are missing a lot
of languages and dialects, on purpose, as well.

I only have a few friends in Spain none of them were offended at those
who would refer to Spanish as español, but none of them were terribly
pleased with referring to Mexican Spanish as Castilian, either. I think it
may be similar to the French vs. Canadian French issue, just with less
emotion behind it.

michka

a new book on internationalization in VB at
http://www.i18nWithVB.com/

- Original Message -
From: "Doug Ewell" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Sunday, September 17, 2000 1:19 PM
Subject: Re: [OT] Re: the Ethnologue


> Michael Kaplan <[EMAIL PROTECTED]> wrote:
>
> >> Spaniards generally refer to their national language as "castellano,"
> >> not "español,"
> >
> > FWIW, I do not know of any Spaniards who object to "español" for the
> > generic language spoken by everyone around the world Castilian
> > they reserve for their own (pure) Spanish
>
> Well, perhaps this is another, unintended example of a problem with
> incorporating the Ethnologue linguistic distinctions into other
> standards without serious review.  If Spaniards consider their language
> sufficiently different from the Spanish spoken by Latin Americans,
> should there be separate codes for the two, or not?  What about similar
> concerns with French vs. Canadian French, American vs. British English,
> etc.?  How does this map intelligently to the existing (like it or not)
> ISO 639 standard?  Standards intended for widespread use should address
> issues like these explicitly.
>
> -Doug Ewell
>  Fullerton, California
>




Re: [OT] Re: the Ethnologue

2000-09-17 Thread Doug Ewell

Michael Kaplan <[EMAIL PROTECTED]> wrote:

>> Spaniards generally refer to their national language as "castellano,"
>> not "español," 
>
> FWIW, I do not know of any Spaniards who object to "español" for the
> generic language spoken by everyone around the world Castilian
> they reserve for their own (pure) Spanish

Well, perhaps this is another, unintended example of a problem with
incorporating the Ethnologue linguistic distinctions into other
standards without serious review.  If Spaniards consider their language
sufficiently different from the Spanish spoken by Latin Americans,
should there be separate codes for the two, or not?  What about similar
concerns with French vs. Canadian French, American vs. British English,
etc.?  How does this map intelligently to the existing (like it or not)
ISO 639 standard?  Standards intended for widespread use should address
issues like these explicitly.

-Doug Ewell
 Fullerton, California



Re: [OT] Re: the Ethnologue

2000-09-17 Thread Doug Ewell

John Cowan <[EMAIL PROTECTED]> wrote:

> Doug wants the Ethnologue to give each of its languages (uniquely
> tagged) a single unique worldwide authoritative name.  That's not
> reasonable in all cases, though it is in 99.5%.

What names are I supposed to associate with codes like SHU, MKJ, and
SRC in my (possibly hypothetical) application that deals with language
tags?  Such associations are normally expected to be one-to-one.

If Ethnologue codes are going to be regarded as a standard outside the
confines of SIL, each code needs to be associated with a single,
normative name.  Unicode understands this concept, which is why you
have things like U+002E FULL STOP and an explanatory note that this
character is optionally called "period."  Here in the U.S. we would
never call '.' a full stop, always a period (or dot or decimal point),
but in the U.K. the opposite is true, and one normative name had to be
chosen over the other(s).

Spaniards generally refer to their national language as "castellano,"
not "español," but at some point in the ISO 639 process, a decision had
to be made that one name would be preferred over the other.  SIL
evidently felt that way too, as "Castilian" is just one of the many
alternate names given for the primary name "Spanish."  But for the code
GSW, the Ethnologue staff created separate entries for "Allemanisch,"
"Alsatian," and "Schwyzerdütsch," which *may* appease nationalistic
preferences but definitely *does* result in inconsistency and
confusion.

An inconsistent standard can be worse than no standard at all.

-Doug Ewell
 Fullerton, California



Re: [OT] Re: the Ethnologue

2000-09-16 Thread John Cowan


> From: "John Cowan" <[EMAIL PROTECTED]>
> > It seems clear from the detailed information that in all 14 cases,
> > there is only one language, known by different names in different
> > countries.  Expecting the Ethnologue to solve this problem by fiat,
> > or even to openly prefer one name over another when nationalist sympathies
> > decree otherwise, is IMHO not reasonable.
> 
> John, a solution must be acheived, nevertheless. If a large part or even all
> of the Ethnologue is to be used as a part of any of these standards, then it
> must be done.
> 
> In a way, this is one of the only advantages to not giving locale tags any
> significance -- by assigning them numbers, you really are trying to stay out
> of the business of people who have very different ideas about names and
> such. In a world where countries can go to war over lesser matters then
> this, I prefer the numbers to having yet another tightrope to walk. :-(

It does not matter in this case whether the tags are meaningful or not.
Doug wants the Ethnologue to give each of its languages (uniquely tagged)
a single unique worldwide authoritative name.  That's not reasonable
in all cases, though it is in 99.5%.

The issue is not about unique language <-> tag mapping, which we already
have.  It's about a unique language <-> name mapping.

-- 
John Cowan   [EMAIL PROTECTED]
One art/there is/no less/no more/All things/to do/with sparks/galore
--Douglas Hofstadter

On Sat, 16 Sep 2000, Michael (michka) Kaplan wrote:




Re: [OT] Re: the Ethnologue

2000-09-16 Thread Michael \(michka\) Kaplan

From: "John Cowan" <[EMAIL PROTECTED]>
> On Sat, 16 Sep 2000, Doug Ewell wrote:
> > SRC is the code for 'Bosnian', 'Croatian', and 'Serbo-Croatian', which
> > means that there is a many-to-one mapping from ISO 639-1 'bs', 'hr',
> > 'sr' to Ethnologue 'SRC'.  This is likely to cause much more widespread
> > trouble than the Hopi example mentioned earlier.
>
> By Ethnologue standards of mutual intelligibility, there is only one
> language here.

Well, thisis one that can actually get some of the speakers (or their
governments) pretty upset, though. And both ISO639-x and rfc1766 have to
care about such things

> It seems clear from the detailed information that in all 14 cases,
> there is only one language, known by different names in different
> countries.  Expecting the Ethnologue to solve this problem by fiat,
> or even to openly prefer one name over another when nationalist sympathies
> decree otherwise, is IMHO not reasonable.

John, a solution must be acheived, nevertheless. If a large part or even all
of the Ethnologue is to be used as a part of any of these standards, then it
must be done.

In a way, this is one of the only advantages to not giving locale tags any
significance -- by assigning them numbers, you really are trying to stay out
of the business of people who have very different ideas about names and
such. In a world where countries can go to war over lesser matters then
this, I prefer the numbers to having yet another tightrope to walk. :-(

michka

Michael Kaplan
a new book on internationalization in VB at
http://www.i18nWithVB.com/





Re: [OT] Re: the Ethnologue

2000-09-16 Thread John Cowan

On Sat, 16 Sep 2000, Doug Ewell wrote:

> But it gets worse.  When I stripped out the alternate-names field and
> again checked for duplicated codes, I found 14 (AVL AYL CAG CTO FUV GAX
> GSC GSW JUP MHI MHM MKJ SHU SRC).  Some of these duplicates differ only
> in spelling (CAG 'Chulupi' vs. 'Chulupí') but other differences are a
> lot more troubling.  For example, SHU is both 'Arabic, Chadian Spoken'
> and 'Arabic, Shuwa.'  As a non-expert in Arabic, how do I know these
> two names describe the same dialect of Arabic?  (These are certainly
> dialects, not discrete languages.)

I see the problem: the same language (with the same code) may be preferentially
known by one name in one country and another name in another.  Because
the Ethnologue names languages by country, conflicts like this can appear.
The entry on "Chadian Spoken Arabic" (in Chad) lists "Shuwa Arabic" as a
synonym; the name "Shuwa Arabic" is the primary name in Niger, Nigeria,
and Cameroon.

> MKJ is the Ethnologue code for both 'Macedonian' and 'Slavic'.
> Absolutely *everyone* knows there is no one 'Slavic' language; the name
> refers to an entire language family.  This is much more imprecise than
> any of the despised 'Other' codes in ISO 639.

Again, "Macedonian" is the preferred name in Macedonia, Bulgaria, and
Albania, but "Slavic" is preferred in Greece.

> SRC is the code for 'Bosnian', 'Croatian', and 'Serbo-Croatian', which
> means that there is a many-to-one mapping from ISO 639-1 'bs', 'hr',
> 'sr' to Ethnologue 'SRC'.  This is likely to cause much more widespread
> trouble than the Hopi example mentioned earlier.

By Ethnologue standards of mutual intelligibility, there is only one
language here.

> Certainly more codes need to be added to ISO 639, and the Maintenance
> Agency needs to be sure not to present an image of unresponsiveness
> (if in fact they have been guilty of that in the past).  However, they
> have their own, existing guidelines for the level at which languages
> should be encoded (one written vs. 60 spoken variants) and this must
> be respected.

Precisely.  Unwritten languages, or languages with only a few written
works, or languages whose written form appears only on bamboo, don't
make it into 639-2, which is (like it or not) in practice a standard
for bibliographic use.

In addition, the notion of mapping spoken form A to written form B
on the basis that the speakers of A write B when they need to write
entails the notion that Dongxiang [SCE], a language of the Mongolian family,
is a "dialect" of Chinese in the same sense that Wu Chinese [WUU] is.

> And the duplicated codes in the Ethnologue list must be
> edited down to one code each, or the list will not earn the respect for
> accuracy that it perhaps deserves.

It seems clear from the detailed information that in all 14 cases,
there is only one language, known by different names in different
countries.  Expecting the Ethnologue to solve this problem by fiat,
or even to openly prefer one name over another when nationalist sympathies
decree otherwise, is IMHO not reasonable.

-- 
John Cowan   [EMAIL PROTECTED]
One art/there is/no less/no more/All things/to do/with sparks/galore
--Douglas Hofstadter