Re: Standaridized variation sequences for the Deseret alphabet?

2017-03-23 Thread James Kass
Martin J. Dürst wrote,

> What is right for Deseret has to be decided by
> and for Deseret users, rather than by script
> historians.

The Universal Character Set is used by everyone, including script
historians.  While modern day deployment of the script is determined
by its users, the proper encoding of the script should be detemined by
character encoders based upon expert input from all interested
parties.

Best regards,

James Kass



Re: Standaridized variation sequences for the Deseret alphabet?

2017-03-23 Thread Otto Stolz

Hello Michael, others,

On 2017/03/23 09:03, Michael Everson wrote:

Its the same diphthong (a sound) written with different
letters.


Am 23.03.2017 um 06:54 schrieb Martin J. Dürst:

I think this may well be the *historically* correct analysis. And that
may have some influence on how to encode this, but it shouldn't be
dominant.

What's most important is (past and) *current use*.


Same issue as with German sharp S: The blackletter »ß« derives from an
ſ-z ligature (thence its German name »Eszet«), whilst the Roman type
»ß« derives from an ſ-s ligature. Still, we encode both variants as
identical letters. I’ve got a print from 1739 with legends in both
German (blackletter) and French (Roman italics), comprising both types
of ligatures in one single document.

Best wishes,
  Otto



Re: Standaridized variation sequences for the Deseret alphabet?

2017-03-23 Thread Richard Wordingham
On Thu, 23 Mar 2017 11:23:27 +0100
Otto Stolz  wrote:

> Same issue as with German sharp S: The blackletter »ß« derives from an
> ſ-z ligature (thence its German name »Eszet«), whilst the Roman type
> »ß« derives from an ſ-s ligature. Still, we encode both variants as
> identical letters. I’ve got a print from 1739 with legends in both
> German (blackletter) and French (Roman italics), comprising both types
> of ligatures in one single document.

There's another, lesser German analogy.  If I understand correctly, in
some styles the diaeresis and umlaut marks may be distinguished
visually.  While it is permissible to use CGJ to mark the difference,
the TUS claims (TUS 9.0 p833, in Section 23.2) that CGJ does not affect
rendering, except for the direct effect of blocking canonical
reordering.  (This does appear to be in contrast to its seemingly
archaic effect in inhibiting line-breaking.)

However, combining marks are, by policy, unified more readily than
letters.

Richard.



Re: Standaridized variation sequences for the Deseret alphabet?

2017-03-23 Thread Philippe Verdy
2017-03-23 6:54 GMT+01:00 Martin J. Dürst :

> Hello Michael, others,
>
> On 2017/03/23 09:03, Michael Everson wrote:
>
>> On 22 Mar 2017, at 21:39, David Starner  wrote:
>>
>
> There's the same characters here, written in different ways.
>>>
>>
>> No, it’s not. Its the same diphthong (a sound) written with different
>> letters.
>>
>
> The closes to the current case that I was able to find was the German ß.
> It has roots in both an ss and an sz (to be precise, an ſs and an ſz)
> ligature (see https://en.wikipedia.org/wiki/ß). And indeed in some fonts,
> its right part looks more like an s, and in other fonts more like a z (and
> in lower case, more often like an s, but in upper case, much more like a
> (cursive) Z). Nevertheless, there is only one character (or two if you
> count upper case) encoded, because anything else would be highly confusing
> to virtually all users.
>

This is a good case for encoding explicit variants, including for the two
German ß, to distinguish letter forms in historic (medieval?) texts where
ſs and ſz were more distinguished. This does not require disuynification,
and fonts that can have both forms can choose the correct glyph to use for
each variant, and take a default form for the unified character depending
on the contextual language (if it is detected) or based on the font style
itself (if it was initially designed for a specific language, notably in
medieval styles).


> What is right for Deseret has to be decided by and for Deseret users,
> rather than by script historians.
>

In historic texts it is not clear which letter form is better than the
other, and historic Deseret was basically for a single language (but there
may have been regional variants prefering a form instead of the other). I
think that now the distinction is in fact more recent, where some eople
will want to distinguish them for new uses with dinstinctions. Here also a
variant encoding would solve these special cases but we should not disunify
the character (and in fact there's not a lot of fonts except for fancy
usages, such as trying to mimic handwritten styles for specific authors
about how they draw these shapes; I've not seen however any conclusive case
of distinction in typesetted texts).

In fact we are in a situation similar to the case of shapes for decimal
digits like 4 (open or closed), 7 (with an overstriking bar or none), or 0
(with an overstriking slash or dot, or none), 3 (with an angular or circle
top part), or letters like g (with a curled leg drawn counterclockwise, or
just a bottom foot from right to left: here a distinctive shape was encoded
for the IPA symbol)

>
> Regards,   Martin.
>


Re: Standaridized variation sequences for the Deseret alphabet?

2017-03-23 Thread Michael Everson

> On 23 Mar 2017, at 05:54, Martin J. Dürst  wrote:
> 
> Hello Michael, others,
> 
> [Fixed script name in subject.]
> 
> On 2017/03/23 09:03, Michael Everson wrote:
>> On 22 Mar 2017, at 21:39, David Starner  wrote:
> 
>>> There's the same characters here, written in different ways.
>> 
>> No, it’s not. Its the same diphthong (a sound) written with different 
>> letters.
> 
> I think this may well be the *historically* correct analysis. And that may 
> have some influence on how to encode this, but it shouldn't be dominant.

Well, Martin, maybe you’re comfortable with shifting goalposts, but we have 
used historically correct analysis to identify characters in the past and to 
continue with this precedent is consistent with good practice. 

> What's most important is (past and) *current use*. If the distinction is an 
> orthographic one (e.g. different words being written with different shapes), 
> then that's definitely a good indication for splitting.

It *is* an orthographic one. For one thing, the 1859 glyphs look NOTHING LIKE 
the 1855 glyphs. 


> On the other hand, if fonts (before/outside Unicode) only include one variant 
> at the time, if people read over the variant without much ado, if people 
> would be surprised to find both corresponding variants in one and the same 
> text (absent font variations), if there are examples where e.g. the variant 
> is adjusted in quotes from texts that used the 'old' variant inside a text 
> with the 'new' variants, and so on, then all these would be good indications 
> that this is, for actual usage purposes, just a font difference, and should 
> therefore best be handled as such.

Um, yeah. Why have Unicode at all? I mean people in Georgia were happy with 
ASCII-based font hacks. Lots of people are still using them. Sure, people put 
up with the unification of Coptic and Greek. 

Just font differences. Yeah. 

> The closes to the current case that I was able to find was the German ß. It 
> has roots in both an ss and an sz (to be precise, an ſs and an ſz) ligature 
> (see https://en.wikipedia.org/wiki/ß). And indeed in some fonts, its right 
> part looks more like an s, and in other fonts more like a z (and in lower 
> case, more often like an s, but in upper case, much more like a (cursive) Z). 
> Nevertheless, there is only one character (or two if you count upper case) 
> encoded, because anything else would be highly confusing to virtually all 
> users.

The situation of the Deseret diphthong letters isn’t anything like German ß. 
Yes, you can analyse it as something like ſs and ſȥ, but THOSE LOOK VERY NEARLY 
ALIKE.

Ignoring the stroke of SHORT I which is the same for all the Deseret letters 
being discussed, we have EW represented by 𐐅 and 𐐋 (which look nothing alike) 
and OI represented by 𐐉 and 𐐃 (which look nothing alike).

A unification of these as “glyph variants” is perverse and not consistent with 
the way we have encoded things in the past.

> What is right for Deseret has to be decided by and for Deseret users, rather 
> than by script historians.

Odd. That view doesn’t seem to be applicable to CJK unification.

Michael


Re: Standaridized variation sequences for the Desert alphabet?

2017-03-23 Thread Michael Everson
On 23 Mar 2017, at 06:28, David Starner  wrote:

> > Does "Яussia" require a new Latin letter because the way R was written has 
> > a different origin than the normal R?
> 
> But it doesn’t. It’s the Latin letter R turned backwards by a designer for a 
> logo. We wouldn’t encode that, because it’s a logo.
> 
> What logo?

Oh, sorry. “Toys Я Us” which is what I saw when I saw your “Яussia”.

> I honestly don't know what logo you're talking about, but a quick Google 
> search confirms it's used outside of a logo. I was thinking of 
> http://www.sjgames.com/gurps/books/Russia/img/cover_lg.jpg which actually 
> doesn't use the reversed R, but uses other Cyrillic characters. 

Decorative display type and font play on book covers is a very different thing 
from the development of the Deseret alphabet we are discussing here. 

>> We don’t encode diphthongs. We encode the elements of writing systems. The 
>> “idea” here is represented by one ligature of 𐐆 + 𐐅 (1855 EW), one ligature 
>> of 𐐆 + 𐐋 (1859 EW), one ligature of 𐐉 + 𐐆 (1855 OI), and one ligature of 𐐃 + 
>> 𐐆 (1859 OI).
> 
> If they're ligatures, they should be encoded as ligatures; if they're 
> indivisible characters, then their glyph forms are of less interest.

We don’t encode ligatures. We encode letters which are historically derived 
from ligation. That’s what the existing EW and OI are, and that’s what the 1859 
revised letters were.

>> Those ligatures are not glyph variants of one another. You might as well say 
>> that Æ and Œ are glyph variants of one another.
> 
> Æ and Œ have contrasting use; they're used in the same text in distinct ways.

That happens to be the case, but the analogy has to do with the origin of the 
ligatures. 

> Note that n and v̆ are considered glyph variants of each other, because v̆ is 
> used in Sutterlin in exactly the places that n is used in typewritten 
> versions of the text.

It’s n and ǔ in Sütterlin, not n and v̆. 

> æ is not œ even when they are printed in fonts that make it nearly impossible 
> to tell them apart. It has nothing to do with the glyphs or how those glyphs 
> were created, it's because they're used in different ways. 

It was an analogy about the structural development of the ligated letters. 

> The example of Sutterlin strikes me as quite relevant here; characters get 
> all sorts of weird shapes in handwriting. Sometimes they end up immortalized 
> in printing, and then they usually get encoded. Usually not.

Again: The source of 1855 EW and OI uses *different* letters than the 1859 EW 
and OI do. This wasn’t accidental. It’s not hard to puzzle out or to see. This 
isn’t random or even systematic natural development of handwriting styles. It 
was a principled revision done on the basis of phonetic analysis. English 
diphthongs EW and OI were first represented by ligatures representing [ɪuː] and 
[ɒɪ], and then later by ligatures representing [ɪʊ] and [ɔːɪ]. 

Indeed I would say to John Jenkins and Ken Beesley that the richness of the 
history of the Deseret alphabet would be impoverished by treating the 1859 
letters as identical to the 1855 letters. 

Michael Everson


Re: Standaridized variation sequences for the Desert alphabet?

2017-03-23 Thread David Starner
On Thu, Mar 23, 2017 at 6:54 AM Michael Everson 
wrote:

> Again: The source of 1855 EW and OI uses *different* letters than the 1859
> EW and OI do. This wasn’t accidental. It’s not hard to puzzle out or to
> see. This isn’t random or even systematic natural development of
> handwriting styles. It was a principled revision done on the basis of
> phonetic analysis. English diphthongs EW and OI were first represented by
> ligatures representing [ɪuː] and [ɒɪ], and then later by ligatures
> representing [ɪʊ] and [ɔːɪ].
>

Sutterlin was created by Ludwig Sütterlin in 1915. There's lots of
principled revision going on all the time in the world's scripts that
doesn't get recorded by Unicode, and this goes double for young constructed
scripts, where people are playing around with them.


> Indeed I would say to John Jenkins and Ken Beesley that the richness of
> the history of the Deseret alphabet would be impoverished by treating the
> 1859 letters as identical to the 1855 letters.
>

And yet the richness of the history of the Latin alphabet is not
impoverished by treating
https://commons.wikimedia.org/wiki/File:I_littera_in_manuscripto.jpg (a
monocase Latin cursive) as identical to part of the modern Latin-script
alphabet, which besides casing, has split the i/j and u/v on the basis of
phonetic analysis?