subject:"Standaridized variation sequences for the Desert alphabet\?"


On 2017/03/29 01:47, Philippe Verdy wrote:

2017-03-28 18:30 GMT+02:00 Asmus Freytag :


On 3/28/2017 6:56 AM, Michael Everson wrote:


An æ ligature is a ligature of a and of e. It is not some sort of pretzel.


We need a pretzel emoji.


We need a broken tooth emoji too !


I prefer soft pretzels!

Regards,   Martin.

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-28 Thread Asmus Freytag (c)


On 3/28/2017 10:30 AM, Peter Edberg wrote:


On Mar 28, 2017, at 9:30 AM, Asmus Freytag > wrote:


On 3/28/2017 6:56 AM, Michael Everson wrote:
An æ ligature is a ligature of a and of e. It is not some sort of 
pretzel.

We need a pretzel emoji.


Already in Unicode 10 / emoji 5.0:
http://www.unicode.org/emoji/charts/emoji-released.html#1f968


No, like the ae, so a half eaten one. :)



A./

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-28 Thread Peter Edberg


> On Mar 28, 2017, at 9:30 AM, Asmus Freytag  wrote:
> 
> On 3/28/2017 6:56 AM, Michael Everson wrote:
>> An æ ligature is a ligature of a and of e. It is not some sort of pretzel.
> We need a pretzel emoji.

Already in Unicode 10 / emoji 5.0:
http://www.unicode.org/emoji/charts/emoji-released.html#1f968 


> A./
> 
>

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-28 Thread Philippe Verdy

2017-03-28 18:30 GMT+02:00 Asmus Freytag :

> On 3/28/2017 6:56 AM, Michael Everson wrote:
>
>> An æ ligature is a ligature of a and of e. It is not some sort of pretzel.
>>
> We need a pretzel emoji.

We need a broken tooth emoji too !

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-28 Thread Asmus Freytag


On 3/28/2017 6:56 AM, Michael Everson wrote:

An æ ligature is a ligature of a and of e. It is not some sort of pretzel.

We need a pretzel emoji.

A./

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-28 Thread Michael Everson

On 28 Mar 2017, at 11:39, Martin J. Dürst  wrote:

>> And what would the value of this be? Why should I (who have been doing this 
>> for two decades) not be able to use the word “character” when I believe it 
>> correct? Sometimes you people who have been here for a long time behave as 
>> though we had no precedent, as though every time a character were proposed 
>> for encoding it’s as thought nothing had ever been encoded before.
> 
> I didn't say that you have to change words. I just said that I could agree to 
> a slightly differently worded phrase.

An æ ligature is a ligature of a and of e. It is not some sort of pretzel. What 
Deseret has is this:

10426 DESERET CAPITAL LETTER LONG OO WITH STROKE
* officially named “ew” in the code chart
* used for ew in earlier texts
10427 DESERET CAPITAL LETTER SHORT AH WITH STROKE
* officially named “oi” in the code chart
* used for oi in earlier texts
1 DESERET CAPITAL LETTER LONG AH WITH STROKE
* used for oi in later texts
1 DESERET CAPITAL LETTER SHORT OO WITH STROKE
* used for ew in later texts

Don’t go trying to tell me that LONG OO WITH STROKE and SHORT OO WITH STROKE 
are glyph variants of the same character. 

Don’t go trying to tell me that LONG AH WITH STROKE and SHORT AH WITH STROKE 
are glyph variants of the same character. 

To do so is to show no understanding of the history of writing systems at all. 
You’re smarter than that. So are Asmus and Mark and Erkki and any of the other 
sceptics who have chimed in here. 

> And as for precedent, the fact that we have encoded a lot of characters in 
> Unicode doesn't mean that we can encode more characters without checking each 
> and every single case very carefully, as we are doing in this discussion.

The UTC encodes a great many characters without checking them at all, or even 
offering documentation on them to SC2. Don’t think we haven’t observed this. 

>> The sharp s analogy wasn’t useful because whether ſs or ſz users can’t tell 
>> either and don’t care.
> 
> Sorry, but that was exactly the point of this analogy. As to "can't tell", 
> it's easy to ask somebody to look at an actual ß letter and say whether the 
> right part looks more like an s or like a z.

By “can’t tell” I mean “recognize as essentially the same letterform”. The 
streetsigns in some German cities use a very ſʒ if you look at it and know 
anything about typography. Most people probably don’t notice. They see ß and 
that’s precisely because ſs and ſʒ look very much alike. 

> On the other hand, users of Deseret may or may not ignore the difference 
> between the 1855 and 1859 shapes when they read.

The people who wrote the manuscripts are dead. Most readers and writers of 
Deseret today use the shapes that are in their fonts, which are those in the 
Unicode charts, and most texts published today don’t use the EW and OI 
ligatures at all, because that’s John Jenkins’ editorial practice. The need to 
distinguish these letters (which are distinguished because of their history as 
letterforms, not because of the diphthong) is no different from the reason we 
encoded these Ꜩ Ꜫ Ꜭ Ꜯ Ꜳ Ꜵ Ꜷ Ꜹ Ꜻ Ꜽ Ꜿ Ꝃ Ꝁ Ꝅ Ꝇ Ꝉ Ꝋ Ꝍ Ꝏ Ꝑ Ꝓ Ꝕ Ꝗ Ꝙ Ꝛ Ꝝ Ꝟ Ꝡ Ꝣ Ꝥ Ꝧ Ꝩ Ꝫ 
Ꝭ Ꝯ Ꝺ Ꝼ Ᵹ Ꝿ Ꞁ Ꞃ Ꞅ Ꞇ. Scholars required those. Manuscripts may contain them side 
by side. Or their usage may be separated by hundreds of kilometres or hundreds 
of years. There is no difference. There were pages of discussion as to WHY 
scholars needed the medievalist characters. The counter argument was “Why not 
normalize?” We had similar pages of discus!
 sion as to WHY Uralicists needed the great many characters we encoded for 
them. 

Why is it that you people can encode BROCCOLI on the basis of nothing but 
“people might like it” but we cannot use sound existing precedent to encode 
characters which (while similar in use to other characters) are an index of 
orthographic change in a historical script and orthography? There are plenty of 
“glyph variations” in early Deseret texts vis à vis which I’d ignore. 

This isn’t one of them. 

> Of course they will easily see different shapes, but what's important isn't 
> the shapes, it's what they associate it with. If for them, it's just two 
> shapes for one and the same 40th letter of the Deseret alphabet, then that is 
> a strong suggestion for not encoding separately, even if the shapes look 
> really different.

Martin, there is no answer to this unless you can read the minds of people who 
are dead a century or more. Therefore it is not a useful criterion, and the 
other criteria (letter origin, spelling choice) are the indices which must 
guide our understanding. The result of those criteria is that there are four 
characters here, not two. 

> No Fraktur fonts, for instance, offer a shape for U+00DF that looks like an 
> ſs. And what Antiiqua fonts do, well, you get this:
>> 
>> https://en.wikipedia.org/wiki/%C3%9F#/media/File:Sz_modern.svg
> 
> Yes. And we are just starting to collec

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-28 Thread Michael Everson

On 28 Mar 2017, at 07:32, Martin J. Dürst  wrote:

> On 2017/03/28 01:03, Michael Everson wrote:
>> On 27 Mar 2017, at 16:56, John H. Jenkins  wrote:
> 
>> The 1857 St Louis punches definitely included both the 1855 EW 𐐧 and the 
>> 1859 OI <𐐃𐐆>. Ken Beesley shows them in smoke proofs in his 2004 paper on 
>> Metafont.
> 
> Good to have some actual examples. However, the example at hand does, as far 
> as I understand it, not necessarily support separate encoding.

Of course it does.

> While it mixes 1855 and 1859, it contains only one of the ligature variants 
> each.

It’s a smoke proof taken from some metal sorts. It shows that at least these 
two characters were in that font. 

> Indeed, it could be taken as support for the theory that the top and bottom 
> row ligatures in 
> https://en.wikipedia.org/wiki/Deseret_alphabet#/media/File:Deseret_glyphs_ew_and_oi_transformation_from_1855_to_1859.svg
>  were used interchangeably, and that the 1857 St Louis punches just made one 
> particular choice of glyph selection.

"Letters to represent the same diphthong” does not mean “letters used 
interchangeably”. These letters have entirely different histories. They are not 
similar to one another. They are not “glyph variants” of one another by ANY 
measure of character identity that I have learned in two decades of this work, 
where I have examined and successfully proposed a great many characters. 
Martin, your scepticism just doesn’t convince. It seems like it’s scepticism 
for its own sake. You only have to, you know, use your EYES to see that 1855 EW 
looks NOTHING LIKE 1859 EW. Doesn’t matter if they’re used to represent the 
same sound. That doesn’t mean they’re in free variation. In fact, what it looks 
like is that early texts may use some letters, later texts may use other 
letters, and a few texts

This is a matter of SPELLING. Of the choice the author makes. It may be 
important for dating a manuscript. Representing texts as they are written is as 
important for early Deseret as it is for medieval Latin, to researchers who 
care to represent the text as it was without normalizing it to one thing or 
another. 

> What would give a strong argument would be the *concurrent* existence of 
> *corresponding* ligatures in the same font, or the concurrent (even better, 
> contrasting) use of corresponding ligatures in the same text.

Well, ain’t it just too bad that the accident of history has not left us 
complete print shops with all the fonts that were ever used for Deseret. 

The origin of these four letters as ligatures of four distinct letters with 
SHORT I is the right argument for character identity. Recognizability is also a 
strong argument. We used that when we encoded Phoenician, though some people 
argued that Semitic studies would collapse if we didn’t treat Phoenician as a 
font variant of Hebrew. 

Maybe those of you who don’t have to face the ever-moving bar of encoding 
criteria over and over again don’t remember that stuff. 

> What's interesting (weird?) is that the "1859" OI <𐐃𐐆> appears in 1857 
> punches. Time travel? Or is the label "1859" a misnomer or just a convention?

I think 1859 refers to a particular publication. 

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

On 2017/03/27 21:59, Michael Everson wrote:

On 27 Mar 2017, at 08:05, Martin J. Dürst wrote:

Consider 2EBC ⺼ CJK RADICAL MEAT and 2E9D ⺝ CJK RADICAL MOON which are
apparently really supposed to have identical glyphs, though we use an
old-fashioned style in the charts for the former. (Yes, I am of course aware
that there are other reasons for distinguishing these, but as far as glyphs go,
even our standard distinguishes them artificially.)

"apparently", maybe. Let's for a moment leave aside the radicals themselves,
which are to a large extent artificial constructs.

I do stipulate not being a CJK expert. But those are indeed different due to
their origins, however similar their shapes are.

Except for the radicals themselves, I haven't found a contrasting pair.
What I think we would need to find to influence the current
argumentation (except for general "history is important", which I think
we all agree) is a case of a character that originally existed both with
a MEAT radical and a MOON radical, but has only a single usage. Then
whether there were one or two code points would provide an analog for
the situation we have at hand.

Also note that there is a difference in meaning. The characters with
MEAT radicals mostly refer to body parts and organs. The characters with
MOON radicals are mostly time-related.

Let's look at the actual characters with these radicals (e.g. U+6709,... for MOON and
U+808A,... for MEAT), in the multi-column code charts of ISO 10646. There are some
exceptions, but in most cases, the G/J/K columns show no difference (i.e. always the ⺝
shape, with two horizontal bars), whereas the H/T/V columns show the ⺼ shape (two
downwards slanted bars) for the "MEAT" radical and the ⺝ shape for the moon
radical. So whether these radicals have identical glyphs depends on typographic
tradition/font/…

They are still always very similar, right?

Similarity is in the eye of the beholder (or the script).

Sometimes, a little dot or hook is irrelevant. Sometimes it's the single
difference that makes it a totally different character.

In Japan, many people may be rather unaware of the difference, whereas in
Taiwan, it may be that school children get drilled on the difference.

That’s interesting.

Not necessarily for the poor Taiwanese students, and not necessarily for
the Japanese who try to find a character in a dictionary ordered by
radical :-(.

Changing to a different font in order to change one or two glyphs is a
mechanism that we have actually rejected many times in the past. We have
encoded variant and alternate characters for many scripts.

Well, yes, rejected many times in cases where that was appropriate. But also
accepted many times, in cases that we may not even remember, because they may
not even have been made explicitly.

Do come up with examples if you have any.

I had the following in mind:

The roman/italic a/ɑ and g/ɡ distinctions (the later code points only used to
show the distinction in plain text, which could as well be done descriptively),

Aa and Ɑɑ are used contrastively for different sounds in some languages and in the IPA. Ɡɡ
is not, to my knowledge, used contrastively with Gg (except that ɡ can only mean /ɡ/, while
orthographic g can mean /ɡ/, /dʒ/, /x/ etc. But g vs ɡ is reasonably analogous to 𐐦 and
𐐃𐐆 being used for /juː/.

The contrastive use *in some languages or notations* (IPA) is the reason
these are separately encoded. The fact that these are not contrastively
used in most major languages is responsible for the fact that they don't
use different code points when used in these languages. It would be a
real hassle to have to change from g to ɡ when switching e.g. from Times
Roman to Times Italic.

In Deseret, we are still missing any contrastive usage, so that suggests
to be careful with encoding.

as well as a large number of distinctions in Han fonts, come to my mind.

It's difficult to show these distinctions, because they are NOT
separately encoded, but three-stroke and four-stroke grass radical is
the most well known.

And the same goes for the /juː/ ligatures. The word tube /tjuːb/ can be written TYŪB
𐐓𐐏𐐅𐐒 or 𐐓𐐧𐐒 or 𐐓<𐐆𐐋>𐐒. But the unligated the sequences would be pronounced
differently: 𐐓𐐏𐐅𐐒 /tjuːb/ and 𐐓𐐆𐐅𐐒 /tɪuːb/ and 𐐓𐐆𐐋𐐒 /tɪʊb/.

Ah, I see. So we seem to have five different ways (counting the two
ligature variants) of writing the same word, with three different
pronunciations. The important question is whether the two ligatures do
imply any difference in pronunciation (as opposed to time of writing or
author/printer preference), i.e. whether the ligated sequences 𐐓𐐧𐐒 or
𐐓<𐐆𐐋>𐐒 are pronounced differently (not by a phonologist but by an
average user).

Is the choice of variant up to the author (for which variants), or is it the
editor or printer who makes the choice (for which variants)?

In a handwritten manuscript obviously the choice is the author’s. As to
historic

Re: Standaridized variation sequences for the Desert alphabet?


On 2017/03/28 01:20, Michael Everson wrote:


Ken transcribes into modern type a letter by Shelton dated 1859, in which “boy” is written 𐐒<𐐃𐐆>, 
“few” as 𐐙<𐐆𐐋>, “truefully” [sic] as 𐐓𐐡<𐐆𐐋>𐐙𐐋𐐢𐐆, and “you” as 𐐏<𐐆𐐋>.


These are all 1859 variants, yes? That would just show that these 
variants existed (which I think nobody in this discussion has doubted), 
but not that there was contrasting use. And is that letter hand-written 
or printed?


Regards,Martin.

Re: Standaridized variation sequences for the Desert alphabet?


On 2017/03/28 01:49, Michael Everson wrote:


Sorry, but typographic control of that sort is grand for typesetting, where you 
can select ranges of text and language-tag it (assuming your program accepts 
and supports all the language tags you might need (which they don’t)) and you 
can select fonts which have all the trickery baked into them (hardly any do) 
and then… can you use this in file names? In your plain-text databases? In your 
text messages?


Do you think that the 1855/1859 distinction is needed in file names? In 
text messages? It may help in some kinds of databases, but it may also 
be possible to just tag each piece of text in the database with "1855" 
or "1859" if that distinction is important (e.g. for historical 
documents). As far as I understand, we are still looking for actual 
texts that use both shapes of the same ligature concurrently.


Regards,   Martin.

Re: Standaridized variation sequences for the Desert alphabet?


I agree with Alstair.

The list of font technology options was mostly to show that there are 
already a lot of options (some might even say too many), so font 
technology doesn't really limit our choices.


Regards,   Martin.

On 2017/03/27 23:04, Alastair Houghton wrote:

On 27 Mar 2017, at 10:14, Julian Bradfield  wrote:


I contend, therefore, that no decision about Unicode should take into
account any ephemeral considerations such as this year's electronic
font technology, and that therefore it's not even useful to mention
them.


I’d disagree with that, for two reasons:

1. Unicode has to be usable *today*; it’s no good designing for some kind of 
hyper-intelligent AI-based font technology a thousand years hence, because we 
don’t have that now.  If it isn’t usable today for any given purpose, people 
won’t use it for that, and will adopt alternative solutions (like using images 
to represent text).

2. “This year’s electronic font technology” is actually quite powerful, and is 
unlikely to be supplanted by something *less* powerful in future.  There is an 
argument about exactly how widespread support for it is (for instance, simple 
text editors are clearly lacking in support for stylistic alternates, except 
possibly on the Mac where there’s built-in support in the standard text edit 
control), but again I think it’s reasonable to expect support to grow over 
time, rather than being removed.

I don’t think it’s unreasonable, then, to point out that mechanisms like 
stylistic or contextual alternates exist, or indeed for that knowledge to 
affect a decision about whether or not a character should be encoded, *bearing 
in mind* the likely direction of travel of font and text rendering support in 
widely available operating systems.

All that said, I’d definitely defer to others on the subject of whether or not 
Unicode needs the Deseret characters being discussed here.  That’s very much 
not my field.

Kind regards,

Alastair.

--
http://alastairs-place.net

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-28 Thread Mark Davis ☕️

On Tue, Mar 28, 2017 at 12:39 PM, Martin J. Dürst 
wrote:

No, your work wouldn't be impossible. It might be quite a bit more
> difficult, but not impossible. I have written papers about Han ideographs
> and Japanese text processing where I had to create my own fonts (8-bit,
> with mostly random assignments of characters because these were one-off
> jobs), or fake things with inline bitmap images (trying to get information
> on the final printer resolution and how many black pixels wide a stem or
> crossbar would have to be to avoid dropouts, and not being very successful).
>
> I have heard the argument that some character variant is needed because of
> research, history,... quite a few times. If a character has indeed been
> historically used in a contrasting way, this is definitely a good argument
> for encoding. But if a character just looked somewhat different a few
> (hundreds of) years ago, that doesn't make such a good argument. Otherwise,
> somebody may want to propose new codepoints for Bodoni and Helvetica,...
>

I agree with Martin.

Moreover, his last paragraphs are getting at the crux of the matter.
Unicode is not a registry of glyphs for letters, nor should try to be.
Simply because someone used a particular shape at some time to mean a
letter doesn't mean that Unicode should encode a letter for that shape. We
do not need to capture all of the shapes in
https://upload.wikimedia.org/wikipedia/commons/f/fc/Gebrochene_Schriften.png
simply because somebody is going to "publish a volume full of" those shapes.

Mark

Re: Standaridized variation sequences for the Desert alphabet?

Hello Michael, others,

On 2017/03/27 21:07, Michael Everson wrote:

On 27 Mar 2017, at 06:42, Martin J. Dürst wrote:

The characters in question have different and undisputed origins, undisputed.

If you change that to the somewhat more neutral "the shapes in question have
different and undisputed origins", then I'm with you. I actually have said as much
(in different words) in an earlier post.

And what would the value of this be? Why should I (who have been doing this for
two decades) not be able to use the word “character” when I believe it correct?
Sometimes you people who have been here for a long time behave as though we had
no precedent, as though every time a character were proposed for encoding it’s
as thought nothing had ever been encoded before.

I didn't say that you have to change words. I just said that I could
agree to a slightly differently worded phrase.

And as for precedent, the fact that we have encoded a lot of characters
in Unicode doesn't mean that we can encode more characters without
checking each and every single case very carefully, as we are doing in
this discussion.

The sharp s analogy wasn’t useful because whether ſs or ſz users can’t tell
either and don’t care.

Sorry, but that was exactly the point of this analogy. As to "can't
tell", it's easy to ask somebody to look at an actual ß letter and say
whether the right part looks more like an s or like a z. On the other
hand, users of Deseret may or may not ignore the difference between the
1855 and 1859 shapes when they read. Of course they will easily see
different shapes, but what's important isn't the shapes, it's what they
associate it with. If for them, it's just two shapes for one and the
same 40th letter of the Deseret alphabet, then that is a strong
suggestion for not encoding separately, even if the shapes look really
different.

No Fraktur fonts, for instance, offer a shape for U+00DF that looks like an ſs.
And what Antiiqua fonts do, well, you get this:

https://en.wikipedia.org/wiki/%C3%9F#/media/File:Sz_modern.svg

Yes. And we are just starting to collect evidence for Deseret fonts.

And there’s nothing unrecognizable about the ſɜ (< ſꝫ (= ſz)) ligature there.

Well, not to somebody used to it. But non-German users quite often use a
Greek β where they should use a ß, so it's no surprise people don't
distinguish the ſs and ſz derived glyphs.

The situation in Deseret is different.

The graphic difference is definitely bigger, so to an outsider, it's
definitely quite impossible to identify the pairs of shapes. But that
does in no way mean that these have to be seen as different characters
(rather than just different glyphs) by insiders (actual users).

To use another analogy, many people these days (me included) would have
difficulties identifying Fraktur letters, in particular if they show up
just as individual letters. Similar for many fantasy fonts, and for
people not very familiar with the Latin script.

Underlying ligature difference is indicative of character identity.
Particularly when two resulting ligatures are SO different from one another as
to be unrecognizable. And that is the case with EW on the left and OI on the
right here:
https://en.wikipedia.org/wiki/Deseret_alphabet#/media/File:Deseret_glyphs_ew_and_oi_transformation_from_1855_to_1859.svg

The lower two letterforms are in no way “glyph variants” of the upper two
letterforms. Apart from the stroke of the SHORT I 𐐆 they share nothing in
common — because they come from different sources and are therefore different
characters.

The range of what can be a glyph variant is quite wide across scripts
and font styles. Just that the shapes differ widely, or that the origin
is different, doesn't make this conclusive.

Character origin is intimately related to character identity.

In most cases, yes. But it's not a given conclusion.

I don’t think that ANY user of Deseret is all that “average”. Certainly some
users of Deseret are experts interested in the script origin, dating,
variation, and so on — just as we have medievalists who do the same kind of
work. I’m about to publish a volume full of characters from Latin Extended-D.
My work would have been impossible had we not encoded those characters.

No, your work wouldn't be impossible. It might be quite a bit more
difficult, but not impossible. I have written papers about Han
ideographs and Japanese text processing where I had to create my own
fonts (8-bit, with mostly random assignments of characters because these
were one-off jobs), or fake things with inline bitmap images (trying to
get information on the final printer resolution and how many black
pixels wide a stem or crossbar would have to be to avoid dropouts, and
not being very successful).

I have heard the argument that some character variant is needed because
of research, history,... quite a few times. If a character has indeed
been historically used in a contr

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-27 Thread Martin J. Dürst


On 2017/03/28 01:03, Michael Everson wrote:

On 27 Mar 2017, at 16:56, John H. Jenkins  wrote:



The 1857 St Louis punches definitely included both the 1855 EW 𐐧 and the 1859 OI 
<𐐃𐐆>. Ken Beesley shows them in smoke proofs in his 2004 paper on Metafont.


Good to have some actual examples. However, the example at hand does, as 
far as I understand it, not necessarily support separate encoding.


While it mixes 1855 and 1859, it contains only one of the ligature 
variants each. Indeed, it could be taken as support for the theory that 
the top and bottom row ligatures in 
https://en.wikipedia.org/wiki/Deseret_alphabet#/media/File:Deseret_glyphs_ew_and_oi_transformation_from_1855_to_1859.svg 
were used interchangeably, and that the 1857 St Louis punches just made 
one particular choice of glyph selection.


What would give a strong argument would be the *concurrent* existence of 
*corresponding* ligatures in the same font, or the concurrent (even 
better, contrasting) use of corresponding ligatures in the same text.


Regards,   Martin.

What's interesting (weird?) is that the "1859" OI <𐐃𐐆> appears in 1857 
punches. Time travel? Or is the label "1859" a misnomer or just a 
convention?

Re: Standaridized variation sequences for the Desert alphabet?

I’ll look into whatever you’re on about the other ‘minor’ script, but with 
regard to what you’ve said below, I’m fairly sure I encoded the missing 
characters there. I believe it was A7AE and A7B0, capital letters turned K and 
T used in that orthography. There is a problem with turned P and p in that 
orthography, though, but no one has ever chosen to look at that. But apart from 
dealing with the turned p, I do not believe it’s correct to say that that 
alphabet was “rejected”. 

Oh, there is a problem with the turned cedilla above; that seems to be missing 
too. 

> On 28 Mar 2017, at 01:04, David Starner  wrote:
> 
> When the discussion of the Hopi-English dictionary comes up, I'm reminded 
> that the Siouian alphabet for Latin, 
> https://commons.wikimedia.org/wiki/File:BAE-Siouan_Alphabet.png , was 
> rejected for encoding, at least on this list, because it was only used in one 
> set of publications that were distributed to every major library in the US, 
> unlike the Hopi dictionary that was stuck in an archive somewhere.

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-27 Thread David Starner

On Mon, Mar 27, 2017 at 1:34 AM Martin J. Dürst 
wrote:

> The qualification 'minor' is less important for an alphabet. In general,
> the more established and well-known an alphabet is, the wider the
> variations of glyph shapes that may be tolerated.
>

My problem with that is that a new script is likely to have wider variation
in properties. It invites people to tinker, with the possibility that any
new changes have a chance to become popular. And variants that show up in
Latin script, like http://www.gutenberg.org/files/20130/20130-h/20130-h.htm ,
don't tend to get encoded unless they have serious support.

When the discussion of the Hopi-English dictionary comes up, I'm reminded
that the Siouian alphabet for Latin,
https://commons.wikimedia.org/wiki/File:BAE-Siouan_Alphabet.png , was
rejected for encoding, at least on this list, because it was only used in
one set of publications that were distributed to every major library in the
US, unlike the Hopi dictionary that was stuck in an archive somewhere.

Re: Standaridized variation sequences for the Desert alphabet?

On 27 Mar 2017, at 15:04, Alastair Houghton  
wrote:

> 1. Unicode has to be usable *today*; it’s no good designing for some kind of 
> hyper-intelligent AI-based font technology a thousand years hence, because we 
> don’t have that now.  If it isn’t usable today for any given purpose, people 
> won’t use it for that, and will adopt alternative solutions (like using 
> images to represent text).

Nothing’s easier than representing encoded characters. :-) 

> 2. “This year’s electronic font technology” is actually quite powerful, and 
> is unlikely to be supplanted by something *less* powerful in future.  There 
> is an argument about exactly how widespread support for it is (for instance, 
> simple text editors are clearly lacking in support for stylistic alternates, 
> except possibly on the Mac where there’s built-in support in the standard 
> text edit control), but again I think it’s reasonable to expect support to 
> grow over time, rather than being removed.

Sorry, but typographic control of that sort is grand for typesetting, where you 
can select ranges of text and language-tag it (assuming your program accepts 
and supports all the language tags you might need (which they don’t)) and you 
can select fonts which have all the trickery baked into them (hardly any do) 
and then… can you use this in file names? In your plain-text databases? In your 
text messages?

> I don’t think it’s unreasonable, then, to point out that mechanisms like 
> stylistic or contextual alternates exist, or indeed for that knowledge to 
> affect a decision about whether or not a character should be encoded, 
> *bearing in mind* the likely direction of travel of font and text rendering 
> support in widely available operating systems.

They exist. And can be useful for some things. I think that historic origin of 
the Deseret diphthong letters and the importance these options have for the 
study of Deseret orthographic choices throughout the early period of its use.

> All that said, I’d definitely defer to others on the subject of whether or 
> not Unicode needs the Deseret characters being discussed here.  That’s very 
> much not my field.

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

On 27 Mar 2017, at 17:07, John H. Jenkins  wrote:

> This should teach me to double-check before posting.

The research is a lot of fun. Can’t wait till I get Ken’s book next week.

> Apparently, the earlier typeface *did* include all forty letters; it just 
> didn't use these two. I don't know what glyphs were used.

What I understood is that typefaces included the letters but there’s no *chart* 
that contains both 1859 letters. 

Ken transcribes into modern type a letter by Shelton dated 1859, in which “boy” 
is written 𐐒<𐐃𐐆>, “few” as 𐐙<𐐆𐐋>, “truefully” [sic] as 𐐓𐐡<𐐆𐐋>𐐙𐐋𐐢𐐆, and “you” as 
𐐏<𐐆𐐋>. 

Fascinating stuff. 

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-27 Thread John H. Jenkins


> On Mar 27, 2017, at 9:56 AM, John H. Jenkins  wrote:
> 
> 
>> On Mar 27, 2017, at 2:04 AM, James Kass > > wrote:
>> 
>>> 
>>> If we have any historic metal types, are there
>>> examples where a font contains both ligature
>>> variants?
>> 
>> Apparently not.
>> 
>> John H. Jenkins mentioned early in this thread that these ligatures
>> weren't used in printed materials and were not part of the official
>> Deseret set.  They were only used in manuscript.
>> 
> 
> This is correct. Neither of the nineteenth century metal types included the 
> letters in question. Nor were they included in any electronic fonts that I'm 
> aware of before they were included in Unicode. 
> 

This should teach me to double-check before posting. Apparently, the earlier 
typeface *did* include all forty letters; it just didn't use these two. I don't 
know what glyphs were used.

Re: Standaridized variation sequences for the Desert alphabet?

On 27 Mar 2017, at 16:56, John H. Jenkins  wrote:

>> John H. Jenkins mentioned early in this thread that these ligatures weren't 
>> used in printed materials and were not part of the official Deseret set.  
>> They were only used in manuscript.
> 
> This is correct. Neither of the nineteenth century metal types included the 
> letters in question. Nor were they included in any electronic fonts that I'm 
> aware of before they were included in Unicode.

The 1857 St Louis punches definitely included both the 1855 EW 𐐧 and the 1859 
OI <𐐃𐐆>. Ken Beesley shows them in smoke proofs in his 2004 paper on Metafont.

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-27 Thread John H. Jenkins

> On Mar 27, 2017, at 2:04 AM, James Kass  wrote:
> 
>> 
>> If we have any historic metal types, are there
>> examples where a font contains both ligature
>> variants?
> 
> Apparently not.
> 
> John H. Jenkins mentioned early in this thread that these ligatures
> weren't used in printed materials and were not part of the official
> Deseret set.  They were only used in manuscript.
> 

This is correct. Neither of the nineteenth century metal types included the 
letters in question. Nor were they included in any electronic fonts that I'm 
aware of before they were included in Unicode.

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-27 Thread Alastair Houghton

On 27 Mar 2017, at 14:49, Michael Everson  wrote:

>> 3) Font features (e.g. 1855 vs. 1859) to select shapes in the same font
> 
> Font trickery. Not portable. Not supported by most apps. 

I wouldn’t describe it as “trickery” or “not portable”.  Features like 
stylistic alternates are part of the OpenType specification, and actually have 
quite widespread support in Mac software (check out the Typography panel, which 
you can get to from the system Font Panel).  On Windows and Linux, support is 
more limited, though software that uses the newer DirectWrite or Pango APIs to 
render text should find it straightforward enough.

I don’t know how this bears on the discussion about Deseret (that’s outside my 
area of expertise), but as a software developer I’d certainly *prefer* to see 
font features used (rather than, say, assigning a new code point or using 
variation selectors) where the primary difference is in the rendering rather 
than the meaning.

Kind regards,

Alastair.

--
http://alastairs-place.net

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-27 Thread Alastair Houghton

On 27 Mar 2017, at 10:14, Julian Bradfield  wrote:
> 
> I contend, therefore, that no decision about Unicode should take into
> account any ephemeral considerations such as this year's electronic
> font technology, and that therefore it's not even useful to mention
> them.

I’d disagree with that, for two reasons:

1. Unicode has to be usable *today*; it’s no good designing for some kind of 
hyper-intelligent AI-based font technology a thousand years hence, because we 
don’t have that now.  If it isn’t usable today for any given purpose, people 
won’t use it for that, and will adopt alternative solutions (like using images 
to represent text).

2. “This year’s electronic font technology” is actually quite powerful, and is 
unlikely to be supplanted by something *less* powerful in future.  There is an 
argument about exactly how widespread support for it is (for instance, simple 
text editors are clearly lacking in support for stylistic alternates, except 
possibly on the Mac where there’s built-in support in the standard text edit 
control), but again I think it’s reasonable to expect support to grow over 
time, rather than being removed.

I don’t think it’s unreasonable, then, to point out that mechanisms like 
stylistic or contextual alternates exist, or indeed for that knowledge to 
affect a decision about whether or not a character should be encoded, *bearing 
in mind* the likely direction of travel of font and text rendering support in 
widely available operating systems.

All that said, I’d definitely defer to others on the subject of whether or not 
Unicode needs the Deseret characters being discussed here.  That’s very much 
not my field.

Kind regards,

Alastair.

--
http://alastairs-place.net

Re: Standaridized variation sequences for the Desert alphabet?

On 27 Mar 2017, at 09:29, Martin J. Dürst  wrote:

>> He is. He transcribes texts into Deseret. I’ve published three of them 
>> (Alice, Looking-Glass, and Snark).
> 
> Great to know. Given that, I'd assume that you'd take his input a bit more 
> serious.

I’m discussing it now, offline, with him and Ken.

> Here's what he wrote:
> 
> 
> My own take on this is "absolutely not." This is a font issue, pure and 
> simple. There is no dispute as to the identity of the characters in question, 
> just their appearance.

That begs the whole question of character identity. He’s simply saying what you 
and Asmus also said. But when you dig into it further, there’s more to the 
story, as we have found out. 

> In any event, these two letters were never part of the "standard" Deseret 
> Alphabet used in printed materials. To the extent they were used, it was in 
> hand-written material only, where you're going to see a fair amount of 
> variation anyway. There were also two recensions of the DA used in printed 
> materials which are materially different, and those would best be handled via 
> fonts.

There was indeed type cut for these. What’s not found is a full alphabet chart 
showing some of the ligated letters, but that’s a different question.

> It isn't unreasonable to suggest we change the glyphs we use in the Standard. 
> Ken Beesley and I have have discussed the possibility, and we both feel that 
> it's very much on the table.
> 

Now that further research has been done, I’ll be discussing this with John and 
Ken with regard to putting together a proposal which will support the two 
ligating letterform characters as well as some other historical Deseret 
characters, some used in an important English-Hopi lexicon which was recently 
published. (I await my copy of that.)

>> I am a designer and typographer, and I’ve worked rather extensively with a 
>> variety of Deseret fonts for my publications. They have been well-received.
> 
> That's fine, and not disputed at all. That's exactly why I'm looking for 
> input from other people.

Well, all right, but I didn’t use either 𐐦 or 𐐧 in my editions apart from the 
entry in the chart in the front matter. 

> As an analogy, assume we had a famous type designer coming to this list and 
> request that we encode old-style digits separately from roman digits, e.g. 
> arguing that this might simplify the production of fonts.

I don’t see how this analogy could possibly apply. Once again the 1859 
ligature-characters look nothing at all like the 1855 one, which speaks to 
their unique identity as characters. 

Moreover, encoded digits are used by billions of people daily.

> We would understand this request, but we would still deny it because based on 
> our day-to-day use of digits, we would understand that at large (i.e. for the 
> average user) the convenience of having only one code point for a given digit 
> weights stronger than the convenience of separate code points for the type 
> designer.

I’m not suggesting encoding characters for “convenience”. I’m suggesting that 
there is a character-identity issue here, based both on the origin of the 
characters and of their vasty different appearance from other characters 
encoded in the standard. 

> We are looking for similar input from "average users" for Deseret.

The encoding of historic characters is for “expert users” working with 
historical material, not necessarily “average users” who might be composing 
blog entries. 

>> Actually neither of the ligature-letters are used in our Carrollian Deseret 
>> volumes.
> 
> Ok. That means that these don't provide any information on the discussion at 
> hand (whether to unify or disunify the ligature shapes).

I didn’t even know about the 1859 ligatures until this week. All this proves is 
that John didn’t use any ligatures when he transcribed the texts. 

>> You know, Martin, I *have* been doing this for the last two decades. I’m 
>> well aware of what a font is and can do.
> 
> Great. So you know that present-day font technology would allow us to handle 
> the different shapes in at least any of the following ways:
> 
> 1) Separate characters for separate shapes, both shapes in same font

We shouldn’t do that for shapes so different and with clearly different origins.

> 2) Variant selectors, one or both shapes in same font

Pseudo-encoding, useful for subtle variation but not for something as big as 
this. I am not an enemy of variation selectors. In fact I’m preparing a nice 
proposal for some standardized sequences. It would not apply here, because they 
glyph identity of the letters is too distinct. 

> 3) Font features (e.g. 1855 vs. 1859) to select shapes in the same font

Font trickery. Not portable. Not supported by most apps. 

> 4) Font selection, different fonts for different shapes

We really don’t do this just for one or two characters in a script. 

> Does that knowledge in any way suggest one particular solution?

None of this discussion has conv

Re: Standaridized variation sequences for the Desert alphabet?

On 27 Mar 2017, at 09:04, James Kass  wrote:

> John H. Jenkins mentioned early in this thread that these ligatures weren't 
> used in printed materials and were not part of the official Deseret set.  
> They were only used in manuscript.

Not quite true. Such detail will be for the proposal.

Michael

Re: Standaridized variation sequences for the Desert alphabet?

On 27 Mar 2017, at 08:05, Martin J. Dürst  wrote:

>> Consider 2EBC ⺼ CJK RADICAL MEAT and 2E9D ⺝ CJK RADICAL MOON which are 
>> apparently really supposed to have identical glyphs, though we use an 
>> old-fashioned style in the charts for the former. (Yes, I am of course aware 
>> that there are other reasons for distinguishing these, but as far as glyphs 
>> go, even our standard distinguishes them artificially.)
> 
> "apparently", maybe. Let's for a moment leave aside the radicals themselves, 
> which are to a large extent artificial constructs.

I do stipulate not being a CJK expert. But those are indeed different due to 
their origins, however similar their shapes are. 

> Let's look at the actual characters with these radicals (e.g. U+6709,... for 
> MOON and U+808A,... for MEAT), in the multi-column code charts of ISO 10646. 
> There are some exceptions, but in most cases, the G/J/K columns show no 
> difference (i.e. always the ⺝ shape, with two horizontal bars), whereas the 
> H/T/V columns show the ⺼ shape (two downwards slanted bars) for the "MEAT" 
> radical and the ⺝ shape for the moon radical. So whether these radicals have 
> identical glyphs depends on typographic tradition/font/…

They are still always very similar, right?

> In Japan, many people may be rather unaware of the difference, whereas in 
> Taiwan, it may be that school children get drilled on the difference.

That’s interesting. 

>> One practical consequence of changing the chart glyphs now, for instance, 
>> would be that it would invalidate every existing Deseret font. Adding new 
>> characters would not.
> 
> Independent of whether the chart glyphs get changed, couldn't we just add a 
> note "also # in some fonts" (where # is the other variant). 

Well, no. First, ALL fonts currently use the 1855 letterforms based on 
ligatures 𐐉𐐆 and 𐐆𐐅, so a decree that those code positions would 

Second, the letterforms resulting from the ligations are just nothing alike 

> That would make sure that nobody could claim "this font is wrong" based on 
> the charts. (Even if a general claim that the chart glyphs aren't normative 
> applies to all charts anyway.)

As James Kass said: "If spelling a word with an x+y string versus a z+y string 
represents two different spellings of the same word, then hand printing the 
same word with either an x/y ligature versus a z/y ligature also represents two 
different spellings of the same word."

>> Changing to a different font in order to change one or two glyphs is a 
>> mechanism that we have actually rejected many times in the past. We have 
>> encoded variant and alternate characters for many scripts.
> 
> Well, yes, rejected many times in cases where that was appropriate. But also 
> accepted many times, in cases that we may not even remember, because they may 
> not even have been made explicitly.

Do come up with examples if you have any. 

> Because in such cases, the focus may not be on a change to one or a few 
> letter shapes, but the focus may be on a change of the overall style, which 
> induces a change of letter shape in some letters.

To be honest I really don’t follow this reasoning. 
https://en.wikipedia.org/wiki/Deseret_alphabet#/media/File:Deseret_glyphs_ew_and_oi_transformation_from_1855_to_1859.svg
 isn’t just some “glyph variation”. They are entirely different glyphs with 
entirely different origins. I can think of no instance where we have "unified” 
such wildly different glyphs. 

> The roman/italic a/ɑ and g/ɡ distinctions (the later code points only used to 
> show the distinction in plain text, which could as well be done 
> descriptively),

Aa and Ɑɑ are used contrastively for different sounds in some languages and in 
the IPA. Ɡɡ is not, to my knowledge, used contrastively with Gg (except that ɡ 
can only mean /ɡ/, while orthographic g can mean /ɡ/, /dʒ/, /x/ etc. But g vs ɡ 
is reasonably analogous to 𐐦 and 𐐃𐐆 being used for /juː/.

> as well as a large number of distinctions in Han fonts, come to my mind. I'm 
> quite sure other scripts have similar phenomena.

Again, spelling of all kinds varies greatly in Deseret texts. I’ll try with 
another example using some Latin glyphs. “Poison” can be written 𐐑𐐄𐐆𐐞𐐇𐐤 POIZƐN 
in Deseret, or it can be written 𐐑𐐦𐐞𐐇𐐤 PƟZƐN or it can be written 𐐑<𐐃𐐆>𐐞𐐇𐐤 
PɄZƐN. That’s three different spellings, not two. (I used O with a bar to mimic 
the bar of Deseret SHORT I 𐐆). 

>> Character identity is not defined by any single criterion. Moreover, in 
>> Deseret, it is not the case that all texts which contain the diphthong /juː/ 
>> or /ɔɪ/ write it using EW 𐐧 or OI 𐐦. Many write them as Y + U 𐐏𐐋 and O + I 
>> 𐐄𐐆. So the choice is one of *spelling*, and spelling has always been a 
>> primary criterion for such decisions.
> 
> This is interesting information. You are saying that in actual practice, 
> there is a choice between writing 𐐄𐐆 (two letters for a diphthong) and 
> writing 𐐧. In the same location, is 𐐆𐐋 (the base for the historically

Re: Standaridized variation sequences for the Desert alphabet?

On 27 Mar 2017, at 06:42, Martin J. Dürst  wrote:

>> The default position is NOT “everything is encoded unified until disunified”.
> 
> Neither it's "everything is encoded separately unless it's unified”.

These Deseret letters aren’t encoded. For my part I wasn’t made aware of them 
in 2004 when they were written about. My view is “Ah, here’s something. is it 
encoded? No. Is it a glyph variant of something encoded? No."

>> The characters in question have different and undisputed origins, undisputed.
> 
> If you change that to the somewhat more neutral "the shapes in question have 
> different and undisputed origins", then I'm with you. I actually have said as 
> much (in different words) in an earlier post.

And what would the value of this be? Why should I (who have been doing this for 
two decades) not be able to use the word “character” when I believe it correct? 
Sometimes you people who have been here for a long time behave as though we had 
no precedent, as though every time a character were proposed for encoding it’s 
as thought nothing had ever been encoded before.

>> We’ve encoded one pair; evidently this pair was deprecated and another pair 
>> was devised. The letters wynn and w are also used for the same thing. They 
>> too have different origins and are encoded separately. The letters yogh and 
>> ezh have different origins and are encoded separately. (These are not 
>> perfect analogies, but they are pertinent.)
> 
> Fine. I (and others) have also given quite a few analogies, none of them 
> perfect, but most if not all of them pertinent.

The sharp s analogy wasn’t useful because whether ſs or ſz users can’t tell 
either and don’t care. No Fraktur fonts, for instance, offer a shape for U+00DF 
that looks like an ſs. And what Antiiqua fonts do, well, you get this:

https://en.wikipedia.org/wiki/%C3%9F#/media/File:Sz_modern.svg

And there’s nothing unrecognizable about the ſɜ (< ſꝫ (= ſz)) ligature there. 
The situation in Deseret is different.

Other analogies had to do with normal shape variation, not shapes derived from 
underlying ligatures. Analogies are never perfect but I don’t think the ones 
offered were pertinent.

Underlying ligature difference is indicative of character identity. 
Particularly when two resulting ligatures are SO different from one another as 
to be unrecognizable. And that is the case with EW on the left and OI on the 
right here: 
https://en.wikipedia.org/wiki/Deseret_alphabet#/media/File:Deseret_glyphs_ew_and_oi_transformation_from_1855_to_1859.svg

The lower two letterforms are in no way “glyph variants” of the upper two 
letterforms. Apart from the stroke of the SHORT I 𐐆 they share nothing in 
common — because they come from different sources and are therefore different 
characters. 

>>> We haven't yet heard of any contrasting uses for the letter shapes we are 
>>> discussing.
>> 
>> Contrasting use is NOT the only criterion we apply when establishing the 
>> characterhood of characters.
> 
> Sorry, but where did I say that it's the only criterion? I don't think it's 
> the only criterion. On the other hand, I also don't think that historical 
> origin is or should be the only criterion.

Neither do I, but it has been a very clear precedent for many character 
distinctions and that is useful precedent. 

> Unfortunately, much of what you wrote gave me the impression that you may 
> think that historical origin is the only criterion, or a criterion that 
> trumps all others. If you don't think so, it would be good if you could 
> confirm this. If you think so, it would be good to know why.

Character origin is intimately related to character identity. Even where 
superficial similarity is concerned; I had to prove character origin for the 
disunification of YOGH from EZH long long ago and I’ve done the same over and 
over again for many characters and even full scripts. Sometimes characters are 
used and then become disused. MOST of the Bamum characters we have encoded 
aren’t in modern use today, but they were encoded for historical concerns. 

>> Please try to remember that. (It’s a bit shocking to have to remind people 
>> of this.
> 
> You don't have to remind me, at least. I have mentioned "usability for 
> average users in average contexts" and "contrasting use" as criteria, and I 
> have also in earlier mail acknowledged history as a (not the) criterion, and 
> have mentioned legacy/roundtrip issues. I'm sure there are others.

I don’t think that ANY user of Deseret is all that “average”. Certainly some 
users of Deseret are experts interested in the script origin, dating, 
variation, and so on — just as we have medievalists who do the same kind of 
work. I’m about to publish a volume full of characters from Latin Extended-D. 
My work would have been impossible had we not encoded those characters. 

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

On 27 Mar 2017, at 05:58, James Kass  wrote:
> 
> Asmus Freytag wrote,
> 
>> In the current case, you have the opposite, to wit, the text elements are 
>> unchanged, but you would like to add alternate code elements
>> to represent what are, ultimately, the same text elements. That's not 
>> disunification, but dual encoding.
> 
> If spelling a word with an x+y string versus a z+y string represents two 
> different spellings of the same word, then hand printing the same
> word with either an x/y ligature versus a z/y ligature also represents two 
> different spellings of the same word.

Asmus also changes the terms of the discussion by introducing the vague and 
undefined term “text element”. 

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-27 Thread Julian Bradfield

While I hesitate to dive in to this argument, Martin makes one comment
where I think a point of principle arises:

On 2017-03-27, =?UTF-8?Q?Martin_J._D=c3=bcrst?=  wrote:
[Michael wrote]
>> You know, Martin, I *have* been doing this for the last two decades. I’m 
>> well aware of what a font is and can do.
>
> Great. So you know that present-day font technology would allow us to 
> handle the different shapes in at least any of the following ways:
>
> 1) Separate characters for separate shapes, both shapes in same font
> 2) Variant selectors, one or both shapes in same font
> 3) Font features (e.g. 1855 vs. 1859) to select shapes in the same font
> 4) Font selection, different fonts for different shapes
>
> Does that knowledge in any way suggest one particular solution?

As I've observed before, the intention is that we are stuck with
Unicode for as long as our civilization endures, be that 5000 years or
50 years.

I contend, therefore, that no decision about Unicode should take into
account any ephemeral considerations such as this year's electronic
font technology, and that therefore it's not even useful to mention
them.

All you should need to say is "these letters are too insignificant to
merit encoding, and those who believe they need to be able to
distinguish them in plain text will just have to use other means, such
as ZWJ with the components of the ligature".

(I'm not saying that's my view, by the way - I'm more of a splitter
than a lumper, and on the basis of this thread, I'm probably on the
"encode" side.)

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-27 Thread Martin J. Dürst


On 2017/03/24 23:37, Michael Everson wrote:

On 24 Mar 2017, at 11:34, Martin J. Dürst  wrote:


On 2017/03/23 22:48, Michael Everson wrote:


Indeed I would say to John Jenkins and Ken Beesley that the richness of the 
history of the Deseret alphabet would be impoverished by treating the 1859 
letters as identical to the 1855 letters.


Well, I might be completely wrong, but John Jenkins may be the person on this 
list closest to an actual user of Deseret (John, please correct me if I'm wrong 
one way or another).


He is. He transcribes texts into Deseret. I’ve published three of them (Alice, 
Looking-Glass, and Snark).


Great to know. Given that, I'd assume that you'd take his input a bit 
more serious. Here's what he wrote:



My own take on this is "absolutely not." This is a font issue, pure and 
simple. There is no dispute as to the identity of the characters in 
question, just their appearance.


In any event, these two letters were never part of the "standard" 
Deseret Alphabet used in printed materials. To the extent they were 
used, it was in hand-written material only, where you're going to see a 
fair amount of variation anyway. There were also two recensions of the 
DA used in printed materials which are materially different, and those 
would best be handled via fonts.


It isn't unreasonable to suggest we change the glyphs we use in the 
Standard. Ken Beesley and I have have discussed the possibility, and we 
both feel that it's very much on the table.





It may be that actual users of Deseret read these character variants the same 
way most of us would read serif vs. sans-serif variants: I.e. unless we are 
designers or typographers, we don't actually consciously notice the difference.


I am a designer and typographer, and I’ve worked rather extensively with a 
variety of Deseret fonts for my publications. They have been well-received.


That's fine, and not disputed at all. That's exactly why I'm looking for 
input from other people.


As an analogy, assume we had a famous type designer coming to this list 
and request that we encode old-style digits separately from roman 
digits, e.g. arguing that this might simplify the production of fonts.


We would understand this request, but we would still deny it because 
based on our day-to-day use of digits, we would understand that at large 
(i.e. for the average user) the convenience of having only one code 
point for a given digit weights stronger than the convenience of 
separate code points for the type designer.


We are looking for similar input from "average users" for Deseret.



If that's the case, it would be utterly annoying to these actual users to have 
to make a distinction between two characters where there actually is none.


Actually neither of the ligature-letters are used in our Carrollian Deseret 
volumes.


Ok. That means that these don't provide any information on the 
discussion at hand (whether to unify or disunify the ligature shapes).




The richness of the history of the Deseret alphabet can still be preserved e.g. 
with different fonts the same way we have thousands of different fonts for 
Latin and many other scripts that show a lot of rich history.


You know, Martin, I *have* been doing this for the last two decades. I’m well 
aware of what a font is and can do.


Great. So you know that present-day font technology would allow us to 
handle the different shapes in at least any of the following ways:


1) Separate characters for separate shapes, both shapes in same font
2) Variant selectors, one or both shapes in same font
3) Font features (e.g. 1855 vs. 1859) to select shapes in the same font
4) Font selection, different fonts for different shapes

Does that knowledge in any way suggest one particular solution?



I’m also aware of what principles we have used for determining character 
identity.


Which, as we have been working out in other mails, are indeed a 
collection of principles, one of which is history of shape derivation.




I saw your note about CJK. Unification there typically has something to do with 
character origin and similarity. The Deseret diphthong letters are clearly 
based on ligatures of *different* characters.


One of the principles of CJK unification is that minor differences are 
ignored if they are not semantically relevant. For CJK, 'minor' is 
important, because otherwise, many users wouldn't be able to recognize 
the shapes as having the same semantics/usage.


The qualification 'minor' is less important for an alphabet. In general, 
the more established and well-known an alphabet is, the wider the 
variations of glyph shapes that may be tolerated. The question I'm 
trying to get an answer for for Deseret is whether current actual script 
users see the shape variation as just substitutable glyphs of the same 
letter, or inherently different letters.


The answer to this question is not the *only* criterion for deciding 
whether to encode further Deseret letters, but I t

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-27 Thread James Kass

Martin J. Dürst responded to Michael Everson,

> Unfortunately, much of what you wrote gave me the
> impression that you may think that historical origin
> is the only criterion, or a criterion that trumps all
> others. If you don't think so, it would be good if you
> could confirm this. If you think so, it would be good
> to know why.

Historical origin is always a good starting point.

The importance of history cannot be overstated.  Without it, the other
criteria would not exist.

Historical origin wouldn't override evidence of contrasting use in
this case because such evidence would be "icing on the cake".

> ... I have mentioned "usability for average users in
> average contexts" and "contrasting use" as criteria,
> and I have also in earlier mail acknowledged history
> as a (not the) criterion, and have mentioned legacy/
> roundtrip issues. I'm sure there are others.

Adding a few historic letters should seldom have any effect on
"usability for average users in average contexts".  Whether it does in
this case remains to be seen.

Legacy and roundtrip issues are important because
backwards-compatibility supports history.  Concerns in this case
appear to be hypothetical.

Best regards,

James Kass

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-27 Thread James Kass

Martin J. Dürst responded to Michael Everson,

> Overall, we may have up to four variants, of which
> three are currently explicitly supported in Unicode.

Yes.

> Are all of these used as spelling variants?

Is there another possible use?

> Is the choice of variant up to the author (for which
> variants), or is it the editor or printer who makes
> the choice (for which variants)?

The author, see below.

> And what informs this choice?

Personal preference and/or spelling reform as well as whether the
material was machine printed or hand written.

> If we have any historic metal types, are there
> examples where a font contains both ligature
> variants?

Apparently not.

John H. Jenkins mentioned early in this thread that these ligatures
weren't used in printed materials and were not part of the official
Deseret set.  They were only used in manuscript.

Best regards,

James Kass

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-27 Thread Martin J. Dürst

On 2017/03/27 01:20, Michael Everson wrote:

On 26 Mar 2017, at 16:45, Asmus Freytag wrote:

"apparently", maybe. Let's for a moment leave aside the radicals
themselves, which are to a large extent artificial constructs. Let's
look at the actual characters with these radicals (e.g. U+6709,... for
MOON and U+808A,... for MEAT), in the multi-column code charts of ISO
10646. There are some exceptions, but in most cases, the G/J/K columns
show no difference (i.e. always the ⺝ shape, with two horizontal bars),
whereas the H/T/V columns show the ⺼ shape (two downwards slanted bars)
for the "MEAT" radical and the ⺝ shape for the moon radical. So whether
these radicals have identical glyphs depends on typographic
tradition/font/... In Japan, many people may be rather unaware of the
difference, whereas in Taiwan, it may be that school children get
drilled on the difference.

One practical consequence of changing the chart glyphs now, for instance, would
be that it would invalidate every existing Deseret font. Adding new characters
would not.

Independent of whether the chart glyphs get changed, couldn't we just
add a note "also # in some fonts" (where # is the other variant). That
would make sure that nobody could claim "this font is wrong" based on
the charts. (Even if a general claim that the chart glyphs aren't
normative applies to all charts anyway.)

In fact, it would seem that if a Deseret text was encoded in one of the two
systems, changing to a different font would have the attractive property of
preserving the content of the text (while not preserving the appearance).

Well, yes, rejected many times in cases where that was appropriate. But
also accepted many times, in cases that we may not even remember,
because they may not even have been made explicitly. Because in such
cases, the focus may not be on a change to one or a few letter shapes,
but the focus may be on a change of the overall style, which induces a
change of letter shape in some letters. The roman/italic a/ɑ and g/ɡ
distinctions (the later code points only used to show the distinction in
plain text, which could as well be done descriptively), as well as a
large number of distinctions in Han fonts, come to my mind. I'm quite
sure other scripts have similar phenomena.

This, in a nutshell, is the criterion for making something a font difference
vs. an encoding distinction.

Character identity is not defined by any single criterion. Moreover, in
Deseret, it is not the case that all texts which contain the diphthong /juː/ or
/ɔɪ/ write it using EW 𐐧 or OI 𐐦. Many write them as Y + U 𐐏𐐋 and O + I 𐐄𐐆. So
the choice is one of *spelling*, and spelling has always been a primary
criterion for such decisions.

This is interesting information. You are saying that in actual practice,
there is a choice between writing 𐐄𐐆 (two letters for a diphthong) and
writing 𐐧. In the same location, is 𐐆𐐋 (the base for the historically
later shape variant of 𐐧; please note that this may actually be written
𐐋𐐆; there's some inconsistency in order between the above cited
sentence and the text below copied from an earlier mail) also used as a
spelling variant? Overall, we may have up to four variants, of which
three are currently explicitly supported in Unicode. Are all of these
used as spelling variants? Is the choice of variant up to the author
(for which variants), or is it the editor or printer who makes the
choice (for which variants)? And what informs this choice? If we have
any historic metal types, are there examples where a font contains both
ligature variants?

(Please note that because 𐐄, 𐐆, and 𐐋 are available as individual
letters, it's very difficult to think about the two-letter sequences as
anything else than spellings, but that doesn't necessarily carry over to
the ligatures.)

And then the same questions, with parallel (or not parallel) answers,
for ɒɪ/ɔɪ/𐐦.

Regards,Martin.

Text copied from earlier mail by Michael:

1. The 1855 glyph for 𐐧 EW is evidently a ligature of the glyph for the
diagonal stroke of the glyph for 𐐆 SHORT I [ɪ] and 𐐅 LONG OO [uː],
that is, [ɪ] + [oː] = [ɪuː], that is, [ju].

2. The 1855 glyph for 𐐦 OI is evidently a ligature of the glyph for 𐐉
SHORT AH [ɒ] and the diagonal stroke of the glyph for 𐐆 SHORT I [ɪ],
that is, [ɒ] + [ɪ] = [ɒɪ], that is, [ɔɪ].

That’s encoded. Now evidently, the glyphs f

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Martin J. Dürst


On 2017/03/26 22:15, Michael Everson wrote:



On 26 Mar 2017, at 09:12, Martin J. Dürst  wrote:


Thats a good point: any disunification requires showing examples of
contrasting uses.


Fully agreed.


The default position is NOT “everything is encoded unified until disunified”.


Neither it's "everything is encoded separately unless it's unified".



The characters in question have different and undisputed origins, undisputed.


If you change that to the somewhat more neutral "the shapes in question 
have different and undisputed origins", then I'm with you. I actually 
have said as much (in different words) in an earlier post.




We’ve encoded one pair; evidently this pair was deprecated and another pair was 
devised. The letters wynn and w are also used for the same thing. They too have 
different origins and are encoded separately. The letters yogh and ezh have 
different origins and are encoded separately. (These are not perfect analogies, 
but they are pertinent.)


Fine. I (and others) have also given quite a few analogies, none of them 
perfect, but most if not all of them pertinent.




We haven't yet heard of any contrasting uses for the letter shapes we are 
discussing.


Contrasting use is NOT the only criterion we apply when establishing the 
characterhood of characters.


Sorry, but where did I say that it's the only criterion? I don't think 
it's the only criterion. On the other hand, I also don't think that 
historical origin is or should be the only criterion.


Unfortunately, much of what you wrote gave me the impression that you 
may think that historical origin is the only criterion, or a criterion 
that trumps all others. If you don't think so, it would be good if you 
could confirm this. If you think so, it would be good to know why.




Please try to remember that. (It’s a bit shocking to have to remind people of 
this.


You don't have to remind me, at least. I have mentioned "usability for 
average users in average contexts" and "contrasting use" as criteria, 
and I have also in earlier mail acknowledged history as a (not the) 
criterion, and have mentioned legacy/roundtrip issues. I'm sure there 
are others.



Regards,   Martin.

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread James Kass

Asmus Freytag wrote,

> In the current case, you have the opposite,
> to wit, the text elements are unchanged, but
> you would like to add alternate code elements
> to represent what are, ultimately, the same
> text elements. That's not disunification, but
> dual encoding.

If spelling a word with an x+y string versus a z+y string represents
two different spellings of the same word, then hand printing the same
word with either an x/y ligature versus a z/y ligature also represents
two different spellings of the same word.

Best regards,

James Kass

Re: Standaridized variation sequences for the Desert alphabet?


On 3/26/2017 9:23 AM, Michael Everson wrote:

On 26 Mar 2017, at 17:02, Asmus Freytag  wrote:

On 3/26/2017 6:18 AM, Michael Everson wrote:


In any case it’s not a disunification. Some characters are encoded; they were 
used to write diphthongs in 1855. These characters were abandoned by 1859, and 
other characters were devised.

Calling them "characters" is pre-judging the issue, don't you think?

No, I don’t think so.


I really think it is.



We know that these are different shapes, but that they stand for the same text 
elements.

No, they don’t. Those diphthongs can also be represented in other ways in 
Deseret.


Having alternative ways to represent these doesn't invalidate or affect 
my argument.


I’ve never accepted the view that “everything is already encoded and everything 
new is a disunification” which seems to be a pretty common view.


I would not say I aspire to the view you quote.

If you encode a certain shape, it may get used for a range of text 
elements. This would (de facto) encode these text elements via that 
shape. If it is later felt that the given shape should not be used for 
the full range of text elements, then you could say that the "implicit" 
unification based on the usage (or, if you will, "fallback usage") was 
mistaken and should be better handled by two (or more) shapes. This 
represents a "de-facto" disunification.


However, where I part from your description is the "everything is 
already encoded". That would not be the case anywhere a range of text 
elements cannot be represented at all. Your statement also implies a 
"correctly encoded" or "successfully encoded" which is different from 
"there's an encoding that some people use as a fallback", which, if 
disunification should prove proper later on, would be a better way of 
describing what was the original situation.


Perhaps the point is subtle, but it is important.

In the current case, you have the opposite, to wit, the text elements 
are unchanged, but you would like to add alternate code elements to 
represent what are, ultimately, the same text elements. That's not 
disunification, but dual encoding.


A./

Re: Standaridized variation sequences for the Desert alphabet?


  
  
On 3/26/2017 1:51 PM, Michael Everson
  wrote:


  
Finally, if this was in major, modern use, adding these code points would have grave consequences for security.

  
  Why? They’re not visually similar to the existing characters. So spoofing wouldn’t be an issue. 

Spoofing would absolutely be an issue,
because if there are free alternates users will mis-remember
which one was used for a given label. Goes for the whole
simplified / traditional issue in the Han script.
Issues are not limited to visual similarity.
A./

Re: Standaridized variation sequences for the Desert alphabet?


On 3/26/2017 9:20 AM, Michael Everson wrote:

On 26 Mar 2017, at 16:45, Asmus Freytag  wrote:

The priority in encoding has to be with allowing distinctions in modern texts, 
or distinctions that matter to modern users of historic writing systems. Beyond 
that, theoretical analysis of typographical evolution can give some interesting 
insight, but I would be in the camp that does not accord them a status as 
primary rationale for encoding decisions.

Our rationales are NOT ranked in the way you suggest. A variety of criteria are 
applied.


And the way you weigh the criteria?



Thus, critical need for contrasting use of the glyph distinctions would have to 
be established before it makes sense to discuss this further.

Precedent for such needs is well-established. Consider the Latin Extended-D 
block. Sometimes it is editorial preference, and that’s not even always 
universal.


I think the Latin Extended-D block may have its own problems.

However, Latin as a script caters to so many varied levels of users, 
from ordinary text to scholarly notations that it really cannot be used 
to settle this issue.



I see no principled objection to having a font choice result in a noticeable or 
structural glyph variation for only a few elements of an alphabet. We have 
handle-a vs. bowl-a as well as hook-g vs. loop-g in Latin, and fonts routinely 
select one or the other.

Well, Asmus, we encode a and ɑ as well as g and ɡ and ᵹ.
And we do that for reasons that are very different from preserving the 
early and possibly transient history of a minor script.

And we do not consider ɑ and ɡ and ᵹ to be things that ought to be 
distinguished by variation selectors. (I am of course well aware of IPA usage.)
Yes, and the absence of such usage in the current example makes all the 
difference.

Whole-font switching is well understood. But character origin has always been 
taken into account. Consider 2EBC ⺼ CJK RADICAL MEAT and 2E9D ⺝ CJK RADICAL 
MOON which are apparently really supposed to have identical glyphs, though we 
use an old-fashioned style in the charts for the former. (Yes, I am of course 
aware that there are other reasons for distinguishing these, but as far as 
glyphs go, even our standard distinguishes them artificially.)
Apparently not only in the standard, because they show as different in 
the plaintext view of this message.



(It is only for usage outside normal text that the distinction between these 
forms matters).

What’s “normal” text? “Normal” text in Latin probably doesn’t use the 
characters from the Latin Extended-D block.

"ordinary" text, if you like, reflecting standard orthographies.

As opposed to notational systems.



While the Deseret forms are motivated by their pronunciation, I'm not 
necessarily convinced that the distinction has any practical significance that 
is in any way different than similar differences in derivation (e.g. for long 
s-s or long-s-z for German esszett).

One practical consequence of changing the chart glyphs now, for instance, would 
be that it would invalidate every existing Deseret font. Adding new characters 
would not.
No, if we state that both glyphs are alternates for the same character 
*and if we decide, to _not_ add variation selectors* the choice is where 
it belongs: with the font maker.



In fact, it would seem that if a Deseret text was encoded in one of the two 
systems, changing to a different font would have the attractive property of 
preserving the content of the text (while not preserving the appearance).

Changing to a different font in order to change one or two glyphs is a 
mechanism that we have actually rejected many times in the past. We have 
encoded variant and alternate characters for many scripts.
If the underlying text element is the same, font switching can be the 
correct choice.



This, in a nutshell, is the criterion for making something a font difference 
vs. an encoding distinction.

Character identity is not defined by any single criterion.

Make it the "primary" criterion then.

  Moreover, in Deseret, it is not the case that all texts which contain the 
diphthong /juː/ or /ɔɪ/ write it using EW 𐐧 or OI 𐐦. Many write them as Y + U 
𐐏𐐋 and O + I 𐐄𐐆. So the choice is one of *spelling*, and spelling has always 
been a primary criterion for such decisions.

Yes, and those other spellings are not affected.



This is complicated by combining characters mostly identified by glyph, and the 
fact that while ä and aͤ may be the same character across time, there are 
people wanting to distinguish them in the same text today, and in both cases
 the theoretical falls to the practical. In this case, there are no 
combining character issues and there's nobody needing to use the two forms in 
the same text.

huh?

He’s wrong there, as I pointed out. A text in German may write an older 
Clavieruͤbung in a citation alongside the normal spelling Klavierübung. The 
choice of spelling is key.
That would have to be a very special

Re: Standaridized variation sequences for the Desert alphabet?

On 26 Mar 2017, at 21:48, Richard Wordingham  
wrote:

>> Come on, Doug. The letter W is a ligature of V and V. But sure, the glyphs 
>> are only informative, so why don’t we use an OO ligature= instead?
> 
> A script-stlye font might legitimately use a glyph that looks like a small 
> omega for U+0077 LATIN SMALL LETTER W.

As I said to Asmus, my analogy was about ligatures made from underlying 
letters. Yours doesn’t apply because it’s just talking about glyph shapes. 

> Small omega, of course, is an οο ligature.

True. :-) Isn’t history wonderful?

> More to the point, a font may legitimately use the same glyphs for U+0067 
> LATIN SMALL LETTER G and U+0261 LATIN SMALL LETTER SCRIPT G.

A good font will still find a way to distinguish them. :-) 

> A more serious issue is the multiple forms of U+014A LATIN CAPITAL LETTER 
> ENG, for which the underlying unity comes from their being the capital form 
> of U+014B LATIN SMALL LETTER ENG.

We could have, and should have, solved this problem *long ago* by encoding 
LATIN CAPITAL LETTER AFRICAN ENG and LATIN SMALL LETTER AFRICAN ENG. 

> Are there not serious divergences with the shapes of the Syriac letters?

That is analogous to Roman/Gaelic/Fraktur. That analogy doesn’t apply to these 
Deseret characters; it’s not a whole-script gestalt. 

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

On 26 Mar 2017, at 21:39, Asmus Freytag  wrote:

>> Come on, Doug. The letter W is a ligature of V and V. But sure, the glyphs 
>> are only informative, so why don’t we use an OO ligature instead?
> 
> If there was a tradition of writing W like omega, then switching the chart 
> glyphs to that alternative tradition would be something that is at least not 
> inconceivable -- even if perhaps not advisable.

You know, Asmus, no analogy is perfect. But mine was a discussion of letters 
derived from ligatures, and yours is just a random note about shape. 

> For letters, their primary identity is not given by their shape, but their 
> position / function in the alphabet.

This isn’t really something you can turn into an axiom, much as you would like 
to. Position in the alphabet may very WIDELY from language to language. As can 
function. The Latin letter c can mean /k s tʃ ts ʔ ʃ θ/… 

> That's why making Gaelic style and Fraktur a font switch works at all, even 
> if that is not perfect (viz, ligatures in Fraktur).

Font style isn’t the same thing in this context. The historical letters used to 
make the 1855 ligatures are *different* letters than those used for the 1859 
ligatures. 

> In the Deseret case, making this alternation a font choice would tend to 
> preserve the content of all documents.

No, since it’s a question of *spelling*. Some documents use a ligature-letter 
for the diphthong /juː/. Some documents use two separate letters for the same 
diphthong. So there’s no “standardized” spelling that works for all text that 
would be affected here. (Spelling for English wasn’t standardized anyway in 
historical Deseret texts and there is much variety.)

> Making this an encoding difference would indeed invalidate some documents.

Right now the 1859 characters aren’t representable. Deciding to change the 
chart glyphs to 1859 glyphs would just destabilize EVERY current Deseret font. 
That’s not something we should do. 

> Finally, if this was in major, modern use, adding these code points would 
> have grave consequences for security.

Why? They’re not visually similar to the existing characters. So spoofing 
wouldn’t be an issue. 

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Richard Wordingham

On Sun, 26 Mar 2017 18:33:00 +0100
Michael Everson  wrote:

> On 26 Mar 2017, at 18:20, Doug Ewell  wrote:

> > Michael Everson wrote:

> >> One practical consequence of changing the chart glyphs now, for
> >> instance, would be that it would invalidate every existing Deseret
> >> font. Adding new characters would not.  

> > I thought the chart glyphs were not normative.  

> Come on, Doug. The letter W is a ligature of V and V. But sure, the
> glyphs are only informative, so why don’t we use an OO ligature
> instead?

A script-stlye font might legitimately use a glyph that looks like a
small omega for U+0077 LATIN SMALL LETTER W.  Small omega, of course,
is an οο ligature.  More to the point, a font may legitimately use the
same glyphs for U+0067 LATIN SMALL LETTER G and U+0261 LATIN SMALL
LETTER SCRIPT G.

A more serious issue is the multiple forms of U+014A LATIN CAPITAL
LETTER ENG, for which the underlying unity comes from their being the
capital form of U+014B LATIN SMALL LETTER ENG.

Are there not serious divergences with the shapes of the Syriac letters?

Richard.

Re: Standaridized variation sequences for the Desert alphabet?


On 3/26/2017 10:33 AM, Michael Everson wrote:

On 26 Mar 2017, at 18:20, Doug Ewell  wrote:

Michael Everson wrote:


One practical consequence of changing the chart glyphs now, for instance, would 
be that it would invalidate every existing Deseret font. Adding new characters 
would not.

I thought the chart glyphs were not normative.

Come on, Doug. The letter W is a ligature of V and V. But sure, the glyphs are 
only informative, so why don’t we use an OO ligature instead?


If there was a tradition of writing W like omega, then switching the 
chart glyphs to that alternative tradition would be something that is at 
least not inconceivable -- even if perhaps not advisable.


For letters, their primary identity is not given by their shape, but 
their position / function in the alphabet.


That's why making Gaelic style and Fraktur a font switch works at all, 
even if that is not perfect (viz, ligatures in Fraktur).


In the Deseret case, making this alternation a font choice would tend to 
preserve the content of all documents. Making this an encoding 
difference would indeed invalidate some documents.


Finally, if this was in major, modern use, adding these code points 
would have grave consequences for security.


A./

Re: Standaridized variation sequences for the Desert alphabet?

On 26 Mar 2017, at 18:20, Doug Ewell  wrote:
> 
> Michael Everson wrote:
> 
>> One practical consequence of changing the chart glyphs now, for instance, 
>> would be that it would invalidate every existing Deseret font. Adding new 
>> characters would not.
> 
> I thought the chart glyphs were not normative.

Come on, Doug. The letter W is a ligature of V and V. But sure, the glyphs are 
only informative, so why don’t we use an OO ligature instead?

Michael.

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Doug Ewell


Michael Everson wrote:


One practical consequence of changing the chart glyphs now, for
instance, would be that it would invalidate every existing Deseret
font. Adding new characters would not.


I thought the chart glyphs were not normative.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)

2017-03-26 Thread Doug Ewell


Philippe Verdy wrote:


Or may be, only for historic texts, we could add a combining lowercase
e as an alternative to the existing diaeresis.


Something like U+0364 COMBINING LATIN SMALL LETTER E, maybe?

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Standaridized variation sequences for the Desert alphabet?

On 26 Mar 2017, at 17:02, Asmus Freytag  wrote:
> 
> On 3/26/2017 6:18 AM, Michael Everson wrote:
> 
>> In any case it’s not a disunification. Some characters are encoded; they 
>> were used to write diphthongs in 1855. These characters were abandoned by 
>> 1859, and other characters were devised.
> 
> Calling them "characters" is pre-judging the issue, don't you think?

No, I don’t think so.

> We know that these are different shapes, but that they stand for the same 
> text elements.

No, they don’t. Those diphthongs can also be represented in other ways in 
Deseret.

I’ve never accepted the view that “everything is already encoded and everything 
new is a disunification” which seems to be a pretty common view. 

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?


> On 26 Mar 2017, at 16:59, Asmus Freytag  wrote:
> 
> On 3/26/2017 8:47 AM, Michael Everson wrote:
>>> On 26 Mar 2017, at 16:45, Asmus Freytag  wrote:
>>> 
>>> The latter is patent nonsense, because ä and aͤ are even less related to 
>>> each other than "i" and "j"; never mind the fact that their forms are both 
>>> based on the letter "a". Encoding and font choice should be seen as 
>>> separate.
>> He refers to the shape of the diacritical marks.
> 
> I see the issue: the font selected on my end made the "e" look like an "o", 
> which completely changed my understanding of what he tried to communicate.

Ah, yes.

M

Re: Standaridized variation sequences for the Desert alphabet?

On 26 Mar 2017, at 16:45, Asmus Freytag  wrote:
> 
> The priority in encoding has to be with allowing distinctions in modern 
> texts, or distinctions that matter to modern users of historic writing 
> systems. Beyond that, theoretical analysis of typographical evolution can 
> give some interesting insight, but I would be in the camp that does not 
> accord them a status as primary rationale for encoding decisions.

Our rationales are NOT ranked in the way you suggest. A variety of criteria are 
applied. 

> Thus, critical need for contrasting use of the glyph distinctions would have 
> to be established before it makes sense to discuss this further.

Precedent for such needs is well-established. Consider the Latin Extended-D 
block. Sometimes it is editorial preference, and that’s not even always 
universal. 

> I see no principled objection to having a font choice result in a noticeable 
> or structural glyph variation for only a few elements of an alphabet. We have 
> handle-a vs. bowl-a as well as hook-g vs. loop-g in Latin, and fonts 
> routinely select one or the other.

Well, Asmus, we encode a and ɑ as well as g and ɡ and ᵹ. And we do not consider 
ɑ and ɡ and ᵹ to be things that ought to be distinguished by variation 
selectors. (I am of course well aware of IPA usage.) Whole-font switching is 
well understood. But character origin has always been taken into account. 
Consider 2EBC ⺼ CJK RADICAL MEAT and 2E9D ⺝ CJK RADICAL MOON which are 
apparently really supposed to have identical glyphs, though we use an 
old-fashioned style in the charts for the former. (Yes, I am of course aware 
that there are other reasons for distinguishing these, but as far as glyphs go, 
even our standard distinguishes them artificially.)

> (It is only for usage outside normal text that the distinction between these 
> forms matters). 

What’s “normal” text? “Normal” text in Latin probably doesn’t use the 
characters from the Latin Extended-D block. 

> While the Deseret forms are motivated by their pronunciation, I'm not 
> necessarily convinced that the distinction has any practical significance 
> that is in any way different than similar differences in derivation (e.g. for 
> long s-s or long-s-z for German esszett).

One practical consequence of changing the chart glyphs now, for instance, would 
be that it would invalidate every existing Deseret font. Adding new characters 
would not. 

> In fact, it would seem that if a Deseret text was encoded in one of the two 
> systems, changing to a different font would have the attractive property of 
> preserving the content of the text (while not preserving the appearance). 

Changing to a different font in order to change one or two glyphs is a 
mechanism that we have actually rejected many times in the past. We have 
encoded variant and alternate characters for many scripts. 

> This, in a nutshell, is the criterion for making something a font difference 
> vs. an encoding distinction.

Character identity is not defined by any single criterion. Moreover, in 
Deseret, it is not the case that all texts which contain the diphthong /juː/ or 
/ɔɪ/ write it using EW 𐐧 or OI 𐐦. Many write them as Y + U 𐐏𐐋 and O + I 𐐄𐐆. So 
the choice is one of *spelling*, and spelling has always been a primary 
criterion for such decisions. 

>> This is complicated by combining characters mostly identified by glyph, and 
>> the fact that while ä and aͤ may be the same character across time, there 
>> are people wanting to distinguish them in the same text today, and in both 
>> cases the theoretical falls to the practical. In this case, 
>> there are no combining character issues and there's nobody needing to use 
>> the two forms in the same text. 
> 
> huh?

He’s wrong there, as I pointed out. A text in German may write an older 
Clavieruͤbung in a citation alongside the normal spelling Klavierübung. The 
choice of spelling is key.

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?


On 3/26/2017 6:18 AM, Michael Everson wrote:

On 26 Mar 2017, at 10:07, Erkki I Kolehmainen  wrote:

I tend to agree with Martin, Philippe and others in questioning the 
disunification.

You may, but you give no evidence or discussion about it, so...

In any case it’s not a disunification. Some characters are encoded; they were 
used to write diphthongs in 1855. These characters were abandoned by 1859, and 
other characters were devised.


Calling them "characters" is pre-judging the issue, don't you think?

We know that these are different shapes, but that they stand for the 
same text elements.


A./


The origin of all of the characters as ligatures of other characters isn’t 
questioned. The right thing to do is to add the missing characters, not to 
invalidate any font that uses the 1855 characters by claiming that the 1855 and 
1859 characters are “the same”.

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?


On 3/26/2017 8:47 AM, Michael Everson wrote:

On 26 Mar 2017, at 16:45, Asmus Freytag  wrote:

The latter is patent nonsense, because ä and aͤ are even less related to each other than "i" and 
"j"; never mind the fact that their forms are both based on the letter "a". Encoding and 
font choice should be seen as separate.

He refers to the shape of the diacritical marks.


I see the issue: the font selected on my end made the "e" look like an 
"o", which completely changed my understanding of what he tried to 
communicate.


A./


Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?


> On 26 Mar 2017, at 16:45, Asmus Freytag  wrote:
> 
> The latter is patent nonsense, because ä and aͤ are even less related to each 
> other than "i" and "j"; never mind the fact that their forms are both based 
> on the letter "a". Encoding and font choice should be seen as separate.

He refers to the shape of the diacritical marks. 

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?


  
  
On 3/25/2017 3:15 PM, David Starner
  wrote:


  

  On Fri, Mar 24, 2017 at 9:17 AM Michael Everson

wrote:
  
  
And we *can* distinguish i and j in that Latin text, because
we have separate characters encoded for it. And we *have*
encoded many other Latin ligature-based letters and sigla of
various kinds for the representation of medieval European
texts. Indeed, that’s just a stronger argument for
distinguishing the ligature-based letters for Deseret, I
think.
  
  
  
  And I'd argue that a good theoretical model of the Latin
script makes ä, ꞛ and aͤ the same character, distinguished
only by the font. 

  


The latter is patent nonsense, because ä and aͤ are even less
related to each other than "i" and "j"; never mind the fact that
their forms are both based on the letter "a". Encoding and font
choice should be seen as separate.

The priority in encoding has to be with allowing distinctions in
modern texts, or distinctions that matter to modern users of
historic writing systems. Beyond that, theoretical analysis of
typographical evolution can give some interesting insight, but I
would be in the camp that does not accord them a status as primary
rationale for encoding decisions.

Thus, critical need for contrasting use of the glyph distinctions
would have to be established before it makes sense to discuss this
further. 

I see no principled objection to having a font choice result in a
noticeable or structural glyph variation for only a few elements of
an alphabet. We have handle-a vs. bowl-a as well as hook-g vs.
loop-g in Latin, and fonts routinely select one or the other. (It is
only for usage outside normal text that the distinction between
these forms matters). 

While the Deseret forms are motivated by their pronunciation, I'm
not necessarily convinced that the distinction has any practical
significance that is in any way different than similar differences
in derivation (e.g. for long s-s or long-s-z for German esszett). 

In fact, it would seem that if a Deseret text was encoded in one of
the two systems, changing to a different font would have the
attractive property of preserving the content of the text (while not
preserving the appearance). This, in a nutshell, is the criterion
for making something a font difference vs. an encoding distinction.

A./


PS:

  

  This is complicated by combining characters mostly
identified by glyph, and the fact that while ä and aͤ may be
the same character across time, there are people wanting to
distinguish them in the same text today, and in both cases
the theoretical falls to the practical. In this case, there
are no combining character issues and there's nobody needing
to use the two forms in the same text. 
  

  

huh?

Re: Standaridized variation sequences for the Desert alphabet?

On 26 Mar 2017, at 14:32, David Starner  wrote:

>>> And I'd argue that a good theoretical model of the Latin script makes ä, ꞛ 
>>> and aͤ the same character, distinguished only by the font.
>> 
>> Fortunately for the users of our standard, we don’t do this.
> 
> You've yet to come up with users to whom these Deseret letters are relevant.

You might imagine it takes time to identify problems and address them. 

>> I’m fairly sure that a person citing a medieval document using aͤ may very 
>> well also need to write this alongside Swedish or German using ä.
> 
> I'm fairly sure that a person citing an early 20th century Germany document 
> may well feel the need to cite it in Fraktur.

Fraktur is a whole-font substitition (modulo the ligatures). This is not the 
same thing as an editor choosing w or ƿ. Imagine if we had unified those two. 
After all, they both represent the same sound, right?

(Shudder.)

> In both cases, I believe that's going above and beyond the identity of the 
> characters involved, but in your case, people do contrast the aͤ with ä, and 
> the user case has been made. Show me the users who want to use these Deseret 
> letters contrastingly.

Do try to be less dismissive. Firstly, *I* have published entire books in 
Deseret and so I myself have a legitimate interest. In the second, Iam in fact 
beginning discussions with relevant experts.

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread David Starner

On Sun, Mar 26, 2017 at 6:12 AM Michael Everson 
wrote:

> On 25 Mar 2017, at 22:15, David Starner  wrote:
> >
> > And I'd argue that a good theoretical model of the Latin script makes ä,
> ꞛ and aͤ the same character, distinguished only by the font.
>
> Fortunately for the users of our standard, we don’t do this.
>

You've yet to come up with users to whom these Deseret letters are relevant.

I’m fairly sure that a person citing a medieval document using aͤ may very
> well also need to write this alongside Swedish or German using ä.
>

I'm fairly sure that a person citing an early 20th century Germany document
may well feel the need to cite it in Fraktur. In both cases, I believe
that's going above and beyond the identity of the characters involved, but
in your case, people do contrast the aͤ with ä, and the user case has been
made. Show me the users who want to use these Deseret letters contrastingly.

Re: Standaridized variation sequences for the Desert alphabet?

On 26 Mar 2017, at 10:07, Erkki I Kolehmainen  wrote:
> 
> I tend to agree with Martin, Philippe and others in questioning the 
> disunification.

You may, but you give no evidence or discussion about it, so...

In any case it’s not a disunification. Some characters are encoded; they were 
used to write diphthongs in 1855. These characters were abandoned by 1859, and 
other characters were devised. The origin of all of the characters as ligatures 
of other characters isn’t questioned. The right thing to do is to add the 
missing characters, not to invalidate any font that uses the 1855 characters by 
claiming that the 1855 and 1859 characters are “the same”. 

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

> On 26 Mar 2017, at 09:12, Martin J. Dürst  wrote:
> 
>> Thats a good point: any disunification requires showing examples of
>> contrasting uses.
> 
> Fully agreed.

The default position is NOT “everything is encoded unified until disunified”. 
The characters in question have different and undisputed origins, undisputed. 
We’ve encoded one pair; evidently this pair was deprecated and another pair was 
devised. The letters wynn and w are also used for the same thing. They too have 
different origins and are encoded separately. The letters yogh and ezh have 
different origins and are encoded separately. (These are not perfect analogies, 
but they are pertinent.)

> We haven't yet heard of any contrasting uses for the letter shapes we are 
> discussing.

Contrasting use is NOT the only criterion we apply when establishing the 
characterhood of characters. Please try to remember that. (It’s a bit shocking 
to have to remind people of this. 

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

On 25 Mar 2017, at 22:15, David Starner  wrote:
> 
> And I'd argue that a good theoretical model of the Latin script makes ä, ꞛ 
> and aͤ the same character, distinguished only by the font. 

Fortunately for the users of our standard, we don’t do this. 

> This is complicated by combining characters mostly identified by glyph, and 
> the fact that while ä and aͤ may be the same character across time, there are 
> people wanting to distinguish them in the same text today, and in both cases 
> the theoretical falls to the practical. In this case, there are no combining 
> character issues and there's nobody needing to use the two forms in the same 
> text. 

I’m fairly sure that a person citing a medieval document using aͤ may very well 
also need to write this alongside Swedish or German using ä. 

Michael Everson

Re: Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)

2017-03-26 Thread Martin J. Dürst


On 2017/03/25 03:33, Doug Ewell wrote:

Philippe Verdy wrote:


But Unicode just prefered to keep the roundtrip compatiblity with
earlier 8-bit encodings (including existing ISO 8859 and DIN
standards) so that "ü" in German and French also have the same
canonical decomposition even if the diacritic is a diaeresis in French
and an umlaut in German, with different semantics and origins.


Was this only about compatibility, or perhaps also that the two signs
look identical and that disunifying them would have caused endless
confusion and misuse among users?


I'm not sure to what extent this was explicitly discussed when Unicode 
was created. The fact that the first 256 code points are identical to 
those in ISO-8859-1 was used as a big selling point when Unicode was 
first introduced. It may well have been that for Unicode, there was no 
discussion at all in this area, because ISO-8859-1 was already so well 
established.


And for ISO-8859-1, space was an important concern. Ideally, both 
Islandic and Turkish (and the letters missed for French) would have been 
covered, but that wasn't possible. Disunifying diaeresis and umlaut 
would have been an unaffordable luxury.


The above reasons mask any inherent reasons for why diaeresis and umlaut 
would have been unified or not if the decision had been argued purely 
"on the merit". But having used both German and French, and e.g. looking 
at the situation in Switzerland, where it was important to be able to 
write both French and German on the same typewriter, I would definitely 
argue that disunifying them would have caused endless

confusion and errors among users.

Also, it was argued a few mails ago that diaeresis and umlaut don't look 
exactly the same. I remember well that when Apple introduced its first 
laser printers, there were widespread complaints that the fonts (was it 
Helvetica, Times Roman, and Palatino?) unified away the traditional 
differences in the cuts of these typefaces for different languages.


So to quite some extent, in the relevant period (i.e. 1970ies/80ies), 
the differences between diaeresis and umlaut may be due to design 
differences in the cuts for different languages (e.g. French and 
German). Nobody would have disunified some basic letters because they 
may have looked slightly different in cuts for different languages, and 
so people may also have been just fine with unifying diaeresis and 
umlaut. (German fonts e.g. may have contained a 'ë' for use e.g. with 
"Citroën", but the dots on that 'ë' will have been the same shape as 
'ä', 'ö', and 'ü' umlauts for design consistency, and the other way 
round for French).


Regards,   Martin.

VS: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Erkki I Kolehmainen

I tend to agree with Martin, Philippe and others in questioning the 
disunification.

Sincerely,
Erkki I. Kolehmainen

-Alkuperäinen viesti-
Lähettäjä: Unicode [mailto:unicode-boun...@unicode.org] Puolesta Martin J. Dürst
Lähetetty: 26. maaliskuuta 2017 11:12
Vastaanottaja: verd...@wanadoo.fr; David Starner
Kopio: Michael Everson; unicode Unicode Discussion
Aihe: Re: Standaridized variation sequences for the Desert alphabet?

On 2017/03/26 11:24, Philippe Verdy wrote:

> Thats a good point: any disunification requires showing examples of 
> contrasting uses.

Fully agreed. We haven't yet heard of any contrasting uses for the letter 
shapes we are discussing.

> Now depending on individual publications, authors would use one 
> character or the other according to their choice, and the encoding 
> will respect it. If we need further unification for matching texts in 
> the samer language across periods of time or authors, collation (UCA) 
> can provide help: this is already what it does in modern German with 
> the digram "ae" and the letter "ä" which are orthographic variants not 
> distinguished by the language but by authors' preference.

Well, in most cases, but not e.g. for names. Goethe is not spelled Göthe.

Regards,   Martin.

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Werner LEMBERG


> Well, in most cases, but not e.g. for names. Goethe is not spelled
> Göthe.

Have a look into `Grimmsches Wörterbuch' to see the opposite :-)


Werner

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-26 Thread Martin J. Dürst


On 2017/03/26 11:24, Philippe Verdy wrote:


Thats a good point: any disunification requires showing examples of
contrasting uses.


Fully agreed. We haven't yet heard of any contrasting uses for the 
letter shapes we are discussing.



Now depending on individual publications, authors would
use one character or the other according to their choice, and the encoding
will respect it. If we need further unification for matching texts in the
samer language across periods of time or authors, collation (UCA) can
provide help: this is already what it does in modern German with the digram
"ae" and the letter "ä" which are orthographic variants not distinguished
by the language but by authors' preference.


Well, in most cases, but not e.g. for names. Goethe is not spelled Göthe.

Regards,   Martin.

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-25 Thread Philippe Verdy

2017-03-25 23:15 GMT+01:00 David Starner :

> On Fri, Mar 24, 2017 at 9:17 AM Michael Everson 
> wrote:
>
>> And we *can* distinguish i and j in that Latin text, because we have
>> separate characters encoded for it. And we *have* encoded many other Latin
>> ligature-based letters and sigla of various kinds for the representation of
>> medieval European texts. Indeed, that’s just a stronger argument for
>> distinguishing the ligature-based letters for Deseret, I think.
>>
>
> And I'd argue that a good theoretical model of the Latin script makes ä, ꞛ
> and aͤ the same character, distinguished only by the font. This is
> complicated by combining characters mostly identified by glyph, and the
> fact that while ä and aͤ may be the same character across time, there are
> people wanting to distinguish them in the same text today, and in both
> cases the theoretical falls to the practical. In this case, there are no
> combining character issues and there's nobody needing to use the two forms
> in the same text.
>

Thats a good point: any disunification requires showing examples of
contrasting uses. Now depending on individual publications, authors would
use one character or the other according to their choice, and the encoding
will respect it. If we need further unification for matching texts in the
samer language across periods of time or authors, collation (UCA) can
provide help: this is already what it does in modern German with the digram
"ae" and the letter "ä" which are orthographic variants not distinguished
by the language but by authors' preference.

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-25 Thread David Starner

On Fri, Mar 24, 2017 at 9:17 AM Michael Everson 
wrote:

> And we *can* distinguish i and j in that Latin text, because we have
> separate characters encoded for it. And we *have* encoded many other Latin
> ligature-based letters and sigla of various kinds for the representation of
> medieval European texts. Indeed, that’s just a stronger argument for
> distinguishing the ligature-based letters for Deseret, I think.
>

And I'd argue that a good theoretical model of the Latin script makes ä, ꞛ
and aͤ the same character, distinguished only by the font. This is
complicated by combining characters mostly identified by glyph, and the
fact that while ä and aͤ may be the same character across time, there are
people wanting to distinguish them in the same text today, and in both
cases the theoretical falls to the practical. In this case, there are no
combining character issues and there's nobody needing to use the two forms
in the same text.

Re: Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)

2017-03-24 Thread Philippe Verdy

Given the history of characters and the initial desire to be forward
compatible with previous ISO standards, I am convinced that there was no
other choice than preserving the unification, otherwise it would have been
impossible to reliably remap the zillions documents and databases or
applications that were using ISO8859, and other related Windows, MacOS and
IBM codepages for OEMs or for EBCDIC. And with the developement of Internet
and the disire in both Unicode and ISO 10646 to leave the first page of
code points in the UCS and ISO8859-1 fully compatible code for code (and
the fact that there was no variant of ISO8859-1 standardized for Germany,
Switzerland, Austria, Belgium and Luxembourg, that did not request it
(causing nightmares notably in the last three countries, and a lot of
legacy softwares on Windows and MacOS needing such bijective mapping;
finally the Unicode Consortium initially was developed separately from the
IUSO standard and merged later, and at that time, Microsofot and IBM were
the most active members and did not want to introduce incompatibilities and
causing troubles for other vendors).
Later there was a clear statement to keep the basic character properties,
stable, and it became impossisble to change the canonical equivalences
(after the bad experience found when mlerging efforts between Unicode and
ISO notably for enconding Hangul, and a strong initial resistance by China
that wanted to develop its own GB standard).
Encoding stability is now a rule that will be extremely hard to break.

Note: umlauts and diaeresis have not always looked the same, confusion
started lately between both during the middle of the 20th century and the
starting development of computing. It would have been impossible to reach a
large adoption of the UCS without such compromizes (and it took additional
years after both projects joined their efforts, before ISO finally closed
its working group on legacy 8-bit character sets, and stopped accepting any
new variants; ISO 8859-15 was one of the last failed attempt to standardize
a new 8-bit encoding, that finally almost nobody really used as they no
longer needed it; China resigned as well and finalized the roundtrip
mapping of its GB 18030 competing encoding with the UCS, so mappings for GB
18030 no longer needs new updates: any new encoding in the UCS is
immediately encoded as well in GB without modifying any line of code or
data, and any software or document compatiblle with the UCS should be
imediately compatible with the GB 18030 standard required in PR China; I
don't know if Hong Kong authorities made the same statement for its HKCS
standard before it reunified with China, or if Taiwan made a similar
decision; however Japan is adding new encodings in its JIS standard, pushed
by national vendors, and the UCS still has delays for accepting these
additions and not all is accepted, but in this area, there's a local
subcommity constantly negociating with Asian vendors and reporting its
efforts to Unicode and ISO).

About umlauts and diaeresis I'm not sure they were always looking the same.
If we try to encode old German, Hungarian or Czech texts, we may find some
discrepencies or ambiguities (but there's still no mechanism to distinguish
when an umlaut is really desired and a diaeresis is destired instead if
they don't look the same in historic script variants). We cannot encode
these using "variants" but possibly we may be using some combining controls
such as CGJ (encoded after the precombined letter or after the base
letter+diaresis, because of canonical equivalences it cannot be in the
middle). Or may be, only for historic texts, we could add a combining
lowercase e as an alternative to the existing diaeresis.

2017-03-24 19:33 GMT+01:00 Doug Ewell :

> Philippe Verdy wrote:
>
> > But Unicode just prefered to keep the roundtrip compatiblity with
> > earlier 8-bit encodings (including existing ISO 8859 and DIN
> > standards) so that "ü" in German and French also have the same
> > canonical decomposition even if the diacritic is a diaeresis in French
> > and an umlaut in German, with different semantics and origins.
>
> Was this only about compatibility, or perhaps also that the two signs
> look identical and that disunifying them would have caused endless
> confusion and misuse among users?
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>

Re: Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)

2017-03-24 Thread Hans Åberg


> On 24 Mar 2017, at 19:33, Doug Ewell  wrote:
> 
> Philippe Verdy wrote:
> 
>> But Unicode just prefered to keep the roundtrip compatiblity with
>> earlier 8-bit encodings (including existing ISO 8859 and DIN
>> standards) so that "ü" in German and French also have the same
>> canonical decomposition even if the diacritic is a diaeresis in French
>> and an umlaut in German, with different semantics and origins.
> 
> Was this only about compatibility, or perhaps also that the two signs
> look identical and that disunifying them would have caused endless
> confusion and misuse among users?

The Swedish letters ÅÄÖ are simplified ligatures, and not diacritic marks. For 
ÄÖ, in handwritten script style, a tilde, the same as Spanish Ñ, which is also 
a simplified ligature.

Diaeresis vs. umlaut (was: Re: Standaridized variation sequences for the Desert alphabet?)

2017-03-24 Thread Doug Ewell

Philippe Verdy wrote:

> But Unicode just prefered to keep the roundtrip compatiblity with
> earlier 8-bit encodings (including existing ISO 8859 and DIN
> standards) so that "ü" in German and French also have the same
> canonical decomposition even if the diacritic is a diaeresis in French
> and an umlaut in German, with different semantics and origins.

Was this only about compatibility, or perhaps also that the two signs
look identical and that disunifying them would have caused endless
confusion and misuse among users?

--
Doug Ewell | Thornton, CO, US | ewellic.org

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-24 Thread Philippe Verdy

2017-03-24 17:11 GMT+01:00 Michael Everson :

> On 23 Mar 2017, at 22:03, David Starner  wrote:
> > On Thu, Mar 23, 2017 at 6:54 AM Michael Everson 
> wrote:
> >> Again: The source of 1855 EW and OI uses *different* letters than the
> 1859 EW and OI do. This wasn’t accidental. It’s not hard to puzzle out or
> to see. This isn’t random or even systematic natural development of
> handwriting styles. It was a principled revision done on the basis of
> phonetic analysis. English diphthongs EW and OI were first represented by
> ligatures representing [ɪuː] and [ɒɪ], and then later by ligatures
> representing [ɪʊ] and [ɔːɪ].
> >
> > Sutterlin was created by Ludwig Sütterlin in 1915. There's lots of
> principled revision going on all the time in the world's scripts that
> doesn't get recorded by Unicode, and this goes double for young constructed
> scripts, where people are playing around with them.
>
> What’s your point? Sütterlin didn’t invent new letters. Both n and u look
> a lot alike, and so the latter was marked with a breve, but in the
> 15th-century Cornish manuscript I was working with at the British Library
> last week both n and u look a lot alike. This has nothing to do with the
> origin or identity of two sets of letters used for diphthongs in Deseret.
>

There's a counter example of precedent for the German umlaut which was
unfortunately unified with the diaeresis, even if its origin (and still its
current semantic) is that of a combining letter e and where it does not
play the phonetic role of a diaresis (i.e. the separation of two vowels to
avoid creating digrams for a single phonem represented by pairs of letters).

So "ä" in German is cognate to the "ae" digram, similar to the "ai" digram
used in French (or to the "æ" ligature used other languages, sometimes as a
distinct letter of their basic alphabet), it contains no phonetic diaeresis
as there's a single phonem, and no diphtong (like "aï" in French where this
is a true diaeresis to break the interpretation as the digram "ai").
Same remark for "ö" in German cognate to the digram "oe" (or the ligatured
letter "œ" in other languages or the variant "ø" in Nordic languages), and
"ü" cognate to "ue".

But Unicode just prefered to keep the roundtrip compatiblity with earlier
8-bit encodings (including existing ISO 8859 and DIN standards) so that "ü"
in German and French also have the same canonical decomposition even if the
diacritic is a diaeresis in French and an umlaut in German, with different
semantics and origins.

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-24 Thread Michael Everson

On 23 Mar 2017, at 22:03, David Starner  wrote:
> On Thu, Mar 23, 2017 at 6:54 AM Michael Everson  wrote:
>> Again: The source of 1855 EW and OI uses *different* letters than the 1859 
>> EW and OI do. This wasn’t accidental. It’s not hard to puzzle out or to see. 
>> This isn’t random or even systematic natural development of handwriting 
>> styles. It was a principled revision done on the basis of phonetic analysis. 
>> English diphthongs EW and OI were first represented by ligatures 
>> representing [ɪuː] and [ɒɪ], and then later by ligatures representing [ɪʊ] 
>> and [ɔːɪ].
> 
> Sutterlin was created by Ludwig Sütterlin in 1915. There's lots of principled 
> revision going on all the time in the world's scripts that doesn't get 
> recorded by Unicode, and this goes double for young constructed scripts, 
> where people are playing around with them.

What’s your point? Sütterlin didn’t invent new letters. Both n and u look a lot 
alike, and so the latter was marked with a breve, but in the 15th-century 
Cornish manuscript I was working with at the British Library last week both n 
and u look a lot alike. This has nothing to do with the origin or identity of 
two sets of letters used for diphthongs in Deseret. 

>> Indeed I would say to John Jenkins and Ken Beesley that the richness of the 
>> history of the Deseret alphabet would be impoverished by treating the 1859 
>> letters as identical to the 1855 letters.
> 
> And yet the richness of the history of the Latin alphabet is not impoverished 
> by treating 
> https://commons.wikimedia.org/wiki/File:I_littera_in_manuscripto.jpg (a 
> monocase Latin cursive) as identical to part of the modern Latin-script 
> alphabet, which besides casing, has split the i/j and u/v on the basis of 
> phonetic analysis?

Your question has, again, nothing to do with the matter in hand. While it is 
true that the shapes of the Latin letters in that manuscript differ from the 
shapes which we use today, their identity as letters (and their Old Italic and 
Phoenician forerunners) is not in question. Inscriptional Latin from that same 
period is still quite familiar to us. That i and j are distinguished in that 
handwritten text isn’t surprising. Centuries later in Europe the j graph was 
extremely common in numbers (as in xiij ’13’). It’s true that it wasn’t until 
1524 that i and j were specifically distinguished *as* separate letters in 
Italy; this distinction was formally made in English in 1633. But this isn’t 
analogous to the ligature-based letters used for diphthongs in Deseret.

And we *can* distinguish i and j in that Latin text, because we have separate 
characters encoded for it. And we *have* encoded many other Latin 
ligature-based letters and sigla of various kinds for the representation of 
medieval European texts. Indeed, that’s just a stronger argument for 
distinguishing the ligature-based letters for Deseret, I think.

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-24 Thread Michael Everson

On 24 Mar 2017, at 11:34, Martin J. Dürst  wrote:
> 
> On 2017/03/23 22:48, Michael Everson wrote:
> 
>> Indeed I would say to John Jenkins and Ken Beesley that the richness of the 
>> history of the Deseret alphabet would be impoverished by treating the 1859 
>> letters as identical to the 1855 letters.
> 
> Well, I might be completely wrong, but John Jenkins may be the person on this 
> list closest to an actual user of Deseret (John, please correct me if I'm 
> wrong one way or another).

He is. He transcribes texts into Deseret. I’ve published three of them (Alice, 
Looking-Glass, and Snark).

> It may be that actual users of Deseret read these character variants the same 
> way most of us would read serif vs. sans-serif variants: I.e. unless we are 
> designers or typographers, we don't actually consciously notice the 
> difference.

I am a designer and typographer, and I’ve worked rather extensively with a 
variety of Deseret fonts for my publications. They have been well-received. 

> If that's the case, it would be utterly annoying to these actual users to 
> have to make a distinction between two characters where there actually is 
> none.

Actually neither of the ligature-letters are used in our Carrollian Deseret 
volumes. 

> The richness of the history of the Deseret alphabet can still be preserved 
> e.g. with different fonts the same way we have thousands of different fonts 
> for Latin and many other scripts that show a lot of rich history.

You know, Martin, I *have* been doing this for the last two decades. I’m well 
aware of what a font is and can do. I’m also aware of what principles we have 
used for determining character identity.

I saw your note about CJK. Unification there typically has something to do with 
character origin and similarity. The Deseret diphthong letters are clearly 
based on ligatures of *different* characters. 

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-24 Thread Martin J. Dürst


On 2017/03/23 22:48, Michael Everson wrote:


Indeed I would say to John Jenkins and Ken Beesley that the richness of the 
history of the Deseret alphabet would be impoverished by treating the 1859 
letters as identical to the 1855 letters.


Well, I might be completely wrong, but John Jenkins may be the person on 
this list closest to an actual user of Deseret (John, please correct me 
if I'm wrong one way or another).


It may be that actual users of Deseret read these character variants the 
same way most of us would read serif vs. sans-serif variants: I.e. 
unless we are designers or typographers, we don't actually consciously 
notice the difference. If that's the case, it would be utterly annoying 
to these actual users to have to make a distinction between two 
characters where there actually is none.


The richness of the history of the Deseret alphabet can still be 
preserved e.g. with different fonts the same way we have thousands of 
different fonts for Latin and many other scripts that show a lot of rich 
history.


Regards,   Martin.

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-23 Thread David Starner

On Thu, Mar 23, 2017 at 6:54 AM Michael Everson 
wrote:

> Again: The source of 1855 EW and OI uses *different* letters than the 1859
> EW and OI do. This wasn’t accidental. It’s not hard to puzzle out or to
> see. This isn’t random or even systematic natural development of
> handwriting styles. It was a principled revision done on the basis of
> phonetic analysis. English diphthongs EW and OI were first represented by
> ligatures representing [ɪuː] and [ɒɪ], and then later by ligatures
> representing [ɪʊ] and [ɔːɪ].
>

Sutterlin was created by Ludwig Sütterlin in 1915. There's lots of
principled revision going on all the time in the world's scripts that
doesn't get recorded by Unicode, and this goes double for young constructed
scripts, where people are playing around with them.

> Indeed I would say to John Jenkins and Ken Beesley that the richness of
> the history of the Deseret alphabet would be impoverished by treating the
> 1859 letters as identical to the 1855 letters.
>

And yet the richness of the history of the Latin alphabet is not
impoverished by treating
https://commons.wikimedia.org/wiki/File:I_littera_in_manuscripto.jpg (a
monocase Latin cursive) as identical to part of the modern Latin-script
alphabet, which besides casing, has split the i/j and u/v on the basis of
phonetic analysis?

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-23 Thread Michael Everson

On 23 Mar 2017, at 06:28, David Starner  wrote:

> > Does "Яussia" require a new Latin letter because the way R was written has 
> > a different origin than the normal R?
> 
> But it doesn’t. It’s the Latin letter R turned backwards by a designer for a 
> logo. We wouldn’t encode that, because it’s a logo.
> 
> What logo?

Oh, sorry. “Toys Я Us” which is what I saw when I saw your “Яussia”.

> I honestly don't know what logo you're talking about, but a quick Google 
> search confirms it's used outside of a logo. I was thinking of 
> http://www.sjgames.com/gurps/books/Russia/img/cover_lg.jpg which actually 
> doesn't use the reversed R, but uses other Cyrillic characters. 

Decorative display type and font play on book covers is a very different thing 
from the development of the Deseret alphabet we are discussing here. 

>> We don’t encode diphthongs. We encode the elements of writing systems. The 
>> “idea” here is represented by one ligature of 𐐆 + 𐐅 (1855 EW), one ligature 
>> of 𐐆 + 𐐋 (1859 EW), one ligature of 𐐉 + 𐐆 (1855 OI), and one ligature of 𐐃 + 
>> 𐐆 (1859 OI).
> 
> If they're ligatures, they should be encoded as ligatures; if they're 
> indivisible characters, then their glyph forms are of less interest.

We don’t encode ligatures. We encode letters which are historically derived 
from ligation. That’s what the existing EW and OI are, and that’s what the 1859 
revised letters were.

>> Those ligatures are not glyph variants of one another. You might as well say 
>> that Æ and Œ are glyph variants of one another.
> 
> Æ and Œ have contrasting use; they're used in the same text in distinct ways.

That happens to be the case, but the analogy has to do with the origin of the 
ligatures. 

> Note that n and v̆ are considered glyph variants of each other, because v̆ is 
> used in Sutterlin in exactly the places that n is used in typewritten 
> versions of the text.

It’s n and ǔ in Sütterlin, not n and v̆. 

> æ is not œ even when they are printed in fonts that make it nearly impossible 
> to tell them apart. It has nothing to do with the glyphs or how those glyphs 
> were created, it's because they're used in different ways. 

It was an analogy about the structural development of the ligated letters. 

> The example of Sutterlin strikes me as quite relevant here; characters get 
> all sorts of weird shapes in handwriting. Sometimes they end up immortalized 
> in printing, and then they usually get encoded. Usually not.

Again: The source of 1855 EW and OI uses *different* letters than the 1859 EW 
and OI do. This wasn’t accidental. It’s not hard to puzzle out or to see. This 
isn’t random or even systematic natural development of handwriting styles. It 
was a principled revision done on the basis of phonetic analysis. English 
diphthongs EW and OI were first represented by ligatures representing [ɪuː] and 
[ɒɪ], and then later by ligatures representing [ɪʊ] and [ɔːɪ]. 

Indeed I would say to John Jenkins and Ken Beesley that the richness of the 
history of the Deseret alphabet would be impoverished by treating the 1859 
letters as identical to the 1855 letters. 

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-22 Thread David Starner

On Wed, Mar 22, 2017 at 5:09 PM Michael Everson 
wrote:

> On 22 Mar 2017, at 21:39, David Starner  wrote:
> >
> > Does "Яussia" require a new Latin letter because the way R was written
> has a different origin than the normal R?
>
> But it doesn’t. It’s the Latin letter R turned backwards by a designer for
> a logo. We wouldn’t encode that, because it’s a logo.
>

What logo? I honestly don't know what logo you're talking about, but a
quick Google search confirms it's used outside of a logo. I was thinking of
http://www.sjgames.com/gurps/books/Russia/img/cover_lg.jpg which actually
doesn't use the reversed R, but uses other Cyrillic characters.

> We don’t encode diphthongs. We encode the elements of writing systems. The
> “idea” here is represented by one ligature of 𐐆 + 𐐅 (1855 EW), one
> ligature of 𐐆 + 𐐋 (1859 EW), one ligature of 𐐉 + 𐐆 (1855 OI), and one
> ligature of 𐐃 + 𐐆 (1859 OI).
>

If they're ligatures, they should be encoded as ligatures; if they're
indivisible characters, then their glyph forms are of less interest.

> Those ligatures are not glyph variants of one another. You might as well
> say that Æ and Œ are glyph variants of one another.
>

Æ and Œ have contrasting use; they're used in the same text in distinct
ways. Note that n and v̆ are considered glyph variants of each other,
because v̆ is used in Sutterlin in exactly the places that n is used in
typewritten versions of the text.

> Æ is not Œ.
>

æ is not œ even when they are printed in fonts that make it nearly
impossible to tell them apart. It has nothing to do with the glyphs or how
those glyphs were created, it's because they're used in different ways.

The example of Sutterlin strikes me as quite relevant here; characters get
all sorts of weird shapes in handwriting. Sometimes they end up
immortalized in printing, and then they usually get encoded. Usually not.

Re: Standaridized variation sequences for the Desert alphabet?

On 22 Mar 2017, at 21:39, David Starner  wrote:
> 
> Does "Яussia" require a new Latin letter because the way R was written has a 
> different origin than the normal R? 

But it doesn’t. It’s the Latin letter R turned backwards by a designer for a 
logo. We wouldn’t encode that, because it’s a logo. 

> There's huge variation in Latin script including all sorts of different 
> glyphs, and I suspect Яussia is way more common than any use of the Deseret 
> script.

In order to represent that logo, people use the Cyrillic letter Я, as you know. 

> There's the same characters here, written in different ways.

No, it’s not. Its the same diphthong (a sound) written with different letters. 

> The glyphs may come from a different origin, but it's encoding the same idea.

We don’t encode diphthongs. We encode the elements of writing systems. The 
“idea” here is represented by one ligature of 𐐆 + 𐐅 (1855 EW), one ligature of 
𐐆 + 𐐋 (1859 EW), one ligature of 𐐉 + 𐐆 (1855 OI), and one ligature of 𐐃 + 𐐆 
(1859 OI).

Those ligatures are not glyph variants of one another. You might as well say 
that Æ and Œ are glyph variants of one another. 

> If a user community considers them separate, then they should be separated, 
> but I don't see that happening, and from an idealistic perspective, I think 
> they're platonically the same.

I do not agree with that analysis. The ligatures and their constituent parts 
are distinct and distinctive. In fact, it might have been that the choice for 
revision was to improve the underlying phonology. In any case, there’s no way 
that the bottom pair in 
https://en.wikipedia.org/wiki/Deseret_alphabet#/media/File:Deseret_glyphs_ew_and_oi_transformation_from_1855_to_1859.svg
 can be considered to be “glyph variants” of the top pair. Usage is one thing. 
Character identity is another. Æ is not Œ. A ligature of 𐐆 + 𐐅 is not a 
ligature of 𐐆 + 𐐋. 

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-22 Thread David Starner

On Wed, Mar 22, 2017 at 8:54 AM Michael Everson 
wrote:

> If there is evidence outside of the Wikipedia for the 1859 letters, they
> should be encoded as new letters, because their design shows them to be
> ligatures of different base characters. That means they’re not glyph
> variants of the currently encoded letters.
>

Does "Яussia" require a new Latin letter because the way R was written has
a different origin than the normal R? There's huge variation in Latin
script including all sorts of different glyphs, and I suspect Яussia is way
more common than any use of the Deseret script.

There's the same characters here, written in different ways. The glyphs may
come from a different origin, but it's encoding the same idea. If a user
community considers them separate, then they should be separated, but I
don't see that happening, and from an idealistic perspective, I think
they're platonically the same.

Re: Standaridized variation sequences for the Desert alphabet?

On 22 Mar 2017, at 20:26, James Kass  wrote:
> Michael Everson wrote,
> 
>> The old EW and OI and the new EW and OI are clearly *different* letters.
> 
> "Different" versus "variant”?

Yes, different. All of them share the SHORT I [ɪ] stroke but the base 
characters are 𐐅 𐐉 (1855) and 𐐋 𐐃 (1859). 

> Michael's analysis seems correct.  If Deseret was not already in the 
> Standard, a new proposal for its encoding including eight characters covering 
> the two dipthongs would not be amiss, would it?  

Capital and small 𐐦 𐑎 𐐧 𐑏 are already encoded. If the other four are required, 
nothing prevents them from being proposed and added. 

> An alternative would be to use the ZWJ mechanism to indicate a preference for 
> the desired letters.

Joining what? We encoded 𐐦 𐑎 𐐧 𐑏 explicitly, not as ligatures, though they are 
in origin ligatures. 

> My opinion that variation selectors would be the right approach was based 
> upon concerns about existing data getting "broken".  But, if there isn't any 
> existing data…

If 𐐦 is in origin a ligature of 𐐆𐐉 and the 1859 one is in origin a ligature of 
𐐆𐐃 then the 1855 and 1859 letters are **NOT** “variants” of one another. They 
are *different* letters in origin, regardless of their intended use. 

The choice to use 1855 EW or 1859 EW is a matter of *spelling*, not glyph 
substitution. If the later letters are really required, they should be added to 
the standard. We should not abandon the good precedent we have for character 
identification just for expedience. That’d be a way to turn the UCS into a 
glyph registry. :-( 

Michael Everson

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-22 Thread James Kass

Michael Everson wrote,

> The old EW and OI and the new EW and OI are
> clearly *different* letters.

"Different" versus "variant"?

Michael's analysis seems correct.  If Deseret was not already in the
Standard, a new proposal for its encoding including eight characters
covering the two dipthongs would not be amiss, would it?  An
alternative would be to use the ZWJ mechanism to indicate a preference
for the desired letters.

My opinion that variation selectors would be the right approach was
based upon concerns about existing data getting "broken".  But, if
there isn't any existing data...

Best regards,

James Kass

Re: Standaridized variation sequences for the Desert alphabet?

On 22 Mar 2017, at 16:50, John H. Jenkins  wrote:
> 
> My own take on this is "absolutely not." This is a font issue, pure and 
> simple. There is no dispute as to the identity of the characters in question, 
> just their appearance. 

There’s identity in terms of intended usage (two diphthongs), and identity in 
terms of the origin of the characters (ligatures from different sources). That 
kind of etymology is indeed something that we take into account when encoding 
characters.

> In any event, these two letters were never part of the "standard" Deseret 
> Alphabet used in printed materials. To the extent they were used, it was in 
> hand-written material only, where you're going to see a fair amount of 
> variation anyway.

I think I have to stand by my glyph analysis

> There were also two recensions of the DA used in printed materials which are 
> materially different, and those would best be handled via fonts.

Dunno what you are referring to here. 

> It isn't unreasonable to suggest we change the glyphs we use in the Standard. 
> Ken Beesley and I have have discussed the possibility, and we both feel that 
> it's very much on the table.

I would oppose such a change given the origin of the four characters we have 
discussed. The old EW and OI and the new EW and OI are clearly *different* 
letters.

Michael

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-22 Thread John H. Jenkins

My own take on this is "absolutely not." This is a font issue, pure and simple. 
There is no dispute as to the identity of the characters in question, just 
their appearance. 

In any event, these two letters were never part of the "standard" Deseret 
Alphabet used in printed materials. To the extent they were used, it was in 
hand-written material only, where you're going to see a fair amount of 
variation anyway. There were also two recensions of the DA used in printed 
materials which are materially different, and those would best be handled via 
fonts.

It isn't unreasonable to suggest we change the glyphs we use in the Standard. 
Ken Beesley and I have have discussed the possibility, and we both feel that 
it's very much on the table.

Re: Standaridized variation sequences for the Desert alphabet?

2017-03-22 Thread William_J_G Overington

>> If the user community needs to preserve the distinction in plain-text, then 
>> variation selection is the right approach.

> True. However, the user community is tiny, and I suspect that those variation 
> selectors would never get used.

I do not use Deseret myself.

I opine that encoding the variation selector sequences would be good.

My reason for that opinion is because I opine that Unicode should provide for 
such situations where they are known to exist, even if the usage of the 
encoding may be very rare.

Am I correct in thinking that making use of such a variation selector encoding 
would be a font issue rather than an operating system issue?

Unicode is intended to be a long-lasting standardized system, so hopefully 
adding the variation selector sequences into The Unicode Standard now would 
provide support for a very long time.

Am I correct in thinking that the cost of adding the variation selector 
sequences into The Unicode Standard would be very small?

William Overington

Wednesday 22 March 2017

Re: Standaridized variation sequences for the Desert alphabet?