AW: Proposal for German capital letter "ß"

2015-12-09 Thread Dreiheller, Albrecht
Just have a look at
U+1E9E  LATIN CAPITAL LETTER SHARP S
in the block Latin Extended Additional
http://www.unicode.org/charts/PDF/U1E00.pdf

Kind regards

Von: Unicode [mailto:unicode-boun...@unicode.org] Im Auftrag von Hans Meiser
Gesendet: Mittwoch, 9. Dezember 2015 13:26
An: unicode@unicode.org
Betreff: Proposal for German capital letter "ß"


Currently there is a vast problem trying to determine the lower case equivalent 
of a capitalized German word like "MASSE".


This is due to the fact that an orthographic rule exists to convert lower case 
letter "ß" to upper case letters "SS". So after converting a word from lower 
case to upper case one cannot unequivocally determine the original lower case 
word because the conversion is only surjective.


This issue exists because the letter "ß" originally was but a ligature of the 
small letter "sz" (using a legacy German font) which over time became a 
ligature of "ss".


After the German spelling reform in 1996, "ß" then became a letter of its own, 
and words containing the letter "ß" are no longer equivalent to words 
containing an "ss" combination instead of the "ß". So, for instance, "Maße" and 
"Masse" are not equal. In fact, "Maße" translates to "measurements" while 
"Masse" translates to "weight".


This is a particular problem in electronic data processing - like, for 
instance, SQL data queries. Given above rule, "Maße" will become "MASSE", just 
like "Masse" becomes "MASSE" when converting a word to uppercase. But there is 
no way back to distinguish one from the other.


I read that the UNICODE group is already striving for a solution to this 
problem and that they are searching for a capital letter equivalent of "ß".


My proposal is to introduce a capital letter equivalent of "ß" that's 
resembling two capital "S" letters: "SS".


So the capital letter equivalent of "ß" would look like "SS" but was in fact a 
separate code point. Converting words from lower case to upper case and back 
will then become bijective, auto correction will become easier and the (false) 
ANSI SQL stopgap of declaring "ß" and "ss" to be equal can be dropped.

Your feedback is appreciated.

Axel Dahmen - Germany


Proposal for German capital letter "ß"

2015-12-09 Thread Hans Meiser
Currently there is a vast problem trying to determine the lower case equivalent 
of a capitalized German word like "MASSE".


This is due to the fact that an orthographic rule exists to convert lower case 
letter "ß" to upper case letters "SS". So after converting a word from lower 
case to upper case one cannot unequivocally determine the original lower case 
word because the conversion is only surjective.


This issue exists because the letter "ß" originally was but a ligature of the 
small letter "sz" (using a legacy German font) which over time became a 
ligature of "ss".


After the German spelling reform in 1996, "ß" then became a letter of its own, 
and words containing the letter "ß" are no longer equivalent to words 
containing an "ss" combination instead of the "ß". So, for instance, "Maße" and 
"Masse" are not equal. In fact, "Maße" translates to "measurements" while 
"Masse" translates to "weight".


This is a particular problem in electronic data processing - like, for 
instance, SQL data queries. Given above rule, "Maße" will become "MASSE", just 
like "Masse" becomes "MASSE" when converting a word to uppercase. But there is 
no way back to distinguish one from the other.


I read that the UNICODE group is already striving for a solution to this 
problem and that they are searching for a capital letter equivalent of "ß".


My proposal is to introduce a capital letter equivalent of "ß" that's 
resembling two capital "S" letters: "SS".


So the capital letter equivalent of "ß" would look like "SS" but was in fact a 
separate code point. Converting words from lower case to upper case and back 
will then become bijective, auto correction will become easier and the (false) 
ANSI SQL stopgap of declaring "ß" and "ss" to be equal can be dropped.

Your feedback is appreciated.

Axel Dahmen - Germany


Re: Proposal for German capital letter "ß"

2015-12-09 Thread Gerrit Ansmann

My proposal is to introduce a capital letter equivalent of "ß" that's resembling two capital 
"S" letters: "SS".


Actually, the capital ß is already included in Unicode (ẞ) because it was and 
is used as a separate letter (not looking like SS), though only rarely. It is 
now realised as a proper distinguishable letter in many fonts, which is 
arguably the best solution. I have a keyboard with this letter. Moreover, the 
Germany authority on spelling (Rat für Rechtschreibung) stated that it will 
acknowledge an individual letter if it gets established in use.


Further reading:

• http://www.versaleszett.de/
• http://german.stackexchange.com/a/8960/2594
• http://j.mp/versaleszett
• http://www.typografie.info/3/page/wiki.html/_/fachbegriffe/grosses-eszett



After the German spelling reform in 1996, "ß" then became a letter of its own, and words containing the letter "ß" are no longer equivalent to words containing 
an "ss" combination instead of the "ß". So, for instance, "Maße" and "Masse" are not equal. In fact, "Maße" translates to 
"measurements" while "Masse" translates to "weight".


Actually, you had the very same problem with “Masse” and “Maße” before the 
spelling reform.


Re: Proposal for German capital letter "ß"

2015-12-09 Thread Asmus Freytag (t)

  
  
On 12/9/2015 9:52 AM, Gerrit Ansmann
  wrote:


  After
the German spelling reform in 1996, "ß" then became a letter of
its own, and words containing the letter "ß" are no longer
equivalent to words containing an "ss" combination instead of
the "ß". So, for instance, "Maße" and "Masse" are not equal. In
fact, "Maße" translates to "measurements" while "Masse"
translates to "weight".

  
  
  Actually, you had the very same problem with “Masse” and “Maße”
  before the spelling reform.
  


The true difference after the spelling reform
  is that the pronunciation of the two is now systematically
  different, with the former having a short vowel and the latter a
  long vowel. Before the reform, the choice of spelling depended on
  other factors, but now a fairly systematic correspondence exists.
  
  Because of that correspondence, the use of SS as a capital form
  might begin to "sound wrong", so to speak, to people who grew up
  with the new spelling. Will have to see whether that suspected
  effect translates into an actual tendency to avoid the "SS" style
  uppercase. Whether this happens by a decision to avoid the use of
  ALL CAPS, or by using the capital sharp s or by simply not
  uppercasing the sharp-s even in an ALL CAPS context. The first
  would be hard to observe, but examples of the other two strategies
  were reasonably common and many were documented in the run-up to
  the encoding of the capital sharp s.
  
  A./
  
  

  



Re: Proposal for German capital letter "ß"

2015-12-09 Thread Richard Wordingham
On Wed, 9 Dec 2015 19:55:24 +
Hans Meiser  wrote:

> I see.
> 
> Yet, the u+1E9E doesn't quite look like two capital "S". So any
> program implementing a conversion conforming to Unicode will
> currently display/print in a wrong result: "MAßE" instead of the
> correctly converted result "MASSE".

While the default simple uppercasing of "maße" will yield "MAßE", the
default full uppercasing will yield "MASSE".

I am not aware of a useful definition of 'conforming to Unicode' that
applies to either transformation.

> Both would be correctly encoded
> as u+004D u+0041 u+1E9E u+0045. Yet, AFAIK, the current glyph would
> currently be considered an error.
> 
> Proposal: Shouldn't the glyph be amended to match the natural
> language?

No, the glyph corresponds to *a* natural form of German, as opposed to
Standard German - which some would argue was not a natural language!
Now, it may be argued that U+00DF has the same glyph as U+1E9E when
next to a capital letter, but that is a font decision, not a Unicode
decision.

One could therefore define an uppercasing transformation that was a
conformant Unicode process, and agreed with default uppercasing on NFD
strings except for U+00DF, but differed by mapping U+00DF to U+1E9E.
One might not notice any error in the printed output of this process,
any more than one would notice U+006F LATIN SMALL LETTER O being
transformed to U+041E CYRILLIC CAPITAL LETTER O.

Richard.



Re: Proposal for German capital letter "ß"

2015-12-09 Thread Gerrit Ansmann

On Wed, 09 Dec 2015 20:55:24 +0100, Hans Meiser  wrote:


Yet, AFAIK, the current glyph would currently be considered an error.


See it like this: The point of spelling rules is to easy reading. However, the 
use of SS for capital ß is rather obstrusive, as it is not exactly frequent in 
everyday texts and if it is used, even professional designers and typesetters 
do it more often wrong than correct and produce something like FUßBALL. On the 
other hand, a well-designed capital ß is not even noticed by many readers.

Finally, as I already said, the institution that decides about right and wrong 
in German orthography implicitly encourages you to use the capital ß if you 
prefer it.


Proposal: Shouldn't the glyph be amended to match the natural language?


Nothing of this is really natural. If you go by what most people do, you would 
have to write FUßBALL. Also, I hypothesise that languages which passed a 
certain level of alphabetisation do not exhibit natural spelling changes beyond 
the single-word level anymore, as spelling dogmatists get too dominant – just 
look at the English orthography. After this point, you can only have 
centralised changes like the spelling reforms.


Re: Proposal for German capital letter "ß"

2015-12-09 Thread Philippe Verdy
2015-12-09 22:45 GMT+01:00 Richard Wordingham <
richard.wording...@ntlworld.com>:

> On Wed, 9 Dec 2015 19:55:24 +
> Hans Meiser  wrote:
>
> > I see.
> >
> > Yet, the u+1E9E doesn't quite look like two capital "S". So any
> > program implementing a conversion conforming to Unicode will
> > currently display/print in a wrong result: "MAßE" instead of the
> > correctly converted result "MASSE".
>
> While the default simple uppercasing of "maße" will yield "MAßE", the
> default full uppercasing will yield "MASSE".
>

Full uppercasing rules are normally locale-sensitive, and thus there should
exist a specific rule for German not yielding this result (see for example
the rules for Turkish dotless i vs dotted i).

I don't think these locale-sensitive rules are irrevocably stable as more
locales can be added at any time for some languages needing specific pairs.
The stabilized properties are for locale-neutral mappings only, in generic
contexts where the language is not known (including for standard
normalizations, or for the locale-neutral "root" collations and the
associated DUCET).

Even for the same language, these rules cannot be hardcoded in a stable
way, orthographies are evoluting over time, unless you use a locale
identifying the orthographic rule precisely (and the associated rulesets
are checked and corrected to reach a stable consensus: if there's an
evolution or variants, use another locale identifier) and that specific
orthography is entirely known (this is difficult for historic orthographies
or when there's no recognized language academy or national institution
fixing the rule to use for some country or region, but even these
institutions are working in their current working time and limiting their
scope to some applications, they will not reforme the history).

> I am not aware of a useful definition of 'conforming to Unicode' that
applies to either transformation.

I am not aware of a useful definition of 'conforming to Unicode' that
> applies to either transformation.


So if you look for an example look at how this is made for Turkish.
Basically this is just a matter of tailoring for specific locales.


Re: Proposal for German capital letter "ß"

2015-12-09 Thread Michael Everson
On 9 Dec 2015, at 20:57, Gerrit Ansmann  wrote:

>> Proposal: Shouldn't the glyph be amended to match the natural language?
> 
> Nothing of this is really natural. If you go by what most people do, you 
> would have to write FUßBALL.

In my new edition of the first German translation of “Alice’s Adventures in 
Wonderland”, the editor and I made sure that the cakes said “Iẞ MICH!” and not 
“Iß MICH!”. :-) 

Michael Everson * http://www.evertype.com/




Re: Proposal for German capital letter "ß"

2015-12-09 Thread Michael Everson
On 9 Dec 2015, at 22:57, Asmus Freytag (t)  wrote:
> 
>> In my new edition of the first German translation of “Alice’s Adventures in 
>> Wonderland”, the editor and I made sure that the cakes said “Iẞ MICH!” and 
>> not “Iß MICH!”. :-) 
> 
> And the correct spelling (modern) would have been "Iss mich" (or capitalized 
> version as in your case).

Well, we were updating from the 1869 Fraktur orthography to one suitable for 
the modern era. We did not use the Schlechtschreibung, in terms of our 
dissatisfaction with it, and in consideration of the timelessness of the 
Victorian text. 

Our choice of “Iẞ MICH!” as opposed to “Iß MICH!” or “ISS MICH!” was based on 
good orthographic practice often found in Germany, regardless of whether it is 
official or not. Please note that “official” and “correct” are not the same 
things. 

It is OBVIOUS that if Maße and Masse are distinguished in lower-case then it is 
advantageous to users and their data if they upper-case to MAẞE and MASSE.

Michael Everson * http://www.evertype.com/




Re: Proposal for German capital letter "ß"

2015-12-09 Thread Mark E. Shoulson

On 12/09/2015 06:49 PM, Hans Meiser wrote:

Yes, they do it wrong because (1) they don't know better and (2) they let their 
software convert lower case text into upper case (a feature nearly every 
typographic software provides).

Yet, if we let the majority of illiterate people decide what's right and what's 
wrong we could as easily decide to have 2 + 2 = 5.

Here's an official text of the correct today's rules on how to write a capital 
"ß" (it's in German):

http://www.duden.de/sprachwissen/rechtschreibregeln/doppel-s-und-scharfes-s


I remember when we went through all this the first time around, encoding 
ẞ in the first place.  People were saying "But the Duden says no!!!"  
And someone then pointed out, "Please close your Duden and cast your 
gaze upon ITS FRONT COVER, where you will find written in inch-high 
capitals plain as day, "DER GROẞE DUDEN" 
(http://www.typografie.info/temp/GrosseDuden.jpg)  So in terms of 
prescription vs description, the Duden pretty much torpedoes itself.


~mark


Re: Proposal for German capital letter "ß"

2015-12-09 Thread Asmus Freytag (t)

  
  
On 12/9/2015 3:49 PM, Hans Meiser
  wrote:


  Yes, they do it wrong because (1) they don't know better and (2) they let their software convert lower case text into upper case (a feature nearly every typographic software provides).

Yet, if we let the majority of illiterate people decide what's right and what's wrong we could as easily decide to have 2 + 2 = 5.

Here's an official text of the correct today's rules on how to write a capital "ß" (it's in German):

http://www.duden.de/sprachwissen/rechtschreibregeln/doppel-s-und-scharfes-s





   In Dokumenten kann bei Namen aus Gründen der Eindeutigkeit
auch bei Großbuchstaben das ß verwendet werden.
  Für den im internationalen Standard-Zeichensatz „Unicode"
(ISO/IEC 10646) verzeichneten Großbuchstaben für das ß gibt es
derzeit noch keine allgemein verwendete Schriftform. Er ist
nicht Gegenstand der amtlichen Rechtschreibregelung.
  
HEINZ GROßE
  

The last line (bullet), placed somewhat ambiguously, is intended
  as example to the first paragraph cited here and shows the small
ß being used for ALL-CAPS names, because for names one can
  never predict the original spelling (for words, except in the
  small number of minimal pairs) it's generally possible for the
  human reader. 

The translation of the second paragraph is:


  For
  the international
  standard character set "Unicode" (ISO / IEC 10646)
  registered capitals
  for the SS, there are currently no
  commonly used in
writing. It is not part of the
  official spelling rules.
 
--- Google Translate
or
  

     For
  capital letter for the sharp s listed in the
  international standard
character set "Unicode"
  (ISO / IEC 10646)
  
   there is currently
no commonly used written form. It is not subject to the
  official spelling rules.
                                                               
                                                      -- with my
edits

So the claim that this contains the "correct today's rule" on
the spelling of a capital "ß" is worded misleadingly. The fact
is that while there are rules for what to do with a "ß" in the
context of ALL-CAPS, there are, in fact no rules for dealing
with "a capital 'ß'". 

Ironically, Google decides to capitalize the example. Since that
"translator" is based on pattern matching, supposedly, one
wonders what constituted the input that drove that particular
outcome.

A./
  
  



Re: Proposal for German capital letter "ß"

2015-12-09 Thread Asmus Freytag (t)

  
  
On 12/9/2015 1:11 PM, Michael Everson
  wrote:


  On 9 Dec 2015, at 20:57, Gerrit Ansmann  wrote:


  

  Proposal: Shouldn't the glyph be amended to match the natural language?



Nothing of this is really natural. If you go by what most people do, you would have to write FUßBALL.

  
  
In my new edition of the first German translation of “Alice’s Adventures in Wonderland”, the editor and I made sure that the cakes said “Iẞ MICH!” and not “Iß MICH!”. :-) 


And the correct spelling (modern) would have been "Iss mich" (or
capitalized version as in your case).

A./

  

Michael Everson * http://www.evertype.com/






  



Re: Proposal for German capital letter "ß"

2015-12-09 Thread Hans Meiser
Yes, they do it wrong because (1) they don't know better and (2) they let their 
software convert lower case text into upper case (a feature nearly every 
typographic software provides).

Yet, if we let the majority of illiterate people decide what's right and what's 
wrong we could as easily decide to have 2 + 2 = 5.

Here's an official text of the correct today's rules on how to write a capital 
"ß" (it's in German):

http://www.duden.de/sprachwissen/rechtschreibregeln/doppel-s-und-scharfes-s


Re: Proposal for German capital letter "ß"

2015-12-09 Thread Martin J. Dürst

On 2015/12/10 09:30, Mark E. Shoulson wrote:


I remember when we went through all this the first time around, encoding
ẞ in the first place.  People were saying "But the Duden says no!!!" And
someone then pointed out, "Please close your Duden and cast your gaze
upon ITS FRONT COVER, where you will find written in inch-high capitals
plain as day, "DER GROẞE DUDEN"
(http://www.typografie.info/temp/GrosseDuden.jpg)  So in terms of
prescription vs description, the Duden pretty much torpedoes itself.


This is an interesting example of a phenomenon that turns up in many 
other contexts, too. A similar example is the use of accents on 
upper-case letters in French in France where 'officially', upper-case 
letters are written without accents. When working on 
internationalization, it's always good to keep eyes open and not just 
only follow the rules.


However, the example is also somewhat misleading. The book in the 
picture is clearly quite old. The Duden that was cited is new. I checked 
with "Der Grosse Duden" on Amazon, but all the books I found had the 
officially correct spelling. On the other hand, I remember that when the 
upper-case sharp s came up for discussion in Unicode, source material 
showed that it was somewhat popular quite some time ago (possibly close 
in age with the old Duden picture). So we would have to go back and 
check the book in the picture to see what it says about ß to be able to 
claim that Duden was (at some point in time) inconsistent with itself.


Regards,   Martin.




Re: Proposal for German capital letter "ß"

2015-12-09 Thread Marc Blanchet



On 9 Dec 2015, at 23:32, Martin J. Dürst wrote:


On 2015/12/10 09:30, Mark E. Shoulson wrote:

I remember when we went through all this the first time around, 
encoding
ẞ in the first place.  People were saying "But the Duden says 
no!!!" And

someone then pointed out, "Please close your Duden and cast your gaze
upon ITS FRONT COVER, where you will find written in inch-high 
capitals

plain as day, "DER GROẞE DUDEN"
(http://www.typografie.info/temp/GrosseDuden.jpg)  So in terms of
prescription vs description, the Duden pretty much torpedoes itself.


This is an interesting example of a phenomenon that turns up in many 
other contexts, too. A similar example is the use of accents on 
upper-case letters in French in France where 'officially', upper-case 
letters are written without accents.


while in Québec, upper-case letters are written _with_ accents. l10n…

Marc.

When working on internationalization, it's always good to keep eyes 
open and not just only follow the rules.


However, the example is also somewhat misleading. The book in the 
picture is clearly quite old. The Duden that was cited is new. I 
checked with "Der Grosse Duden" on Amazon, but all the books I found 
had the officially correct spelling. On the other hand, I remember 
that when the upper-case sharp s came up for discussion in Unicode, 
source material showed that it was somewhat popular quite some time 
ago (possibly close in age with the old Duden picture). So we would 
have to go back and check the book in the picture to see what it says 
about ß to be able to claim that Duden was (at some point in time) 
inconsistent with itself.


Regards,   Martin.


Re: AW: Proposal for German capital letter "ß"

2015-12-09 Thread Khaled Hosny
On Wed, Dec 09, 2015 at 06:16:35PM +0100, Frédéric Grosshans wrote:
> * use your own casing rule and add a ZWNJ (zero width non joiner character)
> such that ss↔SS and ß↔S+ZWNJ + S.

Wouldn’t ZWJ be a more logical choice given that he wants to “join” both
S’s into a single character.

Regards,
Khaled


Re: Proposal for German capital letter "ß"

2015-12-09 Thread Hans Meiser
I see.

Yet, the u+1E9E doesn't quite look like two capital "S". So any program 
implementing a conversion conforming to Unicode will currently display/print in 
a wrong result: "MAßE" instead of the correctly converted result "MASSE". Both 
would be correctly encoded as u+004D u+0041 u+1E9E u+0045. Yet, AFAIK, the 
current glyph would currently be considered an error.

Proposal: Shouldn't the glyph be amended to match the natural language?

Cheers,
Axel



From: Dreiheller, Albrecht 
Sent: Wednesday, December 9, 2015 4:59 PM
To: Hans Meiser; unicode@unicode.org
Subject: AW: Proposal for German capital letter "ß"


Just have a look at

U+1E9E  LATIN CAPITAL LETTER SHARP S

in the block Latin Extended Additional

http://www.unicode.org/charts/PDF/U1E00.pdf

Latin Extended Additional
Latin Extended Additional Range: 1E00 1EFF This file contains an excerpt from 
the character code tables and list of character names for The Unicode Standard, 
Version 8.0
Read more...




Kind regards



Von: Unicode [mailto:unicode-boun...@unicode.org] Im Auftrag von Hans Meiser
Gesendet: Mittwoch, 9. Dezember 2015 13:26
An: unicode@unicode.org
Betreff: Proposal for German capital letter "ß"



Currently there is a vast problem trying to determine the lower case equivalent 
of a capitalized German word like "MASSE".



This is due to the fact that an orthographic rule exists to convert lower case 
letter "ß" to upper case letters "SS". So after converting a word from lower 
case to upper case one cannot unequivocally determine the original lower case 
word because the conversion is only surjective.



This issue exists because the letter "ß" originally was but a ligature of the 
small letter "sz" (using a legacy German font) which over time became a 
ligature of "ss".



After the German spelling reform in 1996, "ß" then became a letter of its own, 
and words containing the letter "ß" are no longer equivalent to words 
containing an "ss" combination instead of the "ß". So, for instance, "Maße" and 
"Masse" are not equal. In fact, "Maße" translates to "measurements" while 
"Masse" translates to "weight".



This is a particular problem in electronic data processing - like, for 
instance, SQL data queries. Given above rule, "Maße" will become "MASSE", just 
like "Masse" becomes "MASSE" when converting a word to uppercase. But there is 
no way back to distinguish one from the other.



I read that the UNICODE group is already striving for a solution to this 
problem and that they are searching for a capital letter equivalent of "ß".



My proposal is to introduce a capital letter equivalent of "ß" that's 
resembling two capital "S" letters: "SS".



So the capital letter equivalent of "ß" would look like "SS" but was in fact a 
separate code point. Converting words from lower case to upper case and back 
will then become bijective, auto correction will become easier and the (false) 
ANSI SQL stopgap of declaring "ß" and "ss" to be equal can be dropped.

Your feedback is appreciated.

Axel Dahmen - Germany


Re: AW: Proposal for German capital letter "ß"

2015-12-09 Thread Frédéric Grosshans
For more information on the capital sharp s (ẞ) (converting Maße to 
MAẞE), you can also look at Wikipedia 
https://en.wikipedia.org/wiki/Capital_%E1%BA%9E (more details in the 
german version https://en.wikipedia.org/wiki/Capital_%E1%BA%9E ) and 
Andreas Stötzner 2004 proposal to Unicode 
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2888.pdf


Your proposal to have a character which look exactly like SS is 
problematic on many grounds, and could only have been introduced in 
Unicode as legacy character if it existed in character sets before the 
1990s. Introducing it know would cause much more problem than it solves 
(e.g. allowing spoofing, making the encoding ambiguous, violating 
stability of the casing rules, etc.). If you want to have reversible 
casing distinguishing ss↔SS and ß↔SS   using ẞ, you can (in your 
software) bend the Unicode standard in one of the following ways:

* make font where ẞ looks like SS (I’m not sure it is Unicode conformant)
* use your own casing rule and add a ZWNJ (zero width non joiner 
character) such that ss↔SS and ß↔S+ZWNJ + S. Both capital version should 
look the same. But doing so, you violate Unicode casing, and you may 
have problem when ZWNJ is also used in German typography to prevent 
wrong ligatures (see https://en.wikipedia.org/wiki/Zero-width_non-joiner)).


  Fred

Le 09/12/2015 16:59, Dreiheller, Albrecht a écrit :


Just have a look at

U+1E9ELATIN CAPITAL LETTER SHARP S

in the block Latin Extended Additional

http://www.unicode.org/charts/PDF/U1E00.pdf

Kind regards

*Von:*Unicode [mailto:unicode-boun...@unicode.org] *Im Auftrag von 
*Hans Meiser

*Gesendet:* Mittwoch, 9. Dezember 2015 13:26
*An:* unicode@unicode.org
*Betreff:* Proposal for German capital letter "ß"

Currently there is a vast problem trying to determine the lower case 
equivalent of a capitalized German word like "MASSE".


This is due to the fact that an orthographic rule exists to convert 
lower case letter "ß" to upper case letters "SS". So after converting 
a word from lower case to upper case one cannot unequivocally 
determine the original lower case word because the conversion is only 
surjective.


This issue exists because the letter "ß" originally was but a ligature 
of the small letter "sz" (using a legacy German font) which over time 
became a ligature of "ss".


After the German spelling reform in 1996, "ß" then became a letter of 
its own, and words containing the letter "ß" are no longer equivalent 
to words containing an "ss" combination instead of the "ß". So, for 
instance, "Maße" and "Masse" are not equal. In fact, "Maße" translates 
to "measurements" while "Masse" translates to "weight".


This is a particular problem in electronic data processing - like, for 
instance, SQL data queries. Given above rule, "Maße" will become 
"MASSE", just like "Masse" becomes "MASSE" when converting a word to 
uppercase. But there is no way back to distinguish one from the other.


I read that the UNICODE group is already striving for a solution to 
this problem and that they are searching for a capital letter 
equivalent of "ß".


My proposal is to introduce a capital letter equivalent of "ß" that's 
resembling two capital "S" letters: "SS".


So the capital letter equivalent of "ß" would look like "SS" but was 
in fact a separate code point. Converting words from lower case to 
upper case and back will then become bijective, auto correction will 
become easier and the (false) ANSI SQL stopgap of declaring "ß" and 
"ss" to be equal can be dropped.



Your feedback is appreciated.

Axel Dahmen - Germany





Hentaigana proposal

2015-12-09 Thread Nicolas Tranter
I comment as a western Japanologist who teaches and researches using
hentaigana. I have published with hentaigana using image files (resulting
in two publisher errors) and will publish next year with hentaigana using
the Koin Hentaigana font (Koin変体仮名外字明朝.tte), and anticipate typesetting
problems. I refer to the 2015 proposal L2/15-239 to include hentaigana,
including the appended paper by Takada Tomokazu, Yada Tsutomu and Saito
Tatsuya ('The past, present and future of Hentaigana Standardization for
Information Interchange'). I also refer to Yada Tsutomu's support of the
proposal ('About the inclusion of standardized codepoints for Hentaigana',
L2/15-318). As the names and numbering of proposed characters is an issue I
deal with below, I also refer to individual hentaigana in the proposal by
their MJ-codes as used in the proposers' own websites (e.g.
http://mojikiban.ipa.go.jp/xb164/).



SELECTION: The selection is good, consisting of 286 forms, although this
would be realised as 299 characters. The earlier 2009 proposal referred to
was based on the Mojikyo M113.ttf font, which has 213 hentaigana characters
and includes a few major basic gaps. The Koin Hentaigana font has 549
characters, which excluding separate forms with voicing and 'half-voicing'
diacritics consists of 330 hentaigana, but includes some very rare forms,
including ones that do not occur in late period texts.



The selection of 'academic' hentaigana is appropriate and lacks major gaps.
On the other hand, the Ministry of Justice hentaigana requirements are ones
that have been decided by the Ministry of Justice in 2004 for name
registration purposes, and so, although one could argue easily with their
2004 decision (and I would), the fact that they are already official means
it is pointless to argue with their inclusion in Unicode.



It's been noted that a few hentaigana are almost identical to normal
hiragana, especially *e* HENTAIGANA LETTER E VARIANT 4 = MJ090017 (cf. え),
*shi* HENTAIGANA LETTER SI VARIANT 2 = MJ090072 (cf. HIRAGANA LETTER SI し)
and *nu* HENTAIGANA LETTER NU VARIANT 2 = MJ090149 (cf. HIRAGANA LETTER NU ぬ):
their differences are solely that the 'brush' is removed from the paper on
a downward rather than a rightward flourish, reflecting vertical
handwriting. Ordinarily I would argue against including them, but since the
MoJ has recognised them as official variants they need to be included.



The decision to propose in most cases one codepoint for the hentaigana
derived from a single Chinese character is sensible, as also is the
decision to allow multiple codepoints in certain cases where manuscripts
use side-by-side significantly distinct forms derived from the same Chinese
character and with the same value. An example of the latter is HENTAIGANA
LETTER KA VARIANT 3 = MJ090025and KA VARIANT 4 = MJ090026, both pronounced
*ka* and both derived from the Chinese character 可, but which are routinely
both found in the same manuscript by the same hand as if they were separate
graphemes from the Heian to the Meiji periods.



POLYPHONY. Several hentaigana are truly polyphonous (e.g. the 子-derived
hentaigana = *ne* MJ090151 or MJ090059 *ko*, or the 馬-derived hentaigana =
*me* MJ090222 or *ma* MJ090205). In particular, those hentaigana derived
from 无 and associated with *n* (MJ090298, MJ090299) historically (also the
source of HIRAGANA LETTER N ん)  are also used for *mu* (MJ090214, MJ090215)
and *mo* (MJ090224, MJ090223). Diachronically, *n* in native Japanese words
is usually derived from an earlier *mu*. Takada et al. includes a list of
10 kanji sources that this applies to in the proposed repertoire.
(Strictly, this affects 11 hentaigana, because the proposal has two forms
for 无-derived characters.) The proposal's solution is to assign different
identifiers, e.g. 子 = HENTAIGANA LETTER NE VARIANT 1 and HENTAIGANA LETTER
KO VARIANT 2, 馬 = HENTAIGANA LETTER ME VARIANT 3 and HENTAIGANA LETTER MA
VARIANT 7, and the two derived from 无 = HENTAIGANA LETTER N VARIANT 1, N
VARIANT 2, MU VARIANT 1, MU VARIANT 2, MO VARIANT 1 and MO VARIANT 2. This
means that there would be characters that are given more than one codepoint
and identifier but are formally and etymologically identical, adding 13
unnecessary repetitions to the character set. I would favour Yada's naming
system, where the polyphonous characters are given a single codepoint and
identifier, e.g. 子 = HENTAIGANA LETTER NE-KO, 馬 = HENTAIGANA ME-MA, and two
无-derived forms = HENTAIGANA LETTER N-MU-MO 1 and N-MU-MO 2.


STANDARD VARIATION: The suggestion that hentaigana be standard variation
characters means that in the absence of appropriate font support they would
be rendered as hiragana with the same value. (This appears to underlie the
decision to propose different codepoints and names for the polyphonous
hentaigana.) I do not support this. The two main uses of hentaigana are
academic and by the MoJ. Academics will only use hentaigana if they
specifically need them to be