Re: Indian Rupee Sign to be chosen today

2010-06-27 Thread Mahesh T. Pai
Mahesh T. Pai said on Mon, Jun 28, 2010 at 10:57:53AM +0530,:
 
 > On a serious note -
 > 
 > 1. Would a change of glypn / glyph shape be considered? 
 > 
 > 2. What are the origins of that character?

I feel that an answer is important, because the code chart
specifically mentions this as the Indian rupee sign.

http://unicode.org/charts/PDF/U20A0.pdf

-- 
Mahesh T. Pai   ||  http://[paivakil|fizzard].blogspot.com
Half knowledge is worse than ignorance.
--Thomas B. Macaulay



Re: Indian Rupee Sign to be chosen today

2010-06-27 Thread Mahesh T. Pai
Leo Broukhis said on Sun, Jun 27, 2010 at 03:45:43PM -0700,:
 
 > Another question nobody had asked so far in this thread is what will happen 
 > to
 > 
 > U+20A8 RUPEE SIGN



That is a Latin character, hence cannot denote the *Indian* rupee. 

See http://www.decodeunicode.org/u+20A8



On a serious note -

1. Would a change of glypn / glyph shape be considered? 

2. What are the origins of that character?

-- 
Mahesh T. Pai   ||  http://[paivakil|fizzard].blogspot.com
Funny how people infriging commercial software licences are called
"pirates", while huge companies infriging the GPL are called "users"



Re: Generic Base Letter

2010-06-27 Thread Asmus Freytag
The one argument that I find convincing is that too many implementations 
seem set to disallow generic combination, relying instead on fixed 
tables of known/permissible combinations.


In that situation, a formally adopted character with the clearly stated 
semantic of "is expected to actually render with ANY combining mark from 
ANY script" would have an advantage. List-based implementations would 
then know that this character is expected to be added to the rendering 
tables for all marks of all scripts.


Until and unless that is done, it couldn't be used successfully in those 
environments, but if the proposers could get buy-in from a critical mass 
of vendors of such implementations, this problem could be overcome.


Without such a buy-in, by the way, I would be extremely wary of such a 
proposal, because the table-based nature of these implementations would 
prohibit the use of this new character in the intended way.


A./



RE: Generic Base Letter

2010-06-27 Thread Vincent Setterholm
It's interesting that you're getting a better display in IE8 than I am. I'm 
running IE8 version 8.0.7600.16385 on the 64-bit Windows 7 with all the updates 
installed and I'm not seeing those characters combine.

I had not previously seen the 'invisible letter' proposal until Michael Everson 
kindly forwarded it to me. That is pretty much what I'm looking for with one 
caveat: if you need to be able to see the difference between sets like 059C and 
059D, a visible letter (like the dotted circle) rather than an invisible one 
makes more sense. But perhaps that difference could be handled at the font 
level.

I see the documentation in the minutes 
(http://unicode.org/consortium/utc-minutes/UTC-101-200411.html) where this 
proposal was rejected (by Microsoft, Apple, IBM, HP, Adobe and RLG) but I 
cannot find any discussion of why it was rejected. Without knowing why it was 
shot down, it is hard to write a better proposal to get this discussion going 
again. Is that information available anywhere? Maybe Asmus Freytag knows, since 
he was tasked to "Oppose the encoding of invisible letter at WG2 meetings"?



From: unicode-bou...@unicode.org [unicode-bou...@unicode.org] On Behalf Of CE 
Whitehead [cewcat...@hotmail.com]
Sent: Sunday, June 27, 2010 4:30 PM
To: unicode@unicode.org
Subject: RE: Generic Base Letter

Hi.

I am not objecting to the 'invisible letter' proposal at:
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2822.pdf
It seems sound.
However the dotted circle seems to be the convention.
Also the dotted circle 25CC seems to be working for me in IE8:
when I displayed Vincent Setterholm's html code in my IE8 browser, the 
combining marks 05B8 and 05BC did combine appropriately with 25CC.
But Vincent is right -- the combining mark characters in his email do not 
combine with 25CC only with each other, but there are no extra circles for the 
email in my browser!
(I have only the one browser too for the moment.)

As for documenting the use of the dotted circle 25CC, perhaps a change should 
be made to the note about it,
saying that nevertheless (in spite of its size) this character can be used in 
combination with diacritics/combining marks.
This would be done at http://www.unicode.org/charts/PDF/U25A0.pdf
Would this be possible?

(I personally do not see why both a dotted circle and an invisible character 
could not be used to display with combining marks.  Which one should be used in 
a particular case would depend on stylistic preferences.  Whether or not the 
current dotted circle's size is o.k., I have no opinion.  Hope this is helpful.)


Best,
-- C. E. Whitehead
cewcat...@hotmail.com


RE: Generic Base Letter

2010-06-27 Thread CE Whitehead

Hi.


I am not objecting to the 'invisible letter' proposal at:
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2822.pdf

It seems sound.
However the dotted circle seems to be the convention.
Also the dotted circle 25CC seems to be working for me in IE8:
when I displayed Vincent Setterholm's html code in my IE8 browser, the 
combining marks 05B8 and 05BC did combine appropriately with 25CC.
But Vincent is right -- the combining mark characters in his email do not 
combine with 25CC only with each other, but there are no extra circles for the 
email in my browser!

(I have only the one browser too for the moment.)


As for documenting the use of the dotted circle 25CC, perhaps a change should 
be made to the note about it, 

saying that nevertheless (in spite of its size) this character can be used in 
combination with diacritics/combining marks.
This would be done at http://www.unicode.org/charts/PDF/U25A0.pdf 

Would this be possible?

 

(I personally do not see why both a dotted circle and an invisible character 
could not be used to display with combining marks.  Which one should be used in 
a particular case would depend on stylistic preferences.  Whether or not the 
current dotted circle's size is o.k., I have no opinion.  Hope this is helpful.)

 


Best,
-- C. E. Whitehead

cewcat...@hotmail.com
  

Re: Indian Rupee Sign to be chosen today

2010-06-27 Thread Leo Broukhis
On Sun, Jun 27, 2010 at 3:09 PM, Tulasi  wrote:
> Were there any from this group took part in the competition?
>
>> http://www.pluggd.in/indian-rupee-symbol-297/
>> I don't know if this is a joke or not, but none of those five is any good.
>
> Exactly!
>
>> Just so they don't choose one of the ones that look dangerously like "Rx" 
>> signs.
>
> All looks close to sort of handwritten "Rx"
> But what does "Rx" means?

http://www.fileformat.info/info/unicode/char/211e/index.htm

PRESCRIPTION TAKE
recipe
cross ratio

Another question nobody had asked so far in this thread is what will happen to

U+20A8 RUPEE SIGN

A country cannot reasonably expect that every modification of its
currency symbol will be accommodated by a new BMP character, can it?

Leo



Re: Indian Rupee Sign to be chosen today

2010-06-27 Thread Christopher Miller
On 2010-06-27, at 6:09 PM, Tulasi wrote:

> All looks close to sort of handwritten "Rx"
> But what does "Rx" means?

http://en.wikipedia.org/wiki/Medical_prescription

Christopher Miller
Montreal QC  Canada




Re: Indian Rupee Sign to be chosen today

2010-06-27 Thread Tulasi
Were there any from this group took part in the competition?

> http://www.pluggd.in/indian-rupee-symbol-297/
> I don't know if this is a joke or not, but none of those five is any good.

Exactly!

> Just so they don't choose one of the ones that look dangerously like "Rx" 
> signs.

All looks close to sort of handwritten "Rx"
But what does "Rx" means?

Tulasi


From: John W Kennedy 
Date: Thu, 24 Jun 2010 15:13:14 -0400
Subject: Re: Indian Rupee Sign to be chosen today
To: Michael Everson 
Cc: unicode Unicode Discussion 

On Jun 24, 2010, at 2:07 PM, Michael Everson wrote:
> http://www.pluggd.in/indian-rupee-symbol-297/
>
> I don't know if this is a joke or not, but none of those five is any good.

It's no joke; even I, a Yank with no connections on the subcontinent,
have been aware of the competition for some time.

> Evidently there is a desire to merge Latin R and Devanagari RA and then add 
> stripes.

Just so they don't choose one of the ones that look dangerously like "Rx" signs.

-- 
John W Kennedy
"The bright critics assembled in this volume will doubtless show, in
their sophisticated and ingenious new ways, that, just as /Pooh/ is
suffused with humanism, our humanism itself, at this late date, has
become full of /Pooh./"
  -- Frederick Crews.  "Postmodern Pooh", Preface



Re: VS: Euro Sign in 8859-15 (was: Re: Indian Rupee Sign to be chosen today)

2010-06-27 Thread Doug Ewell

"Philippe Verdy"  wrote:

When the Euro was added, there was no real need to modify the 8859 
pages and this was not done.


Traditionally the ISO/IEC 8859 code tables have not been modified. 
Instead, new parts were added, and this is what was done with 8859-15.


--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­




Re: Latin Script

2010-06-27 Thread Doug Ewell

"Tulasi"  wrote:

U+00AA FEMININE ORDINAL INDICATOR (which does not contain "LATIN") is 
considered part of the Latin script, while U+271D LATIN CROSS (which 
does) is considered common to all scripts.


Can you post both symbols please, thanks?


I can point you to http://www.unicode.org/charts/PDF/U0080.pdf , which 
includes a glyph for U+00AA, and 
http://www.unicode.org/charts/PDF/U2700.pdf , which includes a glyph for 
U+271D.  I don't think it's necessary to post these glyphs to the public 
list.


Trying to know who among ISO and Unicode first created the names' list 
for Latin-script is not an indication of obsession :-')


So among Unicode and ISO/IEC, who first created ISO/IEC 8859-1 & 
ISO/IEC 8859-2 letters/symbols names with each name with LATIN in it?


Most of the characters in the various parts of ISO 8859 were originally 
standardized before Unicode or ISO 10646, so the names were probably 
either created by the ISO/IEC subcommittees responsible for those parts, 
or found in earlier standards and adopted as-is.


The merger between Unicode and ISO 10646 caused a few character names in 
Unicode to be changed to match the 10646 names.


--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­




VS: Indian Rupee Sign to be chosen today

2010-06-27 Thread Erkki I Kolehmainen
As it has been pointed out, the unreliability of the euro sign is with the 
8859-15 encoding, whereas it works extremely reliably with UCS/Unicode. And so 
would any other sign.

Erkki I. Kolehmainen

-Alkuperäinen viesti-
Lähettäjä: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] 
Puolesta Tulasi
Lähetetty: 27. kesäkuuta 2010 23:46
Vastaanottaja: Unicode Discussion
Kopio: Mark Davis ☕
Aihe: Re: Indian Rupee Sign to be chosen today

> Even in the year 2010, the euro sign (¤) doesn't work reliably.
He calls it brain-dead :-')

http://groups.google.co.uk/group/de.test/browse_thread/thread/929f8f60b1f29ee8/e027e91e7ef17f62?#e027e91e7ef17f62

-- Forwarded message --
From: Andreas Prilop 
Date: 25 June, 07:54
Subject: Indian Rupee Sign to be chosen today
To: de.test


On Fri, 25 Jun 2010, I wrote

> Even in the year 2010, the euro sign (¤) doesn't work reliably.

in both the Unicode list and in the newsgroup de.test.

unicode.org shows a euro sign:http://www.unicode.org/mail-arch/unicode-
ml/y2010-m06/0372.html

groups.google.com shows a currency sign:http://groups.google.co.uk/
group/de.test/msg/e027e91e7ef17f62

Mark Davis called this an "algorithm" 
inhttp://www.unicode.org/mail-arch/unicode-ml/y2010-m01/0273.html

I call it brain-dead.








Re: Generic Base Letter

2010-06-27 Thread Michael Everson
On 27 Jun 2010, at 21:45, Vincent Setterholm wrote:

> That's not terribly helpful, Doug. Do the Principles and Procedures specify 
> that 25CC is the right character to use as a generic base for this type of 
> very common need?

No, because that is not what the Principles and Procedures document is for.

> If the answer is yes, show me where, and I'll take that back to Microsoft and 
> show them that they're not following the Unicode Standard. If this use of 
> 25CC is not documented, how can one hope that future font designers and 
> software companies will embrace this method? If 25CC is not the official 
> solution to this problem, then should we be thinking about creating a 
> character that has letter-like semantics or should we just declare that 25CC 
> is the right answer and document that in the Standard?

Personally I still believe that this is a sound proposal; the NBSP "hack" that 
the UTC favours is troublesome in practice, in my view, as NBSP is "sticky" on 
both sides. 

http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2822.pdf

Michael Everson * http://www.evertype.com/





RE: Generic Base Letter

2010-06-27 Thread Vincent Setterholm
That's not terribly helpful, Doug. Do the Principles and Procedures specify 
that 25CC is the right character to use as a generic base for this type of very 
common need? If the answer is yes, show me where, and I'll take that back to 
Microsoft and show them that they're not following the Unicode Standard. If 
this use of 25CC is not documented, how can one hope that future font designers 
and software companies will embrace this method? If 25CC is not the official 
solution to this problem, then should we be thinking about creating a character 
that has letter-like semantics or should we just declare that 25CC is the right 
answer and document that in the Standard?


From: unicode-bou...@unicode.org [unicode-bou...@unicode.org] On Behalf Of Doug 
Ewell [d...@ewellic.org]
Sent: Sunday, June 27, 2010 8:11 AM
To: Unicode Mailing List
Subject: Re: Generic Base Letter

As far as I know, at least from what the Principles and Procedures
document said, the inability of a particular version of a particular
product from a particular vendor to display a given glyph or glyph
sequence optimally is not justification to add a new character.  I could
be wrong.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­



Re: Indian Rupee Sign to be chosen today

2010-06-27 Thread Tulasi
> Even in the year 2010, the euro sign (¤) doesn't work reliably.
He calls it brain-dead :-')

http://groups.google.co.uk/group/de.test/browse_thread/thread/929f8f60b1f29ee8/e027e91e7ef17f62?#e027e91e7ef17f62

-- Forwarded message --
From: Andreas Prilop 
Date: 25 June, 07:54
Subject: Indian Rupee Sign to be chosen today
To: de.test


On Fri, 25 Jun 2010, I wrote

> Even in the year 2010, the euro sign (¤) doesn't work reliably.

in both the Unicode list and in the newsgroup de.test.

unicode.org shows a euro sign:http://www.unicode.org/mail-arch/unicode-
ml/y2010-m06/0372.html

groups.google.com shows a currency sign:http://groups.google.co.uk/
group/de.test/msg/e027e91e7ef17f62

Mark Davis called this an "algorithm" 
inhttp://www.unicode.org/mail-arch/unicode-ml/y2010-m01/0273.html

I call it brain-dead.






Re: Latin Script

2010-06-27 Thread Tulasi
"ISO/IEC 8859-1 ISO/IEC 8859-2" -> ok to use & in between, but
part-one part-two are clearly indicated using "-1" "-2".

> U+00AA FEMININE ORDINAL INDICATOR (which does
> not contain "LATIN") is considered part of the Latin script, while
> U+271D LATIN CROSS (which does) is considered common to all scripts.

Can you post both symbols please, thanks?

Trying to know who among ISO and Unicode first created the names' list
for Latin-script is not an indication of obsession :-')

So among Unicode and ISO/IEC, who first created ISO/IEC 8859-1 &
ISO/IEC 8859-2 letters/symbols names with each name with LATIN in it?

Tulasi


From: Doug Ewell 
Date: Sat, 26 Jun 2010 10:32:57 -0600
Subject: Re: Latin Script
To: Unicode Mailing List 
Cc: Tulasi 

"Tulasi"  wrote:

> Looks like code-numbers for Unicode are not same as corresponding
> code-numbers of ISO/IEC 8859-1 ISO/IEC 8859-2. But names are same both
> in Unicode and ISO/IEC 8859-1 ISO/IEC 8859-2.

There's something very basic here that is not being understood, one way
or the other.

ISO/IEC 8859-1 and ISO/IEC 8859-2 are different parts of the same
standard.  Not all of the characters assigned to code points in 8859-1
are the same as the characters assigned to those same code points in
8859-2.  If they were, there would not be two different parts.

The first 256 characters of Unicode (code points 0 to 255) are the same
as the 256 characters of ISO/IEC 8859-1.  The names of these 256
characters are also the same, by design, except for the control
characters from 0 to 31, and except for parenthetical notes like
"(German)" in the 8859-1 names.

When a given character exists in both 8859-1 and 8859-2, its name is the
same in both parts, except possibly for the parenthetical notes.

Referring to "ISO/IEC 8859-1 ISO/IEC 8859-2" is ambiguous and confusing.
Do you mean "and" or "or"?  8859-1 and 8859-2 are two different things.

> Among both Unicode and ISO/IEC standards bodies, who can be credited
> for creating any name for any letter/symbol that has LATIN in it?

Stop obsessing over whether any given character name has LATIN in it.
It does not matter.

If you want to know what characters "belong to the Latin script" in the
Unicode sense, use UAX #24 and the Scripts.txt file.  There you will
find that, for example, U+00AA FEMININE ORDINAL INDICATOR (which does
not contain "LATIN") is considered part of the Latin script, while
U+271D LATIN CROSS (which does) is considered common to all scripts.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­




Re: Generic Base Letter

2010-06-27 Thread Doug Ewell
As far as I know, at least from what the Principles and Procedures 
document said, the inability of a particular version of a particular 
product from a particular vendor to display a given glyph or glyph 
sequence optimally is not justification to add a new character.  I could 
be wrong.


--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­




RE: Generic Base Letter

2010-06-27 Thread Vincent Setterholm
I tried sending this once with a small attachment showing what I'm seeing, but 
it doesn't look like it got forwarded, so I'll include some HTML at the end of 
this email you can paste into a .html file so you can see the behavior I'm 
talking about. This same behavior occurs in IE8, Word 2007 and WPF applications 
(even built with .NET 4).

You can also see what I'm talking about in plain text below as well if you're 
using Outlook or an IE-based web mail (the order of marks will flip if you set 
the paragraph direction to RTL, but basically it's the same problem - extra 
circles, nothing combining properly):

◌ָּ

If you can make those three code points (25CC 05BC 05B8) combine in IE8, you're 
my hero (though I really need this working in WPF as well, since that is the 
display technology we're using).
 
So if Microsoft allows some combining marks to combine with 25CC, they 
certainly aren't permitting Hebrew vowels to do so (I did do an experiment with 
0308 and the display looked crummy, but at least there Microsoft wasn't 
inserting an extra dotted circle so font design work migh be abl to resolve 
that, but this is not the case for characters on the Hebrew code page). As I 
stated previously, I can't just pick a regular Hebrew letter, as I need to show 
combinations that include prefixes, suffixes an infixes along with the vowel 
pattern, so to introduce extra consonants would defeat the purpose.

HTML snip:



Internet Explorer 8 display snafu demo

◌ָּ


 


From: unicode-bou...@unicode.org [unicode-bou...@unicode.org] On Behalf Of 
Philippe Verdy [verd...@wanadoo.fr]
Sent: Sunday, June 27, 2010 1:54 AM
To: Vincent Setterholm; Otto Stolz
Cc: 'unicode@unicode.org'
Subject: RE: Generic Base Letter

I don't know what Microsoft does, but at least, combining 25CC with a
combining diacritic DOES work in current versions of Internet
Explorer.

But as it is known that this could cause a problem, for example when
rendering charts on the web, a simple solution generally adopted
involves the use of a more natural arbitrary base character, and some
other presentation style (such as colored backgrounds).

See examples like there (diacritics are shown with a natural base
character, but a consistant blue background for all tables):

- http://fr.wikipedia.org/wiki/Table_des_caract%C3%A8res_Unicode/U0300
(it uses the Latin letter 'o' for diacritics used with the Latin script)

- http://fr.wikipedia.org/wiki/Table_des_caract%C3%A8res_Unicode/U0590
(it uses the Hebrew letter SHIN for all Hebrew diacritics)

- http://fr.wikipedia.org/wiki/Table_des_caractères_Unicode/U0600
(another Arabic letter is used for all Arabic diacritics)

And so on...

Additionally, the controls are shown with a red background, and format
controls are within a box with a dashed border. Unallocated codepoints
are shown with a grey background. There's no risk of confusion with a
true dotted circle symbol.

But the Unicode and ISO/IEC 10646 charts (in PDFs or printed books)
need to be monochrome, so instead of using distinctive color
background, it's normal that they use a symbol that cannot be exactly
similar to an encoded character.

Philippe.

"Vincent Setterholm"  wrote:
>
> I've tried using 25CC. The problem I'm running into is that the font designer 
> can make marks combine with 25CC just fine but then Microsoft simply ignores 
> the look-up tables that shape these combinations and inserts their own dotted 
> circle (or circles - one per combining mark) anyway.
>
> I don't know what effect using a 'symbol' for a letter has on indexing or 
> searching or line/word breaking because I haven't even gotten so far as to 
> get the display to look right, but I'm guessing there'd also be an advantage 
> to such a character having letter semantics.
>
> This need to display marks, well-formed on a generic base, is a really common 
> phenomenon. Countless grammars and other philology and linguistics 
> books/articles/etc. have to represent these types of patterns. I think there 
> needs to be an official solution for placing marks on a generic base that 
> behaves like a letter - something documented so that future font designers 
> can support this and so that the technology providers like Microsoft, ICU, 
> etc. have clear directions on how to support this.
>
> If using 25CC really is the answer, then let's publish that solution as part 
> of the Unicode Standard so that all font designers can follow this convention 
> and so that we can have some hope of companies like Microsoft supporting the 
> standard.
>
> 
> From: Otto Stolz [otto.st...@uni-konstanz.de]
> Sent: Saturday, June 26, 2010 8:03 AM
> To: Vincent Setterholm
> Cc: 'unicode@unicode.org'
> Subject: Re: Generic Base Letter
>
> Hi Vincent Setterholm,
>
> you have been asking:
> > What I'd like to see is a code point for a generic base character
>
> You could try U+25CC DOTTED CIRCLE, though the reference gly

RE: Generic Base Letter

2010-06-27 Thread Vincent Setterholm

I've attached a png file of exactly what I see in IE8, Word 2007 and a WPF 
application built with .NET 4. They all show the same thing. Indeed, if you're 
using Outlook, you can see what I'm talking about in plain text below as well 
(the order of marks will flip if you set the paragraph direction to RTL, but 
basically it's the same problem - extra circles, nothing combining properly):

◌ָּ

If you can make those three code points (25CC 05BC 05B8) combine in IE8, you're 
my hero (though I really need this working in WPF as well, since that is the 
display technology we're using).



So if Microsoft allows some combining marks to combine with 25CC, they 
certainly aren't permitting Hebrew vowels to do so (I did do an experiment with 
0308 and the display looked crummy, but at least there Microsoft wasn't 
inserting an extra dotted circle so font design work migh be abl to resolve 
that, but this is not the case for characters on the Hebrew code page). As I 
stated previously, I can't just pick a regular Hebrew letter, as I need to show 
combinations that include prefixes, suffixes an infixes along with the vowel 
pattern, so to introduce extra consonants would defeat the purpose.




From: unicode-bou...@unicode.org [unicode-bou...@unicode.org] On Behalf Of 
Philippe Verdy [verd...@wanadoo.fr]
Sent: Sunday, June 27, 2010 1:54 AM
To: Vincent Setterholm; Otto Stolz
Cc: 'unicode@unicode.org'
Subject: RE: Generic Base Letter

I don't know what Microsoft does, but at least, combining 25CC with a
combining diacritic DOES work in current versions of Internet
Explorer.

But as it is known that this could cause a problem, for example when
rendering charts on the web, a simple solution generally adopted
involves the use of a more natural arbitrary base character, and some
other presentation style (such as colored backgrounds).

See examples like there (diacritics are shown with a natural base
character, but a consistant blue background for all tables):

- http://fr.wikipedia.org/wiki/Table_des_caract%C3%A8res_Unicode/U0300
(it uses the Latin letter 'o' for diacritics used with the Latin script)

- http://fr.wikipedia.org/wiki/Table_des_caract%C3%A8res_Unicode/U0590
(it uses the Hebrew letter SHIN for all Hebrew diacritics)

- http://fr.wikipedia.org/wiki/Table_des_caractères_Unicode/U0600
(another Arabic letter is used for all Arabic diacritics)

And so on...

Additionally, the controls are shown with a red background, and format
controls are within a box with a dashed border. Unallocated codepoints
are shown with a grey background. There's no risk of confusion with a
true dotted circle symbol.

But the Unicode and ISO/IEC 10646 charts (in PDFs or printed books)
need to be monochrome, so instead of using distinctive color
background, it's normal that they use a symbol that cannot be exactly
similar to an encoded character.

Philippe.

"Vincent Setterholm"  wrote:
>
> I've tried using 25CC. The problem I'm running into is that the font designer 
> can make marks combine with 25CC just fine but then Microsoft simply ignores 
> the look-up tables that shape these combinations and inserts their own dotted 
> circle (or circles - one per combining mark) anyway.
>
> I don't know what effect using a 'symbol' for a letter has on indexing or 
> searching or line/word breaking because I haven't even gotten so far as to 
> get the display to look right, but I'm guessing there'd also be an advantage 
> to such a character having letter semantics.
>
> This need to display marks, well-formed on a generic base, is a really common 
> phenomenon. Countless grammars and other philology and linguistics 
> books/articles/etc. have to represent these types of patterns. I think there 
> needs to be an official solution for placing marks on a generic base that 
> behaves like a letter - something documented so that future font designers 
> can support this and so that the technology providers like Microsoft, ICU, 
> etc. have clear directions on how to support this.
>
> If using 25CC really is the answer, then let's publish that solution as part 
> of the Unicode Standard so that all font designers can follow this convention 
> and so that we can have some hope of companies like Microsoft supporting the 
> standard.
>
> 
> From: Otto Stolz [otto.st...@uni-konstanz.de]
> Sent: Saturday, June 26, 2010 8:03 AM
> To: Vincent Setterholm
> Cc: 'unicode@unicode.org'
> Subject: Re: Generic Base Letter
>
> Hi Vincent Setterholm,
>
> you have been asking:
> > What I'd like to see is a code point for a generic base character
>
> You could try U+25CC DOTTED CIRCLE, though the reference glyph
> for this cgaracter is larger than the dotted circles used to
> attach the various combining marks, in their respective reference
> glyphs.
>
> Best wishes,
>Otto Stolz
>
>
>
<>

RE: Generic Base Letter

2010-06-27 Thread Philippe Verdy
I don't know what Microsoft does, but at least, combining 25CC with a
combining diacritic DOES work in current versions of Internet
Explorer.

But as it is known that this could cause a problem, for example when
rendering charts on the web, a simple solution generally adopted
involves the use of a more natural arbitrary base character, and some
other presentation style (such as colored backgrounds).

See examples like there (diacritics are shown with a natural base
character, but a consistant blue background for all tables):

- http://fr.wikipedia.org/wiki/Table_des_caract%C3%A8res_Unicode/U0300
(it uses the Latin letter 'o' for diacritics used with the Latin script)

- http://fr.wikipedia.org/wiki/Table_des_caract%C3%A8res_Unicode/U0590
(it uses the Hebrew letter SHIN for all Hebrew diacritics)

- http://fr.wikipedia.org/wiki/Table_des_caractères_Unicode/U0600
(another Arabic letter is used for all Arabic diacritics)

And so on...

Additionally, the controls are shown with a red background, and format
controls are within a box with a dashed border. Unallocated codepoints
are shown with a grey background. There's no risk of confusion with a
true dotted circle symbol.

But the Unicode and ISO/IEC 10646 charts (in PDFs or printed books)
need to be monochrome, so instead of using distinctive color
background, it's normal that they use a symbol that cannot be exactly
similar to an encoded character.

Philippe.

"Vincent Setterholm"  wrote:
>
> I've tried using 25CC. The problem I'm running into is that the font designer 
> can make marks combine with 25CC just fine but then Microsoft simply ignores 
> the look-up tables that shape these combinations and inserts their own dotted 
> circle (or circles - one per combining mark) anyway.
>
> I don't know what effect using a 'symbol' for a letter has on indexing or 
> searching or line/word breaking because I haven't even gotten so far as to 
> get the display to look right, but I'm guessing there'd also be an advantage 
> to such a character having letter semantics.
>
> This need to display marks, well-formed on a generic base, is a really common 
> phenomenon. Countless grammars and other philology and linguistics 
> books/articles/etc. have to represent these types of patterns. I think there 
> needs to be an official solution for placing marks on a generic base that 
> behaves like a letter - something documented so that future font designers 
> can support this and so that the technology providers like Microsoft, ICU, 
> etc. have clear directions on how to support this.
>
> If using 25CC really is the answer, then let's publish that solution as part 
> of the Unicode Standard so that all font designers can follow this convention 
> and so that we can have some hope of companies like Microsoft supporting the 
> standard.
>
> 
> From: Otto Stolz [otto.st...@uni-konstanz.de]
> Sent: Saturday, June 26, 2010 8:03 AM
> To: Vincent Setterholm
> Cc: 'unicode@unicode.org'
> Subject: Re: Generic Base Letter
>
> Hi Vincent Setterholm,
>
> you have been asking:
> > What I'd like to see is a code point for a generic base character
>
> You could try U+25CC DOTTED CIRCLE, though the reference glyph
> for this cgaracter is larger than the dotted circles used to
> attach the various combining marks, in their respective reference
> glyphs.
>
> Best wishes,
>Otto Stolz
>
>
>




re: VS: Euro Sign in 8859-15 (was: Re: Indian Rupee Sign to be chosen today)

2010-06-27 Thread Philippe Verdy
All the previous things about ISO 8859 is true, but if the Euro symbol
had the success it has (and it works remarkably well) is that Windows
is used on a lot of PCs :
Microsoft modified its all its Windows code pages (unformally named
"ANSI" due to the name of legacy Win16 APIs which were also ported to
Win32) used in Europe to include the Euro symbol in position 0x80
(which was not used in those code pages).

There are still unused positions in Windows codepages, but most of
them were built on top of ISO 8859, by dropping all C1 controls (not
needed for Windows and not even for DOS compatibility), freeing 16
positions for some commonly used punctuation signs, then the euro.

Microsoft could still decide to repeat it for the codepages used in
India. But even there, Windows display the Indic scripts using Unicode
(and not the ISCII standard).

Microsoft will certaily modify its mapping to Unicode for supporting
the ISCII standard, if it allocates a position there, and other
vendors will follow as well.

When the Euro was added, there was no real need to modify the 8859
pages and this was not done. Microsoft decided to modify its European
Windows "ANSI" codepages only because at that time, it was still
supporting older systems that needed a compatibility with DOS, and
where Unicode was still not used internally in the system (notably
Win16 and Win32s systems like Windows 3.1x and Windows 95/98/ME that
still did not really use a true Unicode-enabled kernel, and did not
even support the NTFS filesystem used on NT and the newer Windows
2000).

IBM also had to adapt its many codepages used on various systems (but
these systems were already becoming very marginalized). This caused
lots of havoc (including also because there were so many variants of
EBCDIC...)

Apple decided to follow a direction completely opposite to IBM, to not
change anything, given that its legacy Mac codepages were already
deprecating (Apple adopted the OS-level use of Unicode probably much
faster than Microsoft, the latter initially reserved it only for its
"professional" NT systems when the former had already decided to stop
maintaining or adding new 8-bit codepages).

But for the Indian Rupiah, there's no need to change anything : all
systems needed for India are already Unicode-enabled (and older
ISCII-based systems are now almost all extinct, so I doubt that there
will even exist any need to change it : these systems will continue to
use the existing usual abbreviations). The Indian government just has
to sponsor its encoding in Unicode.

Let's not repeat the IBM tragedy... India certainly has better places
to put its public (and private) money in, than for reviving and
adapting old and dying national 8-bit encoding standards (that will
still terminate their life without the new symbol addition if they
don't support Unicode).

Today the world is connected to Internet for almost everything, and
the Internet uses Unicode more than all other encodings combined.

Philippe.

 "Erkki I Kolehmainen"  wrote:
> At the time I was the European project team leader for the standardization
> of the euro, and as such I was strongly pushing for the addition of the euro
> sign to Latin-1, which could not be done without adding a new part, which
> then had to be done for the visibility. I fully agree with Ken (as he quite
> well knows, I trust) that no new character encoding standardization should
> have been done for quite a while on anything but the 10646/Unicode. As is,
> the use of any of the 8859 parts can no longer be really be justified for
> any purpose, and with 10646/Unicode the euro sign works extremely reliably.
>
> Sincerely, Erkki
> 
> Kenneth Whistler wrote:
> > On Fri, 25 Jun 2010, I wrote
> >
> > > Even in the year 2010, the euro sign (¤) doesn't work reliably.
> >
> > in both the Unicode list and in the newsgroup de.test.
> >
> > unicode.org shows a euro sign:
> > http://www.unicode.org/mail-arch/unicode-ml/y2010-m06/0372.html
> >
> > groups.google.com shows a currency sign:
> > http://groups.google.co.uk/group/de.test/msg/e027e91e7ef17f62
>
> And as the snark seems to be spreading about this, let's step
> into the Wayback Machine for a moment...
>
> When 8859-15 was originally proposed in 1997 (see SC2/WG3 N388R, for
> those of you with deep document archives), primarily to add the euro
> sign to an 8-bit character set (but also to "fix" 8859-1 for
> French and Finnish), the U.S. NB voted against the subdivision
> of work, claiming in the strongest of terms that the proposal
> was inherently flawed and simply would not work to solve the
> problem(s) it was addressed at.
>
> I'll quote at length from the U.S. NB comments in SC2 N2994,
> dated 1997-11-21, "Summary of Voting on SC 2 N 2910, Proposal for
> Project Subdivision of project JTC 1.02.20: a new part of ISO/IEC
> 8859 for Latin Zero covering the EURO Symbol and Full Support for
> the French and Finnish Language":
>
> ===