Re: Clones (was RE: Hexadecimal)

2003-08-22 Thread Gerd Schumacher
On Roman number signs


Jill Ramonski scripsit;

> I confess, I hadn't read ch14.pdf, and I probably should have done. My
> fault. But I still believe that there should be something in the
> machine-readable code charts themselves that says, of the Roman numerals,
> "Don't use these characters - use the the normal Latin letters instead".
> If
> they really are there _SOLELY_ for round trip compliance with East Asian
> standards, then, if I wish to put the year MMIII in a web page, I should
> _NOT_ use the Roman letters. Furthermore, if I write software to interpret
> Roman Numbers, I only need to interpret the Basic Latin letters, not the
> Roman ones. My life as a webmaster and programmer is made so much SIMPLER
> by
> not having to use the Roman letters. I would really like it if these, and
> every single other character which is "only there for reasons of round
> ...

In - I think, not only - German quality printing the Roman numerals and the 
related letters usually are not equal. At least the numerals got a reduced
advanced width. Metal fonts usually had no extra Roman numerals punches, 
but the typesetters filed the punches a bit slimmer. The I, the V, and the X

may also have connecting top- and bottom bars, the latters not necessarily 
at the base line. So you cannot say, they were simply cloned letters. 

Ok, this might be a matter of smart font technologies, hopefully available
one 
day in standard PC applications, but as there are code points defined for
these 
numerals, they are and certainly will be used in Latin script for a well 
understandable reason. Is there another solution for non smart fonts? 

In my opinion the advice, not to use these codepoints will not solve the 
problem. Actually there are fonts, containing very clearly distinct Roman 
numerals, for example the Titus Cyberbit font of the Titus project at the 
Frankfurt (Main) university.

Gerd Schumacher





Re: Clones (was RE: Hexadecimal)

2003-08-19 Thread James H. Cloos Jr.
> "John" == John Jenkins <[EMAIL PROTECTED]> writes:

John> (Apple's LastResort font [contains every Unicode character],
John> of course, but by virtually of rampant reuse of glyphs.)

Does this Generate glyphs like the following ascii- & utf8-art?

+--+┌──┐
|AB|│AB│
|CD|│CD│
+--+└──┘

(Both included for the benefit of the utf8-impaired.)

I find it interesting, if so, that Apple uses a font to acheive that
rather than a bit of code in the rendering libs.  I beleive that
pango (Παν語) does it in the lib.

-JimC




Re: Clones (was RE: Hexadecimal)

2003-08-19 Thread John Jenkins
On 2003 ¦~ 8 ¤ë 19 ¤é ¬P´Á¤G, at 9:18 AM, Jim Allan wrote (rhetorically):

Must every font contain every Unicode character?

FWIW, it's no longer possible for a TrueType/OpenType font to contain 
every Unicode character with a distinct glyph.   (Apple's LastResort 
font does it, of course, but by virtually of rampant reuse of glyphs.)


John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage..mac.com/jhjenkins/



RE: Clones (was RE: Hexadecimal)

2003-08-19 Thread Jim Allan
Jill Ramonsky posted on the minus sign:

Yeah, I know. But like I said, who uses this? 
Books are normally produced today using computer typesetting. Look in 
any mathematics text or any well printed book for minus signs. Hyphens 
and minus signs are distinct (except when showing computer programming 
in a non-spacing font). Hyphen and minus sign have always been different 
characters.

TeX and SGML and other pre-Unicode legacy typographical systems support 
this difference which has always existed.

On common computer systems like the Macintosh and Windows which didn't 
support the difference globally in their standard character sets in 
pre-Unicode days it was customary to use the en-dash instead of a minus 
sign in formatted text. Or you switched to special math-symbol fonts 
when entering mathematical signs and other symbols.

Style sheets and books of tips for word processing and desktop 
publishing almost always go into some detail about the various kinds of 
dashes and the minus sign. So does the Unicode manual in its section on 
punctuation.

And I also have to ask ... if I am actually WRITING a C++ compiler, should I
allow the use of MINUS SIGN to mean minus sign? (Actually, that question may
be answered by the specification of C++, so let's push it a bit further. If
I am inventing some successor language to C++, and am free to invent my own
specification, should I _then_ allow the use of MINUS SIGN?) 
The symbols to be used for any computer language are part of the 
definition of that computer language. Currently you can't legally use 
U+2212 for any computer language I know of.

However I will be surprised if computer languages do not start to take 
advantage of the additional characters that are universally available 
though Unicode.

 I only ask that the charts make clear what each
character is FOR, in sufficient detail that the answer to questions like the
above becomes obvious. 
Currently the manual assumes that a user who wants to use a character 
will mostly already know what it is FOR or the user wouldn't want to use 
it. That's a reasonable assumption to make to avoid expanding the manual 
to five or six volumes at least. A small amount of typographical and 
usage information on some characters is provided for the convenience of 
font makers.

I would personally love to see an expanded version of the Unicode 
manual, a sort of multi-volume encylopedia of characters and their 
history and uses.

Meanwhile Unicode tells us that a particular glyph is a normal glyph for 
MINUS SIGN. That really should be enough. Most people know that math 
symbols are generally not (yet?) implemented to actually DO their 
function on computers. And it is hardly necessary of the purpose of the 
manul that, for examples, under % we should be told about its use for 
modulus or introducing a comment in some computer languages.

You don't complain that the charts doen't tell you what U+00D7 
MULTIPLICATION SIGN is for or U+00F7 DIVISION SIGN or U+0026 AMPERSAND.

As to supporting all of Unicode, see 2.12 in 
http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf.

Must a cell phone, for example, support all of Unicode?

Must every font contain every Unicode character?

Partial support is quite conformant provided that what is supported is 
supported according to the standard and data is not corrupted.

That doesn't mean that full support and impecable rendering is not 
desireable. It is in the long run. But a lap top user who generally uses 
only English may not wish have disk space taken up by East Asian fonts 
or top-of-the line publishing software that handles east Indian scripts 
impeccably.

Government software for various governments may purposely support only a 
particular subset of the Unicode character set.

Jim Allan





RE: Clones (was RE: Hexadecimal)

2003-08-19 Thread Asmus Freytag
Compatibility characters:

The recommendations for compatibility characters are necessarily vague, 
since their use in legacy data (and legacy environments) is strongly 
dependent on what is (or was) customary in a given environment.

If a process merely warehouses text data (or parses only a very small 
subset of characters for special purpose, such as an HTML parser) then 
merely preserving legacy characters is often the best strategy. However, 
take the opposite example, of a process that actually scans the text for 
roman numerals. In that case, ignoring the compatibility characters would 
be a mistake, since legacy data of the kind for which these compatibility 
characters were added would *only* contain roman numerals in this form. 
They would *not* use the ASCII characters.

Processes that modify legacy data for re-export to a legacy system 
obviously need to be intimately familiar with the legacy conventions, in a 
way that could not possibly be documented in the Unicode Standard in all 
details for every character and every legacy system.

Documentation in the code charts:

I agree with several of the comments that "hiding" the information about 
special characters in running text makes it unnecessarily difficult to work 
with the information. On the other hand, not everything can be succinctly 
expressed in machine readable tables (some characters have complicated 
usages), and even annotations in the name list have limits. They are 
definitely not the place for lengthier discussions.

For Unicode 4.0 we have attempted to improve the situation by systematically
extracting the line-breaking related information into UAX#14, which at 
least allows task-focused access. Information about mathematical usage of 
characters is now collected in one place in UTR#25, partially duplicating, 
and partially extending the information in the text of the standard, but 
providing a single place of access. Further improvements are possible. 
Personally I'd be in favor of some icon in the character names list that 
simply indicates that a character is more fully discussed elsewhere - that 
would make the code charts more useful as an index into the description of 
the characters.

Mathematical operators:

Future extensions of programming languages should allow not only the MINUS 
sign as operator, but many other charactesr, for example LOGICAL AND and 
LOGICAL OR, and as many other operators as appropriate for the language.

Input of the operators doesn't have to necessarily be done via a special 
purpose keyboard. The use of input macros, editor substitution or similar 
input technologies (e.g. turning && into LOGICAL AND) would be more 
flexible. Some editors already support the display of highly formatted 
program source code even though the underlying text backbone uses the 
standard ASCII conventions of current programming languages. Just one 
example is Source Insight from www.sourceinsight.com, which not only 
represents >= etc. by singly symbols, but can also correctly increase the 
size of outer parentheses for nested expressions.

A./




RE: Clones (was RE: Hexadecimal)

2003-08-19 Thread Jill . Ramonsky

Well that just proves my point then.
There are indeed some things that DO need to support the whole of Unicode
(more or less).

Jill


-Original Message-
From: Peter Kirk [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 19, 2003 10:30 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Clones (was RE: Hexadecimal)


On 19/08/2003 01:58, [EMAIL PROTECTED] wrote:

>I disagree.
>
>A post-Windows, post-Linux, Operating System for the 21st century intended
>for global use, should ideally support the whole of Unicode.
>
>There are, in fact, people working on such projects.
>Jill
>
>
>  
>
Well, whatever might be new about this OS, it is not its Unicode 
support. Windows XP and Linux already support the whole of Unicode, more 
or less.

-- 
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Re: Clones (was RE: Hexadecimal)

2003-08-19 Thread Peter Kirk
On 19/08/2003 01:58, [EMAIL PROTECTED] wrote:

I disagree.

A post-Windows, post-Linux, Operating System for the 21st century intended
for global use, should ideally support the whole of Unicode.
There are, in fact, people working on such projects.
Jill
 

Well, whatever might be new about this OS, it is not its Unicode 
support. Windows XP and Linux already support the whole of Unicode, more 
or less.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




RE: Clones (was RE: Hexadecimal)

2003-08-19 Thread Jill . Ramonsky

I disagree.

A post-Windows, post-Linux, Operating System for the 21st century intended
for global use, should ideally support the whole of Unicode.

There are, in fact, people working on such projects.
Jill


-Original Message-
From: Jim Allan [mailto:[EMAIL PROTECTED]
Sent: Monday, August 18, 2003 11:41 PM
To: [EMAIL PROTECTED]
Subject: Re: Clones (was RE: Hexadecimal)


No system has to support all of Unicode.




RE: Clones (was RE: Hexadecimal)

2003-08-19 Thread Jill . Ramonsky

Yeah, I know. But like I said, who uses this?

I have a QWERTY keyboard in front of me. I use a standard en-GB key mapping.
Now I _could_ customise my keymap such that Right-Alt + HYPHEN MINUS yielded
MINUS SIGN. Wouldn't that be great? Then I could write things like "x = -5;"
unambiguously. But it would completely screw my C++ compiler.

And I also have to ask ... if I am actually WRITING a C++ compiler, should I
allow the use of MINUS SIGN to mean minus sign? (Actually, that question may
be answered by the specification of C++, so let's push it a bit further. If
I am inventing some successor language to C++, and am free to invent my own
specification, should I _then_ allow the use of MINUS SIGN?)

I'm not being Devil's advocate. I don't necessarily even expect anyone to
have a definitive answer. I only ask that the charts make clear what each
character is FOR, in sufficient detail that the answer to questions like the
above becomes obvious.

Jill

-Original Message-
From: John Cowan [mailto:[EMAIL PROTECTED]
Sent: Monday, August 18, 2003 4:39 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Clones (was RE: Hexadecimal)


>   U+2212 (minus sign) - an obvious clone of U+002D (hyphen-minus). Who
> uses this?

The ASCII characters, because they have had to do double or triple
duty over the years when we had a very limited 7-bit character set,
often have several near-equivalents in Unicode that disambiguate their
*typographically* different purposes.  Thus hyphen, minus sign, en dash,
and em dash have separate Unicode representations, though in ASCII they
are often written -, -, -- or -, and --- or -- respectively.



Re: Clones (was RE: Hexadecimal)

2003-08-18 Thread Jim Allan
Peter Kirk posted:

Well, that's what was puzzling me about the recommendations not to use
these characters. In my opinion, there needs to be a clear statement
with each character definition (not somewhere in the text not linked to
it) of its status in such respects. Is it for compatibility use only? Is
it a presentation form not for use in general information interchange?
Is it a formatting variant of another character, which should be used if
that special formatting is to be indicated although the two might be
collated together? 
Perhaps a cross-reference to areas in the main text where that 
particular character or kind of character is discussed when there is 
some special mention in the main text.

Otherwise the various indications of distinction and compabitility 
decomposition and canonical decomposition usually indicate a lot, if 
the reader looks at them and learns to understand them.

But indeed the standard is somewhat inconsistant in sometimes coming 
close to recommending not using compatibility characters at all and in 
other cases recommending particular ones.

For example, if I want a superscript 2 to indicate "squared" (which
someone used on this list recently), am I supposed to use U+00B2, or
should I avoid using it and instead use a higher level markup (which
implies I need to use HTML e-mail)? Maybe the text tells me somewhere,
but it certainly doesn't in the code chart. 
Well if you are using unformatted text and want to use a superscript 2 
then you don't have much choice. I suppose I could have sent "E=mc^2" or 
"E=mc{squared}" "E=mc2" or something, but why would I when I have 
Unicode? :-)

Actually superscript 2 is also in the Latin-1 character set. :-)

In http://www.unicode.org/versions/Unicode4.0.0/ch14.pdf it states:

<< Therefore, the preferred means to encode superscripted letters or 
digits, such as “1st” or “DC0016”, is by style or markup in rich text. >>

I would think that statement obvious since in technical writing and 
mathematical writing it is theoretically possible for any displayable 
character in Unicode to be superscripted or subscripted, and even 
superscripted or subscripted to an already superscript or subscript 
character, and so on.

Also in the code chart (http://www.unicode.org/charts/PDF/U0080.pdf) 
U+00BS SUPERSCRIPT TWO is given a compatibility decomposition to 
" *0032* 2". Similarly with other superscript characters.

But beyond all recommendations in the Unicode standard what is done 
depends on what the user wants to do for a particular purpose in a 
particular environment with particular fonts. There is no one correct 
way that fits all users at all places and times, nor should there be.

If I am printing out a document on a particular system with particular 
software and fonts in which plain text superscripts look to me better 
than superscripts created by formatting regular numbers by the word 
processor I am using then I will naturally in that time and place use 
Unicode plain text superscripts.

That Unicode gives me the choice is a benefit I should take advantage of 
without worrying that formatting regular numbers as superscript is 
theoretically better than using compatibility characters.

Unicode is messy and complex mostly because character usage is messy and 
complex and display technology is messy and complex and there are always 
edge-cases and things that don't fit well.

But Unicode's keeping deprecated individual character encodings while 
allowing applications to freely throw away non-deprecated canonical 
decomposable encodings (which supposedly only exist because they should 
not be thrown away) confuses me also.

I thought even deprecated ones were supposed to be usable, in that a
system should process them correctly. 
It depends on what is meant by "usable" and the "system" and 
"correctly". No system has to support all of Unicode. Accordingly I 
would not expect systems to support deprecated control characters or 
fonts to go out of their way to support deprecated characters.

A system that does not support deprecated control codes (and even some 
of the non-deprecatated control codes) and does not support particular 
characters (perhaps only because there are no fonts on the system with 
those characters) can still be conformant to Unicode in what it supports.

A text editor that supports only fixed width fonts will probably not 
support the special-width spaces properly but may still be Unicode 
conformant.

Jim Allan




Re: Clones (was RE: Hexadecimal)

2003-08-18 Thread Rick McGowan
Someone suggested...

> It would be much simpler if each such character were clearly labelled in 
> the code charts etc. DO NOT USE!, and with its glyph presented on a grey 
> background or in some other way to indicate its special status.

Well, sure, I agree that it might be nice to somewhere document some of  
the discouraged and deprecatd characters in a way that people could find  
easily, putting gray boxes in the charts isn't the way.

Perhaps we should also put in blinking bold neon letters the disclaimer  
that is posted at the top of every chart PDF file:

> Disclaimer
> These charts are provided as the on-line reference to the character
> contents of the Unicode Standard, Version 4.0 but do not provide all
> the information needed to fully support individual scripts using the
> Unicode Standard. For a complete understanding of the use of the
> characters contained in this excerpt file, please consult the
> appropriate sections of The Unicode Standard, Version 4.0
> (ISBN 0-321-18578-1), as well as Unicode Standard Annexes #9,
> #11, #14, #15, #24 and #29, the other Unicode Technical Reports
> and the Unicode Character Database, which are available on-line.

Before using things in the standard, people really should check out what  
they are using! There are lοts of things that look really similar but have  
wildly different semantics and оne might n০t want t੦ use things  
indiscriminantly based s๐lely ᅌn what's in the charts...

Rick




Re: Clones (was RE: Hexadecimal)

2003-08-18 Thread Peter Kirk
On 18/08/2003 11:32, Jim Allan wrote:

Peter Kirk posted:

It would be much simpler if each such character were clearly labelled in
the code charts etc. DO NOT USE!, and with its glyph presented on a grey
background or in some other way to indicate its special status. 


I don't think people should be told so directly to NOT use an official 
Unicode character unless the character is actually deprecated.
OK, DO NOT USE! is too strong, but something like NOT RECOMMENDED! could 
be used instead.

Over the years recommendations about particular characters in the 
standard have sometimes changed and no-one can see all possible uses 
for characters or all ways that applications might use some of them.
Well, such things need not be frozen from version to version. And a note 
could read NOT RECOMMENDED except in the case of...

But greying the chart area for deprecated characters and singleton 
canonical decomposable characters seems to me a good idea.

As to compatibility characters, remember some of them, for example 
spaces with varying widths, make essential differences in formatting. 
The standard warns applications not to be hasty in unifyng 
compatibility characters for presentation.
Well, that's what was puzzling me about the recommendations not to use 
these characters. In my opinion, there needs to be a clear statement 
with each character definition (not somewhere in the text not linked to 
it) of its status in such respects. Is it for compatibility use only? Is 
it a presentation form not for use in general information interchange? 
Is it a formatting variant of another character, which should be used if 
that special formatting is to be indicated although the two might be 
collated together?

For example, if I want a superscript 2 to indicate "squared" (which 
someone used on this list recently), am I supposed to use U+00B2, or 
should I avoid using it and instead use a higher level markup (which 
implies I need to use HTML e-mail)? Maybe the text tells me somewhere, 
but it certainly doesn't in the code chart.

If it is not deprecated a character should be usable.
I thought even deprecated ones were supposed to be usable, in that a 
system should process them correctly.

But some more obivous graphic indication would be nice to more 
obviously indicate that perhaps a user should think carefully about 
using that particular encoded character.
Agreed.

Jim Allan

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Re: Clones (was RE: Hexadecimal)

2003-08-18 Thread Jim Allan
Peter Kirk posted:

It would be much simpler if each such character were clearly labelled in
the code charts etc. DO NOT USE!, and with its glyph presented on a grey
background or in some other way to indicate its special status. 
I don't think people should be told so directly to NOT use an official 
Unicode character unless the character is actually deprecated.

Over the years recommendations about particular characters in the 
standard have sometimes changed and no-one can see all possible uses for 
characters or all ways that applications might use some of them.

But greying the chart area for deprecated characters and singleton 
canonical decomposable characters seems to me a good idea.

As to compatibility characters, remember some of them, for example 
spaces with varying widths, make essential differences in formatting. 
The standard warns applications not to be hasty in unifyng compatibility 
characters for presentation.

If it is not deprecated a character should be usable.

But some more obivous graphic indication would be nice to more obviously 
indicate that perhaps a user should think carefully about using that 
particular encoded character.

Jim Allan

















Re: Clones (was RE: Hexadecimal)

2003-08-18 Thread Peter Kirk
On 18/08/2003 09:06, Jim Allan wrote:

Jill Ramonsky posted:

I would really like it if these, and
every single other character which is "only there for reasons of 
round trip
compatibility" with something else, were explicity marked in the
machine-readable charts with something meaning "Don't introduce this
character, at all, ever. Don't try to interpret it. Just preserve it, in
case it ever gets turned back to its original character set". 


That would probably be too strong.

If characters are available then some people will use them. :-(

See section 2.3 at http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf

Unicode 3.0 contained under section D21 on compatibility characters:

<< Their use is discouraged other than for legacy data. >>

I don't know whether this statement was intentionally removed was 
accidently dropped in the changes in 4.0 which distinguish 
"compatitiblity character" from "compatibility composite character".

In any case people can't be prevent from doing things that are 
officially discouraged, especially as for some particular use it might 
be wrong to discourage them. So if you are handling Roman numerals in 
an application and wish your handling to be complete then 
unfortunately you do have to take the compatibility Roman numerals 
into account.
Yes, but people can be clearly discouraged from using them, and that is 
not currently happening. It seems that currently if you come across a 
character by browsing through the charts and want to discover if use of 
it is officially discouraged you have to wade through huge databases and 
hundreds of pages of text to find out if a particular set of properties 
implies that use is discouraged. Well, even that won't tell me 
definitively, for I read, "The compatibility decomposable characters are 
precisely defined in the Unicode Character Database, whereas the 
compatibility characters in the more inclusive sense are not." (from 
section 2.3) - and it is the latter whose use is discouraged. But is it 
in fact safe to assume that the list of such characters includes, but is 
not limited to, those which have defined compatibility mappings?

It would be much simpler if each such character were clearly labelled in 
the code charts etc. DO NOT USE!, and with its glyph presented on a grey 
background or in some other way to indicate its special status.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Re: Clones (was RE: Hexadecimal)

2003-08-18 Thread Jim Allan
Jill Ramonsky posted:

I would really like it if these, and
every single other character which is "only there for reasons of round trip
compatibility" with something else, were explicity marked in the
machine-readable charts with something meaning "Don't introduce this
character, at all, ever. Don't try to interpret it. Just preserve it, in
case it ever gets turned back to its original character set". 
That would probably be too strong.

If characters are available then some people will use them. :-(

See section 2.3 at http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf

Unicode 3.0 contained under section D21 on compatibility characters:

<< Their use is discouraged other than for legacy data. >>

I don't know whether this statement was intentionally removed was 
accidently dropped in the changes in 4.0 which distinguish 
"compatitiblity character" from "compatibility composite character".

In any case people can't be prevent from doing things that are 
officially discouraged, especially as for some particular use it might 
be wrong to discourage them. So if you are handling Roman numerals in an 
application and wish your handling to be complete then unfortunately you 
do have to take the compatibility Roman numerals into account.

  U+2212 (minus sign) - an obvious clone of U+002D (hyphen-minus). Who
uses this?
People concerned with proper appearance of the symbol in proportional 
fonts. Almost all proportional fonts use a narrow hyphen dash rather 
than a minus-width dash for the hyphen-minus character. In some 
older-style fonts it is even a slanting character.

See http://www.unicode.org/versions/Unicode4.0.0/ch06.pdf in 6.2 for a 
detailed discussion of the various dash characters.

U+2217 (asterisk operator) - an equally obvious clone of U+002A
(asterisk)
They look much the same in a typewriter style font. They don't do so in 
proportional fonts where the regular asterisk tends to appear somewhat 
like a superscript.

Unicode provides support both for good typographical usage as well as 
traditional data-processing typographical usage based based on 
typewriter technology.

U+223C (tilde operator) - a clone of U+007E (tilde)
See http://www.unicode.org/versions/Unicode4.0.0/ch07.pdf and look for 
"Spacing Clones of Diacritics".

The ASCII tilde was originally intended to be a non-spacing diacritic 
tilde to be applied to other characters by backspace. In part because of 
the low resolution of many early data-processing printers it was often 
realized in a tilde operator form. That has now become its most normal 
form in fonts.

But for good typography you do want to distinguish them and the 
overloading of tilde as ASCII 7E means that a font may render a 
mathemtical full-character tilde when you want to show a diacritic or 
render a spacing diacritic when you wanted a mathematical operator.

Unicode is intended for typesetting applications as well as entering 
computer code in a traditional typewriter style character set with 
typewriter limitations.

and then there's
U+2223 (divides) - hell, that looks to me remarkably like U+007C
(vertical line) 
The do look close. But U+007C usually extends below the base line and 
and U+2223 usually doesn't.

For example:
U+2264 (less than or equal to) - compare with U+2A7D (less than or
slanted equal to) 
I have no idea. You will probably have to ask the MathML people about 
that one. See http://www.w3.org/TR/2001/REC-MathML2-20010221. 
Mathematicians seem to think they need to distinguish the two.

As a non-mathematician I find many of these distinctions bewildering and 
seemingly only typographical. But if mathematicians in some field make 
fine distinctions based on such differences then it is important that 
Unicode allow such distinctions to be maintained in plain text.

In defence of this argument, I point out that the
complementary relation, NOT equal to, has codepoint U+2270, and this is
represented in the code charts as having a slanted equal to, so it OUGHT to
be the complement of U+2A7D. (Unless I've missed it, there appears to be no
"not equal to with horizontal equals" character). 
The chart at http://www.unicode.org/charts/PDF/U2200.pdf does not show a 
slanted equals.

For some discussion of the math symbols see also 
http://www.unicode.org/unicode/reports/tr25/tr25-5.html.

Part of the problem is that differences that are in most environments 
only typographical style differences may indicate semantic differences 
in particular disciplines. It is impossible to establish a firm line as 
to how important or common would would normally be a stylistic variation 
must be before it should be encoded in Unicode for plain text distinctions.

For example open-loop _g_ is distinguished from close-loop _g_ in the 
International Phonetic Alphabet and so Unicode encodes it separately at 
U+0261.

A normal Latin Letter font would probably not have U+0261 in it at all 
and might display U+0067 with either closed or open l

Re: Clones (was RE: Hexadecimal)

2003-08-18 Thread John Cowan
[EMAIL PROTECTED] scripsit:

> "Don't use these characters - use the the normal Latin letters instead". 

That's essentially the implication of being a compatibility character.

> Secondly, I believe that the code charts SHOULD provide machine-readable
> information about the hexadecimal values of the letters "A" to "F".

0030;0
0031;1
0032;2
0033;3
0034;4
0035;5
0036;6
0037;7
0038;8
0039;9
0041;10
0042;11
0043;12
0044;13
0045;14
0046;15
0061;10
0062;11
0063;12
0064;13
0065;14
0066;15
FF10;0
FF11;1
FF12;2
FF13;3
FF14;4
FF15;5
FF16;6
FF17;7
FF18;8
FF19;9
FF21;10
FF22;11
FF23;12
FF24;13
FF25;14
FF26;15
FF41;10
FF42;11
FF43;12
FF44;13
FF45;14
FF46;15

Thuryago.

>   U+2212 (minus sign) - an obvious clone of U+002D (hyphen-minus). Who
> uses this?

The ASCII characters, because they have had to do double or triple
duty over the years when we had a very limited 7-bit character set,
often have several near-equivalents in Unicode that disambiguate their
*typographically* different purposes.  Thus hyphen, minus sign, en dash,
and em dash have separate Unicode representations, though in ASCII they
are often written -, -, -- or -, and --- or -- respectively.

> Conversely, there are also things that look different, but mean the same.
> For example:
>   U+2264 (less than or equal to) - compare with U+2A7D (less than or
> slanted equal to)

It turns out that in some math contexts one or the other is strongly enough
preferred that it's worth having two characters so as to avoid getting the
"wrong" glyph.

> So, yes, I agree with Jim. Let's not have too many duplicates. But I still
> have to ask why there are so many already?

"History there is, and no history."
--The High Inquest

"Every character has its story."
--various Unicode tribal elders

-- 
John Cowan  <[EMAIL PROTECTED]>
http://www.reutershealth.comhttp://www.ccil.org/~cowan
.e'osai ko sarji la lojban.
Please support Lojban!  http://www.lojban.org