date:20030531

Re: Fw: Unicode filename problems

2003-05-31 Thread Raymond Mercier

This question of non-Ascii filenames is a real problem : hardly any 
software out there can cope with this.
I did not know of RAR, but have given it a try. Even here there is a 
serious problem, because if the filename is non-Ascii the name of the 
compressed file comes out as _.rar, with as many underlines as there 
were characters in the original name. In fact it is a bit less predictable 
: if the name is Greek, for example, you get Latin letters, if it is 
Cyrillic, just the underline.
This is useless then if you have a number of filenames all with the same 
number of characters.
Certainly more work is needed on RAR (at least on the Win 2000 version).

I know about that, since I made my Fontlist 5 work properly with arbitrary 
non-ascii names : 
http://ourworld.compuserve.com/homepages/RaymondM/fontlist5.htm .

Raymond Mercier

At 22:58 30/05/2003 -0500, you wrote:
I wonder if anyone here has ideas on these matters.

Peter

- Forwarded by Peter Constable/IntlAdmin/WCT on 05/30/2003 10:56 PM
-
I have 3 LinguaLinks lexicons that I have converted into HTML pages - one
for each entry. The languages use non-ANSI characters, so I also did a
Unicode conversion at the same time.
[snip]

Everything works very well except that I cannot burn the files onto a CD
because of the unicode values in the filenames. Roxio and Nero CD-burners
don't accept some of the higher values found in the file names (using
Jolliet, ISO9600 and UDF). Anyone have any ideas how to deal with this?
For example, a filename with unicode value 026B, a tilde lower case L,
causes problems.
In the meantime, to get it onto CD, I decided to try and zip all the
files. Turns out almost all the zippers out there DO NOT support Unicode
filenames. Doug Rintoul found WinRAR
(http://www.rarlab.com/rar_archiver.htm) which does the trick in the RAR
format only. There is a RAR expander for Macintosh and Linux systems as
well (all of these are $29 USD). So far, have not found a freeware
solution that meets unicode filename needs. Have any of you run into this
yet?
I could try to determine what Unicode values are causing problems on the
CD burner and do an unacceptable-to-acceptable character translation in
the filenames and the links to those filenames ... but that seems like a
huge compromise. Also, it will be difficult to come up with a generic
solution ... that is to say, I don't know what RANGE of values are
unacceptable for characters in a CD filename. Jolliet is supposed to allow
Unicode filenames according to the documentation I have seen.
Larry

Re: Fw: Unicode filename problems

2003-05-31 Thread Karl Pentzlin

Am Samstag, 31. Mai 2003 um 05:58 schrieb [EMAIL PROTECTED]:

Pso> ...
Pso> Everything works very well except that I cannot burn the files onto a CD
Pso> because of the unicode values in the filenames. Roxio and Nero CD-burners
Pso> don't accept some of the higher values found in the file names ...

I tried to contact the Nero support (and the support of other CD burning
software vendors) for the same problem in vain.

When PC magazines here review or compare CD burning programs, they
never lose a word about the missing ability to handle filenames
correctly which contain non-win1252 characters. This is even true for
the more renowned magazines like "c't" (who will receive a forwarding of
this message).

It seems that using the correct diacritics for other languages than
French is simply "uncool" here (in Germany).

- Karl

Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?

2003-05-31 Thread Karl Pentzlin

Investigating some fonts, I found in a version of Adobe Garamond Pro
the U+2024 ONE DOT LEADER glyph being a dot symmetrically preceded
and followed by a tiny space.

In the same font, the U+2026 HORIZONTAL ELLIPSIS glyph has a tiny
space (smaller than in the U+2024 glyph) before each of the three
dots and after the last one. Additionally there is a considerable
space between the single dots. Thus, that glyph is wider than
three consecutive U+002E glyphs of the same font. Microsoft's Times
Roman behaves similar for U+2026/U+002E (but contains no U+2024).

The U+2025 TWO DOT LEADER in Adobe Garamond Pro has no space between
the two dots. The pair is symmetrically enclosed by a tiny space
smaller than at U+2024.

The glyphs seem to be optimized for special typesetting needs rather
than for general punctuation (confirming other information given in
this thread).

See the attached GIF.
- Karl<>leaders:  l․l․l‥l‥l…l…l
dots:  l.l.l..l..l...l...l
leaders doubled:  l․․l‥‥l……l

Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?

2003-05-31 Thread Philippe Verdy

From: "Kenneth Whistler" <[EMAIL PROTECTED]>
> Philippe Verdy continued:
> 
> > What surprizes me the most in the Unicode spec is that it 
> > both says that its purpose is to create arbitrary length 
> > of leaders 
> 
> As in plain text, as can be seen in Table of Content listings
> in many RFCs, for example. (Which, however, use ASCII 0x2E for the
> same purpose.)

RFCs are not writen using Unicode. They are limited to ASCII, even if they are 
originally created in a XML-based rich text format.
That's why they are converted using the standard fallback conversions to sequences of 
full stops.

I don't see the interest of this argument, given that ASCII has no leader characters. 
It is reasonnable in the RFCs to write them with full dots because there's no other 
choice, given the constraints of the ASCII format and a typewriter style with a fixed 
number of columns.

RFCs are plain text but this format is not a flawed text, for historical reasons (even 
the page layout is fixed!). However, most RFCs are now available too in a flawed rich 
text format that allow them to be presented in a much more readable format for 
printing. In some future, when all past RFCs will have been rewritten to the new rich 
text format, they will be available as XML+XSL text, HTML text, preformatted PDF 
files, ...

It is even possible that the RFC Editor abandons the ASCII format at some time, 
because of the difficulties to integrate schemas, graphics, and tables, and the 
difficulties to interpret some notations sometimes needed because of the absence of a 
richer character set.

At that time the XML format may become the only supported normative format, with all 
other formats derived from this source (note that new RFC submissions MUST now be 
created with this new format with a normative DTD, before discussions and approval of 
its content, which will be converted to preformated ASCII text later). The main reason 
is the difficulty to maintain the preformated text when it is discussed (it requires 
too much manual editing).

Re: Fw: Unicode filename problems

2003-05-31 Thread David Starner

On Fri, May 30, 2003 at 10:58:53PM -0500, [EMAIL PROTECTED] wrote:
> In the meantime, to get it onto CD, I decided to try and zip all the
> files. Turns out almost all the zippers out there DO NOT support Unicode
> filenames. Doug Rintoul found WinRAR
> (http://www.rarlab.com/rar_archiver.htm) which does the trick in the RAR
> format only. There is a RAR expander for Macintosh and Linux systems as
> well (all of these are $29 USD). So far, have not found a freeware
> solution that meets unicode filename needs. Have any of you run into this
> yet?

You could try tar; at least on Unix, it handles Unicode (UTF-8) just
fine.

-- 
David Starner - [EMAIL PROTECTED]
Ic sæt me on anum leahtrice, ða com heo and bát me!

Fw: Unicode filename problems

2003-05-31 Thread Peter_Constable

I wonder if anyone here has ideas on these matters.

Peter

- Forwarded by Peter Constable/IntlAdmin/WCT on 05/30/2003 10:56 PM
-


I have 3 LinguaLinks lexicons that I have converted into HTML pages - one
for each entry. The languages use non-ANSI characters, so I also did a
Unicode conversion at the same time.

[snip]

Everything works very well except that I cannot burn the files onto a CD
because of the unicode values in the filenames. Roxio and Nero CD-burners
don't accept some of the higher values found in the file names (using
Jolliet, ISO9600 and UDF). Anyone have any ideas how to deal with this?
For example, a filename with unicode value 026B, a tilde lower case L,
causes problems.

In the meantime, to get it onto CD, I decided to try and zip all the
files. Turns out almost all the zippers out there DO NOT support Unicode
filenames. Doug Rintoul found WinRAR
(http://www.rarlab.com/rar_archiver.htm) which does the trick in the RAR
format only. There is a RAR expander for Macintosh and Linux systems as
well (all of these are $29 USD). So far, have not found a freeware
solution that meets unicode filename needs. Have any of you run into this
yet?

I could try to determine what Unicode values are causing problems on the
CD burner and do an unacceptable-to-acceptable character translation in
the filenames and the links to those filenames ... but that seems like a
huge compromise. Also, it will be difficult to come up with a generic
solution ... that is to say, I don't know what RANGE of values are
unacceptable for characters in a CD filename. Jolliet is supposed to allow
Unicode filenames according to the documentation I have seen.

Larry

Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULLSTOP?

2003-05-31 Thread Jim Allan

Ken Whistler posted:

U+2025 TWO DOT LEADER is also an XCCS compatibility character.
It corresponds to XCCS 356B/243B (0xEEA3) "Leader, two-dot
on an en body" *and* to 041B/105B (0x2145) "Leader, two-dot
on an em body". The difference in width was considered
a formatting distinction and was unified away in creating
the U+2025 encoded character, as preserving that distinction
in plain text was considered unnecessary by the Xerox
representative to the committee at the time.
U+2026 HORIZONTAL ELLIPSIS maps to the ellipsis seen in a
number of legacy character encodings, including the Macintosh
character sets, but also maps to an XCCS character: 041B/104B
(0x2144) "Leader, three-dot on an em body". 
Interesting.

The intent of the first two characters in the Unicode standard are 
rather vague.

The name TWO DOT LEADER might mean two baseline dots taking up twice the 
space of a one dot leader, presumably to allow a leader to be 
constructed with less keystrokes in the days of manual typography or 
fewer sorts in the days of manual typesetting.

Or TWO DOT LEADER might mean two dots which take up the same space as 
the one dot in the ONE DOT LEADER, to provide a denser leader (possibly 
with finer dots?)

The second seems close to what was intended.

Perhaps a note should be attached to these characters indicating that 
the ONE DOT LEADER was originally intended for leaders with one dot per 
en and the TWO DOT LEADER was intended for leaders with two dots per en 
or per em, to give some guidance to font creators and users.

Though probably of little pratical use, the presence of either U+2024 or 
U+2025 might be used as a hint for any application explicitly rendering 
from plain text to an internal fancy text format that these are leaders.

This seems to be what Philippe is suggesting, and it seems to me a 
reasonable thing for an application to do, but *only* if done explicitly 
and openly as a user-requested reinterpretation of the text.

(I would not expect or want HTML or XML or text readers to do anything 
with these characters except show them as they find them.)

A wordprocessing or desktop publishing application could use the forms 
and sizes of the dots in these characters in the current font as the 
basis for creating its own leaders (going instead to the full stop if 
these characters are empty).

Jim Allan

Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?

2003-05-31 Thread Kenneth Whistler

Philippe Verdy continued:

> What surprizes me the most in the Unicode spec is that it 
> both says that its purpose is to create arbitrary length 
> of leaders 

As in plain text, as can be seen in Table of Content listings
in many RFCs, for example. (Which, however, use ASCII 0x2E for the
same purpose.)

> (you say that the spacing statement in the Xerox name was 
> not considered important by Xerox, so how many leaders would 
> be needed to fit a en space with the Unicode designation?).

If you mean how many leader *dots* would it take to fit an en
space, that would depend on the font in Unicode, as for so
much else. My guess would be that the correct answer is
approximately the same as the number of angels that can stand
on the dot.

Very few characters in Unicode have any specified widths. That 
is by design.

> Why then do you insist that it represents one dot ? 

Because that was the intent of the Unicode Technical Committee
when it encoded the character, and is the clear intent of the
standard as currently specified.

> You also seem to insist o the "compatibility" decomposition 
> which is normally removing an important semantic (else it 
> would be canonical).

I'm simply restating the specification in the standard. Read it
yourself.

> All this seems like creating contradictions.
> 
> Also it would be the only punctuation sign whose number of 
> occurences is not relevant 

False. See the discussion of Tibetan justifying tseks in:

http://www.unicode.org/versions/Unicode4.0.0/ch09.pdf

> (in dotted lines used as leaders), 

Or, for that matter, in plain text visual line separations
also created by stringing together ASCII punctuation:
**
like that. Such legacy use of punctuation characters is no
different than legacy use of a sequence of periods to create
leader lines in plain text.

> as the final presentation of the text will need to compensate 
> for font metrics differences in order to produce the correct 
> effect (also because the size of the dots where removed from 
> the Unicode designation.)

So? That is irrelevant to the question at hand. People who do
stuff like this, as in plain text RFCs, display text in
monospace fonts and don't expect dynamic reflowing of text.

People who do leader lines correctly for fine typography do
them with internal data abstractions, and those data abstractions
aren't based on interpreting U+2024 as a format control character.

> I do no agree wih your argument that says that it is like a 
> full dot to be used in limited applications 

You can disagree with my argument all you like. But if you insist
on coming on the unicode list and spouting nonsense about
particular characters in the standard, suggesting that people
implement them in ways that would be nonconformant with the
standard, then expect people to respond to the nonsense.

> (if Unicode wanted to remove the spacing, it was to generalize 
> is use as an abstract character, not to reenforce its mapping 
> to an approximate full dot.)

That claim is errant nonsense.

> I never heard about the Xerox CCS before, but there's a large 
> legacy usage of the ellipsis as a single unbreakable character 

Correct. And U+2026 is encoded precisely for that legacy practice.

> (and the two dots for the notation of interval bounds are also 
> unbreakable).

True, but this kind of behavior falls automatically out of most
implementations' treatment of U+002E characters in sequence.
Check UAX #14, which discusses the line break behavior of both
the leader dot characters and U+002E FULL STOP. U+002E is lb class
IS, and since class IS prohibits a break before, a sequence of
two periods in a row, as in [0..1] does not have a break
opportunity in the middle of the sequence.

> The single dot leader looks like a way to fill the gap, 
> only because two-dot three-dots ellipsis did not allow, 
> in most fonts and applications, to create a regular leader, 
> using smaller dots than the one used for the regular full stop 
> punctuation.

You are mixing up glyphs and characters here.

In "most fonts and applications" leader dots are *glyphs* used
to express a measured leader line, not characters at all.

> The fact that it was unified with XCCS (with some 
> compromizes accepted by Xerox) clearly demonstrates that 
> the Xerox design was not the main focus:

In the case of encoding of the ONE DOT LEADER, you don't know what you
are talking about.

> - Who knows XCCS and use it ? Very few people.

Today, yes. But it was a key source of character repertoire for
Unicode 1.0, and choices made in the XCCS often guided thinking
about character/glyph distinctions for Unicode.

> - Who uses leaders ? Every publisher and author of long documents 
> that do not want to see irregularily spaced leaders, or a dotted 
> grid instead of a true dotted horizontal line.

This is irrelevant to the claims you have been making about U+2024.

> 
> Leaders are visual helpers for the eye of

Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?

2003-05-31 Thread Philippe Verdy

From: "Kenneth Whistler" <[EMAIL PROTECTED]>
> That last fact should be taken as a hint that for most
> purposes, manual leaders should just be sequences of FULL STOP
> characters (as you will see, for instance in the plain text
> representations of Internet Drafts or RFCs, for example).
> But in any rich text format, leaders are styled formatting objects
> (somewhat similar to tabulations, as Philippe suggested), but
> that does *not* make U+2024 a format character (LEADER
> PLACEHOLDER, or whatever). It is exactly what it claims to
> be: a compatibility character, punctuation, with a single
> baseline dot as its glyph.

What surprizes me the most in the Unicode spec is that it both says that its purpose 
is to create arbitrary length of leaders (you say that the spacing statement in the 
Xerox name was not considered important by Xerox, so how many leaders would be needed 
to fit a en space with the Unicode designation?). Why then do you insist that it 
represents one dot ? You also seem to insist o the "compatibility" decomposition which 
is normally removing an important semantic (else it would be canonical).
All this seems like creating contradictions.

Also it would be the only punctuation sign whose number of occurences is not relevant 
(in dotted lines used as leaders), as the final presentation of the text will need to 
compensate for font metrics differences in order to produce the correct effect (also 
because the size of the dots where removed from the Unicode designation.)

I do no agree wih your argument that says that it is like a full dot to be used in 
limited applications (if Unicode wanted to remove the spacing, it was to generalize is 
use as an abstract character, not to reenforce its mapping to an approximate full dot.)

Compatibility decompositions are not intended to represent exactly the same semantics 
between the "composed" character and the core base characters in the decompositions. I 
think that compatibility decompositions are only acceptable fallbacks when the initial 
character is not supported, but they do not represent the same abstract characters. At 
least it was true before the decomposition stability "pact", but it is less clear now 
as roundtrip convertibility with some encodings is favored face to exact character 
abstraction.

I never heard about the Xerox CCS before, but there's a large legacy usage of the 
ellipsis as a single unbreakable character (and the two dots for the notation of 
interval bounds are also unbreakable). The single dot leader looks like a way to fill 
the gap, only because two-dot three-dots ellipsis did not allow, in most fonts and 
applications, to create a regular leader, using smaller dots than the one used for the 
regular full stop punctuation.

The fact that it was unified with XCCS (with some compromizes accepted by Xerox) 
clearly demonstrates that the Xerox design was not the main focus:
- Who knows XCCS and use it ? Very few people.
- Who uses leaders ? Every publisher and author of long documents that do not want to 
see irregularily spaced leaders, or a dotted grid instead of a true dotted horizontal 
line.

Leaders are visual helpers for the eye of readers, they have absolutely no punctuation 
or symbolic semantic (unlike the two-dots symbol or the ellipsis). The fact that it 
was categorized as a punctuation is probably an initial error that can' be corrected 
and that comes from the classification of its approximative fallback "compatibility 
decomposition".

I do not see it as a compatility character needed for roundtrip conversions with 
legacy sets (even if XCSS was mapped this way after some compromizes). Pure roundtrip 
conversions respect the initial design of the legacy set from which a character is 
mapped.

So you seem to mix the very distinct concept of compatibility characters and 
compatibility decompositions:

- compatibility characters are for the initial mapping from an important legacy 
encoding with full roundtrip, and the exact semantic is preserved in this mapping to 
Unicode. The usage of these Unicode codepoints is discouraged out of this legacy usage.

- characters that have compatiblity decompositions are intended as guides for 
acceptable fallback characters that will not create too confusive interpretation by 
readers, but the exact semantic is not preserved with their compatibility 
decomposition. Their usage is not discouraged but instead favored by Unicode which 
adds important semantics in the "composed" character.

Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?

2003-05-31 Thread Kenneth Whistler

Michael,

> As a typesetter on Mac OS X, I see no reason to abandon the use of 
> the three-dotted horizontal ellipsis character, Ken.

Nor do I. It is fine for ellipses...

And it was encoded for that. But in encodings which don't have
an ellipsis character, it is roughly comparable to a sequence
of three periods, as above. And for sure you wouldn't want to
create leader lines by using a sequence of ellipses.

--Ken

Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?

2003-05-31 Thread Ben Dougall

On Friday, May 30, 2003, at 03:07  pm, John Cowan wrote:

Ben Dougall scripsit:

why is it not categorised as white space then? or is it? doesn't look
like it is to me, but i'm not sure how to actually find out for sure.
Well, um, it's not white: there is a dot in it.
i was just querying what philippe had said, and wondering why unicode's 
categorising didn't match up with what he was saying:

There seems to be a difference: leaders are expected to be written in 
sequences (sometimes long) to create a dotted line. The complete 
sequence of leaders then seems to be a form of tabulation (sort of 
whitespacing), and default ignorable, unlike the full stop which is a 
non ignorable punctuation (or a non ignorable decimal separator).
...
So one-dot leaders can safely be replaced by spaces without affecting 
the semantic, unlike the full stop or the ellipsis.
that led me to ask, why it isn't categorised as white space.

Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULLSTOP?

2003-05-31 Thread Michael Everson

As a typesetter on Mac OS X, I see no reason to abandon the use of 
the three-dotted horizontal ellipsis character, Ken.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com

Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?

2003-05-31 Thread Kenneth Whistler

Philippe Verdy vamped:

> > > For example I would not be shocked if a text using it was rendered with 
> > > a monospaced font, where the base line of the character cell shows
> > > multiple tiny dots, that create a contiguous dotted line when multiple
> > > U+2024 characters (one per display cell) are used to indent the text in 
> > > columns.
> > > 
> > > Of course with proportional fonts this character would display at least 
> > > (and preferably) a single dot. Any use of this character that assumes
> > > it is a symbol consisting in a single dot aligned on the baseline seems 
> > > to abuse the semantic of this character, which is not a punctuation,
> > > but really a styling character used instead of an "invisible" thin
> > > space.
> > 

And Jim Allan asked:

> > Where is this behavior indicated by Unicode specifications?
> > 
> > Such behavior appears to me to be a non-standard extension on Unicode, 
> > interpreting what Unicode classes as a General Puncutation character as 
> > instead a Formatting Character.

> > But I don't see how conforming aplications could assume this semantic 
> > for the character when reading in plain text Unicode or writing plain 
> > text Unicode.
> > 
> > What then is U+2025 TWO DOT LEADER?

And then Philippe Verdy continued to improvise:

> For me this one is a punctuation, commonly used to designate 
> a separator between bounds of intervals like [0..1] (it is 
> generally surrounded by a thin space on both sides with strict 
> typography). It should not be used to create arbitry lengths 
> of leaders.

What he is talking about here is generally represented by
the sequence , in other words, just two
full stops, as in the example given "[0..1]". Typographical
rules then deal with any issues of spacing around or between
the dots.

> 
> The three dot leader is also a punctuation (normally not 
> prefixed by any space, but followed by a large space like 
> for the full dot). It should not be used to create arbitry 
> lengths of leaders.

This is a reference to U+2026 HORIZONTAL ELLIPSIS, and Philippe
is correct that that should not be used to create arbitrary
leaders.

> The one-dot leader should have no other purpose than to be 
> used in sequences of arbitrary length. 

This statement is only very accidentally true. Explanation
below.

> The whole sequence of single-dots leaders like this forms a 
> single token with the semantic of a word separator, where the 
> number of displayed dots is not really relevant for the reader 
> of text whatever is rendering style or fonts.

But this is absolutely false, as Jim Allan suggested.
U+2024 ONE DOT LEADER is a graphic character, whose glyph
consists of a small baseline dot, and whose General Category
is Po (Other Punctuation). It cannot be used conformantly as
if it were a formatting control standing in for a rich text
representation of a leader object (e.g. in a generated
Table of Contents in a Word or FrameMaker document).


> I just think that this 1-dot leader is used as a way to transcode
> within a single string what was initially a tabulation decorated 
> by some markup system, 

False.

Now, here is the true story of U+2024.

It is a compatibility character, introduced for compatibility
with XCCS (Xerox Character Code Standard) 1980, where it
was mapped to the coded character 356B/242B (0xEEA2),
described as "Leader, one-dot on an en body".

Its use in XCCS would have been to create leaders manually,
by lining up a sequence of "one-dot on an en body" to create
a sufficiently long leader. Its rationale in Unicode would be
to either map to data created in XCCS or to manually lay
out text using a comparable mechanism, but for which one wished to
distinguish the "dots" thus used from U+002E FULL STOP.

U+2025 TWO DOT LEADER is also an XCCS compatibility character.
It corresponds to XCCS 356B/243B (0xEEA3) "Leader, two-dot
on an en body" *and* to 041B/105B (0x2145) "Leader, two-dot
on an em body". The difference in width was considered
a formatting distinction and was unified away in creating
the U+2025 encoded character, as preserving that distinction
in plain text was considered unnecessary by the Xerox
representative to the committee at the time.

U+2026 HORIZONTAL ELLIPSIS maps to the ellipsis seen in a
number of legacy character encodings, including the Macintosh
character sets, but also maps to an XCCS character: 041B/104B
(0x2144) "Leader, three-dot on an em body".

All *three* of these characters should be considered
compatibility characters. Indeed, they formally *are*
"compatibility decomposable characters" (Chapter 3, Definition
D21), since they each have compatibility decompositions
to one or more U+002E FULL STOP characters.

That last fact should be taken as a hint that for most
purposes, manual leaders should just be sequences of FULL STOP
characters (as you will see, for instance in the plain text
representations of Internet Drafts or RFCs, for example).
But in any rich text format, leaders are styled form

Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?

2003-05-31 Thread Philippe Verdy

From: "Jim Allan" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, May 30, 2003 8:05 PM
Subject: Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?

> John Cowan posted:
> 
> > Not really, in many applications it will translate in one or more dots
> > just to create a dotted line (notably within layout processors for
> > publishing). This looks more like a "styled" thin whitespace, and
> > semantically it really has this value (the number of dots is not really 
> > relevant).
> > 
> > For example I would not be shocked if a text using it was rendered with 
> > a monospaced font, where the base line of the character cell shows
> > multiple tiny dots, that create a contiguous dotted line when multiple
> > U+2024 characters (one per display cell) are used to indent the text in 
> > columns.
> > 
> > Of course with proportional fonts this character would display at least 
> > (and preferably) a single dot. Any use of this character that assumes
> > it is a symbol consisting in a single dot aligned on the baseline seems 
> > to abuse the semantic of this character, which is not a punctuation,
> > but really a styling character used instead of an "invisible" thin
> > space.
> 
> Where is this behavior indicated by Unicode specifications?
> 
> Such behavior appears to me to be a non-standard extension on Unicode, 
> interpreting what Unicode classes as a General Puncutation character as 
> instead a Formatting Character.
> 
> Individual applications can do such things as they wish as part of a 
> higher protocol.
> 
> But I don't see how conforming aplications could assume this semantic 
> for the character when reading in plain text Unicode or writing plain 
> text Unicode.
> 
> What then is U+2025 TWO DOT LEADER?

For me this one is a punctuation, commonly used to designate a separator between 
bounds of intervals like [0..1] (it is generally surrounded by a thin space on both 
sides with strict typography). It should not be used to create arbitry lengths of 
leaders.

The three dot leader is also a punctuation (normally not prefixed by any space, but 
followed by a large space like for the full dot). It should not be used to create 
arbitry lengths of leaders.

But the "one dot" leader should no be mixed with a full stop or a decimal separator, 
or an abbreviation symbol, or an ideographic full stop, or a bullet, or a 
multiplication operator, or other 1-dot symbols which have their own encodings...

The one-dot leader should have no other purpose than to be used in sequences of 
arbitrary length. The whole sequence of single-dots leaders like this forms a single 
token with the semantic of a word separator, where the number of displayed dots is not 
really relevant for the reader of text whatever is rendering style or fonts.

I just think that this 1-dot leader is used as a way to transcode within a single 
string what was initially a tabulation decorated by some markup system, but it has no 
other application, because unlike other punctuations the number of occurences of this 
character is not meaningful or could have varied depending on conversion constraints 
at the interface between rich text and plain text.

Re: Announcement: New Unicode Savvy Logo

2003-05-31 Thread Christopher John Fynn

"Carl W. Brown" <[EMAIL PROTECTED]> wrote:

> If nothing else we need to discourage people from using the
Latin-1 code
> page and a special font to create a code page hack.

Yes, I think that sort of thing should be *explicitly forbidden*
on pages where the "Unicode Savvy"  logo is present (unless they
are glyphs for unencoded characters and mapped to the PUA).
After all pages using a font/code page hack could be in UTF-8
and validate.

- Chris

RE: Announcement: New Unicode Savvy Logo

2003-05-31 Thread Carl W. Brown

Chris,

> > I think that if you have a Klingon web site that uses UTF-8
> and the PUA with
> > your own font is very Unicode savvy.
> >
> > Carl
>
> It's certainly a lot more savvy than using Latin-1 characters to
> encode Klingon.

If nothing else we need to discourage people from using the Latin-1 code
page and a special font to create a code page hack.

Carl

Specifying the character encoding (was: Announcement: New Unicode Savvy Logo)

2003-05-31 Thread Otto Stolz

William Overington wrote:

1.  I tried out the validation procedure on the following page.
http://www.users.globalnet.co.uk/~ngo/font7007.htm
It will
not validate.  It is not clear to me what I need to add to the page to get
it to validate.


RTFM:

and .
Cheers,
  Otto Stolz

Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULLSTOP?

2003-05-31 Thread Jim Allan

John Cowan posted:

Not really, in many applications it will translate in one or more dots
just to create a dotted line (notably within layout processors for
publishing). This looks more like a "styled" thin whitespace, and
semantically it really has this value (the number of dots is not really 
relevant).

For example I would not be shocked if a text using it was rendered with 
a monospaced font, where the base line of the character cell shows
multiple tiny dots, that create a contiguous dotted line when multiple
U+2024 characters (one per display cell) are used to indent the text in 
columns.

Of course with proportional fonts this character would display at least 
(and preferably) a single dot. Any use of this character that assumes
it is a symbol consisting in a single dot aligned on the baseline seems 
to abuse the semantic of this character, which is not a punctuation,
but really a styling character used instead of an "invisible" thin
space.
Where is this behavior indicated by Unicode specifications?

Such behavior appears to me to be a non-standard extension on Unicode, 
interpreting what Unicode classes as a General Puncutation character as 
instead a Formatting Character.

Individual applications can do such things as they wish as part of a 
higher protocol.

But I don't see how conforming aplications could assume this semantic 
for the character when reading in plain text Unicode or writing plain 
text Unicode.

What then is U+2025 TWO DOT LEADER?

Are there any other characters in Unicode that are *expected* to stretch 
in size and produce multiple images?

Jim Allan

Re: Rare extinct latin letters

2003-05-31 Thread Patrick Andries


- Message d'origine -
De : "Philippe Verdy" <[EMAIL PROTECTED]>

> From: <[EMAIL PROTECTED]>
> > Patrick Andries on 05/29/2003 06:15:10 PM:
> >
> > > Could letters like « l molle »
> > (http://pages.infinit.net/hapax/abcmeigret.jpg
> > > ) or long-tailed A (between O and P in Baïf's alphabet http://pages.
> > > infinit.net/hapax/abcbaif.jpg), letters which I believe cannot be
> > > composed from other existing Unicode characters, be considered for
> > > Unicode encoding
> >
> > If there is a user need, then probably yes.

[PA] Well, may be.  My enquiry was motivated by the question of a
typographer asking for support of texts (although not quoting Meigret and
Baïf) where it seems important to have original letters preserved in a
digital plain text format (beside and above a scanned image) in order to put
such texts on the Web and be able to search the original forms. I have asked
for further information to see what the real need will be.

> I'd like to see some real historic publication that is not an attempt to
reform the French orthograph, using an invented "new" alphabet only used by
1 author.

[PA] I understand.  But these texts are still studied today (I have a recent
popular book on the History of the French Language with pages dedicated to
these authors) and regularly printed as fascimiles for study (Honoré
Champion of Geneva publishes facsimile Renaissance books, here are the two
Meigret books they currently publish
http://www.champion.ch/cgi/run?wwfrset+3+739515205+1+2+cccdegtl1+N+1+1903091
2 ). I'm not sure that this is very different from Deseret or other
alphabets having very few source documents. We are also speaking here about
a few additional letters to be able to represent these texts, not complete
new alphabet.


> Such text is probably interesting to study as it gives hints on how French
was *spoken* when it was written (i.e. interesting for phonetic studies),
but I doubt it has a real language value until there is some real usage of
these modified alphabets created only as a proposal for a future reform of
the orthograph that was never applied.

[PA] I believe the need of an encoding may be pragmatically ascertained, I
don't known about the « real linguistic value » of an alphabet. I have, by
the way, no problem if someone says : « Sorry, too idiosyncratic and
excentric ! Use the private user area if you need such characters. » This
may well be the case.

> Also the letter forms are not quite clear, because the metal fonts use
characters that were apparently manually manufactured individually. So the
glyphs are near but not enough distinctable from these small scanned images.
This would require a more complete analysis of the text, if such text
exists.

[PA] I'm not sure I understand : I have facsimiles of these books at home (I
am currently travelling so I can't scan additional pages), these texts do
exist (usually several hundred pages of them per author).

P. A.

Re: Announcement: New Unicode Savvy Logo

2003-05-31 Thread Christopher John Fynn


"Carl W. Brown" <[EMAIL PROTECTED]> wrote:

> I think that if you have a Klingon web site that uses UTF-8
and the PUA with
> your own font is very Unicode savvy.
>
> Carl

It's certainly a lot more savvy than using Latin-1 characters to
encode Klingon.

- Chris

BabelMap

2003-05-31 Thread Edward C. D. Hopkins




I don't remember seeing mention on this list 
that BabelMap now supports Unicode 4.0 (in Planes 0, 1, 2, 14, 15 and 16). 
Here's the link: http://tinyurl.com/d26d
 
Cheers,
 
Chris Hopkins

Re: [Maybe OT] localized names of the Unicode Control characters

2003-05-31 Thread Patrick Andries


- Message d'origine -
De : "Patrick Andries" <[EMAIL PROTECTED]>

> Even « literal » translations may help disambiguate English forms by the
> introduction of prepositions (e.g. "variant selector" may be
misinterpreted
> by a translator unaware of its role as a slighly different selector
(variant
> as an adjective, variante de sélecteur) rather than a selector of variants
> (variant as a noun, « sélecteur de variants »))

Let me correct the French typo and add that there is actually two ways of
translating "variant selector" when variant is a noun : « sélecteur de
variante » (select one variant at the time) and « sélecteur de variantes »
(select one or several variants at the time). The singular form is the
official ISO 10646 (F) name.

Re: Announcement: New Unicode Savvy Logo

2003-05-31 Thread Curtis Clark

William Overington wrote:
2.. What is the situation if a page is encoded entirely properly as far as,
say, using UTF-8 goes, yet also uses Private Use Area characters?
UTF-8 includes the PUA. It specifies nothing, however, about its contents.

--
Curtis Clark  http://www.csupomona.edu/~jcclark/
Mockingbird Font Works  http://www.mockfont.com/

Re: Announcement: New Unicode Savvy Logo

2003-05-31 Thread Curtis Clark

Philippe Verdy wrote:
May be the PUA allocated spaces could be divided in normative
categories, for example by assigning LTR or RTL base letters in some
areas, diacritics in another large area splitted in 255 subspaces for
combining characters, and symbols or ideographs in another large
area.
Um, then it wouldn't be private. I seem to remember a recent discussion 
of how Microsoft doing something similar was causing all kinds of 
difficulty.

--
Curtis Clark  http://www.csupomona.edu/~jcclark/
Mockingbird Font Works  http://www.mockfont.com/

PUA usage (was RE: Announcement: New Unicode Savvy Logo)

2003-05-31 Thread Marco Cimarosti

Philippe Verdy wrote:
> This contrasts a lot with the Unicode codepoints assigned to 
> abstract characters, that are processable out of any 
> contextual stylesheet, font or markup system, where its only 
> semantic is in that case "private use" with no linguistic 
> semantic and no abstract character evidence, and all with the 
> same default character properties (including shamely the bidi 
> properties needed to render and layout the fonted text, 

In HTML, the default directionality of characters can be overridden with the
BDO tag. E.g.:

hi!

This should displays as a RTL string, with "!" on the left side and "h" on
the right side.

The same can be achieved also in plain-text Unicode, using RLO, LRO and PDF:

?hi!?

(U+202E U+0068 U+0069 U+0021 U+202C)

Ciao.
Marco

PUA usage (was RE: Announcement: New Unicode Savvy Logo)

2003-05-31 Thread Marco Cimarosti

[OOOPS! This works better if I set the proper MIME encoding... Sorry]

Philippe Verdy wrote:
> This contrasts a lot with the Unicode codepoints assigned to 
> abstract characters, that are processable out of any 
> contextual stylesheet, font or markup system, where its only 
> semantic is in that case "private use" with no linguistic 
> semantic and no abstract character evidence, and all with the 
> same default character properties (including shamely the bidi 
> properties needed to render and layout the fonted text, 

In HTML, the default directionality of characters can be overridden with the
BDO tag. E.g.:

hi!

This should displays as a RTL string, with "!" on the left side and "h" on
the right side.

The same can be achieved also in plain-text Unicode, using RLO, LRO and PDF:

‮hi!‬

(U+202E U+0068 U+0069 U+0021 U+202C)

Ciao.
Marco

Re: Announcement: New Unicode Savvy Logo

2003-05-31 Thread Tom Gewecke


>> and clearly
>> not designed to be used on the web.
>> Their use in a page to display text clearly does not qualify, as it
>> requires proprietary fonts to display them.
>
>I think that is overly restrictive. (And if the requirements for the
>"savvy" logo are changed to rule out use of PUA, then I could imagine
>wanting to join WO in requesting a Unicode-with-PUA logo. But I'd rather
>not have to go there.)

I agree.  What is meant by "proprietary" fonts?  There are certain Unicode
ranges that are in any case currently only available via expensive
proprietary fonts or fonts that may only work correctly on a particular OS.
UTF-8 pages using PUA codepoints validate at W3C just as well as pages of
NCR's with 8-bit charsets (which still seem dubious to me.)

Use of Savvy logo with PUA characters (was: Re: Announcement: New Unicode Savvy Logo)

2003-05-31 Thread Doug Ewell

 wrote:

>> and clearly
>> not designed to be used on the web.
>> Their use in a page to display text clearly does not qualify, as it
>> requires proprietary fonts to display them.
>
> I think that is overly restrictive. (And if the requirements for the
> "savvy" logo are changed to rule out use of PUA, then I could imagine
> wanting to join WO in requesting a Unicode-with-PUA logo. But I'd
> rather not have to go there.)

I attached the Savvy logo to my pages that contain PUA characters (and
little else) without a second thought.  Indeed, Unicode PUA characters
*must* be encoded in Unicode, and so would seem especially Savvy!

For them what cares: to view the pages, start at
 and follow the links
just below the text "Three sample texts are now available."  The "prior
agreement" is that the user must have James Kass's Code2000 font
installed.  WARNING: don't do this if you don't care about "constructed"
or "invented" scripts.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Re: Announcement: New Unicode Savvy Logo

2003-05-31 Thread Philippe Verdy

From: "Carl W. Brown" <[EMAIL PROTECTED]>
> > Private Use Areas are by definition not interoperable and clearly
> > not designed to be used on the web.
> > Their use in a page to display text clearly does not qualify, as
> > it requires proprietary fonts to display them.
> 
> People use special fonts all the time.  They are more efficient to obtain a
> special look and feel.  In fact some UTF-8 pages my want to use special
> fonts when  they display characters that a user is not likely to have fonts
> installed.  For example a travel site may want to display the native names
> of sights.  It may use a script that the user does not have a font to cover.
> Even if the user does not read the language they may be able to recognize
> the name.
> 
> From one of my sites:
> 
> 
> I think that if you have a Klingon web site that uses UTF-8 and the PUA with
> your own font is very Unicode savvy.

I would just say that this uses the current top technology, but I don't know if the 
.eot distribution format for the referenced font is widely interoperable for now.

All I can say is that the PUA used in the page are not processable isolately from the 
XML markup and CSS stylesheet. So the effective encoding in this case is the pair 
consisting in the PUA codepoint , and the specified font located at a specific URL.

This contrasts a lot with the Unicode codepoints assigned to abstract characters, that 
are processable out of any contextual stylesheet, font or markup system, where its 
only semantic is in that case "private use" with no linguistic semantic and no 
abstract character evidence, and all with the same default character properties 
(including shamely the bidi properties needed to render and layout the fonted text, 
because there's no guideline for the allocation of PUA characters).

May be the PUA allocated spaces could be divided in normative categories, for example 
by assigning LTR or RTL base letters in some areas, diacritics in another large area 
splitted in 255 subspaces for combining characters, and symbols or ideographs in 
another large area.

Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?

2003-05-31 Thread Philippe Verdy

From: "John Cowan" <[EMAIL PROTECTED]>
> Ben Dougall scripsit:
> 
> > why is it not categorised as white space then? or is it? doesn't look 
> > like it is to me, but i'm not sure how to actually find out for sure.
> 
> Well, um, it's not white: there is a dot in it.

Not really, in many applications it will translate in one or more dots just to create 
a dotted line (notably within layout processors for publishing). This looks more like 
a "styled" thin whitespace, and semantically it really has this value (the number of 
dots is not really relevant).

For example I would not be shocked if a text using it was rendered with a monospaced 
font, where the base line of the character cell shows multiple tiny dots, that create 
a contiguous dotted line when multiple U+2024 characters (one per display cell) are 
used to indent the text in columns.

Of course with proportional fonts this character would display at least (and 
preferably) a single dot. Any use of this character that ssumes it is a symbol 
consisting in a dingle dot aligned on the baseline seems to abuse the semantic of this 
character, which is not a punctuation, but really a styling character used instead of 
an "invisible" thin space.

In the case of a full justification, the number of dots in the leader is not relevant 
too...

The name may be confusing, I would have prefered ONE-DOT LEADERS.

Re: Aramaic Roadmap (was: Persian or Farsi?)

2003-05-31 Thread Edward C. D. Hopkins

> At 12:18 -0400 2003-05-22, Edward C. D. Hopkins wrote:> 
> >But toward being back on topic: it is not clear to me if the 
roadmap> >includes or rejects Arsacid Parthian/Parthian Aramaic (and 
other descriptive> >names have been used). Can someone knowledgeable 
on inclusion of this> >Aramaic variant script in Unicode enlighten 
me?> > Not without detail. But if it's a variant of Aramaic, it is 
likely to > have been unified with Aramaic (which has not yet been 
encoded).

What level of detail would you like? I was 
unable to find a comprehensive Aramaic proposal but did find UTC #3 
(1992-1993), N1932 (1998), N2042 (1999), N2311 (2001) and N2556 (2002). 

In other UTC documents, I found Carl-Martin 
BUNZ, Encoding Scripts from the Past: Conceptual and Practical Problems and 
Solutions. (read at 17th IUC, San Jose, California, September 2000). Bunz 
classes Parthian as "Category B1, in need of encoding".

I also recently came across an article in the 
Journal of Assyrian Academic Studies that is of interest: The Aramaic Language 
and its Classification, by Efrem Yildiz. http://www.jaas.org/edocs/v14n1/e8.pdf

I would like to pursue this to learn what is 
needed to encode Parthian Aramaic.

Cheers,

Chris Hopkins

RE: IPA Null Consonant

2003-05-31 Thread Jim Allan

Ken Whistler posted:

And what I pointed
out earlier is that, in *linguistic* usage, the slashed zero
glyph is clearly an acceptable glyphic variant of the
empty set symbol. So to claim it is "completely unrelated"
is to manifestly ignore actual practice. 
Indeed.

Donald Knuth, a mathematician and author of books on programming, 
disgusted with the continued worsening of typography for publication of 
mathematical texts, in the 1980's invented the TeX typesetting 
programming system and the Metafont font creating system and produced a 
number of fonts himself which are still used.

He used the slashed 0 for the empty set symbol in his cmsy10 font for 
mathematics.

See http://www.mozilla.org/projects/mathml/fonts/encoding/cmsy-ttf.gif 
for the forms of this font. It is encoded as U+2205 at 
http://www.mozilla.org/projects/mathml/fonts/encoding/cmsy-ttf-encoding.html.

See also 
http://www.usefulcontent.org/docs/manuals/REC-MathML2-20010221/isoamso.html 
for some mathml characters and their unicode encodings.

The character "empty" is encoded as U+2205 plus the variation selector 
U+FE00 and has the description "/emptyset - zero, slash" and the alias 
"emptyset".

The following character "emptyv" is encoded simply as U+2205 and has the 
description "/varnothing, circle, slash" and the alias "varnothing".

But the glyph forms of both are given as slashed zero, though that for 
"empty" is slightly wider. (I suspect an error in the emptyv glyph shown 
 here.)

At 
http://www.usefulcontent.org/docs/manuals/REC-MathML2-20010221/byalpha.html 
the entities empty and emptyset are both assigned to U+2205 followed by 
U+FE00 while emptyv is again just U+2205.

That the differing forms are indicated by a variation selector indicates 
that the forms are seen to be "primarily glyphic variations" (see 
http://www.unicode.org/reports/tr25/#13_7_variation_selectors).

That both "emptyset" and "empty" are applied to the slashed zero while 
"emptyv" (I presume meaming "empty variant") is applied to slashed 
circle would seem to indicate that to the creators of mathml, as well as 
to Donald Knuth, the slashed zero form is felt to be the more normal 
glyph for empty set (and for other indications of emptiness, nullity, etc.)

Jim Allan

Re: Rare extinct latin letters

2003-05-31 Thread Philippe Verdy

From: <[EMAIL PROTECTED]>
> Patrick Andries on 05/29/2003 06:15:10 PM:
> 
> > Could letters like « l molle »
> (http://pages.infinit.net/hapax/abcmeigret.jpg
> > ) or long-tailed A (between O and P in Baïf's alphabet http://pages.
> > infinit.net/hapax/abcbaif.jpg), letters which I believe cannot be
> > composed from other existing Unicode characters, be considered for
> > Unicode encoding
> 
> If there is a user need, then probably yes.

I'd like to see some real historic publication that is not an attempt to reform the 
French orthograph, using an invented "new" alphabet only used by 1 author.

Such text is probably interesting to study as it gives hints on how French was 
*spoken* when it was written (i.e. interesting for phonetic studies), but I doubt it 
has a real language value until there is some real usage of these modified alphabets 
created only as a proposal for a future reform of the orthograph that was never 
applied.

Both samples demonstrate that the usage of diacritics does not match the strict 
categories applied in Unicode (for example the Unicode distinction between a cedilla 
and an ogonek is not clear in those texts, where the difference really seems to be 
considered as a form variant for the same diacritic).

Also the letter forms are not quite clear, because the metal fonts use characters that 
were apparently manually manufactured individually. So the glyphs are near but not 
enough distinctable from these small scanned images. This would require a more 
complete analysis of the text, if such text exists.

For now, these appear to be simple and normal form variants from other characters 
already encoded in Unicode, provided that a convention is applied to choose which 
Unicode diacritic best fits the displayed characters.

For example, my reading of the A with long trail, or G with a hook, or E with long 
trail makes me think that these are ligatures created from a base letter and a 
following U. The already encoded "attached hook above" diacritic could represent 
correctly these ligatures, considered in those texts as individual letters in the 
tentative "reformed" ortograph, based on phonetic rules rather than strict historical 
radicals.

The form of the L moll or N moll letters look like if it was a tilde diacritic above N 
 OR ENG (and the same could be used above L, even if it does not strictly look like 
the same composed glyph, as a historic form variant of the LATIN LETTER L WITH TILDE, 
currently encoded in Unicode as a pair of abstract characters).

If this is not enough, may be we could create only a new diacritic for the long leg 
attached on right, or the moll sign (the rotated and mirrored J sign, which may also 
be a sort of curl) normally detached above the base small letter and ligated on the 
right for the L letter or capital letters.

It's hard to decide from these two images.

RE: [Not OT] localized names of the Unicode Control characters

2003-05-31 Thread Francois Yergeau

Kenneth Whistler wrote:
> There is no reason in *principle* why the normative French
> names could not also be published on the Unicode website,
> but there is no easy way to coordinate that with the data
> files of the Unicode Character Database (which are part of
> particular versions of the Unicode Standard). Instead,
> someone would have to design some other means of posting
> them up.

Such as adding ListeDesNoms.txt to the appropriate directory as soon as it
is ready, after the release of a version of Unicode?

> In case it isn't clear, the *normative* part is the
> list of names *in* ISO/IEC 10646 (F). The ListeDesNoms.htm
> is Patrick's translation of the Unicode data file
> NameList.txt, including all the cross-references, and
> various other informational comments, making use of
> the normative French names from ISO/IEC 10646 (F).

Which is true, but hides the fact that the normative French names actually
come from this file.  The file is mechanically processed to produce the
various 10646(F) name lists, just as it is mechanically processed by Asmus
to produce the French charts.

-- 
François

RE: Announcement: New Unicode Savvy Logo

2003-05-31 Thread Carl W. Brown

Philippe,

> Private Use Areas are by definition not interoperable and clearly
> not designed to be used on the web.
> Their use in a page to display text clearly does not qualify, as
> it requires proprietary fonts to display them.

People use special fonts all the time.  They are more efficient to obtain a
special look and feel.  In fact some UTF-8 pages my want to use special
fonts when  they display characters that a user is not likely to have fonts
installed.  For example a travel site may want to display the native names
of sights.  It may use a script that the user does not have a font to cover.
Even if the user does not read the language they may be able to recognize
the name.

>From one of my sites:



I think that if you have a Klingon web site that uses UTF-8 and the PUA with
your own font is very Unicode savvy.

Carl

Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?

2003-05-31 Thread John Cowan

Ben Dougall scripsit:

> why is it not categorised as white space then? or is it? doesn't look 
> like it is to me, but i'm not sure how to actually find out for sure.

Well, um, it's not white: there is a dot in it.

-- 
You are a child of the universe no less John Cowan
than the trees and all other acyclichttp://www.reutershealth.com
graphs; you have a right to be here.http://www.ccil.org/~cowan
  --DeXiderata by Sean McGrath  [EMAIL PROTECTED]

Re: Rare extinct latin letters

2003-05-31 Thread Peter_Constable


Patrick Andries on 05/29/2003 06:15:10 PM:

> Could letters like « l molle »
(http://pages.infinit.net/hapax/abcmeigret.jpg
> ) or long-tailed A (between O and P in Baïf's alphabet http://pages.
> infinit.net/hapax/abcbaif.jpg), letters which I believe cannot be
> composed from other existing Unicode characters, be considered for
> Unicode encoding

If there is a user need, then probably yes.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

37 matches

Mail list logo