UTR#17 comments (was RE: Unicode Public Review Issues update)

2003-11-28 Thread Peter Constable
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf
 Of Rick McGowan


 The following public review issues are new:
 
 25   Proposed Update UTR #17 Character Encoding Model  2004.01.27

I have submitted the following comments, copied here in case anyone
wishes to discuss them:

The draft text for TR17, section 5 says, A simple character encoding
scheme is a mapping of each code unit of a CCS into a unique serialized
byte sequence. It goes on to define a compound CES. While not stated
explicitly, Unicodes CESs do not fit the definition of a compound CES,
and so the definition for simple CES must apply.

The problem is that this definition cannot accommodate all seven Unicode
CESs. Since it defines a CES as a mapping from each code unit, there are
only two possible byte-order-dependent mappings for 16- and 32-bit code
units. In other words, the distinction between UTF-16BE and UTF-16 data
that is big-endian cannot be a CES distinction because individual code
units are mapped in exactly the same way in both cases.

A definition for simple CES must, at a minimum, refer to a mapping of
*streams* of code units if it is to include details about a byte-order
mark that may or may not occur at the beginning of a stream.

I would suggest that, in order to accommodate the UTF-16 and UTF-32
CESs, an appropriate definition should actually be a level of
abstraction away from a mapping: a CES is a specification for
mappings. Any mapping is necessarily deterministic, giving a specific
output for each input. A mapping itself cannot serialize in either
big-endian or little-endian format; it must be one or the other,
unambiguously. On the other hand, a specification for how to map into
byte sequences can be ambiguous in this regard. Thus, the UTF-16 CES can
be considered a specification for mapping into byte sequences that
allows a little-endian mapping or a big-endian mapping.




Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division




Ethiopic numbers (was RE: Unicode Public Review Issues update)

2003-11-28 Thread Peter Constable
 26   Update properties for Ethiopic and Tamil non-decimal digits
 2003.01.27Decimal numbers are those using in decimal-radix number
 systems. In particular, the sequence of the ONE character followed by
the
 TWO character is interpreted as having the value of twelve. We have
gotten
 feedback that this is the not the case for Ethiopic or Tamil. Details
are
 on the public issues page.
Comments I've submitted:

PRI#26: It is my understanding that Ethiopic numerals do not use a
decimal radix.

Most sources describing Ethiopic script will list the characters
representing tens, 100 and 1. The existence of these characters,
which are not combinations made from sequences of digits for 0 - 9,
already indicates that this is not a decimal-radix system.

Traditionally, each syllabic character has a numeric value associated.
This is described on page 8 of
http://www.intelligirldesign.com/paper_gabriella.pdf, which shows Arabic
decimal values, and
http://www.library.cornell.edu/africana/Writing_Systems/Numeric.html,
which shows traditional Ethiopic numerals. By comparison of these two
documents, one can get an idea of how the numbers work. 

The following are some other useful discussions of Ethiopic numbering:

http://www.geez.org/Numerals/

http://www.abyssiniacybergateway.net/fidel/sera-faq_4.html

http://www.ethiopic.com/ethiopic/numerals.pdf

The last of these proposes the addition of a digit 0 in order to allow
decimal-radix numbers in Ethiopic. I have no idea whether this has
caught on at all or not, but it is not the traditional system.


Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division




Re: Ethiopic numbers (was RE: Unicode Public Review Issues update)

2003-11-28 Thread Jungshik Shin

On Fri, 28 Nov 2003, Peter Constable wrote:

  26   Update properties for Ethiopic and Tamil non-decimal digits
  2003.01.27Decimal numbers are those using in decimal-radix number
  systems. In particular, the sequence of the ONE character followed by the
  TWO character is interpreted as having the value of twelve. We have
  gotten feedback that this is the not the case for Ethiopic or Tamil.
  Details

 PRI#26: It is my understanding that Ethiopic numerals do not use a
 decimal radix.

 Most sources describing Ethiopic script will list the characters
 representing tens, 100 and 1. The existence of these characters,
 which are not combinations made from sequences of digits for 0 - 9,
 already indicates that this is not a decimal-radix system.

Thank you for providing many useful and interesting links.

FYI, three list styles  for Ethiopic scripts were implemented in Mozilla
a long time ago. One of them was Ethiopic numeric list style. All three
styles are in CSS3 draft.
(http://bugzilla.mozilla.org/show_bug.cgi?id=102252).

BTW, isn't this covered in TUS 3.0 section 11.1 as well as in
TUS 4.0 section 12.1 (p. 323 - p. 324) ?

   Jungshik



RE: Unicode Public Review Issues update (braille)

2003-10-16 Thread Asmus Freytag
I noticed that this message had not gotten a reply.

At 05:07 PM 10/7/03 +0200, Kent Karlsson wrote:
 A question about the issues already open: What is the justification
for
 proposing to make Braille Lo?
Shortly before this came up as a Public Review Issue, I suggested that
Braille characters should not be regarded as ignorable symbols when
collating texts. I.e. that they should have level one weights in the
default weighting table. The reason being that they are more often
used for letters than for other things. (However, I did not ask to make
them Lo...)
That reasoning doesn't really apply. When data-streams contain a mixture
of Braille and other character codes, then one would assume that the Braille
is merely cited for the sighted, in which case it's used as a symbol.
When data streams contain exclusively Braille, then they are actually used
as intended. To sort such data would require a tailoring based on the
Braille mapping being used.
Which you recognize implicitly:

That would be for the default ordering. Wanting a more
alphabetically proper ordering, would still require tailoring for that
particular correspondence between ordinary letters and Braille, but
not
require converting to the ordinary letters. Each such tailoring would
give level 1 weights to most of the Braille characters used in that
system
of usage.
That may be true, but I still don't see what difference a change in default
mapping would offer. For people reading the braille, it provides a random,
almost binary ordering, which would seem to swamp all benefits of it being
level 1. [If all data to be sorted has weights only on the same level(s),
the results should be unaffected by which those levels are. Or am I missing
something here?]
For people using data with Braille embedded, e.g. instructional material,
I don't see the benefit of sorting the Braille as if it was letters by default.
If you wanted to sort a list, like a Braille to English phrasebook, then
you would need a tailored sort anyway.
A./






Re: Unicode Public Review Issues update: BRAILLE

2003-10-07 Thread Peter Kirk
On 06/10/2003 19:08, Christopher John Fynn wrote:

- Original Message - 
From: Jony Rosenne [EMAIL PROTECTED]

 

Please note that Braille is used also for Hebrew. We use the same codes,
   

but
 

they are assigned a different meaning. The reader has to know or guess
   

which
 

language it is.

I don't remember whether Hebrew Braille is written RTL or LTR.

Jony
   

Braille is probably used for a lot of scripts, maybe even *most* scripts
used for modern languages - in Bhutan I know they use Braille for writing
Dzongkha (Tibetan script).
- Chris





 

Presumably it is no more difficult for a multilingual reader to know or 
guess what language is being used than it is for sighted readers to 
tell the difference between e.g. English and French in a Latin script text.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




RE: Unicode Public Review Issues update

2003-10-07 Thread Marco Cimarosti
Jony Rosenne wrote:
 I don't remember whether Hebrew Braille is written RTL or LTR.

Braille is always LTR, even for Hebrew and Arabic.

To be more precise, Braille is always LTR when you read it, but RTL when you
write it manually (because it is engraved on the back side of the paper,
using a dotted rule and a stylus).

_ Marco





RE: Unicode Public Review Issues update (braille)

2003-10-07 Thread Kent Karlsson
 A question about the issues already open: What is the justification
for 
 proposing to make Braille Lo?

Shortly before this came up as a Public Review Issue, I suggested that
Braille characters should not be regarded as ignorable symbols when
collating texts. I.e. that they should have level one weights in the
default weighting table. The reason being that they are more often
used for letters than for other things. (However, I did not ask to make
them Lo...) That would be for the default ordering. Wanting a more
alphabetically proper ordering, would still require tailoring for that
particular correspondence between ordinary letters and Braille, but
not
require converting to the ordinary letters. Each such tailoring would
give level 1 weights to most of the Braille characters used in that
system
of usage.

 Among other things it would make it part of identifiers. 
 However, there's 
 been some suggestion that this is a bad idea. Whether or not 
 a braille 
 symbol actually stands for a letter or a digit or a 
 punctuation mark is 
 entirely dependent on a higher level protocol.

I would agree with your reasoning here. I don't think Braille
should be used for identifiers.

 Also, by making them Lo, any parser that tries to collect 
 words, would run 
 them together with any surrounding regular letters and 
 digits. That seems 
 odd, but perhaps its not any more odd than mixing Devanagari and Han.

I.e., this is not so odd at all (a quite different case from
identifiers).
 
 The original model for these was that your text processing is done in 
 non-Braille, and on the last leg to a device, you would transcode the 
 regular text to a Braille sequence using a domain and 
 language specific 
 mapping. Having the codes in Unicode allows you to preserve 
 'final form' 
 and transmit that as needed w/o having to also transmit the 
 text-to-braille 
 mapping(s) that were used to generate the Braille version of 
 the text. 
 (This assumes that the eventual human reader can do 'autodetection'.)

This does not apply to text that have been manually written in or
translated to Braille (for a particular language). As I have understood
it, writers/transcribers often use(d) peculiar writings, e.g.
abbreviations,
that would not occur in normal text, the abbreviations varied from
scribe to scribe. I'm not familiar with the details though.
Braille can be used also for math and music notation.

B.t.w. Braille often uses state shifts, e.g. for digits. There is a
digits
Braille code, followed by one or more codes for the letters a-j (if the
basic
script is Latin) which then stand for 1, ..., 9, 0 (the list is
terminated with
any non-a-j code; but decoding the Braille for e.g. 12a is ambiguous,
IIUC).

/kent k


smime.p7s
Description: S/MIME cryptographic signature


Re: Unicode Public Review Issues update: BRAILLE

2003-10-07 Thread Asmus Freytag
At 10:32 AM 10/7/03 +0530, [EMAIL PROTECTED] wrote:
The only justification mentioned so far for changing Braille from So to Lo 
is to be able to use Braille in identifiers. I'm not sure why someone 
whould want to use Braille in this way, for a start how would these 
identifiers be translated into Braille?
Braille identifiers only make sense when the whole source file has been 
translated to Braille. However, the parsing semantics applied to it should 
then be determined by the properties of the original characters (before 
applying the Braille mapping). If one does want to work directly with a 
Braill transcoded stream, then such systm must support *dynamic* property 
assignments. That's something that's outside the scope of the Unicode Standard.

In conclusion, it seems that the correct set of *default* properties for 
Braille would be determined by the needs of inserting Braille strings into 
other text (for educational manuals and similar specifications).

As Marco has pointed out that means BIDI = L and I believe it also means 
GC=So, and other properties assigned as they are for other characters that 
share BIDI=L and GC=So.

A./



Re: Unicode Public Review Issues update

2003-10-07 Thread Chris Jacobs

- Original Message - 
From: Marco Cimarosti [EMAIL PROTECTED]
To: 'Jony Rosenne' [EMAIL PROTECTED]; 'Asmus Freytag'
[EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Tuesday, October 07, 2003 11:47 AM
Subject: RE: Unicode Public Review Issues update


 Jony Rosenne wrote:
  I don't remember whether Hebrew Braille is written RTL or LTR.

 Braille is always LTR, even for Hebrew and Arabic.

 To be more precise, Braille is always LTR when you read it, but RTL when
 you
 write it manually (because it is engraved on the back side of the paper,
 using a dotted rule and a stylus).

Looks like those dots in paper have the Mirrored property:

If I take a piece of paper, make dots e.g. 2,3,4 with the 1,2,3,7 column to
the left and the 4,5,6,8 to the right, and then look at the backside then I
see the 1,2,3,7 column displayed to the right.


 _ Marco








Re: Unicode Public Review Issues update (braille)

2003-10-07 Thread Mark E. Shoulson
Kent Karlsson wrote:

The original model for these was that your text processing is done in 
non-Braille, and on the last leg to a device, you would transcode the 
regular text to a Braille sequence using a domain and 
language specific 
mapping. Having the codes in Unicode allows you to preserve 
'final form' 
and transmit that as needed w/o having to also transmit the 
text-to-braille 
mapping(s) that were used to generate the Braille version of 
the text. 
(This assumes that the eventual human reader can do 'autodetection'.)
   

This does not apply to text that have been manually written in or
translated to Braille (for a particular language). As I have understood
it, writers/transcribers often use(d) peculiar writings, e.g.
abbreviations,
that would not occur in normal text, the abbreviations varied from
scribe to scribe. I'm not familiar with the details though.
Braille can be used also for math and music notation.
I'm not sure about variation among users.  I know that Braille as used 
for English (at least in America) has a standard set of short forms (I 
studied Grade II Braille, as it is called, a bit), including symbols for 
common letter-combinations, one-letter abbreviations for common words, 
and sort of escape symbol+letter abbreviations for common word-endings 
and suffixes.

B.t.w. Braille often uses state shifts, e.g. for digits. There is a
digits
Braille code, followed by one or more codes for the letters a-j (if the
basic
script is Latin) which then stand for 1, ..., 9, 0 (the list is
terminated with
any non-a-j code; but decoding the Braille for e.g. 12a is ambiguous,
IIUC).
 

Not so.  There is, indeed, a Braille symbol for numbers which, when 
followed by one or more letters a-j, makes the following string digits 
instead of letters.  There is also, however, a corresponding letter 
sign that can be used to cancel out the effect of the number-shift, or 
to disambiguate in the case of an isolated symbol that might be 
confusing otherwise.  Both of these, I believe (and can look up) are 
also used as escape characters in making suffix short-forms, and are 
unambiguous because as letter/number shifts they appear at the beginning 
of a string, and not in the middle as they would for suffix short-forms 
(which begs the question of how to encode a12.  I presume that 
lettersignanumbersignab would work for the same reason that 
numbersignablettersigna works for 12a).

~mark





Re: Unicode Public Review Issues update: BRAILLE

2003-10-07 Thread Kenneth Whistler
Asmus said:

 In conclusion, it seems that the correct set of *default* properties for 
 Braille would be determined by the needs of inserting Braille strings into 
 other text (for educational manuals and similar specifications).
 
 As Marco has pointed out that means BIDI = L and I believe it also means 
 GC=So, and other properties assigned as they are for other characters that 
 share BIDI=L and GC=So.

Which is a ton of them, including all the squared CJK, circled
symbols, and most of the musical symbols.

The upshot so far seems to be that there is little reason to
change gc=So -- gc=Lo, but that some feel that there is better
reason to change bc=ON -- bc=L for the Braille symbols.

--Ken




Re: Unicode Public Review Issues update

2003-10-06 Thread jon
 The Unicode Technical Committee has posted some new issues for public  
 review and comment. Details are on the following web page:
 
   http://www.unicode.org/review/

A question about the issues already open: What is the justification for proposing to 
make Braille Lo?







Re: Unicode Public Review Issues update

2003-10-06 Thread Asmus Freytag
At 10:29 AM 10/6/03 +0530, [EMAIL PROTECTED] wrote:
 The Unicode Technical Committee has posted some new issues for public
 review and comment. Details are on the following web page:

   http://www.unicode.org/review/
A question about the issues already open: What is the justification for 
proposing to make Braille Lo?
Among other things it would make it part of identifiers. However, there's 
been some suggestion that this is a bad idea. Whether or not a braille 
symbol actually stands for a letter or a digit or a punctuation mark is 
entirely dependent on a higher level protocol.

Also, by making them Lo, any parser that tries to collect words, would run 
them together with any surrounding regular letters and digits. That seems 
odd, but perhaps its not any more odd than mixing Devanagari and Han.

We've given Braille a script ID, since it's used for running text, unlike a 
string of symbols.

There was a lot of discussion in the meeting which is the reason why UTC is 
asking for public input before deciding.

The original model for these was that your text processing is done in 
non-Braille, and on the last leg to a device, you would transcode the 
regular text to a Braille sequence using a domain and language specific 
mapping. Having the codes in Unicode allows you to preserve 'final form' 
and transmit that as needed w/o having to also transmit the text-to-braille 
mapping(s) that were used to generate the Braille version of the text. 
(This assumes that the eventual human reader can do 'autodetection'.)

Needless to say, conceived this way, Braille does not fit neatly into 
Unicode's text handling model. The General Category, being very simplistic, 
can only express a single aspect of a characters use. Usually we can agree 
on what that primary aspect is, so gc is reasonably useful as a quick cut. 
However, Braille is a bit resistant if put to the question: Are you symbol 
or letter?

In reality, the Braille codes are glyph codes. We decided at some point not 
to allow any new types of gc values. If we didn't have that restriction, we 
could assign them an *Sb or *Lb (for *Symbol-Braille or *Letter-Braille). 
But that's an option we don't have.

One thing that we are hoping to learn is whether people are actually using 
these Braille codes and are using them in ways that are or are not 
compatible with the model we describe in 
http://www.unicode.org/versions/Unicode4.0.0/ch14.pdf (see section 14.9). 
In terms of the organization of the book we've clearly sorted Braille among 
the symbols, by the way.

Any comments?

A./





Re: Unicode Public Review Issues update

2003-10-06 Thread Florian Weimer
Rick McGowan wrote:

 The Unicode Technical Committee has posted some new issues for public  
 review and comment. Details are on the following web page:
 
   http://www.unicode.org/review/

Maybe I'm missing something, but I still can't find any reference that
the Unihan.txt file will be released under a license that permits
redistribution (which has been announced in other documents).



RE: Unicode Public Review Issues update

2003-10-06 Thread Jony Rosenne
Please note that Braille is used also for Hebrew. We use the same codes, but
they are assigned a different meaning. The reader has to know or guess which
language it is.

I don't remember whether Hebrew Braille is written RTL or LTR.

Jony

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Asmus Freytag
 Sent: Monday, October 06, 2003 8:58 PM
 To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: Unicode Public Review Issues update
 
 
 At 10:29 AM 10/6/03 +0530, [EMAIL PROTECTED] wrote:
   The Unicode Technical Committee has posted some new issues for 
   public review and comment. Details are on the following web page:
  
 http://www.unicode.org/review/
 
 A question about the issues already open: What is the 
 justification for
 proposing to make Braille Lo?
 
 Among other things it would make it part of identifiers. 
 However, there's 
 been some suggestion that this is a bad idea. Whether or not 
 a braille 
 symbol actually stands for a letter or a digit or a 
 punctuation mark is 
 entirely dependent on a higher level protocol.
 
 Also, by making them Lo, any parser that tries to collect 
 words, would run 
 them together with any surrounding regular letters and 
 digits. That seems 
 odd, but perhaps its not any more odd than mixing Devanagari and Han.
 
 We've given Braille a script ID, since it's used for running 
 text, unlike a 
 string of symbols.
 
 There was a lot of discussion in the meeting which is the 
 reason why UTC is 
 asking for public input before deciding.
 
 The original model for these was that your text processing is done in 
 non-Braille, and on the last leg to a device, you would transcode the 
 regular text to a Braille sequence using a domain and 
 language specific 
 mapping. Having the codes in Unicode allows you to preserve 
 'final form' 
 and transmit that as needed w/o having to also transmit the 
 text-to-braille 
 mapping(s) that were used to generate the Braille version of 
 the text. 
 (This assumes that the eventual human reader can do 'autodetection'.)
 
 Needless to say, conceived this way, Braille does not fit neatly into 
 Unicode's text handling model. The General Category, being 
 very simplistic, 
 can only express a single aspect of a characters use. Usually 
 we can agree 
 on what that primary aspect is, so gc is reasonably useful as 
 a quick cut. 
 However, Braille is a bit resistant if put to the question: 
 Are you symbol 
 or letter?
 
 In reality, the Braille codes are glyph codes. We decided at 
 some point not 
 to allow any new types of gc values. If we didn't have that 
 restriction, we 
 could assign them an *Sb or *Lb (for *Symbol-Braille or 
 *Letter-Braille). 
 But that's an option we don't have.
 
 One thing that we are hoping to learn is whether people are 
 actually using 
 these Braille codes and are using them in ways that are or are not 
 compatible with the model we describe in 
 http://www.unicode.org/versions/Unicode4.0.0/ch14.pdf (see 
 section 14.9). 
 In terms of the organization of the book we've clearly sorted 
 Braille among 
 the symbols, by the way.
 
 Any comments?
 
 A./
 
 
 
 





Re: Unicode Public Review Issues update

2003-10-06 Thread Rick McGowan
Florian Weimer asked:

  http://www.unicode.org/review/

 Maybe I'm missing something, but I still can't find any reference that
 the Unihan.txt file will be released under a license that permits
 redistribution (which has been announced in other documents).

Ah, you're right. It will have the same distribution as everything else,  
just nobody's had time to update it yet since the last draft was posted.  
It's already in the works. If you read the 4.0 UCD it says the Unihan file  
is intended to have the same distribution.

Rick





Re: Unicode Public Review Issues update: BRAILLE

2003-10-06 Thread Christopher John Fynn
- Original Message - 
From: Jony Rosenne [EMAIL PROTECTED]

 Please note that Braille is used also for Hebrew. We use the same codes,
but
 they are assigned a different meaning. The reader has to know or guess
which
 language it is.

 I don't remember whether Hebrew Braille is written RTL or LTR.

 Jony

Braille is probably used for a lot of scripts, maybe even *most* scripts
used for modern languages - in Bhutan I know they use Braille for writing
Dzongkha (Tibetan script).

 - Chris




Soft-dotted (was: RE: Unicode Public Review Issues update)

2003-06-30 Thread Kent Karlsson


Re. the ij ligature and soft-dottedness:

This is a compatibility letter, both in the sense that it has a
compatibility mapping
and is taken from a legacy character encoding.  It is, however, not
necessarily a
character that should not be used.  Even though in most cases it is
sufficient to use
ordinary i and j in sequence to write the Dutch ij, in some cases it may
still be best
to use the ij ligature character, for best spacing, titlecasing, and
vertical layout. 

This was discussed on this list a while ago.  Also discussed on this
list was the following:

The ij can in some cases be acute accented, in which case both of the
dots should
be replaced by acute accents.  The representation of this is quite
straightforward if
ordinary i and j are used.  If, however, the ij ligature is used, the ij
ligature must first
of all be soft-dotted.  Applying just ordinary combining acute accent to
produce an
accent on each part of the ligature may be a bit strange.  It may be
more logical to
apply combining double acute accent to the ij ligature to get the
desired effect.  A
(small) typographic problem is to align the two acute accents over the
constituent
letter bodies.  It is not certain that a grave accent may be applicable
to the ij, but
if it is, the story is similar.  At least one Dutch dictionary (Ter
Laan, Niew Groninger
Wordenboek, 1929) uses a macron over the ij.  If coded as separate i and
j, one would
use 035E;COMBINING DOUBLE MACRON in-between.  If the ij ligature is
used, a
combining macron should be applied after the ligature character to get a
macron
that goes over both the dotless constituent letter bodies.

In each of these cases (of accented ij ligature), the ij ligature must
be soft-dotted.


 Interesting issue for the Latin Small ij Ligature (U+0133):
 Normally the Soft_Dotted issupposed to make disappear one dot when
 there's and additional diacritic above, but many applications may
 keep these two dots above, fitting the diacritic in the middle.

Examples? (Of actual use of such a letter+marks combination, not of
applications that currently do what you say.)

 This proposal would mean that this become illegal, and it promote the
 use of an additional intermediate dot-above diacritic if the dot must
 be kept. 
 
 What would be the interpretation of this dot added on top of the
 ligature? Should it be still a single dot centered above the ij

Probably.  Examples of use?

 digraph, requiring two dots to be encoded if both i and j must
 have their own dot above? 

They would stack on top of eachother (unless you want additional
special rules, which seems very uncalled for).

 Or would this require using a diaeresis instead centered above the
 digraph? 

Probably.  But are there any examples of this in use (ever, not
necessarily
Unicode encoded, or at all digitally encoded)?  If that kind of thing
never
has occured before, it does not really matter very much, and some coarse
approximation will do fine.

 For the modifier letter j or Greek letter yot, this is less ambiguous.
 
 The proposal however is fine for the mathematical variants of i and j,
 (including the double struck italic, for unification reasons)

I think so too (though I don't know what you mean by unification
here).
But there are also cases where math diacritics go over a larger
expression
than just a single-letter variable name (the span being expressed via
markup).  It
is probably not wise to automatically remove the dots on i's and j's in
such cases.
However, for cases where a math diacritic goes on top of just a
single-letter
i-like or j-like name, the dot should automatically be removed (or,
rather,
an alternate dotless glyph be used).  For other cases, like
more-than-a-variable
expressions getting a diacritic, or when actual undotted i or j is
desired
(compare TeXs \imath and \jmath), dotless i and dotless j characters
should
be used. (But that is another matter, though related to the soft-dotted
issue.)

/kent k


 (Note I also posted this comment in the online report form)
 
 -- Philippe.
 
 




Re: Soft-dotted (was: RE: Unicode Public Review Issues update)

2003-06-30 Thread Philippe Verdy
On Monday, June 30, 2003 1:33 PM, Kent Karlsson [EMAIL PROTECTED] wrote:

  Or would this require using a diaeresis instead centered above the
  digraph?
 
 Probably.  But are there any examples of this in use (ever, not
 necessarily Unicode encoded, or at all digitally encoded)?  If that kind
 of thing never has occured before, it does not really matter very much, and
 some coarse approximation will do fine.

I admit this is quite coarse. But as there does not seem to exist any language
for which a single or double dot above would be used over this character, I think
that a sequence like:
ij, combining dot above
would be rendered as a pair of undotted ij, with a single centered dot above.
using the diaeresis to make a version with two dots.

So if one really wants to emulate the past bad behavior of some old fonts
for ij, combining accute accent, where dots were kept, he could use now:
ij, combining diaeresis, combining accute accent
(using the new soft_dotted property of ij which removes its dots when
combining with any ABOVE=230 diacritics, including the diaeresis, so
that this will produce exactly two dots, and not a quad dots).

  For the modifier letter j or Greek letter yot, this is less
  ambiguous. 
  
  The proposal however is fine for the mathematical variants of i and
  j, (including the double struck italic, for unification reasons)
 
 I think so too (though I don't know what you mean by unification
 here).

I am speaking about the few holes in the mathematics block, which
were unified with pre-existing characters in other blocks. So if the
update is accepted for the new mathematics block, it must be
accepted also for these characters not present in these holes but
unified with characters of previously encoded blocks.

-- Philippe.



Re: Unicode Public Review Issues update

2003-06-27 Thread Philippe Verdy
On Friday, June 27, 2003 10:29 PM, Rick McGowan [EMAIL PROTECTED]
wrote: 

 The Unicode Technical Committee has posted a new issue for public
 review and comment. Details are on the following web page:
 
 http://www.unicode.org/review/
 
 Briefly, the new issue is:
 
 Issue #11  Soft Dotted Property
 Proposal: The Unicode Standard has the principle that if an accent
 is applied to an i or j, the base character loses its dot. Such
 characters are called soft-dotted. The  UTC proposes to extend
 this property to a number of characters that do not currently have
 the 
 property. The accompanying document lists the characters
 

Interesting issue for the Latin Small ij Ligature (U+0133):
Normally the Soft_Dotted issupposed to make disappear one dot when
there's and additional diacritic above, but many applications may
keep these two dots above, fitting the diacritic in the middle.

This proposal would mean that this become illegal, and it promote the
use of an additional intermediate dot-above diacritic if the dot must
be kept. 

What would be the interpretation of this dot added on top of the
ligature? Should it be still a single dot centered above the ij
digraph, requiring two dots to be encoded if both i and j must
have their own dot above? 
Or would this require using a diaeresis instead centered above the
digraph? 

For the modifier letter j or Greek letter yot, this is less ambiguous.

The proposal however is fine for the mathematical variants of i and j,
(including the double struck italic, for unification reasons)

(Note I also posted this comment in the online report form)

-- Philippe.




Re: Unicode Public Review Issues update

2003-03-18 Thread Yung-Fong Tang
url please

Rick McGowan wrote:

The Unicode Public Review Issues page has been updated today.

Highlights:

   Closed issue #1 (Language tag deprecation) without any change.
   Updated some deadlines on other issues to June 1, 2003.
   Added a document for issue #7 (tailored normalizations).
   Added an issue #8 regarding properties of math digits.
Regards,
Rick McGowan
Unicode, Inc.
 






Re: Unicode Public Review Issues update

2003-03-18 Thread David Starner
On Tue, Mar 18, 2003 at 10:26:49AM -0800, Yung-Fong Tang wrote:
 url please
 
 Rick McGowan wrote:
 
 The Unicode Public Review Issues page has been updated today.

http://www.google.com

(Yes, it would have been nice to have a URL in the message, but it's
not hard to find the page.)

-- 
David Starner - [EMAIL PROTECTED]
Einstein once said that it would be hard to teach in a co-ed college since
guys were only looking on girls and not listening to the teacher. He was
objected that they would be listening to _him_ very attentively, forgetting
about any girls. But such guys won't be worth teaching, replied the great
man.