Re: Hmm, this evolved into an editorial when I wasn't looking :) was: RE: Inappropriate Proposals FAQ

2002-07-12 Thread Kenneth Whistler

Barry Caplan wrote:

> >> At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote:
> >> >Unicode is a character set. Period. 
> >> 
> >> Each character has numerous 
> >> properties in Unicode, whereas they generally don't in legacy 
> >> character sets.
> >
> >Each character, or some characters?
> 
> 
> For all intents and purposes, each character. 
> So, each character has at least one attribute. 

Yes. The implications of the Unicode Character Database include
the determination that the UTC has normatively assigned properties
(multiple) to all Unicode encoded characters.

Actually, it is a little more subtle than that. There are some
properties which accrue to code points. The General Category and
the Bidirectional Category are good examples, since they constitute
enumerated partitions of the entire codespace, and API's need to 
return meaningful values for any code point, including unassigned ones.

Other properties accrue more directly to characters, per se.
They attach to the abstract character, and get associated with
a code point more indirectly by virtue of the encoding of that
character. The numeric value of a character would be a good example
of this. No one expects an unassigned code point or an assigned
dingbat character or a left bracket to have a numeric value property
(except perhaps a future generation of Unicabbalists).
 
> There are no corresponding features in other character sets usually.

Correct. Before the development of the Unicode Standard, character
encoding committees tended to leave that property assignments
either up to implementations (considering them obvious) or up
to standardization committees whose charter was "character
processing" -- e.g. SC22/WG15 POSIX in the ISO context.

The development of a Universal character encoding necessitated
changing that, bringing character property development and
standardization under the same roof as character encoding.

Note that not everyone agrees about that, however. We are
still having some rather vigorous disagreements in SC22 about
who "owns" the problem of standardization of character properties.

> A common definition of "character set" is a list of character 
> you are interested in assigned to codepoints. That fits most 
> legacy character sets pretty well, but Unicode is sooo much 
> more than that.

Roughly the distinction I was drawing between "the Unicode CCS"
and "the Unicode Standard".

> But what if we took a look at it from a different point of view, 
> that the standard is a agreed upon set of rules and building 
> blocks for text oriented algorithms? Would people start to 
> publish algorithms that extend on the base data provided so 
> we don't have to reinvent wheels all the time?

Well the "Unicode Standard" isn't that, although it contains
both formal and informal algorithms for accomplishing various
tasks with text, and even more general "guidelines" for how to
do things.

The members of the Unicode Technical Committee are always
casting about for areas of Unicode implementation behavior
where commonly defined, public algorithms would be mutually
beneficial for everyone's implementations and would assist
general interoperability with Unicode data.

To date, it seems to me that the members, as well as other
participants in the larger effort of implementing the Unicode
Standard, have been rather generous in contributing time
and brainpower to this development of public algorithms. The
fact that ICU is an Open Source development effort is enormously
helpful in this regard.

> If I were to stand in front of a college comp sci class, 
> where the future is all ahead of the students, what proportion 
> of time would I want to invest in how much they knew about legacy 
> encodings versus how much I could inspire them to build from and 
> extend what Unicode provides them?

This problem, of Unicode in the computer science curriculum,
intrigues me -- and I don't think it has received enough attention
on this list.

One of my concerns is that even now it seems to be that CS
curricula not only don't teach enough about Unicode -- they basically
don't teach much about characters, or text handling, or anything
in the field of internationalization. It just isn't an area that
people get Ph.D.'s in or do research in, and it tends to get
overlooked in people's education until they go out, get a job
in industry and discover that in the *real* world of software
development, they have to learn about that stuff to make software
work in real products. (Just like they have to do a lot of
seat-of-the-pants learning about a lot of other topics: building,
maintaining, and bug-fixing for large, legacy systems; software
life cycle; large team cooperative development process;
backwards compatibility -- almost nothing is really built from
scratch!)

> 
> The major work ahead is no longer in the context of building 
> a character standard. Time is fast approaching to decide to keep 
> it small and apply a bit of polish, or focus on the use and 

Re: What Unicode Is (was RE: Inappropriate Proposals FAQ)

2002-07-12 Thread Barry Caplan

At 03:54 PM 7/12/2002 -0700, Kenneth Whistler wrote:
>Suzanne responded:
>
>> > Maybe Unicode is more of a shared set of rules that apply to 
>> > low level data structures surrounding text and its algorithms 
>> > then a character set.
>
>O.k., so now before asserting or denying that "Unicode ... is
>a shared set of rules", it would be helpful to pin down
>first what you are referring to. That might make the ensuing
>debate more fruitful.
Actually, it was me, not Suzanne, that called "Unicode" a shared set of rules. As 
Ferris Bueller once said "I'll take the heat for this." 

I was aware of all of the uses of Unicode that you listed. I have no quarrels with any 
of them. They do point to the fact that the word is overloaded with definitions. Which 
means that readers have to choose the appropriate one from the context. The context of 
the statement above is that the "Unicode" referred to is the Standard, and all 
associated documentation. Not Unicode the Consortia which manages the Standard. Not 
Unicode the way of life :)

I did intend to throw open a debate about the long term future of Unicode the Standard 
and by extension Unicode the Consortia. Since Suzanne is writing "What is Unicode and 
is not Unicode FAQ", I think the answer to that is going to be very definitely colored 
by the answer to the related question "What will Unicode become?", e.g. Unicode 6.0, 
7.0, 8.0, etc. 

See my previous msg, subject line: "Hmm, this evolved into an editorial when I wasn't 
looking :) " for some thoughts on that subject.


Barry Caplan
www.i18n.com





What Unicode Is (was RE: Inappropriate Proposals FAQ)

2002-07-12 Thread Kenneth Whistler

Suzanne responded:

> > Maybe Unicode is more of a shared set of rules that apply to 
> > low level data structures surrounding text and its algorithms 
> > then a character set.
> 
> Sounds like the start of a philosophical debate. 
> 
> If Unicode is described as a set of rules, we'll be in a world of hurt.

> (On a serious note, these exceptions are exactly what make writing some
> sort of "is and isn't" FAQ pretty darned hard. 

Hmm. Since the discussion which started out trying to specify a
few examples of what kinds of entities would be inappropriate to
proffer for encoding as Unicode characters seems to be in danger
of mutating into the recurrent "What is Unicode?" question,
perhaps its time to start a new thread for the latter.

And now for some ontological ground rules.

When trying to decide what a "thing" is, it helps not to use
an attribute nominatively, since that encourages people to
privately visualize the noun the attribute is applied to,
but to do so in different ways -- and then to argue past each
other because they are, in the end, talking about different
things.

"Unicode" is used attributatively of a number of things, and
if we are going to start arguing/discussing what "it" is, it
would be better to lay out the alternative "it"s a little
more specifically first.

1. The Unicode *Consortium* is a standardization organization.
It started out with a charter to produce a single standard,
but along the way has expanded that charter, in response to
the desire of its membership. In addition to "The Unicode
Standard", it now has adopted a terminology that refers to
some of its other publications as "Unicode Technical Standards"
[UTS], of which two formally exist now: UTS #6 SCSU, and
UTS #10 Unicode Collation Algorithm [UCA].

It is important to keep this straight, because some people,
when they say "Unicode" are talking about the *organization*,
rather than the Unicode Standard per se. And when people talk
about "the standard", they are generally referring to "The
Unicode Standard", but the Unicode Consortium is actually
responsible for several standards.

2. The Unicode *Standard* itself is a very complex standard, consisting
of many pieces now. To keep track of just what something like
"The Unicode Standard, Version 3.2" means, we now have to
keep web pages enumerating all the parts exactly -- like
components in an assemble-your-own-furniture kit. See:
http://www.unicode.org/unicode/standard/versions/

In any one particular version, the Unicode Standard now consists
of a book publication, some number of web publications
(referred to as Unicode Standard Annexes [UAX]), and a
large number of contributory data files -- some normative and
some informative, some data and some documentation. These
definitions, including the exact list of contributory
data files and their versions, are themselves under tight
control by the Unicode Technical Committee, as they constitute
the very *definition* of the Unicode Standard. It is not
by accident that the version definitions start off now with
the following wording:

"The Unicode Standard, Version 3.2.0 is defined by the following
list..."

and so on for earlier versions.

3. The Unicode *Book* is a periodic publication, constituting the
central document for any given version of the Unicode *Standard*,
but is by no means the entire standard. The book, in turn,
is very complex, consisting of many chapters and parts, some
of which constitute tightly controlled, normative specification,
and some of which is informative, editorial content.

The "book" now also exists in an online version (pdf files):
http://www.unicode.org/unicode/uni2book/u2.html
which is *almost* identical to the published hardcover book,
but not quite. (The Introduction is slightly restructured,
the online glossary is restructured and has been added to,
the charts are constructed slightly differently and have
introductory pages of their own, etc.)

4. The Unicode *CCS* [coded character set] is the mapping of the
set of abstract characters contained in the Unicode repertoire
(at any given version) to a bunch of code points in the
Unicode codespace (0x..0x10). Technically speaking, it
is the Unicode *CCS* which is synchronized closely with
ISO/IEC 10646, rather than the Unicode *Standard*. 10646 and
the Unicode CCS have exactly the same coded characters (at
various key synchronization points in their joint publication
histories), but the *text* of the ISO/IEC 10646 standard doesn't
look anything like the *text* of the Unicode Standard, and the
Unicode Standard [sensum #2 above] contains all kinds of
material, both textual and data, that goes far beyond the scope
of 10646. 

There are other standards produced by some national
bodies that are effectively just translations of 10646
(GB 13000 in China, JIS X 0221 in Japan), but the Unicode Standard
is nothing like those.

Finally, the attribute "Unicode ..." can be applied to all
kinds of other "things" characteristic of the Unicode Sta

Hmm, this evolved into an editorial when I wasn't looking :) was: RE: Inappropriate Proposals FAQ

2002-07-12 Thread Barry Caplan

At 05:13 PM 7/12/2002 -0400, Suzanne M. Topping wrote:
>> -Original Message-
>> From: Barry Caplan [mailto:[EMAIL PROTECTED]]
>> 
>> At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote:
>> >Unicode is a character set. Period. 
>> 
>> Each character has numerous 
>> properties in Unicode, whereas they generally don't in legacy 
>> character sets.
>
>Each character, or some characters?


For all intents and purposes, each character. Chapter 4.5 of my Unicode 3.0 book says 
" The Unicode Character Database on the CDROM defines a General Category for all 
Unicode characters"

So, each character has at least one attribute. One could easily say that each 
character also has an attribute for "isUpperCase" of either true of false, and so on.

There are no corresponding features in other character sets usually.


>> Maybe Unicode is more of a shared set of rules that apply to 
>> low level data structures surrounding text and its algorithms 
>> then a character set.
>
>Sounds like the start of a philosophical debate. 

Not really. I have been giving presentations for years, and I have seen many others 
give similar presentations. A common definition of "character set" is a list of 
character you are interested in assigned to codepoints. That fits most legacy 
character sets pretty well, but Unicode is sooo much more than that.



>If Unicode is described as a set of rules, we'll be in a world of hurt.


Yeah, one of the heaviest books I own is Unicode 3.0. I keep it on a low shelf so the 
book of rules describing Unicode doesn't fall on me for just that reason. this is 
earthquake country after all  :)


>I choose to look at this stuff as the exceptions that make the rule.


I don't really know if it is possible to break down Unicode into more fundamental 
units if you started over. Its complexity is inherent in the nature of the task. My 
own interest is more in getting things done with data and algorithms that use the type 
of material represented by the Unicode standard, more so than the arcania of the 
standard itself. So it doesn't bother me so much that there are exceptions - as long 
as we have the exceptions that everyone agrees on, that is fine by me because it means 
my data and at least some of my algorithms are likely to be preservable across systems.


>(On a serious note, these exceptions are exactly what make writing some
>sort of "is and isn't" FAQ pretty darned hard. 


Be careful what you ask for :)


>I can't very well say
>that Unicode manipulates characters given certain historical/legacy
>conditions and under duress. 

Why not? It is true.

But what if we took a look at it from a different point of view, that the standard is 
a agreed upon set of rules and building blocks for text oriented algorithms? Would 
people start to publish algorithms that extend on the base data provided so we don't 
have to reinvent wheels all the time?

I'm just brainstorming here, this is all just coming to me now. 

If I were to stand in front of a college comp sci class, where the future is all ahead 
of the students, what proportion of time would I want to invest in how much they knew 
about legacy encodings versus how much I could inspire them to build from and extend 
what Unicode provides them?

Seriously, most of the folks on this list that I know personally, and I include myself 
in this category, are approaching or past the halfway point in our careers. What would 
we want the folks who are just starting their careers now to know about Unicode and do 
with it by the time they reach the end of theirs, long after we have stopped working?

For many applications, people are not going to specialize in i18n/l10n issues. They 
need to know what the appropriate building text based blocks are, and how they can 
expand on them while still building whatever they are working on.

Unicode at least hints at this with the bidi algorothm. Moving forward should other 
algorithms be codified into Unicode, or as separate standards or defacto standards? I 
am thinking of "Japanese word splitting algorithm". There are proprietary products 
that do this today with reasonable but not perfect results. Are they good enough that 
the rules can be encoded into a standard? If so, then someone would build an open 
implementation, and then there would always be this building block available for 
people to use.

I am sure everyone on this list can think of their own favorite algorithms of this 
type, based on the part of Unicode that interests you the most. My point is that the 
raw information already in unicode *does* suggest the next level of usage, and the 
repeated newbie questions that inspired this thread suggest the need for a 
comprehensive solution at a higher level then a character set provides. Maybe part of 
this means including or at least facilitating the description of lowlevel text 
handling algorithms.

>If I did, people would be scurrying around
>trying to figure out how to foment the duress.)


The acc

Status update re. Inappropriate Proposals FAQ

2002-07-12 Thread Suzanne M. Topping

I'm nearly done playing catchup after vacation and hope to begin
extracting concepts for the FAQ next week. 

Thanks to all who've submitted input, as conflicting and varied as it
is/was.

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, July 03, 2002 8:53 
> 
> 
> I would like to once again suggest that we refocus this 'FAQ' 
> 
> AWAY from a repetition of the "Principles and Procedures" 
> document maintained
> by WG2 and containing the explanation of what constitutes a 
> valid *formal*
> proposal.
 




RE: Inappropriate Proposals FAQ

2002-07-12 Thread Suzanne M. Topping

> -Original Message-
> From: Barry Caplan [mailto:[EMAIL PROTECTED]]
> 
> At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote:
> >Unicode is a character set. Period. 
> 
> Each character has numerous 
> properties in Unicode, whereas they generally don't in legacy 
> character sets.

Each character, or some characters?

> Maybe Unicode is more of a shared set of rules that apply to 
> low level data structures surrounding text and its algorithms 
> then a character set.

Sounds like the start of a philosophical debate. 

If Unicode is described as a set of rules, we'll be in a world of hurt.

> The Unicode consortium very wisely keeps it's focus narrow. 
> It provides
> >a mechanism for specifying characters. Not for manipulating them, not
> >for describing them, not for making them twinkle.
> 
> All true, except for some special cases (BOM, bidi issues and 
> algoirthms, vertical variants, etc).Not saying those 
> shouldn't be in there, just that they are useful only in the 
> use of algorithms that are explicit (bi-di) or assumed (upper 
> case/lower case, vertical/horizontal) etc.


Why mess up a nice clean statement simply because of a few hard facts? 


I choose to look at this stuff as the exceptions that make the rule.

(On a serious note, these exceptions are exactly what make writing some
sort of "is and isn't" FAQ pretty darned hard. I can't very well say
that Unicode manipulates characters given certain historical/legacy
conditions and under duress. If I did, people would be scurrying around
trying to figure out how to foment the duress.)




RE: Inappropriate Proposals FAQ

2002-07-12 Thread Barry Caplan

At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote:
>Unicode is a character set. Period. 


Well, maybe. But in a much broader sense then the character sets it subsumes in its 
listings. Each character has numerous properties in Unicode, whereas they generally 
don't in legacy character sets.

Maybe Unicode is more of a shared set of rules that apply to low level data structures 
surrounding text and its algorithms then a character set.

The Unicode consortium very wisely keeps it's focus narrow. It provides
>a mechanism for specifying characters. Not for manipulating them, not
>for describing them, not for making them twinkle.

All true, except for some special cases (BOM, bidi issues and algoirthms, vertical 
variants, etc).Not saying those shouldn't be in there, just that they are useful only 
in the use of algorithms that are explicit (bi-di) or assumed (upper case/lower case, 
vertical/horizontal) etc.

In many cases, these algorthms are not well known, even amongst the cognoscenti, or 
generally available in nice libraries. Anyone for an open source Japanese word 
splitting library (I know not taking a look at ICU before I press send is going to 
come back to haunt me on this, but if it is in there, then substitute something that 
isn't :)

Barry Caplan
www.i18n.com





RE: Inappropriate Proposals FAQ

2002-07-11 Thread Suzanne M. Topping

Apologies for the delayed response to this thread, I've been out of
town. 

 -Original Message-
> From: William Overington [mailto:[EMAIL PROTECTED]]
> Sent: Friday, July 05, 2002 10:22 AM
> 
> For the avoidance of doubt I am not saying that the Unicode Technical
> Committee should necessarily accept such items as your 
> furniture idea for
> encoding, I am simply saying that any decision as to what may 
> be encoded and
> what shall and what shall not be encoded should be made by the Unicode
> Technical Committee on the basis of the scientific situation 
> at the time
> that an encoding proposal is formally considered.  I feel 
> that it would be a
> major error for the Unicode Consortium to publish a FAQ document which
> prejudices the fair consideration of characters based upon 
> new technologies
> which may arise in the future.

While your thoughts on executing the floor plan idea are truly
gobsmacking, I have to confess that I'd raised the concept precisely
because it is -not- an appropriate issue for Unicode.

Unicode is a character set. Period. 

When setting out on any endeavor, you have to be clear on what the
intent is. If you want to go to the park and have a picnic, you set out
parameters for that activity. If you allow a bunch of people to stop you
along the way to buy shoes, see a movie, visit their aunt in the
hospital, and get the oil changed in their car, you probably won't be
able to accomplish the initial goal. 

If you develop a program for creating room layouts using graphics of
furniture and architectural details, you probably shouldn't include
modules to manage the drug histories of AIDS victims. 

That doesn't make tracking drug histories of AIDS victims unimportant,
it means that they aren't a logical set of requirements to add to a room
layout program.

The Unicode consortium very wisely keeps it's focus narrow. It provides
a mechanism for specifying characters. Not for manipulating them, not
for describing them, not for making them twinkle.

You clearly have widely ranging ideas for unique text and symbol
applications. It would be great if you could channel that energy into
developing ideas for a manipulation layer that could take Unicode
characters, manipulate them, and deliver them in a cross-platform
portable way which would allow them to be displayed and used in the ways
that you envision.

As recent discussions on this list have shown, Unicode is just one piece
in the puzzle. Font and rendering issues for many languages remain
serious stumbling blocks for actual use, even though the characters
themselves are encoded. Any work you could do toward advancement of a
manipulation layer that would ease the task of rendering characters as
they are actually needed and used would be a tremendous boon. I would
imagine that you would find a reasonable level of interest from a wide
range of communities; font developers, bidi word processing developers,
accessibility experts, minority script advocates, etc. I'll bet that
some of the regular old Unicodies might even want to listen in. 

It would be sad if your energy and enthusiasm were dampened by the
repeated denials you receive through this list. The ideas you generate
are interesting, and often worth investigation. However, they are not
appropriate additions to Unicode.

I'm setting up a new group which can hopefully act as an appropriate
venue for these types of discussions. As soon as I come up with a decent
name for it, I'll send off an invitation with instructions for joining
to the Unicode list.

All the best,

Suzanne Topping
BizWonk Inc.
[EMAIL PROTECTED]
 




Re: Inappropriate Proposals FAQ

2002-07-05 Thread Michael Everson

William,

For the gods' sake reign in those hares.

Interchange protocols for architectural computer-aided design already 
exists. Character encoding does not apply to anything like that, 
because there aren't any characters. Object code has nothing to do 
with character encoding. Your caveat, that you are saying that

>any decision as to what may be encoded and what shall and what shall 
>not be encoded should be made by the Unicode Technical Committee on 
>the basis of the scientific situation at the time that an encoding 
>proposal is formally considered. I feel that it would be a major 
>error for the Unicode Consortium to publish a FAQ document which 
>prejudices the fair consideration of characters based upon new 
>technologies which may arise in the future.

is completely unnececessary. We know quite well what we are doing. We 
are hoping that with diligent study you will figure it out and get on 
board. But as Ken has said there is no scientific theory left to 
puzzle out. There may be aguments as to what specific symbols we wish 
to add (some people hate them, some people like them) and there the 
question is one of usage and the semantics of the symbols in general.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Inappropriate Proposals FAQ

2002-07-05 Thread William Overington

Suzanne M. Topping wrote as follows.

>I see the need for perhaps two entries: one which states clearly what
>Unicode is NOT, and another which lists a few examples of innapropriate
>proposals and why they would not be considered. This section would
>probably refer to the "what Unicode isn't" entry for support of the
>"why"s.
>
>I have a few ideas for fictional proposals to use as examples (my room
>layout idea, and Mark's 3-D Mr. Potato Head representation), but I could
>use another one or two if anyone feels creative. The closer to being
>believable, the better, I suppose. (An alternative would be to use
>real-life proposals, and state why they were not accepted, but I thought
>it more politic to keep it fictional...)

Well, having seen your furniture and room layout idea, presented in the
Unicode list, I figured out the method to use to enable your room layout
idea to be produced, using the technique, novel as far as I know, of
allowing a glyph to contain some software which could be obeyed by the
rendering system so as to rotate the points of the Bézier curves of the
contours of the glyphs of the items of furniture.  This seems to me to be
something of a breakthrough in the possibilities for fonts, as including
software inside a font which could be obeyed by the rendering system would
allow a rendering system to be customized from within a font.  It would seem
a pity to restrict the future development of the concept by a Unicode
Consortium issued FAQ document stating that Unicode will not encode such
symbols when it seems that it would be relatively straightforward to
implement such fonts.  The font would need to contain the software that is
to be obeyed.  This could be organized so as to be accessed when a glyph is
selected, with a central place within the font to store any subroutines
called from within the software of the individual glyphs.  If this software
were in some appropriate portable software format, then the specification of
the font format would perhaps not be that difficult, it could be part of an
advanced font format that supports both chromatic font information and
software in the fonts.  For example, the software in the font could be
specified to be written in 1456 object code.

http://www.users.globalnet.co.uk/~ngo/1456.htm

1456 object code already supports double precision floating point items,
integers, characters, strings, complex numbers and quaternions as standard
types.  Groups are also supported as a type experimentally.

Consideration of this concept of software within the font has lead to
consideration of how the position and rotation angle of the individual items
of furniture could be set to an initial position from within the document
and also as to how they could be adjusted by the end user using facilities
set up from within the document and this has lead to the idea of having the
document be able to open and customize a control panel, which control panel
could contain buttons and scrollbars and so on and also a polar scrollbar
for continuous rotational adjustment.  It would seem, given the fact that
1456 object code supports quaternions and also has some functions of a
quaternion variable built in as standard that this could be extended to
three-dimensional rotations quite straightforwardly for applications that
could use three-dimensional rotations.

This is the sort of computational power which I feel that multimedia should
be able to utilize, by including Unicode codes directly in a text file, so
that the rendering system produces the control panel as instructed by the
Unicode codes.  This seems to be directly permissible within the definition
of character in Annex B of the ISO document which was discussed recently,
though perhaps not within the definition of character used by the Unicode
Consortium at the present time.  I feel that such ideas should not be thrown
out by the Unicode Consortium publishing a FAQ document which would prevent
it considering for inclusion glyphs in regular Unicode which could make good
use of such technological advances.

For the avoidance of doubt I am not saying that the Unicode Technical
Committee should necessarily accept such items as your furniture idea for
encoding, I am simply saying that any decision as to what may be encoded and
what shall and what shall not be encoded should be made by the Unicode
Technical Committee on the basis of the scientific situation at the time
that an encoding proposal is formally considered.  I feel that it would be a
major error for the Unicode Consortium to publish a FAQ document which
prejudices the fair consideration of characters based upon new technologies
which may arise in the future.

William Overington

5 July 2002



















RE: Inappropriate Proposals FAQ

2002-07-03 Thread asmusf

I would like to once again suggest that we refocus this 'FAQ' 

AWAY from a repetition of the "Principles and Procedures" document maintained
by WG2 and containing the explanation of what constitutes a valid *formal*
proposal.

AWAY from any attempt to cover *all* aspects that could make a proposal
inappropriate, and away from any schema for a complete classification of the
universe of possible proposals.

TOWARDS a set of a few -easily understood and not contentious- examples of
things that have been ruled out of bounds - with a clear pointer to the formal
document with its typology of scripts. (By all means, point prominently to the
roadmap as well).

Doing anything else will take a lot of work, both initially and in constantly
tweaking it; cause a lot of confusion (if it contains many items that are in
fact in a gray zone) and can weaken our understanding of which set of 'rules'
are the ones we really operate under.

A./

On Wed, 3 Jul 2002 23:24:01 +0100 Michael Everson <[EMAIL PROTECTED]>
wrote:

I would NOT like to see our committees' hands tied by taking this 
list as more than guidelines. I understand that it is for an FAQ but 
there should be text therein to emphasize that these are not binding.






Re: Inappropriate Proposals FAQ

2002-07-03 Thread Michael Everson

At 15:17 -0600 2002-07-03, John H. Jenkins wrote:
>On Wednesday, July 3, 2002, at 02:23 PM, Murray Sargent wrote:
>
>>as something inappropriate. Question: how does one code up (presumably
>>with markup) a caret over a jk pair in a math expression? The dot on the
>>j should be missing for this case, but how does one communicate that to
>>a font if there's no code for a dotless j? It seems that dotless j is
>>needed for some mathematical purposes.
>>
>
>The glyph is; the character isn't.  There are also accented j's 
>which are based on a dotless-j.  The way we do it is include a glyph 
>called "dotlessj" in the font, and have the tables set up so that 
>whenever "j" is found with an accent, dotlessj is substituted.

This is a very good answer and should be in the FAQ.

There may be a dotless j as a character in one of the Nordic phonetic 
alphabets. But even if there were, it would be wrong to use it for a 
decomposed Esperanto J WITH CIRCUMFLEX.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




RE: Inappropriate Proposals FAQ

2002-07-03 Thread Michael Everson

I would NOT like to see our committees' hands tied by taking this 
list as more than guidelines. I understand that it is for an FAQ but 
there should be text therein to emphasize that these are not binding.

At 19:10 + 2002-07-03, Timothy Partridge wrote:
>Why not just presentation glyphs in general? We seem to have queries about
>Indian cojuncts fairly frequently.
>
>Some more suggestions (some of which have covered from other angles already)
>
>- No scripts with a limited body of text in existance. (No need to exchange
>or analyse on computer.) E.g. Phaistos disk script

If the Phaistos disk were bilingual and deciphered, it could be added 
even if there were only one document. Why not?

>- No scripts which are poorly understood and it is not clear as to what the
>characters are. E.g. Rongo-rongo.

True.

>- No symbols that are just a picture of something with no other meaning e.g.
>a dog. (These tend not to have a fixed conventional form.)

For instance, Blissymbols has a dog symbol in it. Granted, 
Blissymbols is a separate script so maybe that isn't so convincing. 
But what if a series of hotel symbols were added, with things like NO 
SMOKING, NO DOGS, GUIDE DOGS appeared? Those do have some sort of 
real semantic even though the glyphs may vary.

>- No symbols that are only used in diagrams rather than running text. e.g.
>electrical component symbols.

Probably unassailable.

>- No personal, ideosyncratic or company logos. E.g. the artist when he was
>not known as Prince.

This IS a rule.

>- No archaic styles of existing characters. E.g. dotless j.

There are some archaic characters already encoded, and N'Ko is going 
to have two of them. Probably.

>- No control codes for fancy text. E.g. begin bold

We have BEGIN SLUR in Western Music already. Might have use for BEGIN 
and END CARTOUCHE in Egyptian -- or might not. Research continues.

>- No characters that can be obtained by using a different font with existing
>characters and have no semantic difference from the existing characters.

Such as?

>- No proposals to rename existing characters. (But a clarifying note 
>might be added.)

This IS a rule.

>- No proposals to reposition existing characters, e.g. so they sort better.

This IS a rule.

>- No proposals for a newly invented character since putting it in the
>standard would help promote its use. (Significant usage must come first.)

We did encode the GREEK KAI SYMBOL, and when I proposed it, I hoped 
that it would promote its use. Why? Because I saw a lot of 
hand-painted signage in Greece which used it, but machine-printed 
signage which used the AMPERSAND instead. I thought that was pretty 
unfortunate.

But I DIDN'T invent it. It is centuries old!

Playing devil's advocate here, just a bit.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Inappropriate Proposals FAQ

2002-07-03 Thread John H. Jenkins


On Wednesday, July 3, 2002, at 02:23 PM, Murray Sargent wrote:

> as something inappropriate. Question: how does one code up (presumably
> with markup) a caret over a jk pair in a math expression? The dot on the
> j should be missing for this case, but how does one communicate that to
> a font if there's no code for a dotless j? It seems that dotless j is
> needed for some mathematical purposes.
>

The glyph is; the character isn't.  There are also accented j's which are 
based on a dotless-j.  The way we do it is include a glyph called 
"dotlessj" in the font, and have the tables set up so that whenever "j" is 
found with an accent, dotlessj is substituted.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://homepage.mac.com/jenkins/





RE: Inappropriate Proposals FAQ

2002-07-03 Thread Murray Sargent

Timothy Partridge included the restriction

- No archaic styles of existing characters. E.g. dotless j.

as something inappropriate. Question: how does one code up (presumably
with markup) a caret over a jk pair in a math expression? The dot on the
j should be missing for this case, but how does one communicate that to
a font if there's no code for a dotless j? It seems that dotless j is
needed for some mathematical purposes.

Thanks
Murray




RE: Inappropriate Proposals FAQ

2002-07-03 Thread Timothy Partridge

Marco Cimarosti recently said:

> - No presentation glyphs for shapes that can already be obtained using
> regular characters in conjunction with ZWJ or ZWNJ.

Why not just presentation glyphs in general? We seem to have queries about
Indian cojuncts fairly frequently.

Some more suggestions (some of which have covered from other angles already)

- No scripts with a limited body of text in existance. (No need to exchange
or analyse on computer.) E.g. Phaistos disk script

- No scripts which are poorly understood and it is not clear as to what the
characters are. E.g. Rongo-rongo.

- No symbols that are just a picture of something with no other meaning e.g.
a dog. (These tend not to have a fixed conventional form.)

- No symbols that are only used in diagrams rather than running text. e.g.
electrical component symbols.

- No personal, ideosyncratic or company logos. E.g. the artist when he was
not known as Prince.

- No archaic styles of existing characters. E.g. dotless j.

- No control codes for fancy text. E.g. begin bold

- No characters that can be obtained by using a different font with existing
characters and have no semantic difference from the existing characters.

- No proposals to rename existing characters. (But a clarifying note might be added.)

- No proposals to reposition existing characters, e.g. so they sort better.

- No proposals for a newly invented character since putting it in the
standard would help promote its use. (Significant usage must come first.)

   Tim

-- 
Tim Partridge. Any opinions expressed are mine only and not those of my employer





Re: Inappropriate Proposals FAQ

2002-07-02 Thread Barry Caplan

At 10:01 AM 7/2/2002 -0400, Suzanne M. Topping wrote:
>I have a few ideas for fictional proposals to use as examples (my room
>layout idea, and Mark's 3-D Mr. Potato Head representation), but I could
>use another one or two if anyone feels creative. The closer to being
>believable, the better, I suppose. (An alternative would be to use
>real-life proposals, and state why they were not accepted, but I thought
>it more politic to keep it fictional...)


There was a discussion last year about a symbol to represent pi/2 or pi/4 or something 
like that. If you want to fictionalize that to some other fraction of a mathematical 
constant, that might work (e/2 perhaps?)

Barry Caplan
www.i18n.com





Re: Inappropriate Proposals FAQ

2002-07-02 Thread Wm Seán Glen



How about symbols from electronics and hydraulics? Schematic 
symbols.
Wm Seán Glen

  - Original Message - 
  From: 
  Suzanne M. 
  Topping 
  To: Unicode (E-mail) 
  Sent: Tuesday, 02 July, 2002 7:01
  Subject: Inappropriate Proposals 
FAQ
  I have a few ideas for 
  fictional proposals to use as examples (my roomlayout idea, and Mark's 3-D 
  Mr. Potato Head representation), but I coulduse another one or two if 
  anyone feels creative. Thanks in advance for your input,Suzanne 
  ToppingBizWonk Inc.[EMAIL PROTECTED]


Re: Inappropriate Proposals FAQ

2002-07-02 Thread Michael Everson

At 12:38 -0400 2002-07-02, ÇÎÅZÅZÅZÅZ ÇÎÅZÅZÅZ wrote:
>I have a few ideas:
>
>Fictional scripts that would probably be rejected, such as the 
>script of the Codex Seraphinianus

Certainly not. Tengwar and Cirth  are certain to be encoded. The 
Codex script would probably not be encoded because it occurs in only 
one manuscript and is undeciphered.
-- 
Michael Everson *** Everson Typography *** http://www.evertype.com




Re: Inappropriate Proposals FAQ

2002-07-02 Thread $B$m!;!;!;!;(B $B$m!;!;!;(B
I have a few ideas:

Fictional scripts that would probably be rejected, such as the script of 
the Codex Seraphinianus

A "fictional" Hanzi (specifically, a Hanzi made up of the "woman" radical 
plus the character for "walk"), which I am attaching a crude image of. The 
proposer either (1) used this character in a novel once (or has seen it 
used in a novel), or (2) he wants to use it as a symbol for the length unit 
of the new system of measurement he invented.


$B==0l$A$c$s??$N0&$OB8:_$7$J$$$N!)(B

_
$B$-$C$H8+$D$+$k$"$J$?$N?75o!!ITF0;:>pJs$O(B MSN $B=;Bp$G(B http://house.msn.co.jp/


RE: Inappropriate Proposals FAQ

2002-07-02 Thread Marco Cimarosti

Suzanne M. Topping wrote:
> I have a few ideas for fictional proposals to use as examples (my room
> layout idea, and Mark's 3-D Mr. Potato Head representation), 
> but I could use another one or two if anyone feels creative.

Today I don't feel very creative, perhaps because deliberating inventing bad
ideas does not appeal too much to my creativeness. :-)

But perhaps I have some suggestions for the less creative part of the FAQ,
which is: listing the existing policies for excluding some classes of
proposals.

In my understanding, a few such policies are:

- No precomposed ligatures which can be encoded using a sequence of existing
character (possibly joined by ZWJ's);

- No precomposed "accented characters" which can be composed using an
existing character and one or more existing combining diacritics;

- No clones of existing characters whose sole purpose is making a *logical*
differentiation from some existing characters (e.g., hex digits looking
identical to existing characters "0..9" and "A...F"; or a symbol for "meter"
looking identical to Latin "m");

- No clones of existing characters whose sole purpose is making a
*graphical* differentiation from some existing characters (e.g., a Serbian
letter "t", disunified from Russian on the basis that italics looks
different in the two languages);

- No presentation glyphs for shapes that can already be obtained using
regular characters in conjunction with ZWJ or ZWNJ.

_ Marco




Re: Inappropriate Proposals FAQ

2002-07-02 Thread john . colby

But would not using rejected proposals (as well as the fictional ones) be closer to 
the truth and therefore more accurate?

John

>  from:"Suzanne M. Topping" <[EMAIL PROTECTED]>
>  date:Tue, 02 Jul 2002 15:01:16
>  to:  [EMAIL PROTECTED]
>  subject: Re: Inappropriate Proposals FAQ
> 
> (An alternative would be to use
> real-life proposals, and state why they were not accepted, but I thought
> it more politic to keep it fictional...)
>