Re: Revised N2586R

2003-06-23 Thread Michael \(michka\) Kaplan
(reminded of a South Park Episode... the spelling bee in "Hooked on Monkey
Phonics")

excerpt:
-
MAYOR: Here we go - "kroxldyphivc".
KYLE: What?!?
MAYOR: "kroxldyphivc".
KYLE: Definition?
MAYOR: Something which has a kroxldyph-like quality.
KYLE: Uh, could you use it in a sentence?
MAYOR: Certainly -- " 'Kroxldyphivc' is a hard word to spell."
-

Of course, with a definition and usage example like that, its no wonder Kyle
messed up the word and lost the spelling bee. :-)

MichKa

- Original Message - 
From: <[EMAIL PROTECTED]>
To: "Michael Everson" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Sunday, June 22, 2003 11:07 PM
Subject: Re: Revised N2586R


> It seems to me the proposal would present a stronger case if samples were
> available that were something *other* than an explanation of the symbol in
> a dictionary, encyclopaedia, or other reference. It would be similar to
> these kinds of samples if I were to create a proposal using as a sample
> the Phonetic Symbol Guide, but that might not clearly show if a character
> was something that was merely proposed by someone at one time but never
> actually used -- in such a case, taking a sample from Phonetic Symbol
> Guide does not really demonstrate the need to encode as a character for
> text representation. Likewise, the sample for (e.g.) the fleur-de-lis
> doesn't really provide a case that this should be a character to
> facilitate representation in text. It wouldn't be hard to provide a
> comparable descriptive paragraph that began with an image of the Stars and
> Stripes, but I don't think we'd want to encode the US flag as a character.
>
> I'm not saying that I oppose the proposed characters; just that samples of
> a different nature would make for a stronger case.
>
>
> - Peter
>
>
> --
-
> Peter Constable
>
> Non-Roman Script Initiative, SIL International
> 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
> Tel: +1 972 708 7485
>
>
>




Re: [ot] anyone know of a good "sending accessible emails" guideline page?

2003-06-23 Thread Ben Dougall
On Friday, June 20, 2003, at 01:56  pm, Frank da Cruz wrote:

does anyone know of a simple, explanatory web page, aimed at not too
technical people, based on sending *accessible* email, and if really
necessary attachments and the problems related to attachments
(specifically inaccessibly, not viruses).
i'm looking for a nice concise web page that i can give the address to
people who keep asking me about email attachments and reading email.
more often than not, the problem is with the sender, so i'd like to
find a web page that they can pass to people (who are more than likely
not knowledgeable about computers) in the event of unreadable email 
and
in particular unreadable attachments.

very often an attachment isn't needed (like attaching a ms word
document when emails themselves are text) and i'd like to know about a
web page explaining that thoroughly but simply.
anyone know of such a magical page?

You mean something like this?

  http://www.columbia.edu/kermit/safe.html

It includes sections on email.
thanks Frank, but not really :/

looking for something much more simpler and to-the-point and 
explanatory (a *small* page) that informs senders of the 
inaccessibility issues and how to avoid them by not attaching files 
unnecessarily. a very simple practical guide for how to go about that 
and the reasoning behind it.

for people who aren't really interested in computers as a subject in 
themselves and don't want to get bogged down in involved text about 
them. (yes, there are such people)

i've found 2 pages that are towards the sort of thing i'm after:


(the second one, to start with, i only found a pdf version of it, which 
i thought was quite funny, but then i realised there was also an html 
version)

but those still aren't quite what i'm looking for.

apparently the bbc just did a guide on accessibility issues (such as 
not using proprietary formats) and released it as a pdf (not html)! 
then also as a .doc!! :) oh dear.
(i suppose technically pdf is not a proprietary format (?), but still, 
not far off.)

Ken, on this list, suggested i write this myself. my get out: some m.s. 
software knowledge would be needed, which i don't have. :)




Re: Revised N2586R

2003-06-23 Thread Michael Everson
At 01:07 -0500 2003-06-23, [EMAIL PROTECTED] wrote:
It seems to me the proposal would present a stronger case if samples were
available that were something *other* than an explanation of the symbol in
a dictionary, encyclopaedia, or other reference.
Possibly, but there is only so much time in the day, and I certainly 
did a better job than Mark Davis did with L2/02-361. >:-(

(UTC, please take this as a formal protest at the action taken to 
approve the addition of characters based on a document as flimsy as 
that one. Bad UTC. No biscuit!)

It would be similar to  these kinds of samples if I were to create a 
proposal using as a sample the Phonetic Symbol Guide, but that might 
not clearly show if a character was something that was merely 
proposed by someone at one time but never actually used -- in such a 
case, taking a sample from Phonetic Symbol Guide does not really 
demonstrate the need to encode as a character for
text representation.
I tend to disagree. Symbols have a very different nature than 
phonetic characters do. We have *all* seen the atom sign, and I have, 
as Liungman points out, seen it on maps, though I don't seem to have 
such a map here in the house. Similarly, the fleur-de-lis is a 
well-known named symbol which can be used to represent a number of 
things.

Likewise, the sample for (e.g.) the fleur-de-lis doesn't really 
provide a case that this should be a character to facilitate 
representation in text.
Of course these can be considered to be dingbats, as many symbols 
are. When I look at the set of dingbats and symbols in the Standard, 
I find that there some odd omissions. The gender symbols for instance 
that I proposed in N2587, and a set of religious symbols which I'm 
preparing in another document. More dictionary symbols like the 
SHAMROCK. And so on.

It wouldn't be hard to provide a comparable descriptive paragraph 
that began with an image of the Stars and Stripes, but I don't think 
we'd want to encode the US flag as a character.
That would be a logo.

I'm not saying that I oppose the proposed characters; just that samples of
a different nature would make for a stronger case.
I do the best I can. At the end of the day my document won its case 
and the five characters were accepted.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: [ot] anyone know of a good "sending accessible emails" guideline page?

2003-06-23 Thread Christopher John Fynn
>
> Ken, on this list, suggested i write this myself. my get out:
some m.s.
> software knowledge would be needed, which i don't have. :)

MS Office applications all have a File, Send To... option in
their menus - if people use this (and many do) it generates huge
email files in multiple formats.

- Chris




Re: Unicode not in Quark 6

2003-06-23 Thread Peter_Constable
John Jenkins wrote on 06/22/2003 05:25:40 PM:

> MySQL is also available for Mac OS X 
> ().  I'm not sure 
> of the status of Unicode support, but it seems to be fine if you're not 
> worrying about collating or similar services.  It's what's used at the 
> moment to host the Unihan database, for example.

It (MySQL for OS X) is also being used for the content management system 
that drives scripts.sil.org, and so far we haven't encountered any 
problems in storing text content using Unicode characters from a variety 
of ranges.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485




Re: Revised N2586R

2003-06-23 Thread Philippe Verdy
On Monday, June 23, 2003 2:54 PM, Michael Everson <[EMAIL PROTECTED]> wrote:
> > It wouldn't be hard to provide a comparable descriptive paragraph
> > that began with an image of the Stars and Stripes, but I don't think
> > we'd want to encode the US flag as a character.
> 
> That would be a logo.

Most probably not: such an image of a country flag without its colors is not 
meaningful, this is just a form, a contour, which, if assigned to a character with a 
representative glyph, could be colored with yellow and red stripes, and would not have 
the semantic of the same flag, or could be seen as a caricature.

All flags are meaningful only with a minimum of recognizable colors which have an 
history and meaning. Some logos too, but not all. Also a flag can be exposed by people 
mostly at will under some conditions attached to respect. A logo is copyrighted and is 
a piece of art with a owner that has some exclusive rights on it, like glyphs in a 
font (in most countries except glyphs created in US). So a flag is really a colored 
image, not a logo, not a glyph and thus not a character either...

Even its proportions and design are well defined, unlike many glyphs associated to 
characters, which accept a lot of variations without loosing their character semantic. 
On the opposite, a Christian Cross or a Muslum Moon, qualifies as a character, because 
a representative glyph will accept many variations, without loosing its meaning as a 
religious symbol. Same thing for common symbols encodable as characters like a heart 
symbol for card games, a king symbol for chess, etc... These symbols represent real 
concepts that may have corresponding words when used in a sentence, and used in 
various languages (so they can be part of a formal "script"). On the opposite, the 
various forms of bullets or arrows in Dingbats, are probably excessive, as they can be 
swapped without loosing their semantic. These should have been unified as characters, 
with possible glyph variants.


-- Philippe.




Re: Revised N2586R

2003-06-23 Thread Michael Everson
At 14:03 -0400 2003-06-23, John M. Fiscella wrote:

And don't forget the 'radura'. The radura is to the food industry as 
the 'biohazard' is to medical industry.
Jeepers.

Yet the comments on proposing the radura by various UTC members were 
negative. And it isn't a logo.
http://www.extension.iastate.edu/foodsafety/rad/radura.html

http://www.organicconsumers.org/Irrad/EPA-radura.cfm

http://www.sare.org/htdocs/hypermail/html-home/40-html/0462.html

http://www.fsis.usda.gov/OPPDE/larc/Irradiation_Q_&A.htm

Documents say "mandated by the FDA" -- is it actually international? 
I don't believe I have ever seen it. Can you, heh heh, buy something 
with it on it and scan us a sample of it in use?

Does the FDA or anyone distribute a font with this symbol in it?

Radura is Italian for a 'glade' or 'clearing' for what that is worth.

Depending on answers to the above, I would certainly consider popping 
the RADURA into a bucket with the DO NOT LITTER SIGN. (It still 
irritates me that Ken vetoed that one. I see it *everywhere* on 
packaging from more and more countries.)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Revised N2586R

2003-06-23 Thread Michael Everson
As a point of interest there does not seem to be a single 
standardized HALAL SYMBOL though there is rather a lot of discussion 
about having one. I googled "halal logo".

I also looked for "pork logo". Not much turned up, though there was a 
PDF from the Irish Bord Bia (Food Board)  which mentioned some sort 
of Irish pork logo. I have never seen it. There doesn't seem to be an 
international "Warning! Contains Pork!" symbol.

There doesn't seem to be a NUT SYMBOL used to warn that products 
contain nuts, though there are many, many references to Sainsbury's 
(a British supermarket chain) labelling their peanuts "Warning: 
Contains Nuts".
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Revised N2586R

2003-06-23 Thread Rick McGowan
> And don't forget the 'radura'. The radura is to the food industry as the 
> 'biohazard' is to medical industry. Yet the comments on proposing the
> radura by various UTC members were negative. And it isn't a logo.

Interesting. I haven't noticed this symbol in use, and I do buy food. And  
none of the examples I looked at had it used in running text.

Rick



-
Warning! This warning message contains text!
-




Re: Revised N2586R

2003-06-23 Thread Philippe Verdy
On Monday, June 23, 2003 10:17 PM, Michael Everson <[EMAIL PROTECTED]> wrote:
> There doesn't seem to be a NUT SYMBOL used to warn that products
> contain nuts, though there are many, many references to Sainsbury's
> (a British supermarket chain) labelling their peanuts "Warning:
> Contains Nuts".

What about the many symbols used to signal how clothes can be cleaned, or various 
warning signs on some products to signal the presence of a potentially dangerous 
component, or some risk like electric shocks, possible exposition to dangerous 
radiations, or the many logos use to label quality products or signal its origin, or 
for content rating labels used in various countries, or to markup phone numbers to 
some value-added services ?

All these should be part of logo libraries, even if they are sometimes supported by 
custom fonts, only to ease their reuse on similar products. If we continue, we will 
find requests to standardize symbols for signalization on roads, waters, or railways.
And then why not assignments for individual country codes or language codes used to 
annotate a text? why not then assignments for the many decorative bullets used in 
various publications?

It's true that Windows has such fonts: Marlett (for the GUI interface symbols on 
window buttons), Wingdings and Webdings. But do they need a standardization as they 
appear isolately.

All this is not needed for plain-text, but only in rich-text formats with additional 
markup for the layout or inclusion of logos and images, or with layout construction 
libraries.


-- Philippe.



Re: Revised N2586R

2003-06-23 Thread Michael Everson
According to http://www.fas.usda.gov/GainFiles/200010/30678316.pdf, 
Indonesia requires the radura in packaging. Apparently, it also 
requires some sort of pig-logo to warn if a product contains swine 
derivatives.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Revised N2586R

2003-06-23 Thread Michael Everson
At 23:33 +0200 2003-06-23, Philippe Verdy wrote:

What about the many symbols used to signal how clothes can be cleaned,
A well-defined semantic set that I think deserves encoding. :-)

or various warning signs on some products to signal the presence of 
a potentially dangerous component, or some risk like electric 
shocks, possible exposition to dangerous radiations,
Most of these are already encoded.

or the many logos use to label quality products or signal its 
origin, or for content rating labels used in various countries, or 
to markup phone numbers to some value-added services ?
I don't know what you mean.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


RE: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MA RK

2003-06-23 Thread Kenneth Whistler
Actually, there are a number of loose ends still, as it appears
that some of Rob Mount's questions were not actually answered.

> I understand what you say about word formation, and
> combining marks, and that the Alphabetic
> classification should not be limited to "L"s.  But 
> 30FC is of General Category "Lm" (which should be
> included) and, since version 3.1, is classified explicitly 
> as Alphabetic in DerivedCoreProperties.txt.
> (It appears that formal expression of the Alphabetic 
> property was moved from PropList.txt 
> to DerivedCoreProperties.txt in 3.1.)  I don't understand 
> why its exclusion from the Alphabetic 
> category in 3.0.1 was not an oversight.  But if not, 
> then either the consortium consensus on 
> the classification of this character has changed, or 
> the current classification is in error.

Here's some more background for people. I realize that all
the version information is getting bewilderingly complex, so
not everyone is going to want to research back through all the
versions, particularly when that would mean also trying to
dig back through the UTC decision trail.

>From Unicode Version 2.0 to Unicode Version 3.0.1 I maintained
the PropList.txt file. During that time, it was explicitly
an *informative* file only, and was included in the UNIDATA
directory on that basis, as potentially helpful information, only.

The change to Unicode Version 3.1.0 was a major watershed.
Mark Davis started maintaining the PropList.txt file (and
a number of other files) with a different set of tools
that specified a large number of properties as derived,
via rule, from other properties -- hence the introduction
of the DerivedXXX files. At this point, the UTC reexamined
all of the character properties and changed the status
of some of them. Some of the former properties from PropList.txt
were made normative (and their content adjusted slightly),
some were left informative, some were equated to derived
properties (hence moved to other files), and some were determined
to be uninteresting, and thus were dropped altogether.
The format of PropList.txt also changed completely at this
point.

Now as regards the particular handling of U+30FC, the
treatment in PropList.txt from Unicode 2.0 to Unicode 3.0.1
was consistent:

General Category = Lm
PropList specification: [-Alphabetic] [+Diacritic] [+Extender]
[+Identifier_Part]

The theory behind that was that while U+30FC was Lm, like
many other diacritic letter modifiers it wasn't formally
part of an alphabetic or syllabic set of symbols per se,
so wouldn't be given the Alphabetic property. However,
other implicit derivations for word boundaries or identifier
boundaries should include the [+Extender] characters to
get the expected results. Hence the determination, for
example, that U+30FC was [+Identifier_Part].

Starting with Unicode 3.1.0 and continuing through to Unicode
4.0.0, the treatment is still consistent, although slightly
different:

General Category = Lm
PropList specification: [-Other_Alphabetic] [+Diacritic] [+Extender]
DerivedCoreProperties:  [+Alphabetic] [+ID_Continue]

The General Category, the status as diacritic and extender,
and the derived status as part of identifiers are unchanged.
What has changed, however, is the interpretation of what
"Alphabetic", as a derived property now, means. As Mark pointed
out, it is now derived as:

# Derived Property: Alphabetic
#  Generated from: Lu+Ll+Lt+Lm+Lo+Nl + Other_Alphabetic

By *this* definition, all the Lm characters from Unicode 2.0
on would *also* have been "Alphabetic". And Other_Alphabetic
was consistently developed by subtracting out all the
Lu, Ll, Lt, Lm, Lo, and Nl characters from the preexisting
Alphabetic definition from PropList.txt.

So the correct answer is not that the consensus about the
behavior and properties of U+30FC has changed, but rather
that the inclusiveness of the "Alphabetic" property changed
a little when it was redefined to be a derived property.

Note that for the property more relevant to determination of
things like identifiers (now known as ID_Continue), there has been
*no* change to the behavior of U+30FC since Unicode 2.0.

> Here's a little more background regarding my motivation.  
> The problem occurs in a procedure
> that evaluates whether a user-supplied name can be used 
> as an identifier - for which identification 
> of alphabetic characters is important. 

Actually, as you can see from the above discussion, and
from the discussion of identifiers you mentioned in the
standard, it is ID_Start and ID_Continue which are more
relevant than "Alphabetic" per se. 

> One implementation of isalpha(), purportedly based on 
> Unicode 2.1, indicates that 30FC is an alpha character.  
> The current implementation from the 
> same vendor, based on 3.0.1, classifies it as non-alpha.  
> Presumably the next one will be based 
> on 3.1 or later and will reclassify it, again, as alpha.

The vendor has done somethin

Wash Symbols and Iconography (was Re: Revised N2586R)

2003-06-23 Thread Kenneth Whistler

> At 23:33 +0200 2003-06-23, Philippe Verdy wrote:
> 
> >What about the many symbols used to signal how clothes can be cleaned,

And Michael Everson responded:
 
> A well-defined semantic set that I think deserves encoding. :-)

If what you mean is:

http://www.waschsymbole.de/en/index.html

then some of those are *already* representable using currently encoded
symbols:

U+24B6 CIRCLED LATIN CAPITAL LETTER A
= dry clean with all standard methods
U+24C5 CIRCLED LATIN CAPITAL LETTER P
= dry clean with perchloro-ethylene
U+24BB CIRCLED LATIN CAPITAL LETTER F
= dry clean with fluorine-solvent
U+29BB CIRCLE WITH SUPERIMPOSED X
= do not dry clean
U+25B3 WHITE UP-POINTING TRIANGLE
= bleaching allowed

And "delicate" is the sequence <25CB, 0332>, a large circle
with an underscore. And so on.

But as you can see if you visit that page, there is more than
one standard for such icons -- a European standard and a
Canadian standard. And for all we know, there might be others
as well. The Canadian standard also color-codes the icons,
which was one of Philippe's criteria for where these kinds
of things clearly go over the line of what is appropriate
for encoding as characters.

And the "sethood" of a collection of arbitrary icons is not
sufficient criterion for the "characterhood". Just because
a group of symbolphiles can investigate and come up with
a collection of these things, and just because these things
are *printed* on labels for clothing does not ipso facto make
them characters, any more than the various symbols and
logos related to food (and other) packaging.

Look again at the icons listed above at that site. Clearly, as for
many such symbologies which are supposed to communicate
*WITHOUT* language, we have interesting little pictographic
logics embedded in the symbols to convey meaning. For
instance, a pictograph of a hand iron with one, two, or
three dots inside, supposed to convey the degree of heat
of the iron. Or washtub pictographs with digits in them
to convey water temperature (in degrees Celsius), or with
a pictograph of a hand inserted to indicate "hand wash only".

Such collections of icons are, generically, part of an ongoing
process of the reintroduction of pictographs and (true)
ideographs into writing, to solve commercial and regulatory
issues of globalization. Pictographs proliferate across
Europe because "Europe" the commercial and regulatory
entity is becoming so multilingual that it is utterly
unwieldy to require warnings, labels, and other important
captions (and even instructions) in language-specific
writing.

The alternative--to force everyone to use a dominant (or a
few dominant) official languages--is not PC in Europe. Heck,
it isn't even PC in the U.S., although it is almost
official policy here.

But the implication of this ongoing development needs to be
*considered* by the character encoding committees -- not
just be catered to, by "accident", as it were, by merely
encoding as characters whatever nice little set of iconic
symbols happens to attract our attention this week. There
is a serious question here regarding what is plain text
content and what is this "other stuff" -- an ongoing
evolution of iconic and pictographic symbols that are 
intentionally, by design, disanchored from any particular
language, and are instead intended to convey *concepts*
directly.

I think we are at serious risk of "getting it wrong" if we
just keep encoding sets of icons and pictographs as
characters without clear evidence of their use *like*
characters embedded in what is otherwise clearly plain
text context.

What is obvious is that all this stuff is in rapid ferment
right now. Hundreds of agencies and organizations make these
things up for all kinds of purposes, and which ones catch
on and last and get used with text remains to be seen, in
many instances. Further, looking a little more longterm,
it is unclear where this stuff is headed over the next
century. Will such symbols remain disjunct and be very
product- or situation-specific, while turning over rapidly
as technology or products or regulatory environments
change? Will such symbols evolve towards a global,
standardized, iconography-without-words, existing as a
kind of universal visual sign language for the
communication-impaired who don't share a common language?
Will major existing writing systems evolve to incorporate
more and more such symbols (either individually or globally)
in a kind of mass reintroduction of pictographic and
ideographic principles into writing systems?

I don't know the answers to these questions. But I don't
think that we should, as character encoding specialists,
behave as if they don't matter for what we do. I don't
think it is appropriate to just take a "Gee whiz! Let's
encode that cute set of symbols!" approach to every list
of these things that comes along, without considering
more carefully what the Unicode Standard is for and
how it is going to have to interact with the

Symbols and Iconography (was Re: Revised N2586R)

2003-06-23 Thread Christopher John Fynn
And how about:

http://www.csaa.com/global/articledetail/0,8055,100300%257C2
670,00.html

http://www.csaa.com/global/articledetail/0,8055,100300%257C2
669,00.html

http://www.csaa.com/global/articledetail/0,8055,100300%257C2
668,00.html


- Chris




Re: Revised N2586R

2003-06-23 Thread Peter_Constable
Michael Everson wrote on 06/23/2003 07:54:13 AM:

> We have *all* seen the atom sign, and I have, 
> as Liungman points out, seen it on maps, though I don't seem to have 
> such a map here in the house.

But just because a symbol appears on maps, does that mean it should be 
encoded as a character? I've seen a lot of maps that have a pointed cross 
showing four cardinal points of the compass; should we encode that?


> Similarly, the fleur-de-lis is a 
> well-known named symbol which can be used to represent a number of 
> things.

In text? I've seen it on flags, on license plates, on heraldic crests, but 
can't recall seeing it in text.


> I do the best I can. At the end of the day my document won its case 
> and the five characters were accepted.

So, this isn't a new proposal? These characters have already been 
accepted? (If so, that's fine.)



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485