Re: Patent on æ ø å

2002-11-22 Thread Barry Caplan
I met these guys at a trade show a couple of years ago and without know about this 
claim to fame ended up discussing internationalized URLs. IIRC they mentioned 
something about a patent. I just assume that whatever working groups are standardizing 
international DNS are working around it.

Barry Caplan
www.i18n.com

At 08:24 PM 11/22/2002 +, Michael Everson wrote:
>Can there possibly be any truth in any of this?
>
>>The following is an article in the Danish paper Information:
>>
>>http://www.information.dk/Indgang/VisArtikel.dna?pArtNo=136309
>>
>>Do you know anything about this. It is supposedly the company Walid
>>(http://www.walid.com/) that has patented the transformation of non-a-z for
>>use in URLs.
>>
>>An article in CumputerWorld (admittedly a year and a half old) -
>>http://www.computerworld.com/managementtopics/ebusiness/story/0,10801,59043,00.html 
>- has some references, among other things to the text of the patent.
>>
>>The Danish site Softwarepatenter.dk has it also: 
>http://www.softwarepatenter.dk/walid.html. It is quite new there. Is this
>>whole thing just "hoax"?
>-- 
>Michael Everson * * Everson Typography *  * http://www.evertype.com





Unicode list and account quotas

2002-11-22 Thread Sarasvati
Hello Unicadettes...

This is a public service announcement.

Recently I have noted an alarming increase in list mail being rejected
by gateways such as Hotmail and Yahoo due to user accounts being over
quota.

If you wish to receive uninterrupted service, please make sure
you keep enough open space in your mailbox. If you will be going on
vacation, you can visit the Unicode mail list web page for instructions
on how to set vacation mode, or to temporarily unsubscribe.

http://www.unicode.org/unicode/consortium/distlist.html

Because we receive so many bounced messages, and because people do
sometimes change addresses without unsubscribing, we automatically
remove people from the mail list after receiving 15 undeliverable
returns over a four day period.

There is almost always traffic on this mail list each day, so if
you haven't received any Unicode mail for a few days, and your
account is near its quota, you might have been removed from the list.
Simply visit the mail list page and re-subscribe. You can also see
if you have missed any mail by visiting the mail list archives and
looking at the current archive.

http://www.unicode.org/mail-arch/

Cheery regards from your,
-- Sarasvati




Patent on æ ø å

2002-11-22 Thread Michael Everson
Can there possibly be any truth in any of this?


The following is an article in the Danish paper Information:

http://www.information.dk/Indgang/VisArtikel.dna?pArtNo=136309

Do you know anything about this. It is supposedly the company Walid
(http://www.walid.com/) that has patented the transformation of non-a-z for
use in URLs.

An article in CumputerWorld (admittedly a year and a half old) -
http://www.computerworld.com/managementtopics/ebusiness/story/0,10801,59043,00.html 
- has some references, among other things to the text of the patent.

The Danish site Softwarepatenter.dk has it also: 
http://www.softwarepatenter.dk/walid.html. It is quite new there. Is 
this
whole thing just "hoax"?
--
Michael Everson * * Everson Typography *  * http://www.evertype.com




Notice of System Downtime

2002-11-22 Thread Sarasvati
Greetings. This is to let you all know of a scheduled system outage
over the next weekend. Unicode.org is scheduld to be physically 
transported to a new location and will be unavailable during the window:

November 24, 2002 22:00 Eastern Standard Time
November 25, 2002 06:00 Eastern Standard Time

This is an unavoidable downtime scheduled by our network provider
and could be extended in the event of problems.

Regards,
-- Sarasvati





RE: UTF-Morse

2002-11-22 Thread Barry Caplan
At 02:37 PM 11/22/2002 +0100, Marco Cimarosti wrote:
>Otto Stolz wrote:
>> Marco, you shall be called "Marcone", or even (granting
>> a Pluralis majestatis): "Marconi" ;-)

And each element shall be called a "Morsel"

Barry





Re: UTF-Morse

2002-11-22 Thread Simon Josefsson
Marco Cimarosti <[EMAIL PROTECTED]> writes:

> UTF-Morse - "Bringing Unicode in the telegraph age!"
...
> U+0041 ..-- .- LATIN CAPITAL LETTER A [2]
> U+0042 ..-- -...   LATIN CAPITAL LETTER B [2]
> U+0043 ..-- -.-.   LATIN CAPITAL LETTER C [2]
> U+0044 ..-- -..LATIN CAPITAL LETTER D [2]
> U+0045 ..-- .  LATIN CAPITAL LETTER E [2]
> U+0046 ..-- ..-.   LATIN CAPITAL LETTER F [2]
> U+0047 ..-- --.LATIN CAPITAL LETTER G [2]

Interestingly, there is a Japanese morse language that conflicts with
these allocation:

http://www5b.biglobe.ne.jp/~a1c/CW_J.htm (SJIS encoded)

Perhaps it would be useful to have a morse language selection
indicator in UTF-Morse a'la the succesful ISO-2022-JP.

The problem could also be defered to higher level frameworks, such as
the flexible MIME framework:

Content-Type: text/plain; charset=utf-morse+japanese

This seems like a serious problem that could delay deployment of
UTF-Morse.





RE: XTF-Morse (was RE: UTF-Morse)

2002-11-22 Thread Kent Karlsson

Why so ASCII-biased?  ;-)
See http://www.qsl.net/dk5ke/intcode.html.

/kent k





Re: UTF-Morse

2002-11-22 Thread John Cowan
Marco Cimarosti scripsit:

> Wow! One octet less than ASCII! :-)

Well, sure.  The variable-length encoding represents a mild degree of
compression, though it works best for English, being based loosely on English
letter frequency statistics.  But compression aside, we would expect a
scheme that encodes only ~40 characters to do better than ASCII.

-- 
Winter:  MIT,   John Cowan
Keio, INRIA,[EMAIL PROTECTED]
Issue lots of Drafts.   http://www.ccil.org/~cowan
So much more to understand! http://www.reutershealth.com
Might simplicity return?(A "tanka", or extended haiku)




RE: UTF-Morse

2002-11-22 Thread Marco Cimarosti
Otto Stolz wrote:
> Marco, you shall be called "Marcone", or even (granting
> a Pluralis majestatis): "Marconi" ;-)

Hey! I have a little bit of a belly, but not yet enough to justify calling
me "Marcone". :-)

BTW, your careful analysis of Morse needing four code units made me think
that there could be a "digital Morse", where each code unit takes up two
bits. E.g.:

00: letter gap
01: dot
10: dash
11: word gap

Up to four code units could stay in each octet. E.g., the word "MORSE" would
become:

Morse: -- --- .-. ... .
Bits:  10100010 1011 10010001 01010001
Hex:   0xA2 0xA1 0x91 0x51

Wow! One octet less than ASCII! :-)

_ Marco




XTF-Morse (was RE: UTF-Morse)

2002-11-22 Thread Marco Cimarosti
Doug Ewell wrote:
> Yes, it's true.  Marco had sent me his UTF-Morse proposal just
> yesterday, along with a suggestion that I put together an 
> implementation for April Fool's Day.  And darned if I wasn't
> really going to do it.  As a JOKE.
> 
> But Marco, you need to check your invented sequences again.  
> The leading and trailing Morse code units for the
> (non-ASCII) multi-Morse characters conflict with some of the
> single-unit characters.  For example, U+002D -- looks like
> a leading unit, and U+0023 .-.-.. looks like a trailing unit.

--- --- --- --- --- ..--.. ...

Sorry! Not only I use everybody's bandwidth for April fools in advance: I
also get all the details wrong!

I attempted to simplify the wording while translating in English, and I
messed everything up. So now I have to use more bandwidth to send a
corrected version.

> (It's only a JOKE, guys.  Take a breath.)

BTW I recalled that, time ago, the aficionados of faction UTF's on this list
decided to call their creations "XTF's", in order to minimize the
possibility of confusion with real UTF's.

So, everybody reading this message now or in the next years, please take
notice that XTF-Morse is *not* an UTF: just an aborted April fool! So please
don't knock at the Unicode Consortium asking for the last version of the
specs for sending Unicode in Morse!

_ Marco


==
XTF-Morse [*] - "Bringing Unicode in the telegraph age!"


--
0. Terminology

In this document, the following special terms are used:

- "Morse Dot": a short Morse signal; represented with "." in this
  document.

- "Morse Dash": a long Morse signal; represented with "-" in this
  document.

- "Morse Symbol": a sequence of one or more Dots, constituting a
  Morse character such as a letter or a punctuation mark.

- "Morse Pause": a short pause which separates adjacent
  Morse symbols; represented with " " (a space) in this document.

- "Morse Space": a long pause which separates words; represented
  with "/" in this document.

- "Morse Oct": a special Morse Symbol representing three bits of
  an Unicode code point.


--
1. Encoding characters in the "ASCII printable" range.

Each Unicode characters in range U+0020..U+007E is encoded as a Morse
Space, as a single Morse Symbols, or as a sequence of two Morse
Symbols, as specified in the following table:

Code:  XTF-Morse:  Character name:
-- --- --
U+0020 /   SPACE (Morse Space)
U+0021 -.  EXCLAMATION MARK [1]
U+0022 .-..-.  QUOTATION MARK
U+0023 .-.-..  NUMBER SIGN [1]
U+0024 ..-...  DOLLAR SIGN [1]
U+0025 ..-..-  PERCENT SIGN [1]
U+0026 ..-.-.  AMPERSAND [1]
U+0027 ..  APOSTROPHE
U+0028 -.--.-  LEFT PARENTHESIS
U+0029 -.---.  RIGHT PARENTHESIS [1]
U+002A -.  ASTERISK [1]
U+002B --  PLUS SIGN [1]
U+002C --..--  COMMA
U+002D --  HYPHEN-MINUS
U+002E .-.-.-  FULL STOP
U+002F -..-.   SOLIDUS [1]
U+0030 -   DIGIT ZERO
U+0031 .   DIGIT ONE
U+0032 ..---   DIGIT TWO
U+0033 ...--   DIGIT THREE
U+0034 -   DIGIT FOUR
U+0035 .   DIGIT FIVE
U+0036 -   DIGIT SIX
U+0037 --...   DIGIT SEVEN
U+0038 ---..   DIGIT EIGHT
U+0039 .   DIGIT NINE
U+003A ---...  COLON
U+003B ---..-  SEMICOLON [1]
U+003C ---.-.  LESS-THAN SIGN [1]
U+003D ..  EQUALS SIGN [1]
U+003E ---.--  GREATER-THAN SIGN [1]
U+003F ..--..  QUESTION MARK
U+0040 -.-.-.  COMMERCIAL AT [1]
U+0041 ..-- .- LATIN CAPITAL LETTER A [2]
U+0042 ..-- -...   LATIN CAPITAL LETTER B [2]
U+0043 ..-- -.-.   LATIN CAPITAL LETTER C [2]
U+0044 ..-- -..LATIN CAPITAL LETTER D [2]
U+0045 ..-- .  LATIN CAPITAL LETTER E [2]
U+0046 ..-- ..-.   LATIN CAPITAL LETTER F [2]
U+0047 ..-- --.LATIN CAPITAL LETTER G [2]
U+0048 ..--    LATIN CAPITAL LETTER H [2]
U+0049 ..-- .. LATIN CAPITAL LETTER I [2]
U+004A ..-- .---   LATIN CAPITAL LETTER J [2]
U+004B ..-- -.-LATIN CAPITAL LETTER K [2]
U+004C ..-- .-..   LATIN CAPITAL LETTER L [2]
U+004D ..-- -- LATIN CAPITAL LETTER M [2]
U+004E ..-- -. LATIN CAPITAL LETTER N [2]
U+004F ..-- ---LATIN CAPITAL LETTER O [2]
U+0050 ..-- .--.   LATIN CAPITAL LETTER P [2]
U+0051 ..-- --.-   LATIN CAPITAL LETTER Q [2]
U+0052 ..-- .-.LATIN CAPITAL LETTER R [2]
U+0053 ..-- ...LATIN CAPITAL LETTER S [2]
U+0054 ..-- -  LATIN CAPITAL LETTER T [2]
U+0055 ..-- ..-LATIN CAPITAL LETTER U [2]
U+0056 ..-- ...-   LATIN CAPITAL LETTER V [2]
U+0057 ..-- .--LATIN CAPITAL LETTER W [2]
U+0058 ..-- -..-   LATIN CAPITAL LETTER X [2]
U+0059 ..-- -.--   LATIN CAPITAL LETTER Y [2]
U+005A ..-- --..   LATIN CAPITAL LETTER Z [2]
U+005B ..---.  LEFT SQUARE BRACKET [1]
U+005C .-  REVERSE SOLIDUS [1]
U+005D ..  RIGHT SQ

Re: [OT] Morse code (was: Morse coded Unicode)

2002-11-22 Thread John Cowan
Otto Stolz scripsit:

> But binary? No, Sir! You cannot do without the gaps.

The gaps are just to delimit the variable-length binary encoding units.

-- 
What is the sound of Perl?  Is it not the   John Cowan
sound of a [Ww]all that people have stopped [EMAIL PROTECTED]
banging their head against?  --Larryhttp://www.ccil.org/~cowan




Re: UTF-Morse

2002-11-22 Thread Otto Stolz
Marco Cimarosti wrote:


UTF-Morse - "Bringing Unicode in the telegraph age!"


...


1. Unicode characters U+0020..U+007E are encoded according to the
following table:


...


2. All other Unicode characters are encoded with one of seven
multi-Morse schemes:


...


Great!

Marco, you shall be called "Marcone", or even (granting
a Pluralis majestatis): "Marconi" ;-)

Ciao,
  Otto Stolz














Re: Lowercase numerals

2002-11-22 Thread John Cowan
Doug Ewell scripsit:

> Roman numerals should be encoded using the letters in the basic Latin
> alphabet (upper- or lower-case).  The only reason to use the characters
> in the range U+2160 - U+217F is to maintain compatibility with East
> Asian legacy standards.

They might also see plausible use in palaeographic text; a font tuned
for Archaic Latin should presumably display U as V and J as I, and
the Roman numerals in their archaic forms.

-- 
There is / One art  John Cowan <[EMAIL PROTECTED]>
No more / No less   http://www.reutershealth.com
To do / All things  http://www.ccil.org/~cowan
With art- / Lessness -- Piet Hein




Event in Goa

2002-11-22 Thread Dutta Abhijit
Found this on the net

http://www.cfilt.iitb.ac.in/icukl2002/index.html

Regards,
Abhijit






Re: [OT] Morse code (was: Morse coded Unicode)

2002-11-22 Thread Otto Stolz
Tom Gewecke wrote:


Isn't Morse simply the first (variable bit) binary character encoding



standard, which was followed by the 5-bit Baudot, various 6, 7, and 8 bit



encodings, and finally (we hope) by the 21 bit Unicode?



Morse code isn't a binary code, but rather a quaternary one:
It has four distinct code elements, viz. dot, dash, character gap,
and word gap.

Alternatively, you can think of it as a ternary code, when you
represent the word gap by three consecutive character gaps (which
will physically make no difference in audible, or flashlight,
rendering).

But binary? No, Sir! You cannot do without the gaps.

Best wishes,
  Otto Stolz





Re: Anyone who can write Hindi on the Unicode List?

2002-11-22 Thread John Hudson
At 09:34 PM 11/21/2002, [EMAIL PROTECTED] wrote:


Now, what should I do to fix this bug in my system by using the Windows 2000
OS?


The bug is in Office, not in Windows. You either need to get your hands on 
the South Asian version of Office 2000 or upgrade to Office XP.

John Hudson


Tiro Typeworks		www.tiro.com
Vancouver, BC		[EMAIL PROTECTED]

It is necessary that by all means and cunning,
the cursed owners of books should be persuaded
to make them available to us, either by argument
or by force.  - Michael Apostolis, 1467




Re: Lowercase numerals

2002-11-22 Thread Michael Everson
At 08:40 +0100 2002-11-22, Thomas Lotze wrote:


Then what's the way to distinguish between lining and text figures in
plain text?


There isn't one. That's typography, not character identity.


Can this distinction really only be achieved when typesetting the 
text, by switching between two fonts, one for each kind of numerals?

Yes, and rightly so.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com




Re: Lowercase numerals

2002-11-22 Thread Thomas Lotze
On Thu, 21 Nov 2002 21:39:45 -0800
"Doug Ewell" <[EMAIL PROTECTED]> wrote:

> > So can it be summarized that figures (both arabic and latin)
> > actually come in only one flavor (upper or lowercase), the other
> > being a variant glyph, and both kinds of roman numerals being
> > encoded for reasons other than their semantic meaning?
> 
> Both kinds of Roman numerals were encoded for one reason only:
> compatibility with existing standards, as Ken mentioned.

That's what I meant by "other than their semantic meaning" - I put it
that way because I had been interested in the relationship between their
semantics and their being Unicode encoded.

> You cannot apply a variation selector (U+FE0x, or soon U+E01xx) to the
> ASCII digits to request a different glyph, at least not until Unicode
> explicitly defines such a variant sequence.

Then what's the way to distinguish between lining and text figures in
plain text? Can this distinction really only be achieved when
typesetting the text, by switching between two fonts, one for each kind
of numerals? Or am I missing some Unicode mechanism here?

Cheers, Thomas

-- 
Thomas Lotze

[EMAIL PROTECTED]  http://www.thomas-lotze.de/





Re: UTF-Morse

2002-11-22 Thread Doug Ewell
Yes, it's true.  Marco had sent me his UTF-Morse proposal just
yesterday, along with a suggestion that I put together an implementation
for April Fool's Day.  And darned if I wasn't really going to do it.  As
a JOKE.

But Marco, you need to check your invented sequences again.  The leading
and trailing Morse code units for the (non-ASCII) multi-Morse characters
conflict with some of the single-unit characters.  For example,
U+002D -- looks like a leading unit, and U+0023 .-.-.. looks like a
trailing unit.

(It's only a JOKE, guys.  Take a breath.)

-Doug Ewell
 Fullerton, California

- Original Message -
From: "Marco Cimarosti" <[EMAIL PROTECTED]>
To: "'Carl W. Brown'" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Thursday, November 21, 2002 1:22 am
Subject: UTF-Morse (was RE: Morse coded Unicode(was: Morse code))


> Carl W. Brown wrote:
> > I think that the bigger issue might be how do you extend Morse code
to
> > incorporate the Unicode character set.
> > [...]
>
> Carl, this is unfair!! You spoiled my April 1st joke in mid November!
>
> Ciao.
> Marco :-)
>
>
>
> --
> UTF-Morse - "Bringing Unicode in the telegraph age!"
>
>
> 1. Unicode characters U+0020..U+007E are encoded according to the
> following table:
>
> Code:  UTF-Morse:  Character name:
> -- --- --
> U+0020 /   SPACE
> U+0021 -.  EXCLAMATION MARK [1]
> U+0022 .-..-.  QUOTATION MARK
> U+0023 .-.-..  NUMBER SIGN [1]
> U+0024 ..-...  DOLLAR SIGN [1]
> U+0025 ..-..-  PERCENT SIGN [1]
> U+0026 ..-.-.  AMPERSAND [1]
> U+0027 ..  APOSTROPHE
> U+0028 -.--.-  LEFT PARENTHESIS
> U+0029 -.---.  RIGHT PARENTHESIS [1]
> U+002A -.  ASTERISK [1]
> U+002B --  PLUS SIGN [1]
> U+002C --..--  COMMA
> U+002D --  HYPHEN-MINUS
> U+002E .-.-.-  FULL STOP
> U+002F -..-.   SOLIDUS [1]
> U+0030 -   DIGIT ZERO
> U+0031 .   DIGIT ONE
> U+0032 ..---   DIGIT TWO
> U+0033 ...--   DIGIT THREE
> U+0034 -   DIGIT FOUR
> U+0035 .   DIGIT FIVE
> U+0036 -   DIGIT SIX
> U+0037 --...   DIGIT SEVEN
> U+0038 ---..   DIGIT EIGHT
> U+0039 .   DIGIT NINE
> U+003A ---...  COLON
> U+003B ---..-  SEMICOLON [1]
> U+003C ---.-.  LESS-THAN SIGN [1]
> U+003D ..  EQUALS SIGN [1]
> U+003E ---.--  GREATER-THAN SIGN [1]
> U+003F ..--..  QUESTION MARK
> U+0040 -.-.-.  COMMERCIAL AT [1]
> U+0041 ..-- .- LATIN CAPITAL LETTER A [2]
> U+0042 ..-- -...   LATIN CAPITAL LETTER B [2]
> U+0043 ..-- -.-.   LATIN CAPITAL LETTER C [2]
> U+0044 ..-- -..LATIN CAPITAL LETTER D [2]
> U+0045 ..-- .  LATIN CAPITAL LETTER E [2]
> U+0046 ..-- ..-.   LATIN CAPITAL LETTER F [2]
> U+0047 ..-- --.LATIN CAPITAL LETTER G [2]
> U+0048 ..--    LATIN CAPITAL LETTER H [2]
> U+0049 ..-- .. LATIN CAPITAL LETTER I [2]
> U+004A ..-- .---   LATIN CAPITAL LETTER J [2]
> U+004B ..-- -.-LATIN CAPITAL LETTER K [2]
> U+004C ..-- .-..   LATIN CAPITAL LETTER L [2]
> U+004D ..-- -- LATIN CAPITAL LETTER M [2]
> U+004E ..-- -. LATIN CAPITAL LETTER N [2]
> U+004F ..-- ---LATIN CAPITAL LETTER O [2]
> U+0050 ..-- .--.   LATIN CAPITAL LETTER P [2]
> U+0051 ..-- --.-   LATIN CAPITAL LETTER Q [2]
> U+0052 ..-- .-.LATIN CAPITAL LETTER R [2]
> U+0053 ..-- ...LATIN CAPITAL LETTER S [2]
> U+0054 ..-- -  LATIN CAPITAL LETTER T [2]
> U+0055 ..-- ..-LATIN CAPITAL LETTER U [2]
> U+0056 ..-- ...-   LATIN CAPITAL LETTER V [2]
> U+0057 ..-- .--LATIN CAPITAL LETTER W [2]
> U+0058 ..-- -..-   LATIN CAPITAL LETTER X [2]
> U+0059 ..-- -.--   LATIN CAPITAL LETTER Y [2]
> U+005A ..-- --..   LATIN CAPITAL LETTER Z [2]
> U+005B ..---.  LEFT SQUARE BRACKET [1]
> U+005C .-  REVERSE SOLIDUS [1]
> U+005D ..  RIGHT SQUARE BRACKET [1]
> U+005E .-...-  CIRCUMFLEX ACCENT [1]
> U+005F --  LOW LINE [1]
> U+0060 ...---  GRAVE ACCENT [1]
> U+0061 .-  LATIN SMALL LETTER A
> U+0062 -...LATIN SMALL LETTER B
> U+0063 -.-.LATIN SMALL LETTER C
> U+0064 -.. LATIN SMALL LETTER D
> U+0065 .   LATIN SMALL LETTER E
> U+0066 ..-.LATIN SMALL LETTER F
> U+0067 --. LATIN SMALL LETTER G
> U+0068 LATIN SMALL LETTER H
> U+0069 ..  LATIN SMALL LETTER I
> U+006A .---LATIN SMALL LETTER J
> U+006B -.- LATIN SMALL LETTER K
> U+006C .-..LATIN SMALL LETTER L
> U+006D --  LATIN SMALL LETTER M
> U+006E -.  LATIN SMALL LETTER N
> U+006F --- LATIN SMALL LETTER O
> U+0070 .--.LATIN SMALL LETTER P
> U+0071 --.-LATIN SMALL LETTER Q
> U+0072 .-. LATIN SMALL LETTER R
> U+0073 ... LATIN SMALL LETTER S
> U+0074 -   LATIN SMALL LETTER T
> U+0075 ..- LATIN SMALL LETTER U
> U+0076 ...-LATIN SMALL LETTER V
> U+0077 .-- LATIN SMALL LETTER W
> U