Re: UTF-Morse

2002-11-22 Thread Doug Ewell
Yes, it's true.  Marco had sent me his UTF-Morse proposal just
yesterday, along with a suggestion that I put together an implementation
for April Fool's Day.  And darned if I wasn't really going to do it.  As
a JOKE.

But Marco, you need to check your invented sequences again.  The leading
and trailing Morse code units for the (non-ASCII) multi-Morse characters
conflict with some of the single-unit characters.  For example,
U+002D -- looks like a leading unit, and U+0023 .-.-.. looks like a
trailing unit.

(It's only a JOKE, guys.  Take a breath.)

-Doug Ewell
 Fullerton, California

- Original Message -
From: Marco Cimarosti [EMAIL PROTECTED]
To: 'Carl W. Brown' [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Thursday, November 21, 2002 1:22 am
Subject: UTF-Morse (was RE: Morse coded Unicode(was: Morse code))


 Carl W. Brown wrote:
  I think that the bigger issue might be how do you extend Morse code
to
  incorporate the Unicode character set.
  [...]

 Carl, this is unfair!! You spoiled my April 1st joke in mid November!

 Ciao.
 Marco :-)



 --
 UTF-Morse - Bringing Unicode in the telegraph age!


 1. Unicode characters U+0020..U+007E are encoded according to the
 following table:

 Code:  UTF-Morse:  Character name:
 -- --- --
 U+0020 /   SPACE
 U+0021 -.  EXCLAMATION MARK [1]
 U+0022 .-..-.  QUOTATION MARK
 U+0023 .-.-..  NUMBER SIGN [1]
 U+0024 ..-...  DOLLAR SIGN [1]
 U+0025 ..-..-  PERCENT SIGN [1]
 U+0026 ..-.-.  AMPERSAND [1]
 U+0027 ..  APOSTROPHE
 U+0028 -.--.-  LEFT PARENTHESIS
 U+0029 -.---.  RIGHT PARENTHESIS [1]
 U+002A -.  ASTERISK [1]
 U+002B --  PLUS SIGN [1]
 U+002C --..--  COMMA
 U+002D --  HYPHEN-MINUS
 U+002E .-.-.-  FULL STOP
 U+002F -..-.   SOLIDUS [1]
 U+0030 -   DIGIT ZERO
 U+0031 .   DIGIT ONE
 U+0032 ..---   DIGIT TWO
 U+0033 ...--   DIGIT THREE
 U+0034 -   DIGIT FOUR
 U+0035 .   DIGIT FIVE
 U+0036 -   DIGIT SIX
 U+0037 --...   DIGIT SEVEN
 U+0038 ---..   DIGIT EIGHT
 U+0039 .   DIGIT NINE
 U+003A ---...  COLON
 U+003B ---..-  SEMICOLON [1]
 U+003C ---.-.  LESS-THAN SIGN [1]
 U+003D ..  EQUALS SIGN [1]
 U+003E ---.--  GREATER-THAN SIGN [1]
 U+003F ..--..  QUESTION MARK
 U+0040 -.-.-.  COMMERCIAL AT [1]
 U+0041 ..-- .- LATIN CAPITAL LETTER A [2]
 U+0042 ..-- -...   LATIN CAPITAL LETTER B [2]
 U+0043 ..-- -.-.   LATIN CAPITAL LETTER C [2]
 U+0044 ..-- -..LATIN CAPITAL LETTER D [2]
 U+0045 ..-- .  LATIN CAPITAL LETTER E [2]
 U+0046 ..-- ..-.   LATIN CAPITAL LETTER F [2]
 U+0047 ..-- --.LATIN CAPITAL LETTER G [2]
 U+0048 ..--    LATIN CAPITAL LETTER H [2]
 U+0049 ..-- .. LATIN CAPITAL LETTER I [2]
 U+004A ..-- .---   LATIN CAPITAL LETTER J [2]
 U+004B ..-- -.-LATIN CAPITAL LETTER K [2]
 U+004C ..-- .-..   LATIN CAPITAL LETTER L [2]
 U+004D ..-- -- LATIN CAPITAL LETTER M [2]
 U+004E ..-- -. LATIN CAPITAL LETTER N [2]
 U+004F ..-- ---LATIN CAPITAL LETTER O [2]
 U+0050 ..-- .--.   LATIN CAPITAL LETTER P [2]
 U+0051 ..-- --.-   LATIN CAPITAL LETTER Q [2]
 U+0052 ..-- .-.LATIN CAPITAL LETTER R [2]
 U+0053 ..-- ...LATIN CAPITAL LETTER S [2]
 U+0054 ..-- -  LATIN CAPITAL LETTER T [2]
 U+0055 ..-- ..-LATIN CAPITAL LETTER U [2]
 U+0056 ..-- ...-   LATIN CAPITAL LETTER V [2]
 U+0057 ..-- .--LATIN CAPITAL LETTER W [2]
 U+0058 ..-- -..-   LATIN CAPITAL LETTER X [2]
 U+0059 ..-- -.--   LATIN CAPITAL LETTER Y [2]
 U+005A ..-- --..   LATIN CAPITAL LETTER Z [2]
 U+005B ..---.  LEFT SQUARE BRACKET [1]
 U+005C .-  REVERSE SOLIDUS [1]
 U+005D ..  RIGHT SQUARE BRACKET [1]
 U+005E .-...-  CIRCUMFLEX ACCENT [1]
 U+005F --  LOW LINE [1]
 U+0060 ...---  GRAVE ACCENT [1]
 U+0061 .-  LATIN SMALL LETTER A
 U+0062 -...LATIN SMALL LETTER B
 U+0063 -.-.LATIN SMALL LETTER C
 U+0064 -.. LATIN SMALL LETTER D
 U+0065 .   LATIN SMALL LETTER E
 U+0066 ..-.LATIN SMALL LETTER F
 U+0067 --. LATIN SMALL LETTER G
 U+0068 LATIN SMALL LETTER H
 U+0069 ..  LATIN SMALL LETTER I
 U+006A .---LATIN SMALL LETTER J
 U+006B -.- LATIN SMALL LETTER K
 U+006C .-..LATIN SMALL LETTER L
 U+006D --  LATIN SMALL LETTER M
 U+006E -.  LATIN SMALL LETTER N
 U+006F --- LATIN SMALL LETTER O
 U+0070 .--.LATIN SMALL LETTER P
 U+0071 --.-LATIN SMALL LETTER Q
 U+0072 .-. LATIN SMALL LETTER R
 U+0073 ... LATIN SMALL LETTER S
 U+0074 -   LATIN SMALL LETTER T
 U+0075 ..- LATIN SMALL LETTER U
 U+0076 ...-LATIN SMALL LETTER V
 U+0077 .-- LATIN SMALL LETTER W
 U+0078 -..-LATIN SMALL LETTER X
 U+0079 -.--LATIN SMALL LETTER Y
 U+007A --..LATIN SMALL LETTER Z
 

Re: UTF-Morse

2002-11-22 Thread Otto Stolz
Marco Cimarosti wrote:


UTF-Morse - Bringing Unicode in the telegraph age!


...


1. Unicode characters U+0020..U+007E are encoded according to the
following table:


...


2. All other Unicode characters are encoded with one of seven
multi-Morse schemes:


...


Great!

Marco, you shall be called Marcone, or even (granting
a Pluralis majestatis): Marconi ;-)

Ciao,
  Otto Stolz














XTF-Morse (was RE: UTF-Morse)

2002-11-22 Thread Marco Cimarosti
Doug Ewell wrote:
 Yes, it's true.  Marco had sent me his UTF-Morse proposal just
 yesterday, along with a suggestion that I put together an 
 implementation for April Fool's Day.  And darned if I wasn't
 really going to do it.  As a JOKE.
 
 But Marco, you need to check your invented sequences again.  
 The leading and trailing Morse code units for the
 (non-ASCII) multi-Morse characters conflict with some of the
 single-unit characters.  For example, U+002D -- looks like
 a leading unit, and U+0023 .-.-.. looks like a trailing unit.

--- --- --- --- --- ..--.. ...

Sorry! Not only I use everybody's bandwidth for April fools in advance: I
also get all the details wrong!

I attempted to simplify the wording while translating in English, and I
messed everything up. So now I have to use more bandwidth to send a
corrected version.

 (It's only a JOKE, guys.  Take a breath.)

BTW I recalled that, time ago, the aficionados of faction UTF's on this list
decided to call their creations XTF's, in order to minimize the
possibility of confusion with real UTF's.

So, everybody reading this message now or in the next years, please take
notice that XTF-Morse is *not* an UTF: just an aborted April fool! So please
don't knock at the Unicode Consortium asking for the last version of the
specs for sending Unicode in Morse!

_ Marco


==
XTF-Morse [*] - Bringing Unicode in the telegraph age!


--
0. Terminology

In this document, the following special terms are used:

- Morse Dot: a short Morse signal; represented with . in this
  document.

- Morse Dash: a long Morse signal; represented with - in this
  document.

- Morse Symbol: a sequence of one or more Dots, constituting a
  Morse character such as a letter or a punctuation mark.

- Morse Pause: a short pause which separates adjacent
  Morse symbols; represented with   (a space) in this document.

- Morse Space: a long pause which separates words; represented
  with / in this document.

- Morse Oct: a special Morse Symbol representing three bits of
  an Unicode code point.


--
1. Encoding characters in the ASCII printable range.

Each Unicode characters in range U+0020..U+007E is encoded as a Morse
Space, as a single Morse Symbols, or as a sequence of two Morse
Symbols, as specified in the following table:

Code:  XTF-Morse:  Character name:
-- --- --
U+0020 /   SPACE (Morse Space)
U+0021 -.  EXCLAMATION MARK [1]
U+0022 .-..-.  QUOTATION MARK
U+0023 .-.-..  NUMBER SIGN [1]
U+0024 ..-...  DOLLAR SIGN [1]
U+0025 ..-..-  PERCENT SIGN [1]
U+0026 ..-.-.  AMPERSAND [1]
U+0027 ..  APOSTROPHE
U+0028 -.--.-  LEFT PARENTHESIS
U+0029 -.---.  RIGHT PARENTHESIS [1]
U+002A -.  ASTERISK [1]
U+002B --  PLUS SIGN [1]
U+002C --..--  COMMA
U+002D --  HYPHEN-MINUS
U+002E .-.-.-  FULL STOP
U+002F -..-.   SOLIDUS [1]
U+0030 -   DIGIT ZERO
U+0031 .   DIGIT ONE
U+0032 ..---   DIGIT TWO
U+0033 ...--   DIGIT THREE
U+0034 -   DIGIT FOUR
U+0035 .   DIGIT FIVE
U+0036 -   DIGIT SIX
U+0037 --...   DIGIT SEVEN
U+0038 ---..   DIGIT EIGHT
U+0039 .   DIGIT NINE
U+003A ---...  COLON
U+003B ---..-  SEMICOLON [1]
U+003C ---.-.  LESS-THAN SIGN [1]
U+003D ..  EQUALS SIGN [1]
U+003E ---.--  GREATER-THAN SIGN [1]
U+003F ..--..  QUESTION MARK
U+0040 -.-.-.  COMMERCIAL AT [1]
U+0041 ..-- .- LATIN CAPITAL LETTER A [2]
U+0042 ..-- -...   LATIN CAPITAL LETTER B [2]
U+0043 ..-- -.-.   LATIN CAPITAL LETTER C [2]
U+0044 ..-- -..LATIN CAPITAL LETTER D [2]
U+0045 ..-- .  LATIN CAPITAL LETTER E [2]
U+0046 ..-- ..-.   LATIN CAPITAL LETTER F [2]
U+0047 ..-- --.LATIN CAPITAL LETTER G [2]
U+0048 ..--    LATIN CAPITAL LETTER H [2]
U+0049 ..-- .. LATIN CAPITAL LETTER I [2]
U+004A ..-- .---   LATIN CAPITAL LETTER J [2]
U+004B ..-- -.-LATIN CAPITAL LETTER K [2]
U+004C ..-- .-..   LATIN CAPITAL LETTER L [2]
U+004D ..-- -- LATIN CAPITAL LETTER M [2]
U+004E ..-- -. LATIN CAPITAL LETTER N [2]
U+004F ..-- ---LATIN CAPITAL LETTER O [2]
U+0050 ..-- .--.   LATIN CAPITAL LETTER P [2]
U+0051 ..-- --.-   LATIN CAPITAL LETTER Q [2]
U+0052 ..-- .-.LATIN CAPITAL LETTER R [2]
U+0053 ..-- ...LATIN CAPITAL LETTER S [2]
U+0054 ..-- -  LATIN CAPITAL LETTER T [2]
U+0055 ..-- ..-LATIN CAPITAL LETTER U [2]
U+0056 ..-- ...-   LATIN CAPITAL LETTER V [2]
U+0057 ..-- .--LATIN CAPITAL LETTER W [2]
U+0058 ..-- -..-   LATIN CAPITAL LETTER X [2]
U+0059 ..-- -.--   LATIN CAPITAL LETTER Y [2]
U+005A ..-- --..   LATIN CAPITAL LETTER Z [2]
U+005B ..---.  LEFT SQUARE BRACKET [1]
U+005C .-  REVERSE SOLIDUS [1]
U+005D ..  RIGHT SQUARE BRACKET [1]
U+005E .-...-  

RE: UTF-Morse

2002-11-22 Thread Marco Cimarosti
Otto Stolz wrote:
 Marco, you shall be called Marcone, or even (granting
 a Pluralis majestatis): Marconi ;-)

Hey! I have a little bit of a belly, but not yet enough to justify calling
me Marcone. :-)

BTW, your careful analysis of Morse needing four code units made me think
that there could be a digital Morse, where each code unit takes up two
bits. E.g.:

00: letter gap
01: dot
10: dash
11: word gap

Up to four code units could stay in each octet. E.g., the word MORSE would
become:

Morse: -- --- .-. ... .
Bits:  10100010 1011 10010001 01010001
Hex:   0xA2 0xA1 0x91 0x51

Wow! One octet less than ASCII! :-)

_ Marco




Re: UTF-Morse

2002-11-22 Thread John Cowan
Marco Cimarosti scripsit:

 Wow! One octet less than ASCII! :-)

Well, sure.  The variable-length encoding represents a mild degree of
compression, though it works best for English, being based loosely on English
letter frequency statistics.  But compression aside, we would expect a
scheme that encodes only ~40 characters to do better than ASCII.

-- 
Winter:  MIT,   John Cowan
Keio, INRIA,[EMAIL PROTECTED]
Issue lots of Drafts.   http://www.ccil.org/~cowan
So much more to understand! http://www.reutershealth.com
Might simplicity return?(A tanka, or extended haiku)




RE: XTF-Morse (was RE: UTF-Morse)

2002-11-22 Thread Kent Karlsson

Why so ASCII-biased?  ;-)
See http://www.qsl.net/dk5ke/intcode.html.

/kent k





Re: UTF-Morse

2002-11-22 Thread Simon Josefsson
Marco Cimarosti [EMAIL PROTECTED] writes:

 UTF-Morse - Bringing Unicode in the telegraph age!
...
 U+0041 ..-- .- LATIN CAPITAL LETTER A [2]
 U+0042 ..-- -...   LATIN CAPITAL LETTER B [2]
 U+0043 ..-- -.-.   LATIN CAPITAL LETTER C [2]
 U+0044 ..-- -..LATIN CAPITAL LETTER D [2]
 U+0045 ..-- .  LATIN CAPITAL LETTER E [2]
 U+0046 ..-- ..-.   LATIN CAPITAL LETTER F [2]
 U+0047 ..-- --.LATIN CAPITAL LETTER G [2]

Interestingly, there is a Japanese morse language that conflicts with
these allocation:

http://www5b.biglobe.ne.jp/~a1c/CW_J.htm (SJIS encoded)

Perhaps it would be useful to have a morse language selection
indicator in UTF-Morse a'la the succesful ISO-2022-JP.

The problem could also be defered to higher level frameworks, such as
the flexible MIME framework:

Content-Type: text/plain; charset=utf-morse+japanese

This seems like a serious problem that could delay deployment of
UTF-Morse.





RE: UTF-Morse

2002-11-22 Thread Barry Caplan
At 02:37 PM 11/22/2002 +0100, Marco Cimarosti wrote:
Otto Stolz wrote:
 Marco, you shall be called Marcone, or even (granting
 a Pluralis majestatis): Marconi ;-)

And each element shall be called a Morsel

Barry