RE: UTF-Morse
At 02:37 PM 11/22/2002 +0100, Marco Cimarosti wrote: >Otto Stolz wrote: >> Marco, you shall be called "Marcone", or even (granting >> a Pluralis majestatis): "Marconi" ;-) And each element shall be called a "Morsel" Barry
Re: UTF-Morse
Marco Cimarosti <[EMAIL PROTECTED]> writes: > UTF-Morse - "Bringing Unicode in the telegraph age!" ... > U+0041 ..-- .- LATIN CAPITAL LETTER A [2] > U+0042 ..-- -... LATIN CAPITAL LETTER B [2] > U+0043 ..-- -.-. LATIN CAPITAL LETTER C [2] > U+0044 ..-- -..LATIN CAPITAL LETTER D [2] > U+0045 ..-- . LATIN CAPITAL LETTER E [2] > U+0046 ..-- ..-. LATIN CAPITAL LETTER F [2] > U+0047 ..-- --.LATIN CAPITAL LETTER G [2] Interestingly, there is a Japanese morse language that conflicts with these allocation: http://www5b.biglobe.ne.jp/~a1c/CW_J.htm (SJIS encoded) Perhaps it would be useful to have a morse language selection indicator in UTF-Morse a'la the succesful ISO-2022-JP. The problem could also be defered to higher level frameworks, such as the flexible MIME framework: Content-Type: text/plain; charset=utf-morse+japanese This seems like a serious problem that could delay deployment of UTF-Morse.
RE: XTF-Morse (was RE: UTF-Morse)
Why so ASCII-biased? ;-) See http://www.qsl.net/dk5ke/intcode.html. /kent k
Re: UTF-Morse
Marco Cimarosti scripsit: > Wow! One octet less than ASCII! :-) Well, sure. The variable-length encoding represents a mild degree of compression, though it works best for English, being based loosely on English letter frequency statistics. But compression aside, we would expect a scheme that encodes only ~40 characters to do better than ASCII. -- Winter: MIT, John Cowan Keio, INRIA,[EMAIL PROTECTED] Issue lots of Drafts. http://www.ccil.org/~cowan So much more to understand! http://www.reutershealth.com Might simplicity return?(A "tanka", or extended haiku)
RE: UTF-Morse
Otto Stolz wrote: > Marco, you shall be called "Marcone", or even (granting > a Pluralis majestatis): "Marconi" ;-) Hey! I have a little bit of a belly, but not yet enough to justify calling me "Marcone". :-) BTW, your careful analysis of Morse needing four code units made me think that there could be a "digital Morse", where each code unit takes up two bits. E.g.: 00: letter gap 01: dot 10: dash 11: word gap Up to four code units could stay in each octet. E.g., the word "MORSE" would become: Morse: -- --- .-. ... . Bits: 10100010 1011 10010001 01010001 Hex: 0xA2 0xA1 0x91 0x51 Wow! One octet less than ASCII! :-) _ Marco
XTF-Morse (was RE: UTF-Morse)
Doug Ewell wrote: > Yes, it's true. Marco had sent me his UTF-Morse proposal just > yesterday, along with a suggestion that I put together an > implementation for April Fool's Day. And darned if I wasn't > really going to do it. As a JOKE. > > But Marco, you need to check your invented sequences again. > The leading and trailing Morse code units for the > (non-ASCII) multi-Morse characters conflict with some of the > single-unit characters. For example, U+002D -- looks like > a leading unit, and U+0023 .-.-.. looks like a trailing unit. --- --- --- --- --- ..--.. ... Sorry! Not only I use everybody's bandwidth for April fools in advance: I also get all the details wrong! I attempted to simplify the wording while translating in English, and I messed everything up. So now I have to use more bandwidth to send a corrected version. > (It's only a JOKE, guys. Take a breath.) BTW I recalled that, time ago, the aficionados of faction UTF's on this list decided to call their creations "XTF's", in order to minimize the possibility of confusion with real UTF's. So, everybody reading this message now or in the next years, please take notice that XTF-Morse is *not* an UTF: just an aborted April fool! So please don't knock at the Unicode Consortium asking for the last version of the specs for sending Unicode in Morse! _ Marco == XTF-Morse [*] - "Bringing Unicode in the telegraph age!" -- 0. Terminology In this document, the following special terms are used: - "Morse Dot": a short Morse signal; represented with "." in this document. - "Morse Dash": a long Morse signal; represented with "-" in this document. - "Morse Symbol": a sequence of one or more Dots, constituting a Morse character such as a letter or a punctuation mark. - "Morse Pause": a short pause which separates adjacent Morse symbols; represented with " " (a space) in this document. - "Morse Space": a long pause which separates words; represented with "/" in this document. - "Morse Oct": a special Morse Symbol representing three bits of an Unicode code point. -- 1. Encoding characters in the "ASCII printable" range. Each Unicode characters in range U+0020..U+007E is encoded as a Morse Space, as a single Morse Symbols, or as a sequence of two Morse Symbols, as specified in the following table: Code: XTF-Morse: Character name: -- --- -- U+0020 / SPACE (Morse Space) U+0021 -. EXCLAMATION MARK [1] U+0022 .-..-. QUOTATION MARK U+0023 .-.-.. NUMBER SIGN [1] U+0024 ..-... DOLLAR SIGN [1] U+0025 ..-..- PERCENT SIGN [1] U+0026 ..-.-. AMPERSAND [1] U+0027 .. APOSTROPHE U+0028 -.--.- LEFT PARENTHESIS U+0029 -.---. RIGHT PARENTHESIS [1] U+002A -. ASTERISK [1] U+002B -- PLUS SIGN [1] U+002C --..-- COMMA U+002D -- HYPHEN-MINUS U+002E .-.-.- FULL STOP U+002F -..-. SOLIDUS [1] U+0030 - DIGIT ZERO U+0031 . DIGIT ONE U+0032 ..--- DIGIT TWO U+0033 ...-- DIGIT THREE U+0034 - DIGIT FOUR U+0035 . DIGIT FIVE U+0036 - DIGIT SIX U+0037 --... DIGIT SEVEN U+0038 ---.. DIGIT EIGHT U+0039 . DIGIT NINE U+003A ---... COLON U+003B ---..- SEMICOLON [1] U+003C ---.-. LESS-THAN SIGN [1] U+003D .. EQUALS SIGN [1] U+003E ---.-- GREATER-THAN SIGN [1] U+003F ..--.. QUESTION MARK U+0040 -.-.-. COMMERCIAL AT [1] U+0041 ..-- .- LATIN CAPITAL LETTER A [2] U+0042 ..-- -... LATIN CAPITAL LETTER B [2] U+0043 ..-- -.-. LATIN CAPITAL LETTER C [2] U+0044 ..-- -..LATIN CAPITAL LETTER D [2] U+0045 ..-- . LATIN CAPITAL LETTER E [2] U+0046 ..-- ..-. LATIN CAPITAL LETTER F [2] U+0047 ..-- --.LATIN CAPITAL LETTER G [2] U+0048 ..-- LATIN CAPITAL LETTER H [2] U+0049 ..-- .. LATIN CAPITAL LETTER I [2] U+004A ..-- .--- LATIN CAPITAL LETTER J [2] U+004B ..-- -.-LATIN CAPITAL LETTER K [2] U+004C ..-- .-.. LATIN CAPITAL LETTER L [2] U+004D ..-- -- LATIN CAPITAL LETTER M [2] U+004E ..-- -. LATIN CAPITAL LETTER N [2] U+004F ..-- ---LATIN CAPITAL LETTER O [2] U+0050 ..-- .--. LATIN CAPITAL LETTER P [2] U+0051 ..-- --.- LATIN CAPITAL LETTER Q [2] U+0052 ..-- .-.LATIN CAPITAL LETTER R [2] U+0053 ..-- ...LATIN CAPITAL LETTER S [2] U+0054 ..-- - LATIN CAPITAL LETTER T [2] U+0055 ..-- ..-LATIN CAPITAL LETTER U [2] U+0056 ..-- ...- LATIN CAPITAL LETTER V [2] U+0057 ..-- .--LATIN CAPITAL LETTER W [2] U+0058 ..-- -..- LATIN CAPITAL LETTER X [2] U+0059 ..-- -.-- LATIN CAPITAL LETTER Y [2] U+005A ..-- --.. LATIN CAPITAL LETTER Z [2] U+005B ..---. LEFT SQUARE BRACKET [1] U+005C .- REVERSE SOLIDUS [1] U+005D .. RIGHT SQ
Re: UTF-Morse
Marco Cimarosti wrote: UTF-Morse - "Bringing Unicode in the telegraph age!" ... 1. Unicode characters U+0020..U+007E are encoded according to the following table: ... 2. All other Unicode characters are encoded with one of seven multi-Morse schemes: ... Great! Marco, you shall be called "Marcone", or even (granting a Pluralis majestatis): "Marconi" ;-) Ciao, Otto Stolz
Re: UTF-Morse
Yes, it's true. Marco had sent me his UTF-Morse proposal just yesterday, along with a suggestion that I put together an implementation for April Fool's Day. And darned if I wasn't really going to do it. As a JOKE. But Marco, you need to check your invented sequences again. The leading and trailing Morse code units for the (non-ASCII) multi-Morse characters conflict with some of the single-unit characters. For example, U+002D -- looks like a leading unit, and U+0023 .-.-.. looks like a trailing unit. (It's only a JOKE, guys. Take a breath.) -Doug Ewell Fullerton, California - Original Message - From: "Marco Cimarosti" <[EMAIL PROTECTED]> To: "'Carl W. Brown'" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Thursday, November 21, 2002 1:22 am Subject: UTF-Morse (was RE: Morse coded Unicode(was: Morse code)) > Carl W. Brown wrote: > > I think that the bigger issue might be how do you extend Morse code to > > incorporate the Unicode character set. > > [...] > > Carl, this is unfair!! You spoiled my April 1st joke in mid November! > > Ciao. > Marco :-) > > > > -- > UTF-Morse - "Bringing Unicode in the telegraph age!" > > > 1. Unicode characters U+0020..U+007E are encoded according to the > following table: > > Code: UTF-Morse: Character name: > -- --- -- > U+0020 / SPACE > U+0021 -. EXCLAMATION MARK [1] > U+0022 .-..-. QUOTATION MARK > U+0023 .-.-.. NUMBER SIGN [1] > U+0024 ..-... DOLLAR SIGN [1] > U+0025 ..-..- PERCENT SIGN [1] > U+0026 ..-.-. AMPERSAND [1] > U+0027 .. APOSTROPHE > U+0028 -.--.- LEFT PARENTHESIS > U+0029 -.---. RIGHT PARENTHESIS [1] > U+002A -. ASTERISK [1] > U+002B -- PLUS SIGN [1] > U+002C --..-- COMMA > U+002D -- HYPHEN-MINUS > U+002E .-.-.- FULL STOP > U+002F -..-. SOLIDUS [1] > U+0030 - DIGIT ZERO > U+0031 . DIGIT ONE > U+0032 ..--- DIGIT TWO > U+0033 ...-- DIGIT THREE > U+0034 - DIGIT FOUR > U+0035 . DIGIT FIVE > U+0036 - DIGIT SIX > U+0037 --... DIGIT SEVEN > U+0038 ---.. DIGIT EIGHT > U+0039 . DIGIT NINE > U+003A ---... COLON > U+003B ---..- SEMICOLON [1] > U+003C ---.-. LESS-THAN SIGN [1] > U+003D .. EQUALS SIGN [1] > U+003E ---.-- GREATER-THAN SIGN [1] > U+003F ..--.. QUESTION MARK > U+0040 -.-.-. COMMERCIAL AT [1] > U+0041 ..-- .- LATIN CAPITAL LETTER A [2] > U+0042 ..-- -... LATIN CAPITAL LETTER B [2] > U+0043 ..-- -.-. LATIN CAPITAL LETTER C [2] > U+0044 ..-- -..LATIN CAPITAL LETTER D [2] > U+0045 ..-- . LATIN CAPITAL LETTER E [2] > U+0046 ..-- ..-. LATIN CAPITAL LETTER F [2] > U+0047 ..-- --.LATIN CAPITAL LETTER G [2] > U+0048 ..-- LATIN CAPITAL LETTER H [2] > U+0049 ..-- .. LATIN CAPITAL LETTER I [2] > U+004A ..-- .--- LATIN CAPITAL LETTER J [2] > U+004B ..-- -.-LATIN CAPITAL LETTER K [2] > U+004C ..-- .-.. LATIN CAPITAL LETTER L [2] > U+004D ..-- -- LATIN CAPITAL LETTER M [2] > U+004E ..-- -. LATIN CAPITAL LETTER N [2] > U+004F ..-- ---LATIN CAPITAL LETTER O [2] > U+0050 ..-- .--. LATIN CAPITAL LETTER P [2] > U+0051 ..-- --.- LATIN CAPITAL LETTER Q [2] > U+0052 ..-- .-.LATIN CAPITAL LETTER R [2] > U+0053 ..-- ...LATIN CAPITAL LETTER S [2] > U+0054 ..-- - LATIN CAPITAL LETTER T [2] > U+0055 ..-- ..-LATIN CAPITAL LETTER U [2] > U+0056 ..-- ...- LATIN CAPITAL LETTER V [2] > U+0057 ..-- .--LATIN CAPITAL LETTER W [2] > U+0058 ..-- -..- LATIN CAPITAL LETTER X [2] > U+0059 ..-- -.-- LATIN CAPITAL LETTER Y [2] > U+005A ..-- --.. LATIN CAPITAL LETTER Z [2] > U+005B ..---. LEFT SQUARE BRACKET [1] > U+005C .- REVERSE SOLIDUS [1] > U+005D .. RIGHT SQUARE BRACKET [1] > U+005E .-...- CIRCUMFLEX ACCENT [1] > U+005F -- LOW LINE [1] > U+0060 ...--- GRAVE ACCENT [1] > U+0061 .- LATIN SMALL LETTER A > U+0062 -...LATIN SMALL LETTER B > U+0063 -.-.LATIN SMALL LETTER C > U+0064 -.. LATIN SMALL LETTER D > U+0065 . LATIN SMALL LETTER E > U+0066 ..-.LATIN SMALL LETTER F > U+0067 --. LATIN SMALL LETTER G > U+0068 LATIN SMALL LETTER H > U+0069 .. LATIN SMALL LETTER I > U+006A .---LATIN SMALL LETTER J > U+006B -.- LATIN SMALL LETTER K > U+006C .-..LATIN SMALL LETTER L > U+006D -- LATIN SMALL LETTER M > U+006E -. LATIN SMALL LETTER N > U+006F --- LATIN SMALL LETTER O > U+0070 .--.LATIN SMALL LETTER P > U+0071 --.-LATIN SMALL LETTER Q > U+0072 .-. LATIN SMALL LETTER R > U+0073 ... LATIN SMALL LETTER S > U+0074 - LATIN SMALL LETTER T > U+0075 ..- LATIN SMALL LETTER U > U+0076 ...-LATIN SMALL LETTER V > U+0077 .-- LATIN SMALL LETTER W > U