Re: "UNICODE BOMBER STRIKES AGAIN"

2002-04-22 Thread Doug Ewell
> There he sits in wait until you switch on, and BAM, all your data > turns to squares and the little beastie is laughing his socks off. That should have been "BOM." -Doug Ewell Fullerton, California

Re: "UNICODE BOMBER STRIKES AGAIN"

2002-04-22 Thread Tex Texin
Doug Ewell pun'ed: > > > There he sits in wait until you switch on, and BAM, all your data > > turns to squares and the little beastie is laughing his socks off. > > That should have been "BOM." Yes, and "turns to squares" should have been "turns to replacement characters". -- ---

Re: "UNICODE BOMBER STRIKES AGAIN"

2002-04-22 Thread $B$m!;!;!;!;(B $B$m!;!;!;(B
>From: Tex Texin <[EMAIL PROTECTED]> >To: Doug Ewell <[EMAIL PROTECTED]> >CC: [EMAIL PROTECTED], [EMAIL PROTECTED] >Subject: Re: "UNICODE BOMBER STRIKES AGAIN" >Date: Mon, 22 Apr 2002 15:21:48 -0400 > >Doug Ewell pun'ed: > > > > &g

Re: "UNICODE BOMBER STRIKES AGAIN"

2002-04-22 Thread Peter_Constable
On 04/22/2002 02:21:48 PM Tex Texin wrote: >Doug Ewell pun'ed: >> >> > There he sits in wait until you switch on, and BAM, all your data >> > turns to squares and the little beastie is laughing his socks off. >> >> That should have been "BOM." > >Yes, and "turns to squares" should have been "tur

Re: "UNICODE BOMBER STRIKES AGAIN"

2002-04-22 Thread Kenneth Whistler
> Doug Ewell pun'ed: > > > > > There he sits in wait until you switch on, and BAM, all your data > > > turns to squares and the little beastie is laughing his socks off. > > > > That should have been "BOM." > > Yes, and "turns to squares" should have been "turns to replacement > characters".

Re: "UNICODE BOMBER STRIKES AGAIN"

2002-04-22 Thread Tex Texin
Kenneth Whistler wrote: > There he sits in wait until you switch on, and BOM!, all your GIGS > turn to SQUARE-RAD/S and the little bytestie is laughing his SCSU off. There he sits in "symbol for synchonous idle" until you "symbol for start of text", and BOM!, all your GIGS "clockwise open circle

Re: "UNICODE BOMBER STRIKES AGAIN"

2002-04-22 Thread Doug Ewell
ろ ろ〇〇〇 <[EMAIL PROTECTED]> wrote: > Why don't they just romanize the little boxes? I would rather read, > say, romanized kana than boxes. Because if the font maker is going to go to the trouble of providing glyphs for the romanization, she might as well provide real kana glyphs. > Is the Un

Re: "UNICODE BOMBER STRIKES AGAIN"

2002-04-22 Thread Doug Ewell
Kenneth Whistler <[EMAIL PROTECTED]> wrote: > -- K '\0' e '\0' n '\0' Lemme see, that's 0x4B 0x00 0x65 0x00 0x6E 0x00. There's no BOM, and no external tagging as "UTF-16LE," and since this is the Internet, we don't know the endianness of the originating machine. So, based on last week's discus

Re: "UNICODE BOMBER STRIKES AGAIN"

2002-04-23 Thread Florian Weimer
[EMAIL PROTECTED] writes: > FYI: http://linguistlist.org/issues/13/13-1106.html#3 And I thought the Unicode bomber was %u9090%u6858%ucbd3... guy!

Re: "UNICODE BOMBER STRIKES AGAIN"

2002-04-23 Thread Mark Davis
32*, and could only be either: (a) UTF-16, resulting in the UTF-16 code unit sequence: <1234 0061 D800 DF00>, or (b) UTF-16BE, resulting in the UTF-16 code unit sequence: — Γνῶθι σαυτόν — Θαλῆς [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com - Ori

RE: "UNICODE BOMBER STRIKES AGAIN"

2002-04-24 Thread Yves Arrouye
> You can determine that that particular text is not legal UTF-32*, > since there be illegal code points in any of the three forms. IF you > exclude null code points, again heuristically, that also excludes > UTF-8, and almost all non-Unicode encodings. That leaves UTF-16, 16BE, > 16LE as the onl

Re: "UNICODE BOMBER STRIKES AGAIN"

2002-04-24 Thread Mark Davis
MAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, April 24, 2002 10:39 Subject: RE: "UNICODE BOMBER STRIKES AGAIN" > You can determine that that particular text is not legal UTF-32*, > since there be illegal code points in any of the three forms. IF you > exclude null

Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")

2002-04-24 Thread Doug Ewell
Mark Davis <[EMAIL PROTECTED]> wrote: > You can determine that that particular text is not legal UTF-32*, > since there be illegal code points in any of the three forms. IF you > exclude null code points, again heuristically, that also excludes > UTF-8, and almost all non-Unicode encodings. That

Re: Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")

2002-04-24 Thread Mark Davis
: "Kenneth Whistler" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Tuesday, April 23, 2002 23:02 Subject: Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN") > Mark Davis <[EMAIL PROTECTED]> wrote: > > > You can determine that that pa

Re: Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")

2002-04-24 Thread Doug Ewell
Mark Davis <[EMAIL PROTECTED]> wrote: >> I must not *call* the sequence "UTF-16," since that term is officially >> reserved for BOM-marked text which can be either little- or big-endian, >> or BOMless text which must be big-endian. > > Yes, assuming the "BUT" clause applies to (b). That is, the u

Re: Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")

2002-04-24 Thread David Starner
On Wed, Apr 24, 2002 at 09:00:17AM -0700, Doug Ewell wrote: > The Unix and Linux world is very > opposed to the use of BOM in plain-text files, and if they feel that way > about UTF-8 they probably feel the same about UTF-16. Why? The problems with a BOM in UTF-8 have to do with it being an ASCII

RE: Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")

2002-04-24 Thread jarkko . hietaniemi
> Why? The problems with a BOM in UTF-8 have to do with it being an > ASCII-compatible encoding. Err, no. That's not the point, AFAIK. The point is that traditionally in UNIX there hasn't been any sort of "marker" or "tag" in the beginning, UNIX files being flat streams of bytes. The UNIX tool

Re: Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")

2002-04-24 Thread David Starner
On Wed, Apr 24, 2002 at 01:37:39PM -0400, [EMAIL PROTECTED] wrote: > Err, no. That's not the point, AFAIK. The point is that traditionally > in UNIX there hasn't been any sort of "marker" or "tag" in the beginning, > UNIX files being flat streams of bytes. The UNIX toolset has been built > with

Re: Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")

2002-04-24 Thread Jungshik Shin
On Wed, 24 Apr 2002, David Starner wrote: > On Wed, Apr 24, 2002 at 09:00:17AM -0700, Doug Ewell wrote: > > The Unix and Linux world is very > > opposed to the use of BOM in plain-text files, and if they feel that way > > about UTF-8 they probably feel the same about UTF-16. The reason we're n

Re: Variations of UTF-16 (was: Re: "UNICODE BOMBER STRIKES AGAIN")

2002-04-24 Thread John Cowan
Doug Ewell scripsit: > The Unix and Linux world is very > opposed to the use of BOM in plain-text files, and if they feel that way > about UTF-8 they probably feel the same about UTF-16. I doubt it. The trouble with BOMizing is that it makes ASCII not a subset of UTF-8, but ASCII cannot be a su