Compression - binary ordered

2001-06-01 Thread Mark Davis
As a by-product of our recent work on collation, we developed a method of Unicode compression that is similar to SCSU, in that small alphabets are about a byte per character and large alphabets are about two bytes per character. The main difference from SCSU is that this method preserves

Re: Compression - binary ordered

2001-06-01 Thread Rick McGowan
The main difference from SCSU is that this method preserves binary order. Ah. And which binary order does it preserve? The right one, or the other one? ;-) Rick

RE: UTF-8S (was: Re: ISO vs Unicode UTF-8)

2001-06-01 Thread Bill Kurmey
Kenneth Whistler wrote: Plane 14 PUA usage description tags? Naaah, nobody would suggest such a bizarre thing, would they? Marco Cimarosti wrote: The three words PUA usage description are redundant, methinks. Removing them leaves a more concise and dramatic example of a weird proposal.

RE: Compression - binary ordered

2001-06-01 Thread Carl W. Brown
Mark, This sounds like a great idea. I was wondering however, if spaces in non-plane 0 characters set will cause problems with the compression efficiency. Maybe you should consider a special case for spaces. Maybe you could use something like offsetting the displacement values to

Re: Compression - binary ordered

2001-06-01 Thread Mark Davis
It preserves code point binary order (UTF-8 / UTF-32). However, one could also easily produce a minor variant that preserves UTF-16 binary order as well. Mark - Original Message - From: Rick McGowan [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, May 31, 2001 23:46 Subject: Re:

Re: Compression - binary ordered

2001-06-01 Thread Mark Davis
Thanks for your comments. We aren't worried about the transition to and from space for supplementary characters, since as a fraction of all text they will be exceedingly rare ( 0.01%, our estimate). As to Korean, it might save some storage to always reset at space, but (a) I don't see an

Silliness (was RE: UTF-8S (was: Re: ISO vs Unicode UTF-8))

2001-06-01 Thread Edward Cherlin
At 4:44 AM -0600 6/1/01, Bill Kurmey wrote: Kenneth Whistler wrote: Plane 14 PUA usage description tags? Naaah, nobody would suggest such a bizarre thing, would they? Marco Cimarosti wrote: The three words PUA usage description are redundant, methinks. Removing them leaves a more concise and

RE: Compression - binary ordered

2001-06-01 Thread Carl W. Brown
Mark, Shades of UTF-8s!!! Please don't provide UTF-16 binary order. Carl -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Mark Davis Sent: Friday, June 01, 2001 9:00 AM To: [EMAIL PROTECTED]; Rick McGowan Subject: Re: Compression - binary ordered It

Re: [OT] bits and bytes

2001-06-01 Thread jgo
I wrote a couple of programs for a Control Data Corporation (CDC) 6600 back in the early '70s. I recall that the smallest addressable unit was a 60 bit word (though there were special instructions to pack and unpack some size of character -- was it 6 bit?) Bob Correct, except that there

Why call kanji/hanji/hanja 'ideographs' when almost none are?

2001-06-01 Thread Jon Babcock
The Asia/East Asian/CJK thread reminded me of one of my own pet peeves, the use of 'ideograph' to refer to kanji. Perhaps some of the professionals on this list can enlighten me here. I thought that an ideograph meant that the graph stood for an idea, not a sound or a zographic image. Since

Re: Why call kanji/hanji/hanja 'ideographs' when almost none are?

2001-06-01 Thread John H. Jenkins
At 4:16 PM -0600 6/1/01, Jon Babcock wrote: The Asia/East Asian/CJK thread reminded me of one of my own pet peeves, the use of 'ideograph' to refer to kanji. Perhaps some of the professionals on this list can enlighten me here. I thought that an ideograph meant that the graph stood for an

RE: Some Char. to Glyph Statistics, Pan/Single Font

2001-06-01 Thread てんどう瘢雹りゅう瘢雹じ
So does my Rurouni Kensin album go under R or under ru? Maybe ru is better because few words start with ru. ★じゅういっちゃん★ "AIS TSXQ QDOO TD AISC TDQMIG, HYCTDL, ZIC HIIUPLB XSHM GDOPHPISX CYTDL." "QMD XDHCDQ, AIS XDD, PX QMDCD'X LI CDHPWD. P VSXQ WSQ RMYQ P MYED KA TA YCT PL."