Re: Hexadecimal characters.
Perhaps the Ewellic forms should be used rather than risk the possibility of being perceived as ASCII-centric? http://www.evertype.com/standards/csur/ewellic.html All we'd need to do is wait for Doug Ewell to provide the glyphs for hexadecimal digits ten through fifteen and wait for CSUR to assign code points other than the former Shavian block. As for input, these could be entered the same way any other Unicode character is entered. Likewise for handling legacy conversions. As for existing ambiguity, perhaps style books should be strict with hexadecimal notation. Unicode example: (畺) Always U+757A, never U+757a (and never U+757А or other variations). Best regards, James Kass.
Re: Chess symbols, ZWJ, Opentype and holly type ornaments.
Michael (michka) Kaplan wrote, > Not without separate tools to do the input. Something that the current > proposal fails to mention. > In fairness we should note that formal existing proposals seldom mention input method. Code points need to be assigned before input methods can be made. So, the answer to Tom Gewecke's question is "yes". Best regards, James Kass.
Does Adobe Indesign support Unicode input?
I'm wondering if Adobe Indesign supports Unicode? It seems to support Unicode characters when pasted from Word (CF_RTFTEXT?) -- albeit with the wrong fonts, but not when pasted from Notepad (CF_UNICODETEXT) -- which ends up with question marks. I cannot type Unicode characters at all (either with Windows keyboards or with Keyman) -- in Windows XP. Will a future version support Unicode input in WinNT/2000/XP? TIA, Marc Durdin Tavultesoft
Think twice before submitting a proposal
This is especially in reference to those hex digits. Here is what i have to say about the matter: To discourage frivolous character proposals, the Unicode Consortium requires you to come up with these (I am not sure if this is all the requirements, there might be more): 1. You gotta fill out a form. Probably not that hard, except that some ISO standards are referenced, and you might have some research work to do to find out just what these standards are. If you don't know where to access these documents, you're stuck. 2. You gotta have a font including the proposed characters. I do not know what type of font, but No Font = No Complete Proposal. 3. You gotta give them instances of the characters in actual use. I think you have to send them something to prove this, I'm not sure what. SO... If you are the only guy who uses these hex digits, and you don't have a font containing the digits, you basically have less chance proposing these characters to Unicode than proposing marriage to Anna Kournikova. However, if you find some previously undiscovered (and illiterate) jungle tribe, and they count in base 16, and you introduce literacy to the tribe, and give them YOUR digits, and if they ACCEPT and USE your digits, then it's pretty safe to say they're in. But you still need that font. Weird. I am told that it is good that we have non-literals for digits so we can do higher math, that without them higher math would be all but impossible, and now we have come full circle to using letters for digits!! If Thinkit wants to see something interesting, he should see how numbers are expressed in Braille. And the real reason I did not propose those two digits to Unicode was that I did not know how much WORK it would be. Sheesh, where CAN you get the relevant ISO documents anyway? _ $B%G%8%+%a$G;#$C$?http://photos.msn.co.jp/
Re: Google and Unicode
google also seems to sniff locales, for instance it feeds me thai langauge pages when i use thai locale on my browser. > Might be. However, try a search on "japanese" with IE. The first page is, > quite definitely, UTF-8. I'd say it's about time one of the major search > engines went over to Unicode, big time, and what we have here seems like a > big "Go Girl!" for Google. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.371 / Virus Database: 206 - Release Date: 13/6/2545
Re: Hexadecimal characters.
At 10:12 AM 6/20/02 +0100, Avarangal wrote: >Long time ago I raised this matter in this forum. Hope you will go through >filling the proposal forms, etc... > >In addition to your reasons, hex code code points need to be established but >not the character shapes. All languages may not need to use the 0-9 and a-f >shapes. But need to use the same code points. how do you enter these? Right now, if I want to write a hexadecimal number, I write using my keyboard. In fact *all* keyboards have a way for me to enter 0-9 and A-F or a-f. *No* existing keyboards has mappings to these novel characters. Some systems let me give longer key sequences to designate a unicode character, but that's not very convenient. Therefore, the most likely consequence of adding 16 characters would be that they are *not* being universally used, and possibly only used by a few people or a few applications. The proposal fails to address this migration issue, as well as a number of other issues others have mentioned, such as the issue of confusability caused by similarity to existing characters, compatibility mappings etc. etc. In sum, the downsides of taking such an action would have to be outweighed by other benefits, which themselves need to be clearly established (and not just taken for granted) before the proposal could reasonably be considered. I personally doubt that benefits can be shown to outweigh the substantial negative impacts such a proposal would have, and think very unlikely that they would be so compelling as to warrant encoding on the BMP. But this is all speculation until somebody actually writes the whole thing down - and not just a sketch. A./
Re: Chess symbols, ZWJ, Opentype and holly type ornaments.
> IOW, brevity's wit's soul. Well-spoken, dear Polonius. But better to Adorn the soul of wit so briefly put to us. "My liege, and madam, to expostulate What majesty should be, what duty is, Why day is day, night is night, and time is time. Were nothing but to waste night, day, and time. Therefore, since brevity is the soul of wit, And tediousness the limbs and outward flourishes, I will be brief. Your noble son is mad." --the Bard
Re: Hexadecimal characters.
From: "Tom Finch" <[EMAIL PROTECTED]> > Rick McGowan wrote: > > >What is the problem you are trying to solve by encoding 16 things in a > >row? > > To answer this, it is better to have 16 in a row as it makes computation of a > numeric value from the character value easier and more straightforward. A > different proposal could be made for just 6 extra digits for hexadecimal if it > is determined that space is really at a premium. But then you lose the > unambiguousness of sixteen separate characters. Or, since the proposal has already been rejected you can just write the conversion code using the existing numbers/letters and call it a day? :-) MichKa Michael Kaplan Trigeminal Software, Inc. -- http://www.trigeminal.com/
Re: Hexadecimal characters.
Hmmm. I was hoping this discussion would go away after the initial round of reasons why it won't happen. > The problem being solved is properly supporting the base sixteen system. It is already properly supported. In fact, Unicode contains far more than a mere 16 entities sufficient for hexadecimal. With Unicode, any number base up to about 94,000 can easily be represented. It should satisfy even the hippest numerologists. > Also using letters to stand for numeric data can lead to confusion-- > this is BAD. Perhaps, but it's like spelling versus spelling reform. The current representation is already engrained in the computer-literate culture, and you'll be hard-pressed to change it, especially without a compelling story. And this story isn't very compelling. > it is better to have 16 in a row as it makes computation of a numeric > value from the character value easier and more straightforward. So what? This isn't rocket science. The hex-binary conversion problem is so trivial that every beginning CS student has probably had a homework assignement to solve it. Big deal. Five lines of library code. > ...But then you lose the unambiguousness of sixteen separate characters. We already have done: because everybody already uses 0-9, a-f and A-F, and there's tons of software that already deals with this and mounds of existing data. The problem won't be solved, it will be augmented with yet another representation. The proposal is a non-starter. There isn't even a glimmer of serious interest here, and it's rather pointless to continue this discussion. Rick
Re: Hexadecimal characters.
-- On Thu, 20 Jun 2002 16:00:15 Eric Muller wrote: >For the scripts which have their own digits, are there conventions to >write hexadecimal numbers with those digits? If I read a Devanagari text >book, will I see "20A7", or "२०?७" (where "?" stands for whatever is >used for A)? > >Thanks, >Eric. These would remain decimal. Some scripts have numerals beyond 0-9 already, but those are part of the tradition associated with it. For instance Mayan or Sumerian would have numerals beyond ten (if they were to be included in Unicode). _ Communicate with others using Lycos Mail for FREE! http://mail.lycos.com/
Re: Chess symbols, ZWJ, Opentype and holly type ornaments.
On Thursday, June 20, 2002, at 03:25 PM, Kenneth Whistler wrote: > I think what a number of people on the list have been hinting -- or > openly stating -- is that prolixity is not a virtue on an email list > when trying to convey one's ideas. > IOW, brevity's wit's soul. == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://homepage.mac.com/jenkins/
Re: Hexadecimal characters.
-- On Thu, 20 Jun 2002 15:14:13 Rick McGowan wrote: >What is the problem you are trying to solve by encoding 16 things in a >row? To answer this, it is better to have 16 in a row as it makes computation of a numeric value from the character value easier and more straightforward. A different proposal could be made for just 6 extra digits for hexadecimal if it is determined that space is really at a premium. But then you lose the unambiguousness of sixteen separate characters. _ Communicate with others using Lycos Mail for FREE! http://mail.lycos.com/
Re: Hexadecimal characters.
-- On Thu, 20 Jun 2002 15:14:13 Rick McGowan wrote: >Tom Finch wrote: > >> Hexadecimal is very important and deserves to be in Plane 0. > >Hmmm, well.. In this case, importance has nothing to do with it, and going >off on a comparison of the importance of Devanagari as opposed to Hex will >not prevail in this discussion. Agreed. >Hex is already representable with characters in plane zero, as people have >been pointing out. There are the ten digits 0-9 and the letters A-F. >People have explained this, and why your proposal would be confusing and >not cater to legacy data. > >What is the problem you are trying to solve by encoding 16 things in a >row? And how would people convert their legacy data forward while avoiding >confusion, etc? And how do you proposal to deal with multiple >representation problems? Legacy data? > > Rick Another example might be superscript 4 (2074h). You already can say 2^4 for sixteen, but the new character allows you to say it easier. Further, there already is multiple representation problems--A-F as well as a-f. The problem being solved is properly supporting the base sixteen system. Multiple representation is a problem as you say so yourself, and the current way cannot avoid this (A-F or a-f). Also using letters to stand for numeric data can lead to confusion--this is BAD. Legacy data will be dealt with by accepting the old system as long as necessary. _ Communicate with others using Lycos Mail for FREE! http://mail.lycos.com/
Re: Hexadecimal characters.
For the scripts which have their own digits, are there conventions to write hexadecimal numbers with those digits? If I read a Devanagari text book, will I see "20A7", or "२०?७" (where "?" stands for whatever is used for A)? Thanks, Eric.
Re: Hexadecimal characters.
At 15:03 -0700 2002-06-20, Kenneth Whistler wrote: >In any case, I wonder if Tom could explain what is special about >hexadecimal expressed with "0".."9", "A".."F", as opposed to >any other base numeric system that might be in widespread use, >(duodecimal and vigesimal come to mind) which would lead to a >particular argument that it should be encoded with a distinct set >of characters. Actually I heard once that there were duodecimal digits (hm for the six-fingered I guess) for 11 and 12 out there somewhere that somebody was mulling over proposing. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Hexadecimal characters.
Kenneth Whistler scripsit: > In any case, I wonder if Tom could explain what is special about > hexadecimal expressed with "0".."9", "A".."F", as opposed to > any other base numeric system that might be in widespread use, > (duodecimal and vigesimal come to mind) which would lead to a > particular argument that it should be encoded with a distinct set > of characters. Oh, just wait. The next step will be to propose separate characters for binary 1 and 0. He will also propose these novel glyphs for hex digits A-F: *** * * * *** *** * * * * * * * *** *** *** * *** * * * * * * * * * * *** * "A fanatic is someone who can't change his mind and won't change the subject." --Winston Churchill -- John Cowan <[EMAIL PROTECTED]> http://www.reutershealth.com I amar prestar aen, han mathon ne nen,http://www.ccil.org/~cowan han mathon ne chae, a han noston ne 'wilith. --Galadriel, _LOTR:FOTR_
Re: Hexadecimal characters.
Tom Finch said: > Hmm, so representing Devanagari digits is more important > than hexadecimal, which is used almost more than decimal > on the web? I think you may be misconstruing the purpose of the character encoding here. If I want to represent the hexadecimal numbers 0x60DB 0x618A in email or in HTML hexadecimal NCR's or whatever, guess what -- I can use ASCII (or Latin-1 or Unicode) characters: "6" "0" "D" "B" "6" "1" "8" "A" -- and that is what everyone does. It is also what is *required* by the HTML and XML standards for the representation of hexadecimal NCR's on the web, by the way. If I want to represent Devanagari digits, on the other hand, I don't have an ASCII representation to hand -- those *require* separate encoding, since Devanagari characters are not the same as Latin characters or Arabic digits. So Devanagari digits were encoded in Unicode. Simple. > I know inertia is a law of the universe, but this is rediculous. > Hexadecimal is very important and deserves to be in Plane 0. Umm. It *is* in Plane 0: U+0030..U+0039, U+0041..U+0046 (and U+0061..U+0066), to be exact. > I see a good spot in misc technical (23D--oh look hexadecimal again). Nobody has any quarrel with the notion that hexadecimal notation is very important in computer science -- and vital for character encoding discussions. The issue is whether we need any separate characters to represent hexadecimal digits, when we already have the digits everybody has been using for decades encoded. --Ken
Re: Hexadecimal characters.
At 17:45 -0400 2002-06-20, Tom Finch wrote: >Hmm, so representing Devanagari digits is more important than >hexadecimal, which is used almost more than decimal on the web? I >know inertia is a law of the universe, but this is rediculous. >Hexadecimal is very important and deserves to be in Plane 0. I see a >good spot in misc technical (23D--oh look hexadecimal again). Hexadecimal is represented in Plane 0, with 0123456789AaBbCcDdEeFf. I don't get it. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Hexadecimal characters.
Tom Finch scripsit: > Hmm, so representing Devanagari digits is more important than > hexadecimal, which is used almost more than decimal on the web? I know > inertia is a law of the universe, but this is rediculous. Hexadecimal is > very important and deserves to be in Plane 0. I see a good spot in misc > technical (23D--oh look hexadecimal again). Clueless still, I see. (Yes, that's ad hominem.) -- John Cowan <[EMAIL PROTECTED]> http://www.reutershealth.com I amar prestar aen, han mathon ne nen,http://www.ccil.org/~cowan han mathon ne chae, a han noston ne 'wilith. --Galadriel, _LOTR:FOTR_
Re: Hexadecimal characters.
Tom Finch wrote: > Hexadecimal is very important and deserves to be in Plane 0. Hmmm, well.. In this case, importance has nothing to do with it, and going off on a comparison of the importance of Devanagari as opposed to Hex will not prevail in this discussion. Hex is already representable with characters in plane zero, as people have been pointing out. There are the ten digits 0-9 and the letters A-F. People have explained this, and why your proposal would be confusing and not cater to legacy data. What is the problem you are trying to solve by encoding 16 things in a row? And how would people convert their legacy data forward while avoiding confusion, etc? And how do you proposal to deal with multiple representation problems? Legacy data? Rick
Re: MySQL 3.23.51 and unicode
On Thu, Jun 20, 2002 at 04:29:33PM +0700, Art - Arthit Suriyawongkul wrote: > but as long as it can stores ASCII encoded text, > it can also stores UTF-8 encoded text. > (just store, not understand) > > if that's true, > so, with some additional works (in user program layer, not MySQL), > can we do support it ? Yes. LiveJournal.com, for example, uses UTF-8 and MySQL quite successfully. -- Evan Martin [EMAIL PROTECTED] http://neugierig.org
Re: Hexadecimal characters.
> I wish to propose sixteen consecutive digits for the purpose > of displaying hexadecimal values. The usefulness of this is > very obvious--it would be extensively used in the unicode spec itself! > ... Plus this makes numbers with hexadecimal characters unambiguously > base sixteen. Blue-skying a bit about this, why stop with computer science and hexadecimals? The fields of astronomy and geography make widespread use of sexagesimal numerics, although for practical reasons they format each sexagesimal digit using decimal notation currently. So why stop with sixteen consecutive digits for hexadecimal? Wouldn't it be equally obviously useful to have sixty consecutive digits for sexagesimal numeric representation? The problem, of course, is that the numbers (0..59) already exist in numeric space, and the character digits ("0".."9") already exist as encoded characters in Unicode (and ASCII, and all other encoded character sets), so it is a little difficult to see what the exact utility of another set of 60 characters would be for this. In any case, I wonder if Tom could explain what is special about hexadecimal expressed with "0".."9", "A".."F", as opposed to any other base numeric system that might be in widespread use, (duodecimal and vigesimal come to mind) which would lead to a particular argument that it should be encoded with a distinct set of characters. --Ken
Re: Chess symbols, ZWJ, Opentype and holly type ornaments.
> In view of the fact that some people are unwilling to let my > ideas be discussed in this forum upon their academic merit but simply use an > ad hominem attack almost every time I post (before many people can have the > chance to sit down and, if they wish, have a serious read of my ideas), when > it seems that their objection is really about the Unicode Consortium having > included the word published in section 13.5 of chapter 13 of the Unicode > specification, ... Speaking here as an editor of the Unicode Standard, I do not find the word "published" in section 13.5 of the book. Perhaps William was thinking of the subheader "Promotion of Private-Use Characters". Since -- despite the explicit text that follows in that section -- some people seem to be getting the wrong idea about private-use character assignments as a step towards standardization, it is quite likely that the editorial committee will be rewriting that section for Unicode 4.0, to provide further clarification for users. > I feel > that the fact that I am trying to use the Unicode specification as it exists > rather than on some nudge nudge wink wink understanding of how some people > feel that it should be interpreted is at the root of the problem. If parts of the Unicode Standard are unclear and are leading to misinterpretations or incompatible interpretations of how characters should be used -- including private-use agreements for private-use characters, then airing those issues is certainly germane to this discussion list. I think what a number of people on the list have been hinting -- or openly stating -- is that prolixity is not a virtue on an email list when trying to convey one's ideas. --Ken
Re: Hexadecimal characters.
-- On Thu, 20 Jun 2002 12:56:25 Kenneth Whistler wrote: >> >> At 03:03 AM 6/20/02 -0400, Tom Finch wrote: >> >> >I wish to propose sixteen consecutive digits for the purpose of displaying >> >> >hexadecimal values. [...] Has this been considered? >> >> > >[David Starner] > >> >> I seem to recall that it has. The problem is, they're just new copies of >> >> old characters. An A used in hexadecimal notation is just an A. Besides the >> >> problem with normalization, you have the problem with all look-alike >> >> characters - people won't use them consistently. Even if this got adopted, >> >> 99% of time you looked at hexadecimal numbers, they would be in plain old >> >> ASCII, so you don't really gain anything but confusion. It's a no-go. >> >> > >[Tom Finch] > >> I looked at the code chart and there are many 16 character sequences empty. > >That is true enough -- but the more appropriate place to look is the >BMP roadmap: > >http://www.unicode.org/roadmaps/bmp-3-6.html > >where you can see that many of those empty columns are already accounted >for by roadmapped allocations for living minority scripts. The BMP is >rather tight now for allocation, and it is unlikely that the committees are >going to look kindly on miscellaneous collections of dubious stuff for >encoding there. > >Of course there is plenty of space in Plane 1 for just about everything, >but... > >That said, David Starner has this one right. There really is no good reason >to create clones of 0..9, A..F to represent hexadecimal digits. The >existing characters do that just fine, and represent an overwhelming >legacy data representation precedent that any proposal such as Tom Finch's >would have to cope with. Introducing new characters for these would just >introduce confusion and would be unlikely to be implemented in any >useful way. > >--Ken > Hmm, so representing Devanagari digits is more important than hexadecimal, which is used almost more than decimal on the web? I know inertia is a law of the universe, but this is rediculous. Hexadecimal is very important and deserves to be in Plane 0. I see a good spot in misc technical (23D--oh look hexadecimal again). _ Communicate with others using Lycos Mail for FREE! http://mail.lycos.com/
RE: Rotated Glyphs
Thanks to Jungshik Shin for the solution to the problem and to Marco for his comments; a corrected page reflecting both is up: http://www.columbia.edu/kermit/glass.html (if you looked at it before, you'll need to refresh the images). I also added a bit more about BIDI, using the Hebrew University ALEPH library system as an illustration. - Frank
Re: Ethiopic chromatic fonts (Was: Chess symbols, ZWJ, Opentype and holly type ornaments.)
At 13:13 6/20/2002, [EMAIL PROTECTED] wrote: >If by "the options" you mean "what kind of mechanism would it take?", then >it would amount to a substitution rule along the lines (using some pseudo >notation) of > > gU1368 > gU1368_a [colour = red] gU1368_b [colour = black] > >or > > gU1368 > gU1368_a [colour = alt] gU1368_b [colour = default] Yes, that is the sort of thing I meant by 'options', but also more generally what are the different approaches that one might take to the problem? Creating a font mechanism that specifies chromatic variation is one approach, but as you note the likelihood of getting this implemented are very slim. I know of at least one other option, but I'm under NDA :) Regards, John Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] Language must belong to the Other -- to my linguistic community as a whole -- before it can belong to me, so that the self comes to its unique articulation in a medium which is always at some level indifferent to it. - Terry Eagleton
Re: Google and Unicode
It looks they are using UTF-8 starting from MSIE 5.5 or so, for all sites actually if you search using their input form. On Thu Jun 20 11:18:47 2002 -0700 Paul Deuter wrote: >Is this strictly true? I think there are cases where the results are >sent back ISO-8859-1. It would not surprise me if there was a more >complex algorithm which tried to determine the requesting browser. >I love UTF-8 but some older browsers do not tolerate it very well. >Sigh. >-Paul > >-Original Message- >From: Roozbeh Pournader [mailto:[EMAIL PROTECTED]] >Sent: Thursday, June 20, 2002 9:40 AM >To: Unicode List >Subject: Google and Unicode > > > >Did anyone notice that Google now uses Unicode (UTF-8) in displaying the >search results? No more of that 'This page contains Russian characters >that..' > >:) > >roozbeh > > -- ☻ Ričardas Čepas ☺
Re: Google and Unicode
On Thu Jun 20 21:10:16 2002 +0430 Roozbeh Pournader wrote: > >Did anyone notice that Google now uses Unicode (UTF-8) in displaying the >search results? No more of that 'This page contains Russian characters >that..' > >:) > No, it still has 'This page contains...' blurb. Does it require some special setting? Hmm.. it does indeed use UTF-8 on national page.. but not on .com page. -- ☻ Ričardas Čepas ☺
Re: Creative IDN Opportunities
I think it is somehow tied into the whole ICANN political mess. I haven't sorted it out yet but I am interested if anyone else has... Barry Caplan www.i18n.com At 02:13 PM 6/20/2002 -0400, Suzanne M. Topping wrote: >Couldn't help but cringe at the last line of this press release. > >Can anyone give me a quick update on the status of IDN standards work? >It's been a while since I checked it out...
Re: Ethiopic chromatic fonts (Was: Chess symbols, ZWJ, Opentype and hollytype ornaments.)
On 06/20/2002 01:34:34 PM John Hudson wrote: >>The question interests me because a while ago now I was amusing myself with >>the idea of being able to do this kind of thing in Graphite (another >>smart-font technology akin to OpenType) in order to emulate dual-coloured >>Ethiopic manuscripts -- specifically, I was thinking of a way to handle the >>paragraph marks that are done with four black dots interspersed with five >>red dots. >> >>Can an OpenType (or Graphite) font be programmed to do this? No. Should the >>technology be revised to accommodate this? There's not a clear enough case >>to warrent the increased complexity, I think. (But it would be possible to >>implement, and it's still amusing to imagine doing so.) > >Peter, what do you see as the options for achieving something like this? If by "the options" you mean "what kind of mechanism would it take?", then it would amount to a substitution rule along the lines (using some pseudo notation) of gU1368 > gU1368_a [colour = red] gU1368_b [colour = black] or gU1368 > gU1368_a [colour = alt] gU1368_b [colour = default] If you means, "what likelihood do you see of anyone implementing support for something like this", that would be slim. >Some aspects of colour use in Ethiopic manuscripts can clearly be handled >using markup (e.g. the small, raised red glyphs providing chant >instructions, for which I'm wondering if existing ruby notation solutions >might be easily adapted). The paragraph marker is a tricky problem, though. Indeed, since it requires a character's shape to be divided into two differently-coloured glyphs. It probably wouldn't be hard to implement a smart-font system that could support switching between default and alternate colours (where the actual colour choices are specified somewhere else in the system, with control handled using some kind of feature or attribute system (e.g. in the Graphite Description Language, this could easily be expressed as a glyph attribute), but I don't really expect anybody to implement this. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
RE: Google and Unicode
On Thu, 20 Jun 2002, Paul Deuter wrote: >Is this strictly true? I think there are cases where the results are >sent back ISO-8859-1. Might be. However, try a search on "japanese" with IE. The first page is, quite definitely, UTF-8. I'd say it's about time one of the major search engines went over to Unicode, big time, and what we have here seems like a big "Go Girl!" for Google. Sampo Syreeni, aka decoy - mailto:[EMAIL PROTECTED], tel:+358-50-5756111 student/math+cs/helsinki university, http://www.iki.fi/~decoy/front openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Re: Hexadecimal characters.
> >> At 03:03 AM 6/20/02 -0400, Tom Finch wrote: > >> >I wish to propose sixteen consecutive digits for the purpose of displaying > >> >hexadecimal values. [...] Has this been considered? > >> [David Starner] > >> I seem to recall that it has. The problem is, they're just new copies of > >> old characters. An A used in hexadecimal notation is just an A. Besides the > >> problem with normalization, you have the problem with all look-alike > >> characters - people won't use them consistently. Even if this got adopted, > >> 99% of time you looked at hexadecimal numbers, they would be in plain old > >> ASCII, so you don't really gain anything but confusion. It's a no-go. > >> [Tom Finch] > I looked at the code chart and there are many 16 character sequences empty. That is true enough -- but the more appropriate place to look is the BMP roadmap: http://www.unicode.org/roadmaps/bmp-3-6.html where you can see that many of those empty columns are already accounted for by roadmapped allocations for living minority scripts. The BMP is rather tight now for allocation, and it is unlikely that the committees are going to look kindly on miscellaneous collections of dubious stuff for encoding there. Of course there is plenty of space in Plane 1 for just about everything, but... That said, David Starner has this one right. There really is no good reason to create clones of 0..9, A..F to represent hexadecimal digits. The existing characters do that just fine, and represent an overwhelming legacy data representation precedent that any proposal such as Tom Finch's would have to cope with. Introducing new characters for these would just introduce confusion and would be unlikely to be implemented in any useful way. --Ken
Re: Hexadecimal characters.
-- On Thu, 20 Jun 2002 9:42:12 Frank da Cruz wrote: >> At 03:03 AM 6/20/02 -0400, Tom Finch wrote: >> >I wish to propose sixteen consecutive digits for the purpose of displaying >> >hexadecimal values. [...] Has this been considered? >> >> I seem to recall that it has. The problem is, they're just new copies of >> old characters. An A used in hexadecimal notation is just an A. Besides the >> problem with normalization, you have the problem with all look-alike >> characters - people won't use them consistently. Even if this got adopted, >> 99% of time you looked at hexadecimal numbers, they would be in plain old >> ASCII, so you don't really gain anything but confusion. It's a no-go. >> >The proposal that was rejected is this one: > > ftp://kermit.columbia.edu/kermit/ucsterminal/hex.txt > >- Frank This is for full byte representation, which requires of course 256 rather than 16 characters. I looked at the code chart and there are many 16 character sequences empty. Oh and hi John Cowan, recognize me from "hexadecimal lojban"? _ Communicate with others using Lycos Mail for FREE! http://mail.lycos.com/
Ethiopic chromatic fonts (Was: Chess symbols, ZWJ, Opentype and holly type ornaments.)
At 10:32 6/20/2002, [EMAIL PROTECTED] wrote: > >> The potentially interesting question of whether an OpenType fount may be > >> programmed to produce a two colour display has not been discussed. > >Did you raise that question? That's something I might have noticed if it >had been stated in a two-line post. But I didn't notice it, and I'm >guessing it's because it was in the midst of some 500 lines. > >The question interests me because a while ago now I was amusing myself with >the idea of being able to do this kind of thing in Graphite (another >smart-font technology akin to OpenType) in order to emulate dual-coloured >Ethiopic manuscripts -- specifically, I was thinking of a way to handle the >paragraph marks that are done with four black dots interspersed with five >red dots. > >Can an OpenType (or Graphite) font be programmed to do this? No. Should the >technology be revised to accommodate this? There's not a clear enough case >to warrent the increased complexity, I think. (But it would be possible to >implement, and it's still amusing to imagine doing so.) Peter, what do you see as the options for achieving something like this? Some aspects of colour use in Ethiopic manuscripts can clearly be handled using markup (e.g. the small, raised red glyphs providing chant instructions, for which I'm wondering if existing ruby notation solutions might be easily adapted). The paragraph marker is a tricky problem, though. William will be thrilled to hear that one option would be to use a PUA codepoint for a zero-width combining character to represent the red dots, and include a variant glyph for the Ethiopic paragraph mark that contains only the black dots (or visa versa). If the user input the standard paragraph mark U+1368, the default glyph would be used, but if the user then input the PUA codepoint for the combining dots, we would contextually change the default paragraph mark for the variant with space for the combining dots. In VOLT notation: uni1368 -> uni1368.black | uni1368.red This puts us in a position where we can use markup to colour the different elements. Unfortunately, this has to be done using a PUA codepoint, because markup applies to characters, not glyphs. By making the formation contextually dependent on the PUA character, we ensure that a correct paragraph sign is always shown if the PUA codepoint is not used or if a font is used that does not contain a glyph for the PUA codepoint (in which case you would get the paragraph sign followed by a .notdef glyph -- not pretty, but unambiguous). John Hudson Tiro Typeworks www.tiro.com Vancouver, BC [EMAIL PROTECTED] Language must belong to the Other -- to my linguistic community as a whole -- before it can belong to me, so that the self comes to its unique articulation in a medium which is always at some level indifferent to it. - Terry Eagleton
RE: Google and Unicode
Is this strictly true? I think there are cases where the results are sent back ISO-8859-1. It would not surprise me if there was a more complex algorithm which tried to determine the requesting browser. I love UTF-8 but some older browsers do not tolerate it very well. Sigh. -Paul -Original Message- From: Roozbeh Pournader [mailto:[EMAIL PROTECTED]] Sent: Thursday, June 20, 2002 9:40 AM To: Unicode List Subject: Google and Unicode Did anyone notice that Google now uses Unicode (UTF-8) in displaying the search results? No more of that 'This page contains Russian characters that..' :) roozbeh
Creative IDN Opportunities
Couldn't help but cringe at the last line of this press release. Can anyone give me a quick update on the status of IDN standards work? It's been a while since I checked it out... WEB Addresses Take On New Look As Multilingual & Symbol-Based Capability Launched http://www.globalization.com/newsIndex.cfm?newsID=news57700062020024 Neteka Inc. today announced it's International Domain Name Server (DNS) has gone live on the dot.BZ Registry, extending the Internet's accepted character set for Uniform Resource Locator (URL) addresses from English letters and numbers, to the thousands of symbols and letters of the world's many alphabets. Web addresses supporting the world's languages give the non-English Internet community a chance to break free from the confines of the English alphabet. In addition, the expanded character set gives rise to untapped creative opportunities to combine symbols and letters for unique Web addresses.
Re: Chess symbols, ZWJ, Opentype and holly type ornaments.
On 06/20/2002 10:48:27 AM "Doug Ewell" wrote: >> it would be best for me not to post details of my research in this >> forum. Don't despair, William. Just please recognise that many of us don't have the ability to read lots of long posts. >Also, as I have tried to convey before, many of us lead relatively busy >lives and receive a lot of e-mail, and don't always have time to read >through a post of 2,000 words or more. When it gets that long, it's >better to post it on your Web site and send us an announcement. Speaking for myself, my main frustration with William's contributions has been with the length of the posts. >> My understanding was that this forum was a place to ask questions of end >> users of the Unicode system. I have done that. In this thread I have asked >> interesting scientific questions. And some of those questions have probably gone unnoticed by some or many of the members because they have been buried in lengthy messages that didn't provide enough motivating attractions for me to take the time to read them. (Note that I've read Doug's message because my experience with his contributions tells me that I can expect a good benefit : time investment ratio.) >>My understanding is that academic freedom is about >> being able to hold unpopular ideas without personal disadvantage. I don't think anyone has given you personal disadvantage because of any of your ideas. I started paying less attention to your posts because they were consistently very long and often revisited ideas that I either didn't agree with or wasn't particularly interested in. There's nothing personal in that. >> I feel >> that the fact that I am trying to use the Unicode specification as it exists >> rather than on some nudge nudge wink wink understanding of how some people >> feel that it should be interpreted is at the root of the problem. > >See, I think it's the other way around. I just reread Section 13.5 and >I don't see anything about the character-glyph model or other policies >of Unicode being suspended for the PUA... My opinion differs a bit from Doug here. I don't see any problem if you want to encode ligatures or presentation forms as PUA codepoints. I'm aware that people still need to work with software that does not provide support for the kind of processing assumed by the character-glyph model. But, I don't agree with devising some grand scheme for everyone to encode ligatures in the PUA in the same way. (I actually don't know if you proposed that since your discussions of ligatures were ones I wasn't interested in reading. I'm making assumptions based on earlier messages that I did read.) If you want to do it, fine. If I need to interchange documents with you at some point, I'll take an interest in your encoding of PUA characters. Until then, I'm not interested. Not for any personal reason. It's just that I don't have a need. >> The potentially interesting question of whether an OpenType fount may be >> programmed to produce a two colour display has not been discussed. Did you raise that question? That's something I might have noticed if it had been stated in a two-line post. But I didn't notice it, and I'm guessing it's because it was in the midst of some 500 lines. The question interests me because a while ago now I was amusing myself with the idea of being able to do this kind of thing in Graphite (another smart-font technology akin to OpenType) in order to emulate dual-coloured Ethiopic manuscripts -- specifically, I was thinking of a way to handle the paragraph marks that are done with four black dots interspersed with five red dots. Can an OpenType (or Graphite) font be programmed to do this? No. Should the technology be revised to accommodate this? There's not a clear enough case to warrent the increased complexity, I think. (But it would be possible to implement, and it's still amusing to imagine doing so.) >Such a >> discussion could have either established that it could be done, or that it >> could not be done in which case perhaps some extension to OpenType could be >> produced for the future which could have that facility. If so, how would >> that facility best be produced? That question isn't relevant until there is concensus that it *should* be produced. I don't think it should. BTW, I didn't see what reasons *you* had for wanting to control colour. If it's just to handle different colours for squares and pieces on a chess board, that doesn't belong in OpenType, even in amusing imaginings: it belongs in markup. IMO. - Peter PS: As always, the opions I express are my own and not those of the organisation for which I work. --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
Re: Chess symbols...
Doug Ewell wrote: > (This is a moderated list, and Sarasvati could have withheld > your postings if it were appropriate to do so, but it is not > and she has not.) Actually, this is a monitored list, not moderated. Truly inappropriate material is handled by removing senders or domains from participation in the list, not by pre-empting their inappropriate postings. In other words, the rope for self-hanging is generously provided by your cheerful, -- Sarasvati
RE: Rotated Glyphs
Marco Cimarosti <[EMAIL PROTECTED]> wrote: > Frank da Cruz wrote: > > As part of the release, I made some screen shots showing text in many > > languages and writing systems on the same terminal screen: > > > > http://www.columbia.edu/kermit/glass.html > > > > The CJK examples were so crowded I didn't notice until James > > Kass pointed it > > out that they were also sideways! Windows had rotated each > > glyph 90 degrees counterclockwise. > > I am sorry I cannot help with this: I have never seen before such a thing > happening in Windows. However, I have some observations that might or might > not lead you towards a solutions. > > It seems that something very strange is happening with kerning: all glyphs > seem to be equally spaced, including CJK characters (which should be twice > the width of a Western characters, even in non-proportional fonts) and > Hindi, which has no "monospace" version. Also the Armenian and Georgian > glyphs look very strange (too spaced), because they are clearly designed for > a proportional font. > These are the kinds of problems you encounter in a terminal emulator, in which every character must be placed in a fixed-size cell, even CJK. The traditional solution for CJK terminal emulation is a "duospaced" font, in which the {K,H}an{j,z}i are double-width, the rest are single-width. I'm not aware of any such font for Windows, although I think there is one for Linux (X). In the future we might consider taking CJK from one font and the rest from another, and painting doublewidth cells for the CJK. Armenian does look a bit microscopic in this font, but not (if you scroll down to the bottom of the page) in Everson Mono Terminal (thanks, Michael!). Oops, the original page lacked Armenian in the EMT example, but I'll redo all the screen shots now that I know how to get upright CJK characters. > What I am assuming is that the display API, somehow, overrides the font's > metric, forcing a fixed width. Do you know whether the TextOut*() API is > called with some special flag? Maybe that flag also turns on some strange > rotation for CJK characters. > > Beside that, I guess you already know that the program fails to display > complex scripts correctly: Arabic and Hebrew are displayed left-to-right; > Arabic has no ligatures nor contextual shaping; Hindi has no ligatures nor > handling of non-spacing marks. > This is all noted on the Web page. In the traditional terminal-to-host environment, BIDI must be handled by the host. We do plan to address some of the other issues in a future release; for now Kermit 95 handles what ISO 10646 refers to as "Conformance Level 1", which is mostly (if not completely) adequate for the terminal-to-host environment where you would probably not expect to find Indic (etc) anyway. I suppose the irony here is that societies that were ill-served by computers prior to 1995 are now the ones that use only the latest technology, at least when they must deal with text in their own languages. This is similar to countries where telecommunications networks are first installed using fiber optics and satellites rather than building on a 125-year old copper cable plant. But don't get me wrong, I *like* terminals and plain text :-) - Frank
RE: Google and Unicode
> Did anyone notice that Google now uses Unicode (UTF-8) in displaying the > search results? No more of that 'This page contains Russian characters > that..' Yes, noticed that yesterday, I searched for something Linux-related, and got lots of hits to Chinese and Japanese translations of various Linux FAQs... (the "screenshots" were left untranslated and my search matched those...)
RE: Rotated Glyphs
Frank da Cruz wrote: > As part of the release, I made some screen shots showing text in many > languages and writing systems on the same terminal screen: > > http://www.columbia.edu/kermit/glass.html > > The CJK examples were so crowded I didn't notice until James > Kass pointed it > out that they were also sideways! Windows had rotated each > glyph 90 degrees counterclockwise. I am sorry I cannot help with this: I have never seen before such a thing happening in Windows. However, I have some observations that might or might not lead you towards a solutions. It seems that something very strange is happening with kerning: all glyphs seem to be equally spaced, including CJK characters (which should be twice the width of a Western characters, even in non-proportional fonts) and Hindi, which has no "monospace" version. Also the Armenian and Georgian glyphs look very strange (too spaced), because they are clearly designed for a proportional font. What I am assuming is that the display API, somehow, overrides the font's metric, forcing a fixed width. Do you know whether the TextOut*() API is called with some special flag? Maybe that flag also turns on some strange rotation for CJK characters. Beside that, I guess you already know that the program fails to display complex scripts correctly: Arabic and Hebrew are displayed left-to-right; Arabic has no ligatures nor contextual shaping; Hindi has no ligatures nor handling of non-spacing marks. _ Marco
Re: Rotated Glyphs
On Thu, 20 Jun 2002, Frank da Cruz wrote: > As part of the release, I made some screen shots showing text in many > languages and writing systems on the same terminal screen: > > http://www.columbia.edu/kermit/glass.html > > The CJK examples were so crowded I didn't notice until James Kass pointed it > out that they were also sideways! Windows had rotated each glyph 90 degrees > counterclockwise. But when another font is used (bottom of same page) the > same glyphs (e.g. Japanese Kana) are upright. In all cases where CJK characters are rotated, note that the names of fonts begin with '@'. I don't know exactly how it works, but under MS-Windows, fonts whose name begin with '@' have their CJK characters rotated by 90 degrees(for use in vertical writing). Try to use 'Andale Mono WT J' instead of '@Andale Mono WT J' and CJK characters should be rendered upright. Jungshik Shin
Re: Rotated Glyphs
> On Thu, 20 Jun 2002, Frank da Cruz wrote: > > > As part of the release, I made some screen shots showing text in many > > languages and writing systems on the same terminal screen: > > > > http://www.columbia.edu/kermit/glass.html > > > > The CJK examples were so crowded I didn't notice until James Kass > > pointed it out that they were also sideways! Windows had rotated each > > glyph 90 degrees counterclockwise. But when another font is used > > (bottom of same page) the same glyphs (e.g. Japanese Kana) are upright. > > In all cases where CJK characters are rotated, note that the names of > fonts begin with '@'. I don't know exactly how it works, but under > MS-Windows, fonts whose name begin with '@' have their CJK characters > rotated by 90 degrees(for use in vertical writing). Try to use 'Andale > Mono WT J' instead of '@Andale Mono WT J' and CJK characters should be > rendered upright. > Aha, mystery solved, thanks! - Frank
Google and Unicode
Did anyone notice that Google now uses Unicode (UTF-8) in displaying the search results? No more of that 'This page contains Russian characters that..' :) roozbeh
Re: Chess symbols, ZWJ, Opentype and holly type ornaments.
Not without separate tools to do the input. Something that the current proposal fails to mention. By the time it is possible to do, no one will be using the OSes in question any more (certainly no one who uses computers and plays chess!). MichKa Michael Kaplan Trigeminal Software, Inc. -- http://www.trigeminal.com/ - Original Message - From: "Tom Gewecke" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, June 20, 2002 8:13 AM Subject: Re: Chess symbols, ZWJ, Opentype and holly type ornaments. > >Suppose that one wishes to produce a chess diagram in a Unicode compliant > >manner in a document produced using Word 97 running on either a Windows 95 > >platform or a Windows 98 platform, with a view also to save the document as > >plain text. One way to do that would be to use a chess fount which is > >mapped to my collection of code points for a chess fount. I would be > >interested to know of how, if at all, that could be done in a document > >produced using Word 97 running on either a Windows 95 platform or a Windows > >98 platform using regular Unicode or XML. > > For someone unfamiliar with Windows and Word 97, are these proposed > operations (inputting PUA character sequences, saving them as UTF-8 plain > text, opening and correctly reading the document produced) technically > possible on the systems mentioned, assuming you have an appropriate font? > > > >
Re: SCSU and BOCU (was: Chess symbols etc.)
John Cowan <[EMAIL PROTECTED]> asked: > What's your view of BOCU? I have BOCU-1 on my list of things to implement. I seem to recall being disappointed at the lack of ASCII transparency, and feeling that the compression performance relative to SCSU might not be worth making a change, but it's an interesting technique and I am looking forward to experimenting with it. I'll probably have lots of questions when I get to that point. -Doug Ewell Fullerton, California
Re: Chess symbols, ZWJ, Opentype and holly type ornaments.
William Overington wrote: > In view of the fact that some people are unwilling to let my ideas > be discussed in this forum upon their academic merit but simply use > an ad hominem attack almost every time I post (before many people > can have the chance to sit down and, if they wish, have a serious > read of my ideas), when it seems that their objection is really > about the Unicode Consortium having included the word published in > section 13.5 of chapter 13 of the Unicode specification, and they > seem to angrily refute things which I have not said, I think that > it would be best for me not to post details of my research in this > forum. I don't recall seeing any ad hominem attacks. I do recall seeing a lot of criticism ("attacks," if you will) of some of your ideas based on their merit, none based on the fact that they are William Overington's ideas. Also, as I have tried to convey before, many of us lead relatively busy lives and receive a lot of e-mail, and don't always have time to read through a post of 2,000 words or more. When it gets that long, it's better to post it on your Web site and send us an announcement. In Section 13.5, my objection was to the word "promoted." It apparently gives the impression that characters can use the PUA as a stepping stone to "full Unicode status," when in fact all characters are considered for inclusion in Unicode without regard to PUA implementations. Someone else may have had a problem with the word "published." > There also seems to be the problem of the great tidal wave > that everybody is expected to be using the very latest equipment. I'm using a 166 MHz Pentium "classic" with 24 MB of RAM and Windows 95. So it's obvious you're not talking about me here. > My understanding was that this forum was a place to ask questions of end > users of the Unicode system. I have done that. In this thread I have asked > interesting scientific questions. And gotten back some answers you didn't want to hear, namely that the ideas don't fall within the intended scope of Unicode and have already been (or can easily be) solved using other technologies or mechanisms. > Ad hominem attacks have prevented those > questions being discussed properly, possibly because some people may be too > embarrassed to respond to the scientific questions when an atmosphere of ad > hominem attack prevails. My understanding is that academic freedom is about > being able to hold unpopular ideas without personal disadvantage. James Kass, for one, has responded positively to some of your inquiries. Academic freedom means everyone has a chance to listen to the ideas of others. Nobody has infringed upon your right to post your essays. (This is a moderated list, and Sarasvati could have withheld your postings if it were appropriate to do so, but it is not and she has not.) Academic freedom also means people have a right to object or criticize the ideas of others, or at least point out where the ideas are flawed. Ask anyone in the scientific or research community whether new ideas are always met with universal approval. > I feel > that the fact that I am trying to use the Unicode specification as it exists > rather than on some nudge nudge wink wink understanding of how some people > feel that it should be interpreted is at the root of the problem. See, I think it's the other way around. I just reread Section 13.5 and I don't see anything about the character-glyph model or other policies of Unicode being suspended for the PUA. The issue of not encoding additional ligatures isn't a secret; it's been published in several documents available on the Web. I have provided a pointer to one already; I can provide more if you like. > The potentially interesting question of whether an OpenType fount may be > programmed to produce a two colour display has not been discussed. Such a > discussion could have either established that it could be done, or that it > could not be done in which case perhaps some extension to OpenType could be > produced for the future which could have that facility. If so, how would > that facility best be produced? This is how progress is achieved. It is indeed a potentially interesting question. I'm not a font expert, so I have stayed out of that discussion. Note, however, that not all printers support color, so there would need to be an appropriate fallback mechanism for rendering the information that would have been displayed in a second color. -Doug Ewell Fullerton, California