Re: Hexadecimal never again
[EMAIL PROTECTED] wrote: Thanks, but not good enough. What guarantee do I have that other Unicode characters will not be added in the future which have the property Hex_Digit? One solution is to join the consortium and be able to vote against such a thing happening! If it is a concern you can still implement your algorithm to allow the hex digits to be separately or externally specifiable, perhaps using John's chart. (With perhaps a slight attendant security risk... ;-) ) From a practical standpoint, I think it is more likely that the base will change rather than the hex characters. After all, digits have been constant for a long time, but the base has changed. Initially it was binary, then it was octal, and now hex arithmetic is common. It seems more likely to me that we might switch to another base (32? 64?) as platforms expand, before we started adding redundant characters to hex arithmetic. Somewhere, someday, some wristwatch-sized, space-deprived display device manufacturer will be complaining that he doesn't have enough room on his device to show the hex codes for the combining sequence of unicode characters missing in his font, and so instead of hex, he wants to use base64 characters, but only if the characters are defined in the standard (Guess I am showing my age to be recalling flipping binary switches...) ;-) All your base are belong to us! tex
Re: UTC vs GMT (was [way OT] Beer measurement...)
On 19/08/2003 21:25, Jungshik Shin wrote: I have no idea whether that's the same conference, but in early 1970's it's also decided that the abbreviation 'GMT' would be deprecated and 'UTC' should be used in its place. ... And I thought from the subject line that the Unicode Technical Committee (as in UTC Agenda Item...) was supposed to have something against GMT. I trust not, Unicode doesn't need more enemies! -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
RE: UTC vs GMT (was [way OT] Beer measurement...)
On 19/08/2003 21:25, Jungshik Shin wrote: I have no idea whether that's the same conference, but in early 1970's it's also decided that the abbreviation 'GMT' would be deprecated and 'UTC' should be used in its place. ... And to add to confusion, the military also calls it Zulu time, as in 1230 GMT= 1230 UTC = 1230Z. Very confusing, especially if you've ever been to South Africa... Any one there knows that the Zulu Nation lives at GMT+2 hours! Clive ;-)} Clive P Hohberger, PhD Corporate VP, Technology Development Director of Patent Affairs Office: +1 847 793 2740 Cellular: +1 847 910 8794 FAX: +1 847 793 5573 E-mail: [EMAIL PROTECTED]
RE: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)
Mark Davis wrote: awful. At least with inches, feet, and miles, the number of feet per mile don't vary depending on which mile one is talking about! A Danish mile is 7 km, a Swedish mile (a fairly popular distance measure here) is 10 km, and an English mile is a mere 1.6 km (approx.). So yes, the number of feet per mile does vary depending on which mile one is talking about (even when considering that the length of a foot originally depended on who's foot was used to measure). ;-) (Sorry for being OT) /kent k PS Originally the Swedish mile was marginally longer than 10 km, but via nymil (new mile) or myriameter, the original term mile (mil) was adopted for the metric adapted distance.
Re: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)
On 20/08/2003 04:58, Kent Karlsson wrote: Mark Davis wrote: awful. At least with inches, feet, and miles, the number of feet per mile don't vary depending on which mile one is talking about! A Danish mile is 7 km, a Swedish mile (a fairly popular distance measure here) is 10 km, and an English mile is a mere 1.6 km (approx.). So yes, the number of feet per mile does vary depending on which mile one is talking about (even when considering that the length of a foot originally depended on who's foot was used to measure). ;-) (Sorry for being OT) /kent k PS Originally the Swedish mile was marginally longer than 10 km, but via nymil (new mile) or myriameter, the original term mile (mil) was adopted for the metric adapted distance. Well, a Roman mile was originally a thousand (double) paces, which depended on how long your legs were and how much of a hurry you were in. It was standardised as marginally shorter than the English mile. I guess English legs tended to be longer than Roman ones. But Swedish legs ... I know many Swedes are tall, but not that much taller than us! Your Swedish mile sounds more like what we call a league. From Websters 1913 edition, at http://www.hyperdictionary.com/dictionary/league: 1. A measure of length or distance, varying in different countries from about 2.4 to 4.6 English statute miles of 5.280 feet each, and used (as a land measure) chiefly on the continent of Europe, and in the Spanish parts of America. The marine league of England and the United States is equal to three marine, or geographical, miles of 6080 feet each. Note: The English land league is equal to three English statute miles. The Spanish and French leagues vary in each country according to usage and the kind of measurement to which they are applied. The Dutch and German leagues contain about four geographical miles, or about 4.6 English statute miles. Thank goodness that most of these measurements are obsolete! -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
[Still OT] RE: UTC vs GMT (was [way OT] Beer measurement...)
I have no idea whether that's the same conference, but in early 1970's it's also decided that the abbreviation 'GMT' would be deprecated and 'UTC' should be used in its place. ... There are two subtly different definitions of GMT, one which is synonymous with UTC and one which differs from it; at times by as nearly a second. Hence GMT is ambiguous. And to add to confusion, the military also calls it Zulu time, as in 1230 GMT= 1230 UTC = 1230Z. Very confusing, especially if you've ever been to South Africa... Any one there knows that the Zulu Nation lives at GMT+2 hours! The abbreviation of Z is used in ISO 8601 and standards, recommendations and specs derived from it, and also in RFC 822. Indeed the U.S. Military use 25 letters to designate time zones, with A through M skipping J to indicate timezones from +01:00 to +12:00 and N through Y to indicate timezones from -01:00 to -12:00. RFC 822 attempted to copy this but there was an error which resulted in them being used the wrong way around (so A Alpha time means -01:00 according to RFC 822 and +01:00 according to the military convention they attempted to copy). The resulting confusion made any attempt to use this scheme in an interoperable way impractical and hence all codes marked obsoleted in RFC 2822, with the advice that one should treat them as indicating +00:00 unless you have out-of-band information about how they are being used. Notably there is no confusion with Z as it means the same time zone whether treated according to the military convention, RFC 822, RFC 2822 or indeed ISO 8601.
RE: Hexadecimal never again
From a practical standpoint, I think it is more likely that the base will change rather than the hex characters. After all, digits have been constant for a long time, but the base has changed. Initially it was binary, then it was octal, and now hex arithmetic is common. No, first it was binary, then it was binary and now its binary. Different human-readable formats have been (and continue to be) used to represent this. It seems more likely to me that we might switch to another base (32? 64?) as platforms expand, before we started adding redundant characters to hex arithmetic. What human-readability advantages (the only reason we use hex) would base 32 or base 64 representations have over hex? They aren't matched by a nice number of bits for most systems; the reason for using hex rather than octal is that 2 hex digits can exactly represent the range of a octet (the most common size of bytes these days) and by extension of any word composed of an integral number of octets. The next base to have that quality is base 256, which would require us to ransack a few different alphabets and then maybe create a few symbols in order for us to represent it.
RE: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)
Peter Kirk wrote: [...] I guess English legs tended to be longer than Roman ones. Well, if by English you mean those Germanic barbarians who invaded Britannia, I guess that the British mile existed way before they set their feet on the island... _ Marco
Re: Hexadecimal never again
On 20/08/2003 06:45, Jon Hanna wrote: ... The next base to have that quality is base 256, which would require us to ransack a few different alphabets and then maybe create a few symbols in order for us to represent it. No, we could just use Ethiopic. Plenty of characters there. We could even put some logic in the system e.g. by use the vowel parts of the glyphs to indicate the lower three bits. I'm sure most people would learn quickly. And if we used Ethiopic letters to define Unicode symbols it might stop some people complaining that Unicode isn't African enough. ;-) -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
RE: [hebrew] Re: ZWJ/ZWNJ - Are they legal to use with combining marks?
I admit I haven't been able to catch up with the flood of messages on the Hebrew list... On 15/08/2003 07:57, Paul Nelson (TYPOGRAPHY) wrote: This brings us back to the earlier quesion of whether it is legitimate to use ZWJ or ZWNJ between combining marks It sure better be. This is done in Khmer for controlling register shift I noticed, somewhat to my surprise! combinations for exception words. I have seen nothing in Unicode that states that the ZWJ/ZWNJ can only be used with base type characters. Paul Well, I understood that ZWJ/ZWNJ followed by a combining mark gives a defective combining sequence, Yes, it does. I would suggest reclassifying ZWJ and ZWNJ as combining marks, i.e. Mn instead of Cf as general category. They would have to keep their combining class of 0, in order not to change any normal forms. As I've mentioned, I think class 0, with its special treatment is problematic, and hence ZWJ and ZWNJ should not be acknowledged just anywhere in any combining sequence. There is no problem to have them at the end of a combining sequence, that is what has to all intents and purposes been the case all the time, even as Cf *just after* a combining sequence. New cases, that are not acknowledged with ZWJ/ZWNJ as Cf, but should be with ZWJ/ZWNJ as Mn: just after a base character (even when followed by more combining characters) [this takes care of the Khmer cases], between two combining marks in a combining sequence that would be in canonical order *if* the ZWJ/ZWNJs were removed from the sequence (this may take care of some Hebrew ligation cases, I'm not sure). Other placements of ZWJ/ZWNJ should not be acknowledged, but should be deprecated. ZWJ/ZWNJ would remain default ignorable. /kent k
Re: Hexadecimal never again
Jon Hanna wrote: From a practical standpoint, I think it is more likely that the base will change rather than the hex characters. After all, digits have been constant for a long time, but the base has changed. Initially it was binary, then it was octal, and now hex arithmetic is common. No, first it was binary, then it was binary and now its binary. Different human-readable formats have been (and continue to be) used to represent this. It seems more likely to me that we might switch to another base (32? 64?) as platforms expand, before we started adding redundant characters to hex arithmetic. What human-readability advantages (the only reason we use hex) would base 32 or base 64 representations have over hex? They aren't matched by a nice number of bits for most systems; Only density. You are right 256 would be a more convenient base. Fortunately with Unicode ransacking alphabets is easy! Jon I was mostly being tongue in cheek and contrasting that relative to needing new hex digits, a base change was more likely. However, I wasn't saying that a base change is likely. tex the reason for using hex rather than octal is that 2 hex digits can exactly represent the range of a octet (the most common size of bytes these days) and by extension of any word composed of an integral number of octets. The next base to have that quality is base 256, which would require us to ransack a few different alphabets and then maybe create a few symbols in order for us to represent it. -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
RE: Hexadecimal never again
Jon I was mostly being tongue in cheek and contrasting that relative to needing new hex digits, a base change was more likely. However, I wasn't saying that a base change is likely. And I was being tongue in cheek (and ignorant of Ethiopian script) in suggesting the use of base 256. However we both had serious points and my serious point disagrees with yours. Hexadecimal is used for good reasons; and while I'm not convinced about Jill's point I'm not convinced otherwise either. What do hackers with non Latin-based languages use for hex anyway?
Re: RE: Hexadecimal never again
What do hackers with non Latin-based languages use for hex anyway? They use 0-9, A-F, and a-f. Hex is used mostly by programmers, mostly for computing, and mostly in programming languages that have the digits and Latin letters built-in, and that's what compilers expect to see. Hex doesn't have an independent existence out in non-computing culture for, e.g., signs in the market place or monetary values. Rick
Proposed Draft UTR #31 - Syntax Characters
This notice is relevant to anyone dealing with programming languages, query specifications, regular expressions, scripting languages, and similar domains. The Proposed Draft UTR #31: Identifier and Pattern Syntax will be discussed at the UTC meeting next week. Part of that document (Section 4) is a proposal for two new immutable properties, Pattern_White_Space and Pattern_Syntax. As immutable properties, these would not ever change once they are introduced into the standard, so it is important to get feedback on their contents beforehand. The UTC will not be making a final determination on these properties at this meeting, but it is important that any feedback on them is supplied as early in the process as possible so that it can be considered thoroughly. The draft is found at http://www.unicode.org/reports/tr31/ and feedback can be submitted as described there. Regards, Rick McGowan Unicode, Inc.
Re: Hexadecimal never again
On Wednesday, August 20, 2003, at 07:03 pm, Rick McGowan wrote: What do hackers with non Latin-based languages use for hex anyway? They use 0-9, A-F, and a-f. which'll be whatever characters happen to be used to represent those sections of the character set on their machines: 0x30 - 0x39, 0x41 - 0x46 and 0x61 - 0x66. Hex is used mostly by programmers, mostly for computing, and mostly in programming languages that have the digits and Latin letters built-in, and that's what compilers expect to see. Hex doesn't have an independent existence out in non-computing culture for, e.g., signs in the market place or monetary values. Rick
Re: Proposed Draft UTR #31 - Syntax Characters
On 20/08/2003 11:23, Rick McGowan wrote: This notice is relevant to anyone dealing with programming languages, query specifications, regular expressions, scripting languages, and similar domains. The Proposed Draft UTR #31: Identifier and Pattern Syntax will be discussed at the UTC meeting next week. Part of that document (Section 4) is a proposal for two new immutable properties, Pattern_White_Space and Pattern_Syntax. As immutable properties, these would not ever change once they are introduced into the standard, so it is important to get feedback on their contents beforehand. The UTC will not be making a final determination on these properties at this meeting, but it is important that any feedback on them is supplied as early in the process as possible so that it can be considered thoroughly. The draft is found at http://www.unicode.org/reports/tr31/ and feedback can be submitted as described there. Regards, Rick McGowan Unicode, Inc. I'm a little concerned at the implications of counting zero width characters like LRM and RLM as white space. They can easily find their way unnoticed into the middle of patterns e.g. when copying from a text which has added these characters to ensure correct directionality. I wonder if it might be better to add a new category of ignored characters, such that one of these found on its own doesn't count as a separator but it is ignored i.e. treated as part of the white space if found adjacent to white space. Of course the details of this need a little more thought, e.g. does one of these actually count as part of the pattern, but I hope you see what I am getting at. -- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/
Re: Hexadecimal never again
Ben Dougall wrote about what is used for hex characters: which'll be whatever characters happen to be used to represent those sections of the character set on their machines: 0x30 - 0x39, 0x41 - 0x46 and 0x61 - 0x66. Not in EBCDIC (and other older character sets) they aren't. There are a lot of mainframe systems still using EBCDIC encodings. Jim Allan
Mail Message Size
Effective immediately and until further notice, Unicode.ORG will not accept any e-mail larger than 40,000 bytes. This is to counteract a growing storm of virus-laden e-mails. Regards, -- Sarasvati
Re: Hexadecimal never again
on 2003-08-20 11:03 Rick McGowan wrote: Hex doesn't have an independent existence out in non-computing culture for, e.g., signs in the market place or monetary values. Caviar, 10kg, FEED -- Curtis Clark http://www.csupomona.edu/~jcclark/ Mockingbird Font Works http://www.mockfont.com/