Re: Hexadecimal never again

2003-08-20 Thread Tex Texin
[EMAIL PROTECTED] wrote:
 
 Thanks, but not good enough.
 
 What guarantee do I have that other Unicode characters will not be added in
 the future which have the property Hex_Digit?

One solution is to join the consortium and be able to vote against such a
thing happening!

If it is a concern you can still implement your algorithm to allow the hex
digits to be separately or externally specifiable, perhaps using John's chart.
(With perhaps a slight attendant security risk... ;-) )

From a practical standpoint, I think it is more likely that the base will
change rather than the hex characters.
After all, digits have been constant for a long time, but the base has
changed. Initially it was binary, then it was octal, and now hex arithmetic is
common. It seems more likely to me that we might switch to another base (32?
64?) as platforms expand, before we started adding redundant characters to hex
arithmetic. Somewhere, someday, some wristwatch-sized, space-deprived display
device manufacturer will be complaining that he doesn't have enough room on
his device to show the hex codes for the combining sequence of unicode
characters missing in his font, and so instead of hex, he wants to use base64
characters, but only if the characters are defined in the standard


(Guess I am showing my age to be recalling flipping binary switches...) ;-)

All your base are belong to us!
tex



Re: UTC vs GMT (was [way OT] Beer measurement...)

2003-08-20 Thread Peter Kirk
On 19/08/2003 21:25, Jungshik Shin wrote:

 I have no idea whether that's the same conference, but in early 1970's
it's also decided that the abbreviation 'GMT' would be deprecated
and 'UTC' should be used in its place. ...
And I thought from the subject line that the Unicode Technical Committee 
(as in UTC Agenda Item...) was supposed to have something against GMT. 
I trust not, Unicode doesn't need more enemies!

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




RE: UTC vs GMT (was [way OT] Beer measurement...)

2003-08-20 Thread Hohberger, Clive


On 19/08/2003 21:25, Jungshik Shin wrote:

  I have no idea whether that's the same conference, but in early 1970's
it's also decided that the abbreviation 'GMT' would be deprecated
and 'UTC' should be used in its place. ...


And to add to confusion, the military also calls it Zulu time, as in 
1230 GMT= 1230 UTC = 1230Z.

Very confusing, especially if you've ever been to South Africa... 
Any one there knows that the Zulu Nation lives at GMT+2 hours!

Clive ;-)}


Clive P Hohberger, PhD
Corporate VP, Technology Development
Director of Patent Affairs
Office:   +1 847 793 2740
Cellular: +1 847 910 8794
FAX:  +1 847 793 5573
E-mail:   [EMAIL PROTECTED]





RE: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)

2003-08-20 Thread Kent Karlsson

Mark Davis wrote:
 awful. At least with inches, feet, and miles, the number of 
 feet per mile don't
 vary depending on which mile one is talking about!

A Danish mile is 7 km, a Swedish mile (a fairly popular
distance measure here) is 10 km, and an English mile is
a mere 1.6 km (approx.). So yes, the number of feet per
mile does vary depending on which mile one is talking
about (even when considering that the length of a foot
originally depended on who's foot was used to measure). ;-)

(Sorry for being OT)
/kent k

PS
Originally the Swedish mile was marginally longer than 10 km,
but via nymil (new mile) or myriameter, the original term
mile (mil) was adopted for the metric adapted distance.




Re: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)

2003-08-20 Thread Peter Kirk
On 20/08/2003 04:58, Kent Karlsson wrote:

Mark Davis wrote:
 

awful. At least with inches, feet, and miles, the number of 
feet per mile don't
vary depending on which mile one is talking about!
   

A Danish mile is 7 km, a Swedish mile (a fairly popular
distance measure here) is 10 km, and an English mile is
a mere 1.6 km (approx.). So yes, the number of feet per
mile does vary depending on which mile one is talking
about (even when considering that the length of a foot
originally depended on who's foot was used to measure). ;-)
(Sorry for being OT)
/kent k
PS
Originally the Swedish mile was marginally longer than 10 km,
but via nymil (new mile) or myriameter, the original term
mile (mil) was adopted for the metric adapted distance.




 

Well, a Roman mile was originally a thousand (double) paces, which 
depended on how long your legs were and how much of a hurry you were in. 
It was standardised as marginally shorter than the English mile. I guess 
English legs tended to be longer than Roman ones. But Swedish legs ... I 
know many Swedes are tall, but not that much taller than us!

Your Swedish mile sounds  more like what we call a league. From Websters 
1913 edition, at http://www.hyperdictionary.com/dictionary/league:

1. A measure of length or distance, varying in different  countries 
from about 2.4 to 4.6 English statute miles of 5.280 feet each, and 
used (as a land measure) chiefly on the continent of Europe, and in 
the Spanish parts of America. The marine league of England and the 
United States is equal to three marine, or geographical, miles of 6080 
feet each.

 Note: The English land league is equal to three English statute 
miles. The Spanish and French leagues vary in each country according 
to usage and the kind of measurement to which they are applied. The 
Dutch and German leagues contain about four geographical miles, or 
about 4.6 English statute miles.

Thank goodness that most of these measurements are obsolete!

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




[Still OT] RE: UTC vs GMT (was [way OT] Beer measurement...)

2003-08-20 Thread Jon Hanna
   I have no idea whether that's the same conference, but in early 1970's
 it's also decided that the abbreviation 'GMT' would be deprecated
 and 'UTC' should be used in its place. ...

There are two subtly different definitions of GMT, one which is synonymous with UTC 
and one which differs from it; at times by as nearly a second. Hence GMT is ambiguous.

 And to add to confusion, the military also calls it Zulu time, as in 
 1230 GMT= 1230 UTC = 1230Z.
 
 Very confusing, especially if you've ever been to South Africa... 
 Any one there knows that the Zulu Nation lives at GMT+2 hours!

The abbreviation of Z is used in ISO 8601 and standards, recommendations and specs 
derived from it, and also in RFC 822.

Indeed the U.S. Military use 25 letters to designate time zones, with A through M 
skipping J to indicate timezones from +01:00 to +12:00 and N through Y to indicate 
timezones from -01:00 to -12:00. RFC 822 attempted to copy this but there was an error 
which resulted in them being used the wrong way around (so A Alpha time means -01:00 
according to RFC 822 and +01:00 according to the military convention they attempted to 
copy). The resulting confusion made any attempt to use this scheme in an interoperable 
way impractical and hence all codes marked obsoleted in RFC 2822, with the advice that 
one should treat them as indicating +00:00 unless you have out-of-band information 
about how they are being used. Notably there is no confusion with Z as it means the 
same time zone whether treated according to the military convention, RFC 822, RFC 2822 
or indeed ISO 8601.




RE: Hexadecimal never again

2003-08-20 Thread Jon Hanna
 From a practical standpoint, I think it is more likely that the base will
 change rather than the hex characters.
 After all, digits have been constant for a long time, but the base has
 changed. Initially it was binary, then it was octal, and now hex
 arithmetic is
 common.

No, first it was binary, then it was binary and now its binary. Different
human-readable formats have been (and continue to be) used to represent
this.

 It seems more likely to me that we might switch to
 another base (32?
 64?) as platforms expand, before we started adding redundant
 characters to hex
 arithmetic.

What human-readability advantages (the only reason we use hex) would base 32
or base 64 representations have over hex? They aren't matched by a nice
number of bits for most systems; the reason for using hex rather than octal
is that 2 hex digits can exactly represent the range of a octet (the most
common size of bytes these days) and by extension of any word composed of an
integral number of octets. The next base to have that quality is base 256,
which would require us to ransack a few different alphabets and then maybe
create a few symbols in order for us to represent it.




RE: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)

2003-08-20 Thread Marco Cimarosti
Peter Kirk wrote:
 [...] I guess English legs tended to be longer than Roman
 ones.

Well, if by English you mean those Germanic barbarians who invaded
Britannia, I guess that the British mile existed way before they set their
feet on the island...

_ Marco



Re: Hexadecimal never again

2003-08-20 Thread Peter Kirk
On 20/08/2003 06:45, Jon Hanna wrote:

... The next base to have that quality is base 256,
which would require us to ransack a few different alphabets and then maybe
create a few symbols in order for us to represent it.
 

No, we could just use Ethiopic. Plenty of characters there. We could 
even put some logic in the system e.g. by use the vowel parts of  the 
glyphs to indicate the lower three bits. I'm sure most people would 
learn quickly. And if we used Ethiopic letters to define Unicode symbols 
it might stop some people complaining that Unicode isn't African enough. 
;-)

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




RE: [hebrew] Re: ZWJ/ZWNJ - Are they legal to use with combining marks?

2003-08-20 Thread Kent Karlsson
I admit I haven't been able to catch up with the flood of messages on
the
Hebrew list...

 On 15/08/2003 07:57, Paul Nelson (TYPOGRAPHY) wrote:
 
 This brings us back to the earlier quesion of whether it is 
 legitimate to use ZWJ or ZWNJ between combining marks
 
 
 
 It sure better be. This is done in Khmer for controlling register
shift

I noticed, somewhat to my surprise!

 combinations for exception words. I have seen nothing in Unicode that
 states that the ZWJ/ZWNJ can only be used with base type characters.
 
 Paul
 
 
   
 
 Well, I understood that ZWJ/ZWNJ followed by a combining mark gives a 
 defective combining sequence, 

Yes, it does.  I would suggest reclassifying ZWJ and ZWNJ as combining
marks,
i.e. Mn instead of Cf as general category. They would have to keep
their
combining class of 0, in order not to change any normal forms. As I've
mentioned, I think class 0, with its special treatment is problematic,
and
hence ZWJ and ZWNJ should not be acknowledged just anywhere in any
combining sequence. There is no problem to have them at the end of a
combining sequence, that is what has to all intents and purposes been
the case all the time, even as Cf *just after* a combining sequence.
New
cases, that are not acknowledged with ZWJ/ZWNJ as Cf, but should be
with ZWJ/ZWNJ as Mn: just after a base character (even when followed by
more combining characters) [this takes care of the Khmer cases], between
two combining marks in a combining sequence that would be in canonical
order *if* the ZWJ/ZWNJs were removed from the sequence (this may take
care of some Hebrew ligation cases, I'm not sure). Other placements of
ZWJ/ZWNJ should not be acknowledged, but should be deprecated.
ZWJ/ZWNJ would remain default ignorable.

/kent k




Re: Hexadecimal never again

2003-08-20 Thread Tex Texin


Jon Hanna wrote:
 
  From a practical standpoint, I think it is more likely that the base will
  change rather than the hex characters.
  After all, digits have been constant for a long time, but the base has
  changed. Initially it was binary, then it was octal, and now hex
  arithmetic is
  common.
 
 No, first it was binary, then it was binary and now its binary. Different
 human-readable formats have been (and continue to be) used to represent
 this.
 
  It seems more likely to me that we might switch to
  another base (32?
  64?) as platforms expand, before we started adding redundant
  characters to hex
  arithmetic.
 
 What human-readability advantages (the only reason we use hex) would base 32
 or base 64 representations have over hex? They aren't matched by a nice
 number of bits for most systems; 

Only density. You are right 256 would be a more convenient base.
Fortunately with Unicode ransacking alphabets is easy!

Jon I was mostly being tongue in cheek and contrasting that relative to
needing new hex digits, a base change was more likely. However, I wasn't
saying that a base change is likely.
tex

the reason for using hex rather than octal
 is that 2 hex digits can exactly represent the range of a octet (the most
 common size of bytes these days) and by extension of any word composed of an
 integral number of octets. The next base to have that quality is base 256,
 which would require us to ransack a few different alphabets and then maybe
 create a few symbols in order for us to represent it.

-- 
-
Tex Texin   cell: +1 781 789 1898   mailto:[EMAIL PROTECTED]
Xen Master  http://www.i18nGuy.com
 
XenCrafthttp://www.XenCraft.com
Making e-Business Work Around the World
-



RE: Hexadecimal never again

2003-08-20 Thread Jon Hanna
 Jon I was mostly being tongue in cheek and contrasting that relative to
 needing new hex digits, a base change was more likely. However, I wasn't
 saying that a base change is likely.

And I was being tongue in cheek (and ignorant of Ethiopian script) in
suggesting the use of base 256. However we both had serious points and my
serious point disagrees with yours.

Hexadecimal is used for good reasons; and while I'm not convinced about
Jill's point I'm not convinced otherwise either. What do hackers with non
Latin-based languages use for hex anyway?




Re: RE: Hexadecimal never again

2003-08-20 Thread Rick McGowan
 What do hackers with non
 Latin-based languages use for hex anyway?

They use 0-9, A-F, and a-f.

Hex is used mostly by programmers, mostly for computing, and mostly in  
programming languages that have the digits and Latin letters built-in, and  
that's what compilers expect to see. Hex doesn't have an independent  
existence out in non-computing culture for, e.g., signs in the market place  
or monetary values.

Rick



Proposed Draft UTR #31 - Syntax Characters

2003-08-20 Thread Rick McGowan
This notice is relevant to anyone dealing with programming languages, query
specifications, regular expressions, scripting languages, and similar domains.

The Proposed Draft UTR #31: Identifier and Pattern Syntax will be discussed at
the UTC meeting next week. Part of that document (Section 4) is a proposal for
two new immutable properties, Pattern_White_Space and Pattern_Syntax. As
immutable properties, these would not ever change once they are introduced into
the standard, so it is important to get feedback on their contents beforehand.

The UTC will not be making a final determination on these properties at this
meeting, but it is important that any feedback on them is supplied as early in
the process as possible so that it can be considered thoroughly. The draft is
found at http://www.unicode.org/reports/tr31/ and feedback can be submitted as
described there.

Regards,
Rick McGowan
Unicode, Inc.



Re: Hexadecimal never again

2003-08-20 Thread Ben Dougall
On Wednesday, August 20, 2003, at 07:03  pm, Rick McGowan wrote:

What do hackers with non
Latin-based languages use for hex anyway?
They use 0-9, A-F, and a-f.
which'll be whatever characters happen to be used to represent those 
sections of the character set on their machines: 0x30 - 0x39, 0x41 - 
0x46 and 0x61 - 0x66.



Hex is used mostly by programmers, mostly for computing, and mostly in
programming languages that have the digits and Latin letters built-in, 
and
that's what compilers expect to see. Hex doesn't have an independent
existence out in non-computing culture for, e.g., signs in the market 
place
or monetary values.

	Rick





Re: Proposed Draft UTR #31 - Syntax Characters

2003-08-20 Thread Peter Kirk
On 20/08/2003 11:23, Rick McGowan wrote:

This notice is relevant to anyone dealing with programming languages, query
specifications, regular expressions, scripting languages, and similar domains.
The Proposed Draft UTR #31: Identifier and Pattern Syntax will be discussed at
the UTC meeting next week. Part of that document (Section 4) is a proposal for
two new immutable properties, Pattern_White_Space and Pattern_Syntax. As
immutable properties, these would not ever change once they are introduced into
the standard, so it is important to get feedback on their contents beforehand.
The UTC will not be making a final determination on these properties at this
meeting, but it is important that any feedback on them is supplied as early in
the process as possible so that it can be considered thoroughly. The draft is
found at http://www.unicode.org/reports/tr31/ and feedback can be submitted as
described there.
Regards,
Rick McGowan
Unicode, Inc.


 

I'm a little concerned at the implications of counting zero width 
characters like LRM and RLM as white space. They can easily find their 
way unnoticed into the middle of patterns e.g. when copying from a text 
which has added these characters to ensure correct directionality. I 
wonder if it might be better to add a new category of ignored 
characters, such that one of these found on its own doesn't count as a 
separator but it is ignored i.e. treated as part of the white space if 
found adjacent to white space. Of course the details of this need a 
little more thought, e.g. does one of these actually  count as part of 
the pattern, but I hope you see what I am getting at.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Re: Hexadecimal never again

2003-08-20 Thread Jim Allan
Ben Dougall wrote about what is used for hex characters:

which'll be whatever characters happen to be used to represent those
sections of the character set on their machines: 0x30 - 0x39, 0x41 -
0x46 and 0x61 - 0x66. 
Not in EBCDIC (and other older character sets) they aren't. There are a 
lot of mainframe systems still using EBCDIC encodings.

Jim Allan




Mail Message Size

2003-08-20 Thread Sarasvati
Effective immediately and until further notice, Unicode.ORG
will not accept any e-mail larger than 40,000 bytes. This is
to counteract a growing storm of virus-laden e-mails.
Regards,
-- Sarasvati




Re: Hexadecimal never again

2003-08-20 Thread Curtis Clark
on 2003-08-20 11:03 Rick McGowan wrote:

Hex doesn't have an independent  
existence out in non-computing culture for, e.g., signs in the market place  
or monetary values.
Caviar, 10kg, FEED

--
Curtis Clark  http://www.csupomona.edu/~jcclark/
Mockingbird Font Works  http://www.mockfont.com/