RE: Smiles, faces, etc

2002-02-16 Thread Christopher J Fynn

Patrick Andries wrote:

 << I wonder sometimes if the largest obstacle in the encoding
  of smileys as characters is not the "universal" normalization
  process itself. Had they been invented a few decades ago and 
  encoded "locally" in some kind of popular font/encoding (the 
  Netscape font for example that could have the iconic :-) :-( 
  ;-) :-P :-D :-[ :-\ found in Messenger) they might have been 
  included in Unicode without much further ado. I personnaly 
  see them as punctuation mark (albeit not of "metaprosodic" 
  nature).>>

Patrick, 

There are whole scripts for contemporary languages which
are as yet unencoded in the Unicode Standard and some 
punctuation and other chararacters missing from already 
encoded scripts. IMO attention needs to be paid to making 
sure all these characters are encoded before we start 
bothering with Klingon, smileys, & etc. 

All the  "smiley" characters you need could perhaps be 
encoded by using one of the existing two plus one of the 
variant selector characters. If you really think they are
some sort of important modern day "punctuation" then 
document it, make a formal proposal and follow it through. 

- Chris

--
Christopher J Fynn
DDC Dzongkha Computing Project
PO Box 122, Thimphu, Bhutan

<[EMAIL PROTECTED]>
<[EMAIL PROTECTED]>





RE: Unicode and Security

2002-02-07 Thread Christopher J Fynn

John Hudson wrote:

> I can make an OpenType font for that uses contextual substitution to
> replace the phrase 'The licensee also agrees to pay the type designer
> $10,000 every time he uses the lowercase e' with a series of invisible
> non-spacing glyphs. Of course, the backing store will contain my
> dastardly
> hidden clause and that is the text the unwitting victim will
> electronically
> sign. Hahahaha, he laughed maniacally!

How about a font that displays any number following a dollar sign as only
10% of the actual value in the backing text?

As John pointed out, this sort of thing isn't a Unicode problem. One could
just as easily employ the same kind of hidden rendering rules with ASCII
text. The only way to prevent this sort of fraud altogether would be to
throw out complex script rendering and encode glyphs not characters... I
don't think anyone seriously wants to go back down that route and anyway it
would probably take decades and a huge effort to make such a standard
properly covering all the scripts already in Unicode - and there would
undoubtedly still be other problems.

There are plenty of ways paper documents can be altered, added to or just
plain forged by someone intent on fraud - some of them extremely difficult
to detect. I don't know, but it's probably safest to assume that the
situation is similar with electronic documents - whatever security systems
are in place. That's one reason why you should always keep a duplicate copy
of any contract you sign - whether its an electronic document you digitally
sign or a paper document you sign with a pen.

- Chris





RE: Unicode and Security

2002-02-07 Thread Christopher J Fynn


Gaspar Sinai wrote:
> I am thinking about electronically signed Unicode text documents
> that are rendered correctly or believeed to be rendered correctly,
> still they look different, seem to contain additional or do not
> seem to contain some text when viewed with different viewers due
> to some ambiguities inherent in the standard.

This sounds like a rendering (application) issue not a character encoding
(unicode) issue. If the applicaton or operating environment doesn't properly
support complex script rendering (and / or if the client doesn't have the
right fonts installed) then text in complex scripts might be rendered
incorrectly - or not at all. Chances are such text would either be
nonsensical, look like gobbledegook, or display as string of empty boxes
indicating missing glyphs. Would you sign something like that?

Can you give an example of some text or document a person might be fooled
into signing that would mean one thing if rendered correctly and something
entirely different when rendered incorrectly?

- Chris






RE: Devanagari

2002-01-21 Thread Christopher J Fynn

Aman

Here in Bhutan the Internet connection is still much worse than in most
places I've visited in India & Nepal (and the cost per minute is several
times higher) - believe me even then UTF-8 (or UTF-16) encoded pages do not
display noticeably slower than ASCII, ISCII or 8-bit font encoded pages -
and I don't need to download any special plug-ins or fonts.

- Chris

--
Christopher J Fynn
Thimphu, Bhutan

<[EMAIL PROTECTED]>
<[EMAIL PROTECTED]>


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of Aman Chawla
> Sent: 21 January 2002 10:57
> To: James Kass; Unicode
> Subject: Re: Devanagari
>
>
> - Original Message -
> From: "James Kass" <[EMAIL PROTECTED]>
> To: "Aman Chawla" <[EMAIL PROTECTED]>; "Unicode"
> <[EMAIL PROTECTED]>
> Sent: Monday, January 21, 2002 12:46 AM
> Subject: Re: Devanagari
>
>
> > 25% may not be 300%, but it isn't insignificant.  As you note, if the
> > mark-up were removed from both of those files, the percentage of
> > increase would be slightly higher.  But, as connection speeds continue
> > to improve, these differences are becoming almost minuscule.
>
> With regards to South Asia, where the most widely used modems are
> approx. 14
> kbps, maybe some 36 kbps and rarely 56 kbps, where broadband/DSL is mostly
> unheard of, efficiency in data transmission is of paramount importance...
> how can we convince the south asian user to create websites in an encoding
> that would make his client's 14 kbps modem as effective (rather,
> ineffective) as a 4.6 kbps modem?
>





RE: Planning a "Unicode Only" Week

2001-11-28 Thread Christopher J Fynn

[EMAIL PROTECTED]

> I think maybe that encoding (on the Internet) does not much 
> matter. As long as my browser knows that it is looking at 
> Unicode, it knows which, say, SJIS, character to look up in the 
> font to display. Must have table lookup or something.


Encoding on the Internet *does* matter - particularly for those scripts 
which don't have any other standard encoding recognised by browsers. People tend to 
use a variety of non-standard font based encodings for these scripts.   

- Chris




RE: Private Use Area - Building Combining Classes

2001-10-30 Thread Christopher J Fynn

This question, related to VOLT and OpenType, seems to belong on the MS VOLT users 
mailng list <[EMAIL PROTECTED]> rather than on the 
Unicode list. 
 
> Will the OS (Windows XP)- Uniscribe DLL recognise the rules given in the 
> PRIVATE USE AREA?

I think the answer to this is: NO

- Chris

Anbarasu R [mailto:[EMAIL PROTECTED]] wrote:

> Subject: Private Use Area - Building Combining Classes 
> 
> I had a problem in Glyph Substitution in using characters created in
> Private Use Area.
> 
> I tried the following..in MS-VOLT
> U+E000 + U+E001 ->  KAAA_Glyph (Glyph symbol)

> under  feature i have added the above substitution table. But this
> is  not working. Why?
 
> For that What I have to do?
 
> Will the OS (Windows XP)- Uniscribe DLL recognise the rules given in the 
> PRIVATE USE AREA?
> 
> Kannan.





RE: Windows/Office XP question

2001-10-18 Thread Christopher J Fynn



 Mark Davis wrote:

> One feature that
> some systems have is composite fonts, where the "font" is actually a table
> of subfonts in some order (perhaps with specific ranges assigned to each).
> That way, someone can have the advantage of specifying a single font name,
> and get a full repertoire, without requiring a monster font. Of course,
> there may be little uniformity of style across scripts, or in mixtures of
> symbols, but at least you can get legible characters instead of boxes.
> 
> Are there any plans to do something like that in Windows?
> 
> Mark
> —

On some level at least this already seems to be implemented in Windows with system / 
GUI fonts. e.g. in Win 2K Unicode file-names etc are displayed in the proper script in 
Windows Explorer if the system font for that script is installed. There are seperate 
system / GUI fonts for each script, rather than one huge font. 

A problem with implementing something which allows you to specify a single font name 
and getttng a full repertoire is: Which font in script x matches font  in script 
y? If I specify "Baskerville" for Latin text and that text contains a run of Arabic 
characters how does the system know which Arabic script text best matches Baskerville? 
Sure you could have a lookup table - but imagine getting users to maintain such a 
table with all the fonts some people accumulate these days. Font matching systems like 
Panose which might be used to automate this kind of thing seem to deal only with the 
characteristics of Latin and closely related scripts.  

- Chris Fynn  

--
Christopher J Fynn
DDC Dzongkha Computing Project
PO Box 122, Thimphu, Bhutan

<[EMAIL PROTECTED]>
<[EMAIL PROTECTED]>

 




RE: [OT] What happened to the OpenType list?

2001-10-08 Thread Christopher J Fynn


> Subject: Re: [OT] What happened to the OpenType list?
> 
> 
> The OpenType list is still active, although there has not been much 
> discussion for the past couple of weeks. See the Microsoft Typography 
> website -- www.microsoft.com/typography -- for subscription information.
> 
> John Hudson


Anyone can also go to http://www.topica.com/lists/opentype
to read messages sent to that list.  





RE: Why Arabic shaping?

2001-08-13 Thread Christopher J Fynn




Marco Cimarosti wrote:
> So a devil's advocate may ask: if the Arabic shaping forms of 
> Kaaf have been
> unified in the same code point, then why Latin uppercase and lowercase K
> haven't been unified as well? And, conversely, if Latin case variant have
> been assigned to different code points, why not Arabic shape variants?

Isn't it that Arabic shape variants can be determined from context while Latin case 
variants cannot be? (Sometimes headings etc. in Latin script are written in all caps.)





RE: some kind of virus?

2001-07-24 Thread Christopher J Fynn

Besides several mails containing an attachment with this W32.Sircam.Worm@mm  I've been 
getting an extraordinary number of mails with an attachment containing the 
TROJ_SIRCAM.A virus during the past 24 hours.

- Chris






RE: Support for Urdu & Sindhi

2001-07-17 Thread Christopher J Fynn


Mr. Liwal, 

Is your software (& fonts) Unicode based??? or does it use some other character 
encoding??? I couldn't find anything about this on your website (mind you the "Site 
search" link to http://www.liwal.net/cgi-bin/search.cgi at the top of your page 
doesn't work). I ask this since Mr. Sajjad wrote to the Unicode list and so presumably 
he is looking for a Unicode based solution.

 - Chris


N.R.Liwal wrote:

<<
Dear Mr. Sajjad;
 
We have Asiasoft Urdu Support for Windows 95, 98, ME and 2K which 
is an Add-On to Microsoft Windows, it adds Urdu processing capabilities
to Microsoft Windows and make Microsoft Office, 95, 97 and 2000's
Word, Excel, PowerPoint, Outlook express, Corel Draw and 100 of other 
programs Urdu.

Once you install Asiasoft Urdu support for Windows,
you can process Urdu, Arabic, English and many other
Roman and Non-roman languages together in Word as well as 
in all other office applications.

All your application will Just run fine as they were  running on Normal Windows.
 
for further information you may visit,  www.liwal.net/asiasoft or
www.liwal.com/asiasoft there you see more information and screenshots of our
software.

If you require further information, please do not hesitate
to contact us at Tel +92-91-844974 or +92-91-40706
or email us at [EMAIL PROTECTED]

Regards
 
N.R.Liwal
Asiasoft

>>




RE: Is there Unicode mail out there?

2001-07-14 Thread Christopher J Fynn


Mark Davies wrote:

<< 
Take a look at the XML standard.

Mark
>>

The thread was discussing HTML. Are there any restrictions on numeric character 
references in the *HTML* standard?

- Chris


 




RE: Is there Unicode mail out there?

2001-07-14 Thread Christopher J Fynn

Gaute B Strokkenes wrote:

<< ...
That's the only benefit that Unicode and UTF-8 will bring to email:
the ability to mix and match characters from all scripts of all sizes
and shapes in a single message.  OTOH, for those of us who need this
it's a big advantage.
>>

There are also a number of scripts which don't have any registered 
encoding or code-page except Unicode / ISO-10646 - for users of those
scripts, whether or not they want to mix characters from other 
scripts, Unicode / UTF-8 is the only real choice (unless they want to 
use some non-standard font based encoding).

However, since many of these scripts are also complex scripts, 
clients need to be able to render them properly to be of much use
with these scripts. 

- Chris




Re: halp me!!!!

2000-09-28 Thread Christopher J. Fynn

 "Karambir Rohilla" <[EMAIL PROTECTED]> wrote:

> hi

> wath is maping of unicode font in indian language?

> regard

In Windows, the recommended way to map from Unicode characters to the glyphs
in an Indic script font is to use OpenType tables in the fontfile. So, if you
want to create "Unicode" fonts for Indic scripts to run under Microsoft
Windows then you might start by reading: :

"Creating and supporting OpenType fonts for Indic scripts" at:
http://www.microsoft.com/typography/otspec/indicot/default.htm

and

"Converting a Devanagari font to Unicode / OTL" at:
http://www.microsoft.com/typography/developers/volt/devanagari.htm

You should  also read the OpenType specification:
http://www.microsoft.com/typography/otspec/otsp125.zip

After reading these documents you will probably have a lot of questions, so
you will probably also want to join the OpenType discussion forum (see:
http://www.microsoft.com/typography/otspec/otlist.htm ) where you may be able
to get these questions answered.

-Chris




Re: Unicode on a website: ? Devanagari

2000-09-23 Thread Christopher J. Fynn


Anyone know of any Devanagari documents (Sanskrit, Hindi, Nepali) on the Web
using UTF-8 (other than the pages at
http://titus.uni-frankfurt.de/unicode/samples/rvbeispx.htm ) - especially any
using Dynamic fonts?

I am not interested in Devanagri sites using font based encodings.

- Chris




Astral planes

2000-09-12 Thread Christopher J. Fynn

I still think one plane should be delegated entirely to  Mr Everson and he
should be left to get on with populating it to his heart's content :-) This
could save everybody concerned lots of time and expense.

- Chris

- Original Message -
From: "Kenneth Whistler" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Tuesday, September 12, 2000 11:43 PM
Subject: Re: RXp. : Re: surrogate terminology


>
> BMP:  real characters
> Plane 1:  complex characters
> Plane 2:  irrational characters
> Plane 14: imaginary characters
>
> --Ken




Re: the Ethnologue

2000-09-12 Thread Christopher J. Fynn


> Can anyone point me to an existing list of languages that is more
> comprehensive and better researched than the Ethnologue?  
> If there is no such list, then we don't need to consider any 
> alternatives, right?

I'm not qualified to judge the merits of one list over another
but there certaily are other comprehensive and well researched 
lists e.g. the Linguasphere Register of the World's Languages 
and Speech Communities see:  http://www.linguasphere.org/

Unfortunately their list is not available online, you have to buy 
the book - a bit like ISO/IEC 10646 and many other standards 
:-) 

I do know that the way the compilers of the Linguasphere have 
classified languages and dialects is different than the way the
compilers of the Ethnolouge have - though I'm sure both could
give you well reasoned arguments why their scheme is better
or more useful than the other. 

- Chris 




Re: (iso639.184) Plane 14 redux (was: Same language, two locales)

2000-09-12 Thread Christopher J. Fynn


In principle I think making the set of ISO 639 language tags more
comprehensive is a good idea.

However there are a couple of concerns I have:

I think a clear distinction may need to be made between those languages which
are commonly written and those which are (largely) only spoken.  Outside the
realm of specialised applications for linguists, most applications currently
only deal with written languages and scripts and it is only confusing (and
storing up problems) to add codes for spoken languages and dialects to that
list of tags.

It is quite easy to envision that a set of standard codes may also  be needed
for spoken languages and dialects in for things like voice recognition
applications and specialised linguistic tagging. My own feeling is that there
should be a separate set of codes for *spoken* languages and major dialects.
Obviously may languages would fall in both lists.

Looking over the Ethnolouge codes for "Bodhic" languages  it seems quite clear
that most of the codes listed are for distinctive spoken  languages and
dialects - literate speakers of most of these languages have one
written/literary language "Tibetan" which they share in common - though if
they tried to speak to each other they might have a great deal of difficulty
understanding each other.

Putting it simply we need one code "bo" for the written language but many
distinct codes for the spoken languages and dialects which are often *very*
different from the common spoken language.

In short I favour inclusion of codes for written languages in the Ethnolouge
list which are currently missing in ISO 639 (and the requirement for a certain
number of publications does not seem too onerous)  - but do not favour the
adoption of  all the languages in the Ethnolouge list wholesale
at this time as many of these appear to be only spoken languages or distinct
dialects.

I do think it would be useful to consider a separate set of codes for spoken
languages.

- chris






Re: Tamil glyphs

2000-09-12 Thread Christopher J. Fynn


"John Hudson" <[EMAIL PROTECTED]> wrote:

> we need be clear about what is being tagged -- script variants,
> orthographic variants or language specific practices --, so as to avoid the
> kind of inconsistencies that exist in ISO 639.

I'd add regional (and/or country) specific variants to this list. 
 
- Chris




Re: is there any way to change already defined character codes?

2000-08-07 Thread Christopher J. Fynn

Sandro

I'm sure someone official will give you an official answer, but I know the only
answer you are going to get to your question is NO - there is no way to change
the encoding point of a character (or to change a character name) once it is in
the Unicode or ISO 10646 standards. Allowing changes like this would break
existing implementations of these standards - and of course these standards
would be useless as standards if they were subject to that kind of change.

Proposals to encode new characters in the Unicode and ISO 10646 standards have
to go through a lengthy process of consideration and there is ample opportunity
to submit comments on any proposal during that process. However once characters
are finally assigned code points in the Unicode and ISO 10646 standards that's
it.

May I ask what is the reason these people from the government of Georgia want
to change the codepoints of some Georgian characters? There is probably another
good solution (or solutions) for whatever problem they think would be solved by
changing encoding points.

Regards

- Chris


"Sandro Karumidze" <[EMAIL PROTECTED]> wrote:

> There are people from the government of Georgia interested in possibility in
> altering Unicode standard it terms of changing codes for some of Georgian
> characters.

> Does this type of things happen in Consortium and if yes under what
circumstances.

> If not can you specify in which rules is it defined that this types of
changes are
> not allowed..

> Thanks in advance for your support,

> Best regards,

> Sandro Karumidze





Re: Unicode 3.0.1 update beta data files available

2000-08-04 Thread Christopher J. Fynn

Ken:

UnicodeData-3.0.1d2.beta.txt has:
<<
0F00;TIBETAN SYLLABLE OM;Lo;0;L;N;
...
0F0E;TIBETAN MARK NYIS SHAD;Po;0;L;N;TIBETAN DOUBLE SHAD;nyi shey;;;
0F0F;TIBETAN MARK TSHEG SHAD;Po;0;L;N;;tsek shey;;;
...
0F71;TIBETAN VOWEL SIGN AA;Mn;129;NSM;N;
0F72;TIBETAN VOWEL SIGN I;Mn;130;NSM;N;
0F73;TIBETAN VOWEL SIGN II;Mn;0;NSM;0F71 0F72N;
0F74;TIBETAN VOWEL SIGN U;Mn;132;NSM;N;
0F75;TIBETAN VOWEL SIGN UU;Mn;0;NSM;0F71 0F74N;
0F76;TIBETAN VOWEL SIGN VOCALIC R;Mn;0;NSM;0FB2 0F80N;
0F77;TIBETAN VOWEL SIGN VOCALIC RR;Mn;0;NSM; 0FB2 0F81N;
0F78;TIBETAN VOWEL SIGN VOCALIC L;Mn;0;NSM;0FB3 0F80N;
0F79;TIBETAN VOWEL SIGN VOCALIC LL;Mn;0;NSM; 0FB3 0F81N;
0F7A;TIBETAN VOWEL SIGN E;Mn;130;NSM;N;
0F7B;TIBETAN VOWEL SIGN EE;Mn;130;NSM;N;TIBETAN VOWEL SIGN AI
0F7C;TIBETAN VOWEL SIGN O;Mn;130;NSM;N;
0F7D;TIBETAN VOWEL SIGN OO;Mn;130;NSM;N;TIBETAN VOWEL SIGN AU
>>

Shouldn't 0F00 have a decomposition / equivalence to 0F6B 0F7C 0F7E?
It is essetially a ligature or precomposed form of those characters.

Shouldn't 0F7B have a decomposition / equivalence to 0F7A 0F7A
and  0F7D have a decomposition / equivalence to 0F7C 0F7C ?

Are 0F77 and 0F79 marked as  because, once decomposed,
the subjoined RA [0FB2] or LA [0FB3] cannot easily be identified as part of
a vowel. (Most Tibeans will enter these in decomposed form anyway.)

Shouldn't 0F0E have a decompsition to 0F0F 0F0F?

==

0F14 TIBETAN MARK GTER TSHEG has an alternative name "TIBETAN COMMA"
this altenative is completly misleading and should be removed if possible.
0F14 basically has the same function as 0F0F - and is used in certain types
of text instead of 0F0F. Text that has this character instead  of 0F14 is
"treasure text"  [gter-ma] or revealed text.

==
PropList-3_0_1d2_beta.txt has:
<<
Property dump for: 0x20001000 (Punctuation)
...
0F04..0F12  (15 chars)
0F3A..0F3D  (4 chars)
0F85
...
>>

The first range should probably begin at 0F02 instead of 0F04
as 0F02 and 0F03 are special foms of YIG MGO.

0F14 is also a punctuation character (just like 0F0F which it replaces in
certain types of text).

I don't think 0F85 TIBETAN MARK PALUTA should be counted as a "punctuation
character"

 ===

<<
Property dump for: 0x0080 (Delimiter)
...
<<
0F0B
0F0D..0F12  (6 chars)
0F3A..0F3D  (4 chars)
>>

0F0C like 0F0B is a word delimiter - the difference is a break cannot occur
after 0F0C while it can after 0F0B.

0F14 should also be included here for the same reasons as mentioned earlier.

0F34 and 0FBE should probably also be included here - although they indicate
something like "etc." or "and so forth" they also contain an implicit 0F0B and
can occur only at at the end of a phrase not in the middle.

- Chris



- Original Message -
From: "Kenneth Whistler" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Saturday, August 05, 2000 2:24 AM
Subject: Unicode 3.0.1 update beta data files available


> The beta directory for the Unicode 3.0.1 update has been created.
>
> Due to the current problem with anonymous ftp on www.unicode.org,
> only the http version of this directory is currently available:
>
> http://www.unicode.org/Public/3.0-Update1/
>
> The updated beta files at that location for the Unicode 3.0.1 update are:
>
>5179 Jul 31 21:14 ArabicShaping-3d1.beta.txt
>   43559 Jul 31 21:14 CaseFolding-2d1.beta.txt
>5085 Jul 31 21:14 CompositionExclusions-2d1.beta.txt
>   55254 Jul 31 21:14 PropList-3.0.1d2.beta.txt
>   13841 Jul 31 21:14 SpecialCasing-3d2.beta.txt
>   48261 Jul 31 21:14 UnicodeData-3.0.1d1.beta.html
>  636269 Jul 31 21:15 UnicodeData-3.0.1d2.beta.txt
>
> These are temporary names. Once the beta review closes, the "beta" and
> the delta number on the files will be dropped for the permanent
> versioned filename, and the latest versions of the files will
> be copied into the UNIDATA directory minus the version extension.

> And comparable changes will be made in the ftp hierarchy as well, as
> soon as regular ftp service can be restored on the server.

> Before that happens, however, we would like to invite all interested
> implementers to examine the data files and report any problems you
> find in them, so that any problem can be corrected before the finalization
> of the Unicode 3.0.1 update.

> Note that UnicodeData.txt and PropList.txt now explicitly contain
> codepoint listings using the 5- or 6-digit UTF-32 notation. If
> you are using automated parsers on either of those files, be aware
> of this change in convention and make sure your code is prepared
> to handle parsing of codepoint values greater than 0x.

> We are introducing this change now with the relatively trivial
> listing of user-defined, unassigned, and not-a-character codepoints
> past U+, so people can test out their implementations before
> they get whumped with 40,000+ new characters from Planes 1,

Re: Addition of remaining two Maltese Characters to Unicode

2000-07-31 Thread Christopher J. Fynn

"Angelo Dalli" <[EMAIL PROTECTED]> wrote:

> Note that representing these two digraphs is fraught with problems,
> especially due to the context sensitive capitalisation rules. ‘gh’ at the
> start of a word is capitalised as ‘Gh’ while for an all-capitals word it is
> written as ‘GH’. Similarly ‘ie’ is capitalised as ‘Ie’ at the start of a
> word and as ‘IE’ for an all-capitals word.
...

On the big assumption that you are able to convince the powers tht be that
these are unique characters, I don't think you need the two kinds of context
sensitive capitals - this sort of thing is now more easily handled by having
a single capital letter character that may take on  two glyph forms dependant
on context. Which form is actually displayed can be handled easily by AAT or
OpenType fonts dependant on context. Other scripts, especially Arabic and
Indic, have many single characters which have multiple context sensitive glyph
forms - and modern font rendering systems such as those in Mac OS/9 and
Windows 2000,  originally developed with exactly this sort of thing
in mind, can take care it with ease.

In fact I should think that proposing encoding two forms of capitals could
weaken the case you need to make in order to prove that these are discrete
characters - and encoding two forms surely complicates things like sorting
and searching.

- Chris




Cost per character?

2000-07-31 Thread Christopher J. Fynn

Leaving aside implementation costs - has anyone ever come up with a good
estimate of the cost per character for the development of the  Unicode / ISO
10646 standards in terms of man hours of experts and their long-suffering
secretaries, the office space they use, cost of attending and hosting UTC, WG2
and other meetings, cost of producing and distributing documents for proposals
etc, communications charges - and a whole host of other things?

(And maybe there should also be an estimate of all additional the time & work
many people have put in to these standards without getting paid for it.)

- Chris






Re: Unicode has included the Inscriptions of Harappa/Mohenjadaro/Egypt...?

2000-07-28 Thread Christopher J. Fynn

"Padma kumar .R" <[EMAIL PROTECTED]> wrote:

>  I am very much interested 2 know about whether the old
inscriptions on
>  harappa, mohenjadaro, sumaria, egypt and like things are included in
the
>  unicode list... if so, are there any document of how to use or
pronounce
>  atleast some of the characters... and are there any fonts that
supports
>  this  If you know please let me know it... i am deeply interested
in
>  this area...


For the Harappa  Mohenjadaro  seal script the Indology list
[EMAIL PROTECTED] would be a better place to ask.
In fact there has been quite heated discussion on that list recently
concerning various interpretations of this script.

- Chris




Re: Designing a multilingual web site

2000-07-20 Thread Christopher J. Fynn

Even when  
and the rest are present, I've noticed that including the line:

in an HTML document actually *stops* some versions of Netscape recognising  UTF-8 or
&#; encoded characters.
Commenting out the line seems to fix the problem.

- Chris






Re: Subject lines in UTF-8 mssgs? [was: Proposal to make ...]

2000-07-12 Thread Christopher J. Fynn


"Jaap Pranger" <[EMAIL PROTECTED]> wrote:

> At 16:44 +0200 2000.07.12, [EMAIL PROTECTED] wrote:

> >Everybody (beginning by myself!) should probably be more careful 
> >in naming subject lines, and renaming them when a reply deviates 
> >from the subject.
 
> Marco,
 
> This wil not help very much when you send UTF-8 messages. Your 
> Subject lines in those messages show up completely "garbled", at 
> least in my non-UTF-8-aware email client. OK, that's my problem. 
> But mostly other people's UTF-8 messages show 'neat' Subject headers.  
> What's going on, why this difference? 
 
> Jaap
 
In Outlook Express under Tools, Options, Send,  International Settings 
it is possible to specify that only English  (? ASCII) is used in headers 
and under Tools, Options, Send, Plain Text Settings & Tools, Options, 
Send, HTML Settings it is possible to specify whether or not 8-bit 
characters may be used in message headers.

These settings seem to apply whatever encoding is used for the body 
of the message.

- Chris




ATM light & glyphs for Unicode characters?

2000-07-12 Thread Christopher J. Fynn

Anyone know if Adobe's (free) ATM lite
http://www.adobe.com/products/atmlight/main.html
supports display of glyphs for Unicode characters when these are named
according to Adobe's document "Unicode & Glyph Names"
http://partners.adobe.com/asn/developer/typeforum/unicodegn.html

- Chris

--
ཿརྨ༼སྦྷྲུ཭༼཭རྦྷྱུཧ༼཭སྙར༼འཾིར།




Re: Acronyms

2000-07-10 Thread Christopher J. Fynn


"Antoine Leca" <[EMAIL PROTECTED]> wrote:

> Also, SMP is intended "for scripts and symbols" in English, and
> in French « pour caractères et symboles » ("for characters and
> symbols"), a slightly different thing...

Aren't *all* the planes « pour caractères » ?