Re: Dates in Japanese Era Names in Unicode Standard

2016-09-29 Thread Raymond Mercier
Philippe,
>>Is it possible that these eras start at midday instead of noon ?
I assume you mean midnight 

RM

www.raymondm.co.uk

Re: Turned Capital letter L (pointing to the left, with serifs)

2016-01-05 Thread Raymond Mercier
I have looked at both the collected works of Gauss and at the English version 
of the Theoria Motus, in order to see what a later editor made of this symbol.



In the Werke the symbol ’7’ continues to be used : C F Gauss, Werke, Vol. 7, 
ed. E J Schering, Gotha, 1871; § 77, M = N + n’7’  ̶  Π.



In the translation the ‘7’ is replaced by the lower case tau. 



Theory of the motion of the heavenly bodies moving about the sun in conic 
sections: a translation of Gauss's "Theoria motus." With an appendix. By 
Charles Henry Davis, Boston : Little, Brown and company, 1857; § 77, M = N + nτ 
 ̶  Π.



So this seems to settle the matter of the identity, and just leaves one to 
puzzle over the German use of this sign for tau.

Raymond



Re: Turned Capital letter L (pointing to the left, with serifs)

2016-01-04 Thread Raymond Mercier
On further reflection I can well agree that it is tau. The attached images from 
R. Barbour, Greek Literary Hands, show clearly (scan 3) the large upper case 
tau in several lines, and in scan 4 in the first and other lines a hooked 
version of tau. So I withdraw my suggestion of pi.
Raymond

From: Asmus Freytag (t) 
Sent: Monday, January 04, 2016 7:58 PM
To: unicode@unicode.org 
Subject: Re: Turned Capital letter L (pointing to the left, with serifs)

On 1/4/2016 10:41 AM, Michael Everson wrote:

  Certainly it does look more like a very common variant of “tau” than “pi”


Variant of uppercase tau?

A./


Re: Turned Capital letter L (pointing to the left, with serifs)

2016-01-04 Thread Raymond Mercier
The sign described as like 7 is surely a cursive form of π. The form used by 
Gauss (Disquisitio de elementis ellipticis Palladis) is much the same as that 
shown in manuals of Greek Palaeography as a cursive π. This is given by E.P. 
Thompson in two works, An Introduction to Greek and Latin Palaeography, Oxford, 
1912, p.83, and  A Handbook of Greek and Latin Palaeography, Chicago, 1975, p. 
95.
Raymond Mercier

Re: Unicode 7.0 Paperback Available

2015-01-17 Thread Raymond Mercier
Well why not print a good clean copy with Acrobat and a high quality printer, 
and do the rest of the volume printing as camera-ready ? I have had complex 
texts published that way.
R.___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Unicode 7.0 Paperback Available

2015-01-17 Thread Raymond Mercier
Asmus,
Thanks. Indeed I am surprised that a publisher cannot get results as clean and 
reliable as I do when printing from Acrobat.
R

From: Asmus Freytag (t) 
Sent: Saturday, January 17, 2015 5:52 PM
To: Raymond Mercier ; unicode@unicode.org 
Subject: Re: Unicode 7.0 Paperback Available

Raymond,

even though the source is PDF, the nature of the fonts used for the charts 
makes this extremely challenging for the printers. Experiments run by some 
volunteers have determined that you can expect very inconsistent results, 
because the way these printing services and their contractors handle PDF is 
just not the same as when you use Acrobat or some browser plug-in to view them 
on screen.

You may find this a surprising state of affairs, but those are the facts on the 
ground. It was found that even the same service may get you different results 
for each order. And by different, I mean, with different discrepancies from the 
desired output.

These services apparently subcontract with a number of printing presses, all of 
which may have different software.

A./


___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Unicode 7.0 Paperback Available

2015-01-17 Thread Raymond Mercier
Since the new printed volume is so expensive when shipping is included, why not 
try one of the commercial binding services, such as
https://www.doxdirect.com/products/specialist-document-printing/pdf-printing/.
The pdf files that make up Unicode 7.0 can all be downloaded from 
http://www.unicode.org/versions/Unicode7.0.0/.
It would have been easier of course if the individual pdf’s had been gathered 
together into larger groups, although one can do that easily within Acrobat.

Best of all would be a volume (or two ?) like that for Unicode 5 produced by 
Addison Wesley. When I once asked about that for Unicode 6  I was told that it 
was just too difficult to get the pages formatted suitably for book production. 
But if the charts can be presented as pdf, why is it difficult to print and 
bind them ?

Regards
Raymond Mercier
___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Word reversal from Abobe to Word

2013-02-17 Thread Raymond Mercier
Thanks to all for your comments on the problem of copying a Hebrew phrase from 
Adobe to Word.
The Hebrew phrase is מהלך חמה הבינוני בשנים מחוברות ופרוטות וחדשים


I have had another look at the problem, with these results.

Copy from Adobe Acrobat 6 (with Select text):
Paste into Word as rtf - words in the correct order, but reversal within 
each word
 Paste into Word using Paste special and unformatted unicode -word order 
reversed, as well as reversal within each word

Copy from Adobe Reader X:
Paste int Word as rtf - word order reversed, but characters not reversed 
within each word
Paste into Word using Paste special and unformatted unicode - everything 
correct.

Clearly my old Adobe Acrobat 6 is not up to the job, while the latest reader 
is, provided Paste Special is used.

There are options within Word options/Advanced/Cut Copy Paste, but a choice of 
non-default options has no bearing on the issue.

Regards
Raymond Mercier



Re: Word reversal from Abobe to Word

2013-02-07 Thread Raymond Mercier


From: "Jukka K. Korpela" 


Do you mean the commercial Adobe Acrobat software for creating PDF

documents, or the free Adobe Reader (previously called Adobe Acrobat
Reader) for viewing and printing (and commenting on) PDF documents? <<

I am using the full commercial Adobe Acrobat version 6, running on XP.

In constructing the example I realize that I had the wrong final 'm', but 
that does not affect the point.
If there is more than one word, the order of words IS correct, but the order 
of characters in each word is reversed.


Regards
Raymond 





Word reversal from Abobe to Word

2013-02-07 Thread Raymond Mercier

This problem is not precisely about Unicode - or is it?
If I have a Hebrew text displayed in Adobe Acrobat I can select part of it 
and can paste it into Word. The trouble is that while individual characters 
are correctly displayed the order is reversed. Thus if I have in Acrobat

קודמ (meaning 'prior')
when pasted into Word I get
םדוק
Every effort to put this right has failed, and yet it must have been met by 
many others.
It's not really about Word as such, since pasting into Notepad has the same 
result.

What does one do ?

Regards
Raymond Mercier




Re: Greek Astrology

2012-11-01 Thread Raymond Mercier
Byzantine authors had a great penchant for ligatures. Although I do not 
have expertise in Greek astrology, I do have some competence in other 
aspects of Byzantine literature (including some familiarity with 
manuscripts and inscriptions). Based on that experience, I feel I can 
safely say that any attempt to encode the four ligatures on the grounds 
discussed here would be an invitation to encode a host of other Byzantine 
Greek ligatures (for example, the standard cruciform invocative monograms: 
V. Laurent, La collection Orghidian [Paris, 1952], pl. lxx).


A formal proposal for these four ligatures would be premature. One should 
first understand the entire culture of Byzantine ligation, then determine 
what parts of that culture should be encoded, and which not.


Sincerely,

jk


I have done enough text editing in Greek to know well that there is a large 
and bewildering range of ligatures and abbreviations, and I have absolutely 
no thought of suggesting they should all be encoded.


However there are pecularities of notation associated with individual 
disciplines, such as mathematics and music. The sign for the ascendant, at 
least, is part of the notation in astrology and astronomy, along with, for 
example, the sign used for zero in sexagesimal notation. This approach can 
be compared with the Unicode block of Byzantine musical notation.


Raymond Mercier












--
Joel Kalvesmaki
Editor in Byzantine Studies
Dumbarton Oaks
1703 32nd St. NW
Washington, DC 20007
(202) 339-6435

From: , "A. Sz." mailto:a.sz.sz...@gmail.com>>
Date: Thursday, November 1, 2012 3:56 AM
To: CE Whitehead mailto:cewcat...@hotmail.com>>
Cc: "unicode@unicode.org<mailto:unicode@unicode.org>" 
mailto:unicode@unicode.org>>

Subject: Re: Greek Astrology

Is there evidence that these have been used consistently, on most charts of 
the time? These could be ad-hoc notations (as given the contemporary praxis, 
ligation per se does not make a "symbol").


--
Szelp, Andr Szabolcs

+43 (650) 79 22 400


On Thu, Nov 1, 2012 at 2:38 AM, CE Whitehead 
mailto:cewcat...@hotmail.com>> wrote:

Hi.
From: Raymond Mercier 
mailto:rm459_at_cam.ac.uk?Subject=Re:%20Greek%20astrology>>

Date: Mon, 29 Oct 2012 08:52:43 -
I think I had somehow assumed that the symbols used in Greek Horoscopes 
had already been encoded, but it seems not.
The four signs used to mark the principal corners (ascendant, etc) of the 
horoscope diagram are shown in the attachment, taken from

http://www.skyscript.co.uk/greek_horoscope.html


These four signs should be encoded along with the zodiacal signs U+2648 to 
U+2653.

Perhaps they are already in the pipeline ?
Perhaps these should be in the pipeline, as the online templates I could 
find for astrological charts do not have them; they have to be added in 
(although it would be possible to have these built into the chart template 
also, as the houses are always in the same place and the ascendant is always 
located between the 12th and the 1rst, etc.); see:

http://www.skyscript.co.uk/charttemp.html

Similarly Paul Wade's copiable template is void of the symbols
http://books.google.com/books?id=WY8hjKtSaP0C&pg=PA40&lpg=PA40&dq=natal+charts+astrological+charts+templates&source=bl&ots=By-xF3UGWB&sig=KvomOKgo999CwuJPKaq1LmeoqHc&hl=fr&sa=X&ei=oMCRUK-wF5Sc8gTWi4DYAg&ved=0CDQQ6AEwAzgK#v=onepage&q=natal%20charts%20astrological%20charts%20templates&f=false

(I'll try to check an offline guide, too, but the few actual online 
templates, not sample charts, seem void of the symbols for the ascendant, 
midheaven, etc., so they seem to be separate from the actual chart of the 
houses, so go for it. Happy Halloween in any case.)




Best,

--C. E. Whitehead
cewcat...@hotmail.com<mailto:cewcat...@hotmail.com>

Best wishes
Raymond Mercier







Greek astrology

2012-11-01 Thread Raymond Mercier
The first sign  (ascendant or horoscope) is common enough I believe, and I 
attach a small portion of a most valuable (10th century ?) Syriac manuscript 
(photographed in UV light), a palimpsest, where the inferior text is Greek. 
There is clearly a list of 'values' for this ascendant (16,15,14...), but I 
cannot make out the rest of the lines. When such material is transcribed people 
(as Neugebauer, Greek Horoscopes) just use H, or some other conventional sign 
to indicate the horoscope, but it would be nice if the ancient sign were 
encoded. For the transcription I made a horoscope sign using Paint, and used an 
ad hoc uncial script (non Unicode) for the Greek, with sigma instead of stigma. 
All this shows the problems of making a faithful transcription of ancient texts.

Raymond Mercier
  - Original Message - 
  From: Szelp, A. Sz. 
  To: CE Whitehead 
  Cc: unicode@unicode.org 
  Sent: Thursday, November 01, 2012 7:56 AM
  Subject: Re: Greek Astrology


  Is there evidence that these have been used consistently, on most charts of 
the time? These could be ad-hoc notations (as given the contemporary praxis, 
ligation per se does not make a "symbol").


  --
  Szelp, André Szabolcs

  +43 (650) 79 22 400

<><>

Greek astrology

2012-10-29 Thread Raymond Mercier
I think I had somehow assumed that the symbols used in Greek Horoscopes had 
already been encoded, but it seems not.
The four signs used to mark the principal corners (ascendant, etc) of the 
horoscope diagram are shown in the attachment, taken from

http://www.skyscript.co.uk/greek_horoscope.html

These four signs should be encoded along with the zodiacal signs U+2648 to 
U+2653.
Perhaps they are already in the pipeline ?
Best wishes
Raymond Mercier
<>

Re: Unicode Core

2012-06-21 Thread Raymond Mercier

Michael Everson:
Perhaps less than us character mavens would imagine. Books don't publish 
themselves, and publishing takes resources of various kinds.

Julian Bradfield:

Not much, if they use the Lulu route, as they already have an account
set up. An hour of somebody's time should do it.
And at a Lulu price, there'll be a lot more of a market than at an
Addison-Wesley price!


I haven't work out the number of pages needed for all the charts, but even 
if it needed two volumes, what is the problem with that ?
It is not just for private libraries like mine, but this is something, 
complete with the charts, that should be in the reference section of every 
university library, and every computer library. Or do we tell the library 
user that they can always download the charts ?
Raymond Mercier 





Unicode Core

2012-06-21 Thread Raymond Mercier
Today I received from Lulu the Unicode Standard 6.1 -Core specification
http://www.lulu.com/shop/unicode-consortium/the-unicode-standard-version-61-core-specification/paperback/product-20082926.html
 .

While I am very glad to have this, I really do wonder why there was not a full 
publication of Unicode 6 or 6.1 from the corporation itself, with all the 
charts, as we have had with Unicode 1 to 5. Surely there is a market for this ?

Raymond Mercier

Re: Notice of brief Unicode.org system outage on Friday

2012-05-02 Thread Raymond Mercier


From: "Cristian Secară" 


Just wondering why the time zone reference is not given in a universal

format, like UTC±n, so one in other part of the world can calculate.

Excellent point !

Raymond Mercier




Re: Hittite cuneiform

2011-12-17 Thread Raymond Mercier

From: "Michael Everson" 

> Hittite cuneiform is a subset of http://www.unicode.org/charts/PDF/U12000.pdf

I agree that the Hittite signs are a relatively small subset of those used in 
Sumerian, Akkadian, etc., but the values so often differ that it seems to me 
that a separate listing of the Hittite usage is appropriate. 
Compare the CJK signs: the Japanese -kun and -on readings are included in 
unihan along with the Chinese readings.

Raymond Mercier


Hittite cuneiform

2011-12-17 Thread Raymond Mercier
Why has Hittite cuneiform not yet been included ? As one sees from the table in 
http://www.ancientscripts.com/hittite.html, 
it should be easy enough, just as Old Persian and Ugaritic are included under 
the general heading Cuneiform.

Best wishes
Raymond Mercier

Re: Missing old Greek ligature/letter "omicron+upsilon above"

2010-09-19 Thread Raymond Mercier
Philippe Verdy :

>>Clearly there does seem to be missing a Greek letter,

I hope there is no suggestion of encoding the huge variety of Greek 
abbreviations and ligatures. The early printed Greek texts used type designed 
to follow the manuscripts, but thank goodness that was dropped before long. 
There is a bewildering number of these signs.
See, for example
http://commons.wikimedia.org/wiki/File:Greek_alphabet_ligatures.jpg
and that's just the start of it.

Raymond Mercier

Re: TeX: insert Unicode character

2010-08-24 Thread Raymond Mercier

From: "Julian Bradfield" 

In principle, find a suitable cuneiform package, and use it.
I don't know which package has what characters in it, unfortunately,
and at a first glance, I can't see one that has that character.


We have a font, since I adapted an older experimental cuneiform TrueType 
font, essentially by changing the encoding to conform to Unicode cuneiform. 
I have not made this publicly available, but have just used it for an 
article written in Word. My colleague wants to convert that to Latex, but he 
ran into some problems. The publisher does not insist in Latex, but it has a 
loyal fan club, especially among mathematicians.


Raymond 





Re: TeX: insert Unicode character

2010-08-24 Thread Raymond Mercier

Thanks for both those suggestions, which I will pass on.

Raymond



TeX: insert Unicode character

2010-08-24 Thread Raymond Mercier
I am trying to help a colleague who writes an article in LaTeX, and who 
needs to insert an isolated character U+1212d from the Unicode block. I am 
not too much familiar with LaTeX myself, but what do I suggest to him ?


Raymond Mercier




Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-27 Thread Raymond Mercier

"John Dlugosz" writes

I can imagine supporting national representations for numbers for 
outputting reports, but I don't imagine anyone writing in a >>programming 
language would be compelled to type 四佰六十 instead of 560.


Especially since 四佰六十 is 460.

Raymond Mercier 





Re: Pronunciation of the word emoji

2010-07-06 Thread Raymond Mercier

Please, Mr Overington, enough ! enough !

Raymond Mercier



Re: Value of U+1E20

2004-09-17 Thread Raymond Mercier
> Raymond mentions Arabic ghayn, but I would expect this to be
> transliterated more commonly with U+011F or U+0121.
I can assure you that 1E20 and the l.c. companion 1E21 are very clearly used
by Wehr in his Arabic Dictionary.
As to  U+011F or U+0121, I see that Socin, in his old Arabic Grammar (1895),
uses U+011F  for jim U+062C, and the U+0121 for ghain U+063A. Wright, Arabic
Grammar, as old as Socin, also uses U+0121 for ghain U+063A. It may be that
these usages of a century ago survive in some quarters even today.

Raymond




Re: Value of U+1E20

2004-09-17 Thread Raymond Mercier
> Would any one know what is the value of U+1E20 ?
> Is this (also) used in Semitic transliterations ? For which value ?
> Could it be a fricative G ?

It is used somtimes in transcribing Arabic, where it represents ghain U+063A
Ø. You will see it for example in Wehr's Arabic Dictionary, even in the
English version of Cowan. In most English transcriptions of Arabic gh would
be used.
Raymond Mercier




Re: sign for anti-neutrino - greek nu with diacritical line aboveworkaround ?

2004-08-08 Thread Raymond Mercier
Herbert,

>have you been to:
>http://titus.fkidg1.uni-frankfurt.de/database/unicode/unicself.htm
>there you can combine NU and MACRON - and they are using IE on newest
>WINDOWS…

Well that's very pretty - for me however it works only in Mozilla, not in
IE.
As far as I know IE6 is the latest.
Raymond




Re: sign for anti-neutrino - greek nu with diacritical line aboveworkaround ?

2004-08-08 Thread Raymond Mercier




Herbert,
Sorry, no change in IE6: still nu+ empty squares. 
However it works in Mozilla, and so did the previous one.
Raymond


Re: sign for anti-neutrino - greek nu with diacritical line aboveworkaround ?

2004-08-08 Thread Raymond Mercier
Mark:
> It is probably a really bad idea to have the base letter in one span and
the
> combining mark in another. That is very likely to throw a monkey wrench
into
> whatever you are trying, on most text layout systems.

If I start in Word with a clear nu+macron, and save as html, I get the
division into two spans. What I posted is derived from that by dropping all
the style padding.
Raymond




Re: sign for anti-neutrino - greek nu with diacritical line aboveworkaround ?

2004-08-08 Thread Raymond Mercier
ï


Herbert,
Well when I open yours in IE 6 I just get the character nu, 
followed by a blank square. 
To add to this comedy, however, when I look at your 
source in notepad I see there indeed a correct nu+macron !
There is some odd instability going on here.
 
BTW in my previous message I intended 
 
ν̄ 
 
to be all in one line - but it does not come out that way in 
the mail display, at least not in Outlook Express.
Raymond
 


Re: sign for anti-neutrino - greek nu with diacritical line aboveworkaround ?

2004-08-08 Thread Raymond Mercier
Peter Jacobi:
>Testing with Mozilla 1.7, ν̄ displays a fine Anti-Neutrino
sign.
So you say, but the following works in IE6, and Opera, but not in Mozilla
1.7. What is the problem ?


"nu.htm"


ν̄



In Mozilla only the nu shows.

And if I change
ν̄
to
ν
̄

then in IE6 the macron is shifted to the right.
What is going on ?
Raymond Mercier




Re: Morohashi in unihan

2004-08-07 Thread Raymond Mercier
From: "Allen Haaheim" <[EMAIL PROTECTED]>
> I spot-checked a few random characters from Blocks A and B, and some of
them
> were in Morohashi.
So that means that the Morohashi numbers have just not been included in A or
B.

Raymond Mercier




Morohashi in unihan

2004-08-05 Thread Raymond Mercier
A lot of characters in unihan.txt have Morohashi values 0 or 9. I
take it there is then no Morohashi equivalent, but what is the distinction
between these two, and what is the point of putting anything if there is no
Morohashi equivalent ?
Also the Morohashi equivalent is not given for CJK-A or B. Is that really
true, I mean are these characters really not in Morohashi ?

Raymond Mercier




Re: Much better Latin-1 keyboard for Windows

2004-07-18 Thread Raymond Mercier
Jowh Cowan writes
> http://www.livejournal.com/users/gwalla/39856.html is a page about
> (and a link to) a truly excellent Windows keyboard driver that
> provides full access to the Latin-1 range

Latin-1 is not everything! If you need to transcribe
Arabic/Hebrew/Sanskrit/Farsi, you will need the macrons on vowels (Latin
Extended-A) and various dot-under letters (Latin Extended Additional). I
made my own layout using the DDK.
Raymond Mercier




Re: Looking for transcription or transliteration standards latin- >arabic

2004-07-06 Thread Raymond Mercier

Peter Kirk writes
> This is more complicated than it looks. The Greek form Istimboli is
> impossible for the period as Greek had no [b] sound, for Î was
> pronounced [v] except that later and perhaps already at that period ÎÏ
> was pronounced [b] at least in foreign words. So is the Greek consonant
> cluster ÎÎ, or ÎÏ, or ÎÎÏ, or what? Also is the previous consonant
> cluster ÏÏ as transliterated, or ÏÎ corresponding to "isthmus"? And then
> what are the Greek vowels?

I was only trying to grasp the sense of Gerd's throw-away remark (which I
hope he will explain), but I appreciate the difficulties you raise,
especially the point about the Greek beta as the phoneme /v/ . That
particular difficulty at least doesn't apply to the Ottoman b, if we look
for a Turkish -bul < ÏÎÎÎÏ.

Raymond Mercier

http://ourworld.compuserve.com/homepages/RaymondM




Re: Looking for transcription or transliteration standards latin- >arabic

2004-07-06 Thread Raymond Mercier
ï


Gerd Schumacher wrote
> I think, the underying meaning of Istimboli 
must be > "town at the isthmus", which makes sense, 
indeed.
How does that work ? Do you mean
istim<ÎÏÎÎÎÏ 
, 
bol<ÏÎÎÎÏ 
?
 
Raymond 
Mercier


Re: Philippe's Management of Microsoft (was: Re: Yoruba Keyboard)

2004-05-07 Thread Raymond Mercier



James Kass writes:> IE6 displays CJK(A) in UTF8 just fine.  It 
can't seem to handle> CJK(B) in UTF-8, though.Isn't it the other 
way round ?I attach a file with three characters all in UTF8, representing 
CJK(A), CJKand CJK(B). The CJK(A) displays in IE6 only if ... isincluded, but it *does* handle the CJK(B) 
without any reference to lang.In Mozilla all three display without the 
"lang=ZH"Of course to see the CJK(B) you need the font Simsun (Founder 
Extended).Raymond
 
Title: Definition Search


㖾 35BE
址 5740
𨀣 28023



Re: Philippe's Management of Microsoft (was: Re: Yoruba Keyboard)

2004-05-07 Thread Raymond Mercier
Kenneth Whistler writes, replying to Philippe
> This kind of long-winded harangue about how Microsoft should manage its
> business is OT for this list and is generally insulting to the Microsoft
> participants as well. Please take it elsewhere and do not bother the
> Unicode list with your management plans for Microsoft's internal
> business.

It is all very well to mock Philippe, but IE6 fails badly if it cannot even
display CJK(A) in UTF8, something Mozilla does perfectly well. If there are
Microsoft participants in this list perhaps they could explain this failure.
Broadly speaking I am pro-Microsoft, but this behaviour in IE6 reflects
badly on them.

Raymond Mercier





Re:CJK(B) and IE6

2004-05-04 Thread Raymond Mercier
[Earlier posting lost, it seems.]

James Kass writes:
> The lack of support for supplementary characters expressed in UTF-8
> in the Internet Explorer is a bug.  As Philippe Verdy mentions, the
> Mozilla browser does not have this same bug.  Also it should be
> noted that the Opera browser handles non-BMP UTF-8 just fine.

As I said in my starting message Mozilla copes with everything, both UTF8
and NCR, over the whole CJK range.
However Opera (in my experience) cannot do Ext B in either UTF8 or NCR.
IE6 cannot cope with Ext A in UTF8, but will do so in NCR.
I attach two short files (produced by Hanfind) that include both extensions,
one in UTF8 and the other NCR (except that characters given within the text
are all NCR).


> While working with NCRs may be an ugly nightmare, there are some
shortcuts.
BabelPad is great, but it chokes in converting all the UTF8 in unihan.txt to
NCR at one
go. I wrote a dedicated program to do that.


> I *think* that Windows 2000 uses Unicode always internally and uses an
> internal conversion chart if material is non-Unicode like GB-18030.
>
That at least is declared http://www.i18nguy.com/surrogates.html.

Raymond Mercier
Title: Definition Search
㖾 35BE, E4 	(same as 咢) to beat a drum; to startle, to argue; to debate; to dispute, (interchangeable 愕) to be surprised; to be amazed; to marvel, (interchangeable 鍔) the blade or edge of a sword, beams of a house㝔 3754, YAO4 	deep bottom; the southeast corner of a house㝢 3762, YU3 	(same as 宇) a house; a roof, look; appearance, space㝪 376A, DIAN4 DING3 	a slanting house, nightmare㡯 386F, ZHAI2 	(ancient form of 宅) wall of a building, a house, to keep in the house, thriving; flourishing, blazing, (ancient form of 度) legal system; laws and institutions, to think; to consider; to ponder; to contemplate㡯 386F, ZHAI2 	(ancient form of 宅) wall of a building, a house, to keep in the house, thriving; flourishing, blazing, (ancient form of 度) legal system; laws and institutions, to think; to consider; to ponder; to contemplate㡰 3870, YU3 	(large seal type 宇) a house; a roof, appearance, space; the canopy of heaven, to cover㡵 3875, LING2 	roof of the house connected㡸 3878, ZHA3 ZHA4 	a house; an unfinished house, uneven; irregular; unsuitable; ill-matched, tenon㡸 3878, ZHA3 ZHA4 	a house; an unfinished house, uneven; irregular; unsuitable; ill-matched, tenon㡺 387A, DAN4 	a cottage; a small house, a small cup㢂 3882, YAN3 	(terrains) of highly strategic; precipitious (hill, etc. a big mound, (same as VEA 3888) a collapse house, to hit, to catch something㢈 3888, TUI2 	a collapsed house, (same as 堆) to heap up; to pile㢎 388E, CHA4 ZE2 ZHAI2 ZHE2 	hide; conceal, a house not so high㢑 3891, TUI2 	(corrupted form of VEA 3888) a collapsed house, (same as 堆) to heap up; to pile㢒 3892, CHA2 	an almost collapsing house㢖 3896, not available 	a store house, to store㢗 3897, QIAO4 	a high house; a high building㢚 389A, LU3 	a corridor; a hallway; rooms around the hall (the middle room of a Chinese house), a nunnery; a convent, a cottage; a hut, a mansion㢝 389D, not available 	cottage; a coarse hourse, house with flat roof㢞 389E, YI4 	rooms connected, moveable house ( a yurt, a portable, tentlike dwelling used by nomadic Mongols)㭽 3B7D, DI3 	(non-classical form of 柢) root; foundation; base, eaves of a house; brim㯪 3BEA, LING2 	(same as 櫺) carved or patterned window-railings; sills, the wooden planks which join eaves with a house㰃 3C03, MIAN2 	(same as U+6AB0) a tree, the bark of which is used in medicine-- Eucommia ulmoides, an awning of the house㰅 3C05, DI2 	(same as 樀) eaves of a house; brim, part of a loom, the cross beams on the frame on which silkworms spin, a bookcase, to abandon or give up㼟 3F1F, BAI2 	a tiled house, brick wall of a well䅊 414A, DU4 	a spacious house, (corrupted form of 秺) bundle of rice plant, name of a place䆖 4196, HONG2 	a big house, (same as 宏) great; vast; wide; ample䆧 41A7, not available 	(same as 窩) a cave; a den, living quarters; a house, to hide; to harbor䆲 41B2, not available 	a spacious house, emptiness䆵 41B5, CHENG2 	an echo, a high and deep; large; big; specious house䆸 41B8, CHENG2 	spacious; capacious, sound (of the house), a picture (on silk) scroll䗔 45D4, HOU2 	a house-lizard or gecko, a kind of insect; living in the water䦗 4997, XU4 	(same as 侐) quiet (house, surrounding, etc.)䳸 4CF8, MA2 MAI2 	the wild goose, sparrow; the house-sparrow䵇 4D47, XIAN4 	to dislike; to reject; to hate, a house; a building䵺 4D7A, TING3 	(same as 圢 町)boundary between agricultural lands, (in Japan) a street; a city block, ant hill; formicary, vacant land by the side of a house; a paddock, deer trace; deer track址 5740, ZHI3 	site, location, land for house墅 5885, SHU4 	villa, country house壁 58C1, BI4 	partition wall; walls of a house宇 5B87, YU3 	house; building, structure; eaves室 5BA4, SHI4 	room, home, house, chamber家 5BB6, JIA1 JIE5 GU1 	house, home, residence; family屋 5C4B, WU1 	house; room; building, shelter庳 5EB

Re: CJK(B) and IE6

2004-05-01 Thread Raymond Mercier
Raymond Mercier a écrit :
> However, I am disappointed to find that IE6 will not display
> U+2, etc.

See http://www.i18nguy.com/surrogates.html, may help.

-- 
François

Thanks very much. With these changes in the Registry the font Simsun
(Founder extended) displays in IE, and in my Hanfind too, since that relies
on the browser.

Hanfind: http://ourworld.compuserve.com/homepages/RaymondM

Raymond





CJK(B) and IE6

2004-05-01 Thread Raymond Mercier
Having installed the large font Simsun (Founder Extended), which covers much
of CJK(Ext B)(U+2, etc), I find that these characters appear  in MS
Word, Wordpad and Notepad.
However, I am disappointed to find that IE6 will not display U+2, etc.
Of course in Tools/Internet Options, I have set the Asian Font display to
this new font.
The same browser that is used in IE6 can be coupled with other applications
(compiled in VC6, for example), but the result is the same.

On the other hand Mozilla will show these characters. I know that
applications can be arranged to use the Mozilla browser, but that is a whole
new programming ball game, that frankly I could do without.

Raymond Mercier









Re: GB18030 and super font

2004-04-22 Thread Raymond Mercier
ï


Eric,
Amazin' Amazon!!  Now why didn't I think of 
that ?
In fact the uk Amazon.co.uk say it is discontinued, 
so I would have to get it from Amazon in the US. It is not the first time that 
the two Amazon's fail to connect.
Many thanks for the tip,
Raymond

  - Original Message - 
  From: 
  Eric Muller 
  
  To: [EMAIL PROTECTED] 
  Sent: Thursday, April 22, 2004 5:40 
  PM
  Subject: Re: GB18030 and super font
  Raymond Mercier wrote:
  


But that 
link to proofing tools leads nowhere. Maybe it's not be so easy toget 
the CHS version.Includes 
  ~140 fonts, mostly for CJK, Arabic, Hebrew but other scripts as well. Includes 
  "Simsun (Founder Extended)" aka "åä-ææèååçé", with 65,531 
  glyphs!Eric.


GB18030 and super font

2004-04-22 Thread Raymond Mercier
ï


Mark Shoulson writes>their Super Font is 
bundled with Microsoft Office XP, and> even Microsoft's prices haven't 
gotten that high!From Microsoft,http://www.microsoft.com/globaldev/DrIntl/columns/015/default.mspx :"A font that contains Simplified Chinese glyphs from 
both CJK Extension Aand B sets is "SimSun (Founder Extended)" (SurSong.ttf 
in the system), oråäâææèååçé (in Chinese). It is currently available in the 
Simplified Chinese(CHS) version of Office XP, or the Microsoft Office 
Proofing Tools. Clickthe link for more information and how to 
buy."But that link to proofing tools leads nowhere. Maybe it's not be so 
easy toget the CHS version.Raymond


GB18030 and super font

2004-04-22 Thread Raymond Mercier
I am intrigued by GB18030 encoding. There is a table of equivalences in
http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-200
0.xml
No doubt Unihan will at some stage include these 2 & 4 byte values.

I enquired about the 'super font' created by a Beijing foundry,
http://font.founder.com.cn/english/web/index.htm, and am fairly astonished
at the prices, as you see from the attached.

I suppose this is the only source for such a full font.
Raymond Mercier


- Original Message -
From: "GaoZhiQing (高志青)" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, April 22, 2004 12:09 PM
Subject: re:GB18030 super font


Hello Mercier,

The price of our GB18030 font:
20,000US$/1 font per year license.
80,000US$/perpetual license.

The price of our GB2312 bitmap font:
15,000US$/4 years license.(ONE SIZE)
(We provide China standard bitmap fonts,the price and term has constitute by
Chinses government.You company must agreement with chinese government.We can
act at an agent.)

Best regards,

Gao Zhiqing

Beijing Founder Electronic Co., Ltd.
Network Circulation Division

Add:9,No.5Street,Shangdi Information Industry
Base,Haidian District,Beijing 100085,China
Tel:86-10-62981432
Fax:86-10-62981438
Mobile:13501204825
E-mail:[EMAIL PROTECTED]
http://font.founder.com.cn


-邮件原件-
发件人: Raymond Mercier [mailto:[EMAIL PROTECTED]
发送时间: 2004年4月19日 2:08
收件人: 字库支持信箱
主题: GB18030 super font

It would be helpful to learn availability and cost of the full GB18030 font.
The bitmap fonts (GB2312) are also of interest.

Dr R.Mercier
St Ives,Cambs
UK





Re: Unihan.txt and the four dictionary sorting algorithm

2004-04-20 Thread Raymond Mercier
John Jenkins writes
>>Also, even though the full Unihan database is 25+ Mb in size, given the
cheapness of disk space nowadays, it's not all *that* big, surely.
<<

The problem of the size of Unihan has nothing at all to do with the cost of
storage, and everything to do with the functioning of programs that might
open and read it.
Since the lines in Unihan are separated by 0x0A alone, not 0x0A0x0D, this
means that when opened in notepad the lines are not separated. Notepad does
have the advantage that the UTF-8 encoding is recognized, and the characters
are displayed.

If opened in Wordpad the Chinese characters do not appear, perhaps the UTF-8
encoding does not function.

If I try MS Word the machine grinds to a halt - and this is a good modern
machine (XP with 120Mb HD and 512Mb RAM).

Similarly if I open in IE6, with UTF-8 encoding, the text opens up to around
U+4C00, and then grinds to a halt.

I can open it in the HexWorkshop byte editor, or in the editor in Visual C
6, but these do not recognize UTF-8 encoding, and they hardly count as
suitable readers for such a file.

I wish the people who designed this file would accept the need for a more
structured and sophisticated approach. Why not, for example, have a basic
html file, with html-links to the various sections ?

Raymond Mercier





Re: Unihan.txt and the four dictionary sorting algorithm

2004-04-20 Thread Raymond Mercier
Ernest Cline writes
>>I'm trying to pare Unihan.txt down to a less unwieldy size
for my own use by eliminating properties that are of no
interest to me <<

The sheer size of unihan creates problems, hence the need to extract
manageable subsets.
This is the basis of my Hanfind:
(http://ourworld.compuserve.com/homepages/RaymondM) which isolates Pinyin,
definitions, etc etc.

Andrew West once suggested that Unihan be converted to an XML file, and
would appear to help isolate the different fields.

Raymond Mercier





Re: Web Form: Subj: Unicode conversion- Microsoft Visual C++ compiler

2004-04-19 Thread Raymond Mercier
Mino,
This is not at clear:
the character U+0427 is Ð in the Cyrillic block, and what does this have to
do with the two characters à and Â, which are U+ 00D0 and U+00A7 ?
Are you wondering how to store 0x0427 in a binary file ? Or what ?

Raymond Mercier

> > Contact:  [EMAIL PROTECTED]
> > Report Type:  Other Question, Problem, or Feedback
> > Opt Subject:  Unicode conversion
> >
> > I would like to convert a 2 byte Unicode code into its
> > corresponding Unicode character (for instance the decimal 1063 or the
> > hexadecimal 0427 into 'ÃÂ'). Is there a C function in order to make the
> > conversion? What file .h do I need to include in the C program? Can I
> > use the 6.0 version of the Microsoft Visual C++ compiler, or do i
> > need a newer version?
> > Thanks a lot in advance.
> > Mino Napoletano
> >
> > -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
> > (End of Report)
>





Re: CJK U+3ADA and U+66F6

2004-04-09 Thread Raymond Mercier

James Kass writes:

> Is there a difference between U+66F6 and U+3ADA?
>
> The newest UNIHAN.TXT file doesn't have a definition field for
> U+66F6.  The glyphs in the Unicode 4.0 book appear identical
> for these two characters.  One is placed with radical 72, the
> other with radical 73, although UNIHAN.TXT gives both as
> having radical 73.

Only experts with access to all the references will sort this out, but at
least note that both characters are placed under radical 73 in both Unicode
4.0 (p.1237) and the revised unihan.

Raymond Mercier





Greek zero

2004-02-04 Thread Raymond Mercier



I have made a proposal to the UTC to encode the Greek symbol 
for zero, as used in astronomical texts. 
An extended version of  this is available on my site http://ourworld.compuserve.com/homepages/RaymondM/. 

It is a rather long pdf file.
 
Raymond Mercier


RE: Code points on Windows

2004-01-15 Thread Raymond Mercier




>>RichEdit 
4.1 (used in Windows XP SP1 WordPad and later) also have the 
toggle
 
I 
am using Wordpad on Win2000 (SP4), and Word 2002.
I 
found after rebooting that Alt-X now works on Wordpad, but not the 
reverse. According to Wordpad About, I am using 'Version 5 SP 
4'.
In 
a program of mine (Handfind in  http://ourworld.compuserve.com/homepages/RaymondM/ where 
I used RichEdit control, the reverse also fails.
 
If 
Alt-X fails in Word, you need to check the assignment 
of shortcuts for Word commands, as follows:
In 
Tools/Macros; select 
Word Commands; look for ListCommands, Click Run. This gives a list of commands 
and their shortcuts: look for 'Toggle 
Character Code'.
The 
corresponding shortcut should be Alt-X, but if it has 
been reassigned for any reason, it has to be reset.
 
Raymond 
Mercier

  - Original Message - 
  From: 
  Mike Ayers 
  To: 'Murray Sargent' ; Raymond 
  Mercier 
  Cc: [EMAIL PROTECTED] 
  Sent: Wednesday, January 14, 2004 11:19 
  PM
  Subject: RE: Code points on Windows
  
  > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On 
  > Behalf Of Murray Sargent > 
  Sent: Wednesday, January 14, 2004 3:01 PM 
  > WordPad on Windows 2000 and XP support Alt+x. Win95 and 
  Win98 WordPads > don't, since they used earlier 
  RichEdit's than version 3.0. > Version 3.0 
  > doesn't have the toggle: Alt+x converts a hex code 
  string to > the Unicode > character; Alt+X does the reverse. Word 2002 added the Alt+x 
  facility  
  ^  
  Alt+Shift+X 
  > with the nice wrinkle of making it a toggle. Accordingly 
  RichEdit 4.0 > (used in Office 2002) and RichEdit 
  4.1 (used in Windows XP SP1 WordPad > and later) 
  also have the toggle, as does RichEdit 5.0 shipped with > Office 2003. 
      I'm sure he knew 
  that - his fingers just forgot.  ;-) 
  /|/|ike 


Re: Code points on Windows

2004-01-14 Thread Raymond Mercier
Title: Code points on Windows



In MS Word if you type the Unicode code point, followed by 
Alt-X, you get the character (if you have the font). This works in 
reverse.
 

Sometimes in a RichEdit control window it will work 
in the first direction, but not in reverse.
 
It does not work in Wordpad, in spite of its use of RichEdit. 
I don't know why not.
 
 
Raymond Mercier

  - Original Message - 
  From: 
  Mike Ayers 
  To: '[EMAIL PROTECTED]' 
  Sent: Wednesday, January 14, 2004 7:34 
  PM
  Subject: Code points on Windows
  
      On Windows, it is 
  well known that you can generate a character from its code point by holding 
  down the alt key and typing the code point in decimal, with a leading 0, on 
  the numeric keypad.  I recall that there is also a method to do this in 
  reverse - given a character on, say, Wordpad, one can get the Unicode 
  codepoint for that character (copied to the clipboard, I believe).  
  However, I have forgotten how to do this.  Can anyone help me out 
  here?
      Thanks, 
  /|/|ike 


Re: Chinese rod numerals

2004-01-10 Thread Raymond Mercier
Christopher,
This is an excellent suggestion. A submission can be made using
n2352-form.pdf that you can get from this site.

http://www.dkuug.dk/JTC1/SC2/WG2/docs/summaryform.html

Raymond Mercier


- Original Message - 
From: "Christopher Cullen" <[EMAIL PROTECTED]>
To: "Unicode list" <[EMAIL PROTECTED]>
Sent: Saturday, January 10, 2004 12:23 PM
Subject: Chinese rod numerals


>
> I am an academic with research interests in the history of ancient
> Chinese mathematics, and I should like to propose the encoding of
> traditional Chinese rod numerals.
>
> These represent the arrays of "counting rods" on a counting board as
> used in China for complex calculations before the invention of the
> abacus.  There are eighteen forms in all, representing the numerals one
> to nine in two forms which are basically versions of each other with a
> 90 degrees rotation.  One form is used for units, the the other for
> tens, then back to the first form for hundreds, and so on.  A zero is
> represented by a gap in the array.  For pictures of these and an
> explanatory text, see:
>
>   http://www.math.sfu.ca/histmath/China/Beginning/Rod.html
>
> These forms appear in pre-modern mathematical books in China, and in
> modern books discussing ancient mathematics.  They are not to be
> confused with the the related "Hangzhou numerals", which are already
> encoded at 3021-303a.   It would be a great convenience to have these
> as a standard resource rather than having to create a special private
> font in order to represent them.
>
>  From a private source, I have been told that these forms are neither in
> any current Unicode encoding initiative, nor indeed anywhere in the
> proposal pipeline.  I should therefore be grateful for any comments or
> advice that might guide me towards making a formal submission.
>
>
> Christopher Cullen
>




Re: Today is neither Thursday nor Friday

2004-01-01 Thread Raymond Mercier

>
> Michael Everson scripsit:
>
> > On 21 December 2012 the Mayan Long Count calendar will tick over from
> > 12.19.19.17.19 to 13.0.0.0.0. Isn't that cool
>

--- subject to considerable uncertainty about the alignment between the
Mayan cycles and our own calendar (I mean the "Ahau constant")
See my Kairos 3, at http://ourworld.compuserve.com/homepages/RaymondM/

[I know this is OT, but it is a holiday.]

Raymond Mercier




Re: MS Windows and Unicode 4.0 ?

2003-12-04 Thread Raymond Mercier
 Well can we be perfectly clear about this: I read that OS X is Unicode
 compliant, yet I understand you to say that Word (as part of Office) on OS
X
 is not.
 If that is true of Word on OS X then I am surprised - even amazed, but that
 seems to be what you said.
 Is it really the case that characters in Word in OS X are not stored as
 Unicode, even though they are so stored in Word in Windows NT (and later)
on
 a PC ?
 If not stored as Unicode on a Mac, then how are they stored ?

Raymond Mercier

- Original Message - 
From: "Michael Everson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, December 04, 2003 3:43 PM
Subject: Re: MS Windows and Unicode 4.0 ?


>
> At 15:00 + 2003-12-04, Raymond Mercier wrote:
> >Arcane Jill writes
> >My next OS will be a Mac.
> >
> >Before you rush off to the nearest Mac showroom:
> >
> >Michael Everson 25/11/03 wrote
> >>Microsoft Office on OS X does not support Unicode. Quark XPress on
> >>OS X does not support Unicode. Adobe InDesign on OS X does not
> >>support Unicode inputting via  keyboard, and doesn't shape
> >>Devanagari properly. Eudora on OS X does not support Unicode.
> >>
> >>These companies have work to do if their products are to be
> >>Unicode-enabled for Mac OS X. It is frustrating.
>
> Do ***NOT*** quote me as a reason not to buy a Macintosh.
>
> Using a Macintosh is a joy. Unicode support at the OS level is strong
> and stable. That Microsoft, Quark, Adobe, and Qualcomm have work to
> do to allow their customers to take advantage of the richness Apple
> provides us is *their* challenge. And when they do, using a Macintosh
> will be even more of a pleasure than it is now.
> -- 
> Michael Everson * * Everson Typography *  * http://www.evertype.com




Re: MS Windows and Unicode 4.0 ?

2003-12-04 Thread Raymond Mercier
Right. And they even have the nerve to charge for it.
I use OE.
Raymond

- Original Message - 
From: "Stefan Persson" <[EMAIL PROTECTED]>
To: "Raymond Mercier" <[EMAIL PROTECTED]>
Cc: "Arcane Jill" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Thursday, December 04, 2003 3:49 PM
Subject: Re: MS Windows and Unicode 4.0 ?


> 
> Raymond Mercier wrote:
> > Eudora on OS X does not support Unicode.
> 
> Eudora doesn't support Unicode on *any* OS, right?
> 
> Stefan



Re: MS Windows and Unicode 4.0 ?

2003-12-04 Thread Raymond Mercier
Arcane Jill writes
> My next OS will be a Mac.

Before you rush off to the nearest Mac showroom:

Michael Everson 25/11/03 wrote
>>
Microsoft Office on OS X does not support Unicode.
Quark XPress on OS X does not support Unicode.
Adobe InDesign on OS X does not support Unicode inputting via 
keyboard, and doesn't shape Devanagari properly.
Eudora on OS X does not support Unicode.

These companies have work to do if their products are to be 
Unicode-enabled for Mac OS X. It is frustrating.
<<

Raymond Mercier




Re: Free Fonts

2003-12-03 Thread Raymond Mercier

Philippe Verdy writes
> Simple: for now the fonts are in beta, and do not include the hinting
> instructions. This may be in development, but faces some legal issues
> with Apple patents. So until there's a patent-free hinting mechanism,
> for use in fonts, or Apple agrees with a royaltee-free license on
> hinting mechanisms, hinted fonts cannot be freely distributed.
>
What is the legal position if these fonts are taken into Fontlab and
rehinted ?

Surely if I make my own hinted font in Fontlab I do not owe royalties to
Apple.

Raymond Mercier





Re: Fonts on Web Pages

2003-12-02 Thread Raymond Mercier



Of course Adobe was designed  to do just the problem you 
defined, and it works well, with your embedded fonts, etc., so the recipient 
sees just what you write.
 
OTOH What about using Word with your embedded fonts, and then 
saving it as mht  (Web Archive File)? 
 
Have a look at:
http://www.softcities.com/WebArchiveX/download/6912.htm 
>>The WebArchiveX Component API lets you programmatically save a 
complete Web page as a single Web Archive file (.mht) file. (Same as "Save as 
Web Archive" in Microsoft Internet Explorer). Web Archive is an Internet 
standard for keeping HTML documents within a MIME formatted message including 
graphics, scripts and style sheets as its body parts. Packing HTML files with 
the WebArchiveX COM Component helps to avoid errors when you send the Web page 
by email or publish it electronically. WebArchiveX can be used with any 
programming language and supports multi-threaded environments
<<
 
Raymond
- Original Message - 

  From: 
  Arcane 
  Jill 
  To: [EMAIL PROTECTED] 
  Sent: Tuesday, December 02, 2003 12:02 
  PM
  Subject: RE: Fonts on Web Pages
  
  The use of PDF 
  files does solve a problem, yes, but it solves a different 
  problem from the one about which I had asked. I specifically want to know the 
  current state-of-the-art regarding the use of fonts on web 
  pages. I believe someone was working on this, but I don't know if it 
  was the W3C or some other bunch.Jill-Original 
  Message-From: Raymond Mercier [mailto:[EMAIL PROTECTED]]Sent: 
  Tuesday, December 02, 2003 11:29 AMTo: Arcane JillCc: [EMAIL PROTECTED]Subject: 
  Re: Fonts on Web Pages
  Surely 
  Adobe Acrobat will solve both problems ? 
  The 
  recipient only needs to have the Acrobat Reader installed, and who does not 
  already have that ?
  Raymond 
  Mercier
  
Anyone know the current status on 
embedded fonts in web pages?I basically 
have two questions. (1) Assume the existence of a font to which I legally 
own the copyright. For example, let's say I invented it. Now, I design a web 
page which uses this font. Now, it's easy (but terribly inconvenient) 
to say on the web page "Please download and install this font in order to 
view this web page correctly", but the truth is I know damn well that no-one 
will ever do that. So, short of using small image files, what's the current 
state-of-the-art technical solution to this.Question (2) is the same as question (1), except that I don't 
own the copyright. Suppose, for example I want to use this font called 
Garamond. It's on my machine. (I don't know how it got there - I think it 
came pre-installed with the OS). But of course, I can't guarantee that it 
will be installed on someone else's machine. And since I don't own 
the copyright, and don't have explicit permission to distribute it, I don't 
think I'm even allowed to say "Please download and install this font in 
order to view this web page correctly". How do we solve this 
one?Jill


Re: Fonts on Web Pages

2003-12-02 Thread Raymond Mercier



Surely Adobe Acrobat will solve both problems ? 
The recipient only needs to have the Acrobat Reader installed, 
and who does not already have that ?
Raymond Mercier

  - Original Message - 
  From: 
  Arcane 
  Jill 
  To: [EMAIL PROTECTED] 
  Sent: Tuesday, December 02, 2003 10:29 
  AM
  Subject: Fonts on Web Pages
  Anyone know the current status on embedded 
  fonts in web pages?I basically have two questions. (1) Assume the 
  existence of a font to which I legally own the copyright. For example, let's 
  say I invented it. Now, I design a web page which uses this font. Now, it's 
  easy (but terribly inconvenient) to say on the web page "Please 
  download and install this font in order to view this web page correctly", but 
  the truth is I know damn well that no-one will ever do that. So, short of 
  using small image files, what's the current state-of-the-art technical 
  solution to this.Question (2) is the same 
  as question (1), except that I don't own the copyright. Suppose, for example I 
  want to use this font called Garamond. It's on my machine. (I don't know how 
  it got there - I think it came pre-installed with the OS). But of course, I 
  can't guarantee that it will be installed on someone else's machine. 
  And since I don't own the copyright, and don't have explicit permission to 
  distribute it, I don't think I'm even allowed to say "Please download and 
  install this font in order to view this web page correctly". How do we solve 
  this one?Jill


Re: How can I have OTF for MacOS

2003-11-25 Thread Raymond Mercier
OK, I stand corrected on Mozilla !
Raymond Mercier



Re: How can I have OTF for MacOS

2003-11-25 Thread Raymond Mercier

Michael Everson writes
> Eudora on OS X does not support Unicode.
>
Eudora doesn't support Unicode anywhere, surely ? To my knowledge on a PC
the only mail handler that is Unicode compliant is Outlook Express.

Raymond Mercier




unicode@unicode.org

2003-11-25 Thread Raymond Mercier



I am not sure if this is a point that really involves Unicode 
blocks, but someone in this list might have a comment.
 
In Word 2002 there is one bug that is cleared up in Word 2003 
(at least in the Beta, which I have been playing with).
 
In Word 2002 the Style may assign one particular font for 
Latin characters, but when certain Latin characters are inserted, the font 
switches to the Asian font even when the characters are found in the Latin 
font. This is now cleared up in Word 2003.
 
For example if the Latin font is Times, and the Asian font is 
Simsun, and if the character U+01CE is inserted the font switches to Simsun, 
even though U+01CE is available in Times. (U+01CE is the letter a with the 
Chinese 3rd tone mark, a small v placed over the a). If the character is 
selected and the font is switched back to Times then the character is switched 
to the one in the Times font. 
 
The problem is if course not restricted to this character, or 
the Times font, but happens in many other situations in Word 2002.
 
This bug has disappeared in Word 2003, where all Latin based 
characters are taken from the Latin font, as long as the character is really 
present in that font.
 
In Word 2002 the problem extends to Greek fonts, when accented 
characters are inserted: in that case the font switches to Arial Unicode, even 
when the accented character is in the default font such as Cardo. But all this 
is cleared up in Word 2003.
 
Raymond Mercier


Re: How can I input any Unicode character if I know its hexadecimal code?

2003-11-15 Thread Raymond Mercier
John Cowan wrote

> It's an XML editor (recte XMetaL), and an XML editor that can't handle
> Unicode would be a sorry specimen indeed.

A quick glance at the program's site suggests that there cannot be such a
serious problem

http://www.corel.com/servlet/Satellite?pagename=Corel/Products/productInfo&id=1042152756365&did=1042152754863&content=FAQ#top

How does Corel® XMetaL® 4 support Unicode?
Corel XMetaL 4 features UTF-8 and UTF-16 encoding in conformance with
Unicode 3.0 for the transparent display and editing of all left-to-right
reading languages. Unicode support is available in the document window, as
well as in the customizable interface elements (e.g., menu items and toolbar
names) found in Corel XMetaL 4 and the macro script-editing interface.

-and a number of other encouraging paragraphs. So what is the problem ?

Raymond mercier



- Original Message - 
From: "John Cowan" <[EMAIL PROTECTED]>
To: "Raymond Mercier" <[EMAIL PROTECTED]>
Cc: "Patrick Andries" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Saturday, November 15, 2003 5:16 PM
Subject: Re: How can I input any Unicode character if I know its hexadecimal
code?


>
> Raymond Mercier scripsit:
>
> > You cannot complain to Unicode if there are software developers who fail
to
> > make their programs Unicode compliant. What makes you think that XMetal
> > (whatever it is) can handle Unicode internally ?
>
> It's an XML editor (recte XMetaL), and an XML editor that can't handle
> Unicode would be a sorry specimen indeed.
>
> -- 
> John Cowan  <[EMAIL PROTECTED]>
> http://www.reutershealth.comhttp://www.ccil.org/~cowan
> .e'osai ko sarji la lojban.
> Please support Lojban!  http://www.lojban.org




Re: How can I input any Unicode character if I know its hexadecimal code?

2003-11-15 Thread Raymond Mercier
> > Then it is a request for enhancement to address to the author of XMetal.
> > This is not an issue of Unicode.
>
> Funnily enough, I thought I wanted to input Unicode characters

You cannot complain to Unicode if there are software developers who fail to
make their programs Unicode compliant. What makes you think that XMetal
(whatever it is) can handle Unicode internally ?

Even in WIN2000 some features are not Unicode compliant - for example the
global search facility applied to folders and files. Certainly Alt-X doesn't
work, and even if you paste a non-Ascii character into the window, you will
get nonsense when you run the search.

Raymond Mercier




Re: New contribution N2676

2003-10-28 Thread Raymond Mercier



Richard,
 
 

>>Now, unless Zero shares the 
same glyphic range as Artabe, I’m not sure that they can be 
unified.<<
 
My last wish is to unify artabe and zero, but the artabe 
symbol listed in N2676 really is just the same as the zero used in papyri of the 
age. 
 

>>If you look at e.g. ‘Siglae’ 
in RE 2.2 (1923) 2279-2315 you’ll see that Bilabel lists 16 glyph variants for 
the Artabe. The most common variants are the ones with a horizinal line (like a 
dash) with an arrangement of between one and three dots around it, sometimes the 
dots are solid and sometimes they are hollow circles.<<
 
I will have a look at RE, since I 
have so far only seen the examples in Kenyon's photos, but it was already clear 
to me that the entry in N2676 will not do, if only because it fails to represent 
the confused variety of forms used in the papyri.
 
The question of the zero is separate, and perhaps easier. 
Anyway I will not try to summarise here my as yet incomplete 
collection.
 
Raymond

  - Original Message - 
  From: 
  Richard Peevers 
  
  To: [EMAIL PROTECTED] 
  Sent: Monday, October 27, 2003 5:07 
  PM
  Subject: Re: New contribution N2676
  
  
  Raymond,
   
  Apropos 10186 G 
  GREEK ARTABE SIGN
   
  The identity of one glyph variant 
  of ‘zero’ and one of ‘artabe’ raises an interesting problem.
   
  For the ‘Zero’ there are, it seems 
  to me, two main characters used for this: one is identical to the letter 
  omicron and the other is a circle (more or less like an omicron) with a more 
  or less elaborate bar over it. It’s only the second that we’d be looking to 
  propose.
  It seems to me that here we need 
  two characters: 1) Artabe (horizontal line surrounded by 1-3 dots/hollow 
  circles) and 2) Zero (hollow circle with more or less elaborate line 
  above.
   
  Richard
   
  Richard Peevers
  Research Associate
  Thesaurus Linguae Graecae
  3450 Berkeley Place
  Irvine CA 92697-5550
   
  www.tlg.uci.edu
  www.digressus.org
   


Re: New contribution N2676

2003-10-25 Thread Raymond Mercier
>Should we continue to encode this as ARTABE SIGN and just note the use of
> this shape for 'zero' in an annotation?
> Should we change it to another name and add the annotation for 'artabe'?>
> Should we take any other actions?



Well I don't quite know. My real intrest is in the changing shape of the
zero, but I am not yet ready with a proposal.

Besides in the papyri where Kenyon read Artable this symbol is much of the
time coupled with another, the two written rather cursively together in the
papyri. Kenyon carefully records all the different forms, and after seeing
that I am in some doubt about what exactly should be encoded. I suspect that
the new list is based not on the many many symbols given by Kenyon in his
many volumes of transcribed papyri, but on a summary list that he published
before that.
I wish I could  be more definite.

Raymond


- Original Message - 
From: "Asmus Freytag" <[EMAIL PROTECTED]>
To: "Raymond Mercier" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Saturday, October 25, 2003 8:26 PM
Subject: Re: New contribution N2676


>
> At 05:51 PM 10/25/03 +0100, Raymond Mercier wrote:
> >  Among the new characters in N2676 there is
> >
> >  10186 G GREEK ARTABE SIGN
> >
> >  This is one of the many signs found in papyri, such as those edited by
> >Kenyon. This symbol represents apparently a measure of volume used for
> >grain. It appears as a small circle, smaller than omicron, with a long
> >overline, much longer than a macron.
> >
> >  While I have been looking for the various forms of the symbol for zero
I
> >find in other papyri quite exacty the same character used for 'zero'. I
make
> >this comparison after studying many photographs of papyri, those provided
> >with Kenyon's editions on the one hand, and on the other, Alexander
Jones'
> >recent volume of horoscopes, Astronomical Papyri from Oxyrhynchus.
> >  The attached image is take from Jones, part of a column of zeroes
written
> >this way.
>
> This is fascinating information.
>
> However, I'm unclear what you propose.
>
> Should we continue to encode this as ARTABE SIGN and just note the use of
> this shape for 'zero' in an annotation?
>
> Should we change it to another name and add the annotation for 'arabe'?
>
> Should we take any other actions?
>
> A./




Re: New contribution N2676

2003-10-25 Thread Raymond Mercier
 Among the new characters in N2676 there is

 10186 G GREEK ARTABE SIGN

 This is one of the many signs found in papyri, such as those edited by
Kenyon. This symbol represents apparently a measure of volume used for
grain. It appears as a small circle, smaller than omicron, with a long
overline, much longer than a macron.

 While I have been looking for the various forms of the symbol for zero I
find in other papyri quite exacty the same character used for 'zero'. I make
this comparison after studying many photographs of papyri, those provided
with Kenyon's editions on the one hand, and on the other, Alexander Jones'
recent volume of horoscopes, Astronomical Papyri from Oxyrhynchus.
 The attached image is take from Jones, part of a column of zeroes written
this way.

 Raymond Mercier

> - Original Message - 
> From: "Michael Everson" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Sent: Friday, October 24, 2003 7:36 PM
> Subject: New contribution N2676
>
>
> >
> > A new contribution:
> > N2676
> > Repertoire additions from meeting 44
> > Asmus Freytag
> > 2003-10-23
> > http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2676.pdf
> >
> > -- 
> > Michael Everson * * Everson Typography *  * http://www.evertype.com
>
<>

Re: Web Form: Other Question, Problem, or Feedback

2003-10-24 Thread Raymond Mercier
> > -Original Message-
> > Date/Time:Tue Oct 21 07:54:01 EDT 2003
> > Contact:  [EMAIL PROTECTED]
> > Report Type:  Other Question, Problem, or Feedback
> >
> > Hello Unicode-Team,
> >
> > i'm looking for a tool or a tutorial to convert japanese
> > signs in numeric unicode signs (e.g. 留). Can you help me?
> >
> > Greetings from Germany
> > T. Nikolai
> > please mail to: [EMAIL PROTECTED]

Hello Nikolai,
You might find some features of interest in my program Hanfind, which you
can download from:

http://ourworld.compuserve.com/homepages/RaymondM/

It works however from the Pinyin reading of the characters, seen as Chinese
not Japanese.

If you are working from a text in Word you can always write a macro which
gives directly the Unicode value of any character, based on the calculation
Unicode = Hex(AscW(Selection.text)).

Raymond Mercier




Re: Mac fonts

2003-10-01 Thread Raymond Mercier
Thanks - I have passed on your messages.
Raymond




Re: Mac fonts

2003-10-01 Thread Raymond Mercier
Thanks to all who reassure me that TitusCyberbitBasic and Code2000 as I use
them on my PC can also be used on Mac with OS X. This is really for a
colleague, who has tried without success to install the Titus font that I
passed on to her. She tells me she has OS X, and I will just have to discuss
it further with her.

Raymond Mercier






Mac fonts

2003-10-01 Thread Raymond Mercier
I am looking for Mac versions of the fonts TitusCyberbitBasic and Code2000.
Any suggestions ?
I would like a serif font like Times, with the Latin Extended Additional
block.

Raymond Mercier




Re: TLG and Beta code

2003-08-27 Thread Raymond Mercier



John,I am glad to hear from you. I shall do 
what I can to get a proposaltogether.Raymond- 
Original Message -From: "John Hudson" <[EMAIL PROTECTED]>To: "Raymond Mercier" <[EMAIL PROTECTED]>Cc: <[EMAIL PROTECTED]>Sent: Wednesday, 
August 27, 2003 7:20 PMSubject: Re: TLG and Beta 
code>> At 05:37 AM 8/27/2003, Raymond Mercier 
wrote:>> >I know this is common in the TLG, but as you say, 
they assume it is just> >omicron (an assumption repeated in a message 
just received from them).> >But, I am trying to get across that that 
is wrong: it represents neither> >papyri nor Byzantine 
MSS.>> ...>> >So is there not a good reason to 
treat this as a distinct character, tobe> >assigned to a Unicode 
codepoint ?>> Raymond, based on what you have said, I would agree. 
A variety of visual> representations, clearly distinct from the omicron 
as formed in the same> documents, suggests a separate character. Would 
you be able to write up a> proposal to encode such a character, or at 
least an informational document> including illustrations of different 
forms of the Greek zero, preferablyin> proximity to differently 
formed omicrons? Nothing is going to happenunless> the UTC receive 
such a document, and you sound like the best person to> prepare 
one.>> John Hudson>> Tiro Typeworks www.tiro.com> Vancouver, BC [EMAIL PROTECTED]>> You need a good 
operator to make type. If it were a> DIY affair the caster would only run 
for about five> minutes before the DIYer burned his butt 
off.>    
- Jim Rimmer>- Original Message -From: "John Hudson" 
<[EMAIL PROTECTED]>To: "Raymond Mercier" 
<[EMAIL PROTECTED]>Cc: <[EMAIL PROTECTED]>Sent: Wednesday, 
August 27, 2003 7:20 PMSubject: Re: TLG and Beta 
code>> At 05:37 AM 8/27/2003, Raymond Mercier 
wrote:>> >I know this is common in the TLG, but as you say, 
they assume it is just> >omicron (an assumption repeated in a message 
just received from them).> >But, I am trying to get across that that 
is wrong: it represents neither> >papyri nor Byzantine 
MSS.>> ...>> >So is there not a good reason to 
treat this as a distinct character, tobe> >assigned to a Unicode 
codepoint ?>> Raymond, based on what you have said, I would agree. 
A variety of visual> representations, clearly distinct from the omicron 
as formed in the same> documents, suggests a separate character. Would 
you be able to write up a> proposal to encode such a character, or at 
least an informational document> including illustrations of different 
forms of the Greek zero, preferablyin> proximity to differently 
formed omicrons? Nothing is going to happenunless> the UTC receive 
such a document, and you sound like the best person to> prepare 
one.>> John Hudson>> Tiro Typeworks www.tiro.com> Vancouver, BC [EMAIL PROTECTED]>> You need a good 
operator to make type. If it were a> DIY affair the caster would only run 
for about five> minutes before the DIYer burned his butt 
off.>    
- Jim Rimmer>


Re: TLG and Beta code

2003-08-27 Thread Raymond Mercier



- Original Message -From: "Nick 
Nicholas" <[EMAIL PROTECTED]>To: 
<[EMAIL PROTECTED]>Sent: Wednesday, 
August 27, 2003 12:33 PMSubject: Re: TLG and Beta code>The 
equivalent glyph the TLG has posted for #130 is omicron,

Re: TLG and Beta code

> In a Greek text, shouldn't you be using omicron and a combining macron
> rather than Latin o with macron? If omicron plus combining macron is an
> adequate representation of the glpyh, then maybe there is no need to a
> new character.
>
> --
> Peter Kirk
> [EMAIL PROTECTED] (personal)
> [EMAIL PROTECTED] (work)
> http://www.qaya.org/
>

Well, it is just simpler to use the Latin, since the combination is a single
codepoint. The real point is that it would be nice to have an appropriate
Greek form. The TLG assumption is that the Greek texts used omicron for
zero, but that is not what you find in the MSS. Against that assumtion, I
have just written to the TLG as follows:

I know that you will find support in Heath, whose Greek Mathematics, Vol.1,
p. 45, is surprisingly misleading in just saying that they used omicron.
Also in his ed. of Ptolemy's Hypotheses Heiberg has rather perversely put a
macron on all the letters except omicron ! (Opera Minora 78.29, for
example).

This does not adequately represent the Byzantine MSS. I don't have Heiberg's
Syntaxis in front of me, but  Halma's edition of the Syntaxis is closer to
the MSS, and uses, o+macron. Elsewhere in the MSS one finds a variety of
forms, according to the age etc. In the ninth century MSS zero is
represented by a rather small o with a long overline with serifs at either
end, much bigger than our macron. In late Byzantine mathematics one finds
sometimes a form like the Cyrillic che  (U+0447). Certainly the form varies
a good deal, but I have not seen a simple omicron, whatever the editors may
have put.

In the texts edited in Georges Gémiste Pléthon (by Anne Tihon and myself),
which I see you include in the TLG, we use a macron on the o, and are doing
the same in our edition of Ptolemy's Handy Tables. If we had something
closer to the forms used in ninth century MSS we would use it.

Raymond




TLG and Beta code




David,I am glad to see this 
much progress, yet, as I  noticed after posting, the zero symbol is 
actually missing inbeta code, so your Beta code -Unicode equivalences would 
not have it. I think it is fair to say that the TLG have avoided the parts of 
mathematical texts where the symbol is common, as in the various tables in 
Ptolemy's Almagest (where all the tables are omitted by TLG). This symbol is in 
reality more common than the rarities listed in quickbeta. In the editions I am 
involved with we use U+14D, o, which is near enough I 
suppose.Raymond- Original Message -From: 
"David J. Perry" <[EMAIL PROTECTED]>To: 
"'Raymond Mercier'" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>Sent: 
Wednesday, August 27, 2003 1:11 AMSubject: RE: TLG and Beta 
code>> Raymond,>> If you go to http://www.tlg.uci.edu/Uni.prop.html 
you will see all the> proposals; the site indicates very clearly which 
ones have been accepted> by the UTC and which are pending (only one still 
pending at this point).> They must of course be voted on by WG2 before 
they are officially a part> of Unicode.  The TLG folks have prepared 
a very useful document at> http://www.tlg.uci.edu/quickbeta.pdf 
that shows the Unicode equivalent> for each beta code character (some of 
these are existing Unicode> characters, some newly proposed, and some so 
rare or poorly understood> that TLG did not think them appropriate to 
propose for Unicode).>> David>


TLG and Beta code




Last January when I asked if the Greek symbol for one-half 
might be included somewhere in Unicode I was led to understand that not only 
that but a whole range of Greek symbols were being proposed by the TLG 
people. There was for example http://www.tlg.uci.edu/Uni.prop.html. Indeed the Beta code (http://www.tlg.uci.edu/~tlg/BetaCode.html) 
used by the TLG covers a huge range of odd symbols which are needed in Unicode 
if the classical texts which they have digitised are ever to be "unicoded". 

 
I was reminded of the need to enlarge the Greek coverage when 
converting some Greek numerical texts, and saw that not even the symbol for zero 
was part of the Greek block, so that I had to use U+014D, latin l.c. 'o' 
+macron, ō, 
which is admittedly near enough.
 
Yet when I search the Unicode site now for TLG or beta code I 
find nothing. Are the TLG proposals somewhere in the pipeline ?
 
Raymond Mercier



Re: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)

Ted Hopp writes
>
> Since we're speaking of the French (we are, aren't we?) what ever happened
> to French Revolutionary Metric Time?

The other French attempts were less successful, such as the 12 30-day
months. The French names for the months Vendémiaire, etc., were parodied in
an English version: wheezy, sneezy, freezy, slippy, etc.

One decimal dystem that survives is the grad (400 grad = 360 degrees), still
used at least by surveyors, but Laplace used it in astronomical
calculations.

The Americans won't have the meter now, unless it's renamed the
freedom-yard, I suppose.


Raymond




Re: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)

At some time in the 70's when I was at conference to mark the centenary of
the Greenwich meridian I learned that the French agreed to give up the Paris
meridian if the British agreed to go metric-and that was over a century ago
!
Maybe the U.S. could be bribed to go metric if they were allowed to have
Washington as the standard meridian.

Raymond Mercier




Re: Which ancestral links

John Clews writes:

>I've never seen a description of the Sogdian
> alphabet (i.e. I have never come across one): is there a good article
> or URL which illustrates such links?

Here is a Unicode proposal for just that:

http://wwwold.dkuug.dk/jtc1/sc2/wg2/docs/n2422.pdf

See also
http://www.gengo.l.u-tokyo.ac.jp/~hkum/pdf/SIE3.pdf

Raymond Mercier



Aramaic scripts





There are omissions in Michael 
Everson's chart in 
http://www.dkuug.dk/jtc1/sc2/wg2/docs/n2311.pdf 

 
The chart was based on Semitic languages, although 
purporting to be about scripts. After all Greek and Latin also derive 
from the same family of scripts, as we all learn from page 1 of Greek grammars. 

 
There are less obvious omissions:
 
1. Kharoshthi, a RtoL script much used in North 
West India, and regarded by everyone as a derivative from a form of the 
Aramaic script used in that region. It is found on coins, Ashokan edicts, 
various inscriptions and manuscripts. It was used to write mainly prakrits, 
although some sanskrit text is known. See, for example, A.H. 
Dani, Indian Palaeography, Oxford 1963.
 
2. Pahlavi, widely used to write Middle Persian. This 
involved a troublesome mixture of Persian reading of Aramaic words, a 
subject requiring more elaboration than is needed here.
 
 
Raymond Mercier



Re: Which ancestral links

Indeed, pardon my haste, that was a matter of an addition to the Syriac
script. For a comparison of the various scripts used for Sogdian,

http://iranianlanguages.com/midiranian/sogdian.htm#Alphabet

Raymond


- Original Message -
From: "Michael Everson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, August 08, 2003 5:43 PM
Subject: Re: Which ancestral links


>
> At 17:26 +0100 2003-08-08, Raymond Mercier wrote:
> >John Clews writes:
> >
> >>  I've never seen a description of the Sogdian
> >>  alphabet (i.e. I have never come across one): is there a good article
> >>  or URL which illustrates such links?
> >
> >Here is a Unicode proposal for just that:
> >
> >http://wwwold.dkuug.dk/jtc1/sc2/wg2/docs/n2422.pdf
>
> That is not the Sogdian script.
> --
> Michael Everson * * Everson Typography *  * http://www.evertype.com




Re: UTF-8 and HTML import into MS Word 2000

Both the html files open in Word2002 without problem, Polish & Japanese
characters included.
Raymond Mercier

- Original Message -
From: "Janusz S. Bieñ" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, July 29, 2003 9:56 AM
Subject: UTF-8 and HTML import into MS Word 2000


>
>
> I try to convert a LaTeX document into Word through UTF-8 coded HTML.
>
> When I import a small test
>
> http://www.mimuw.edu.pl/~jsbien/poufne/utf8-pjk.html
> http://www.mimuw.edu.pl/~jsbien/poufne/utf8-pjk.css
>
> into Word, I see it correctly. To be precise, sufficiently correctly
> (a single Polish letter is displayed in a strange way) as all Japanese
> characters are displayed properly.
>
> When I import the real document
>
> http://www.mimuw.edu.pl/~jsbien/poufne/JSB-EAJS03.html
> http://www.mimuw.edu.pl/~jsbien/poufne/JSB-EAJS03.css
>
> exactly in the same way, I see empty boxes instead of Japanese
> characters.
>
> Am I making some silly error? I will appreciate comments and
> suggestions.
>
> Best regards
>
> Janusz
>
> --
>  ,
> dr hab. Janusz S. Bien, prof. UW
> Prof. Janusz S. Bien, Warsaw Uniwersity
> [EMAIL PROTECTED], [EMAIL PROTECTED]
> http://www.orient.uw.edu.pl/~jsbien/
> http://www.mimuw.edu.pl/~jsbien/
>




Greek polytonique kybd for AZERT

People using an AZERTY keyboard (French or Belgian) might find it useful to
know of the excellent keyboard layout for polytonique unicode greek
available at

http://club.euronet.be/frederique.bouras/kbdhept.htm

Raymond Mercier




RE: International Font to be Used

In Babelmap the choice of unicode block is restricted to just one block at 
a time, whereas in Unicode Search you can select any number of blocks.
For example, if you want to handle Classical Greek, you need to know which 
fonts cover both the blocks Greek and Greek Extended. Such combinations are 
not available in Babelmap.

Raymond

And one free tool that will do both is BabelMap (just press F7) :

uk.geocities.com/BabelStone1357/Software/BabelMap.html

And for those without access to such a tool, a comparative table of 
Unicode 4.0
block coverage for some of the more common "pan-Unicode" fonts such as Arial
Unicode MS and Code2000 is given at :

uk.geocities.com/BabelStone1357/Unicode/fonts.html#FontsByRange

Andrew




RE: International Font to be Used

At 12:16 09/06/2003 -0400, you wrote:
One (free) tool that will allow you to investigate what blocks of Unicode
are actually covered in a font file is:
   http://pfaedit.sourceforge.net/
And to see what fonts on your disk support specified unicode blocks, 
another free tool at
http://ourworld.compuserve.com/homepages/RaymondM/unisearch.htm

Raymond Mercier




Re: Fw: Unicode filename problems

Much obliged !

Raymond

At 15:26 02/06/2003 +0200, you wrote:
Raymond Mercier wrote:

> Doesn't one have to know the binary format of a Zip file to be sure of 
that ? I suppose that is proprietry, and in any case, I don't have it.

http://www.pkware.com/products/enterprise/white_papers/appnote.html

Stefan





Re: Fw: Unicode filename problems

At 00:11 01/06/2003 +0200, you wrote:
but certainly not for the file index stored in a ZIP file where there's no 
reason why it should not contain correctly encoded and portable UTF-8 names
Doesn't one have to know the binary format of a Zip file to be sure of that 
? I suppose that is proprietry, and in any case, I don't have it.
-Raymond






Re: Fw: Unicode filename problems

Well, you would expect that, since Win9* and WinNT/2000/XP differ 
fundamentally regarding unicode compliance.
-Raymond

At 21:01 31/05/2003 +0200, you wrote:
Am Samstag, 31. Mai 2003 um 13:18 schrieb Raymond Mercier:

RM> Certainly more work is needed on RAR (at least on the Win 2000 version).

The author of WinRar wrote me in a private mail that he hopes to fix
the problem in a future version. The problem seems to be that he then
has to maintain two versions for Win9x/ME and WinNT/2000/XP.
-- Karl




Re: [OT] Unicode filename problems

Before I am accused of being altogether OT I will try to answer a couple of 
points.

>>The name of the ZIP archive is not relevant: you don't really need to 
internationalize it, and can restrict it to ASCII with a classic .zip or 
.jar extension.

Someone using the Chinese or Russian Win2000 would naturally name files 
with Chinese or Russian characters. He hasn't chosen to "internationalize". 
He just has a right to function within that part of the Unicode universe of 
filenames that he happens to occupy.

>>Zip files should have no problems to contain files with UTF-8 names.
For example a Cyrillic filename U+444.doc, becomes in 
UTF-8  ¿Ñ„.doc.  Neither is accepted by WInzip.

In writing a program like Winzip there is barrier to the use of fileopening 
routines with wide-byte characters. All the API routines are defined so as 
accept that. It is just that programmers have been rather lazy about it.

Raymond




Re: Fw: Unicode filename problems

This question of non-Ascii filenames is a real problem : hardly any 
software out there can cope with this.
I did not know of RAR, but have given it a try. Even here there is a 
serious problem, because if the filename is non-Ascii the name of the 
compressed file comes out as _.rar, with as many underlines as there 
were characters in the original name. In fact it is a bit less predictable 
: if the name is Greek, for example, you get Latin letters, if it is 
Cyrillic, just the underline.
This is useless then if you have a number of filenames all with the same 
number of characters.
Certainly more work is needed on RAR (at least on the Win 2000 version).

I know about that, since I made my Fontlist 5 work properly with arbitrary 
non-ascii names : 
http://ourworld.compuserve.com/homepages/RaymondM/fontlist5.htm .

Raymond Mercier

At 22:58 30/05/2003 -0500, you wrote:
I wonder if anyone here has ideas on these matters.

Peter

- Forwarded by Peter Constable/IntlAdmin/WCT on 05/30/2003 10:56 PM
-
I have 3 LinguaLinks lexicons that I have converted into HTML pages - one
for each entry. The languages use non-ANSI characters, so I also did a
Unicode conversion at the same time.
[snip]

Everything works very well except that I cannot burn the files onto a CD
because of the unicode values in the filenames. Roxio and Nero CD-burners
don't accept some of the higher values found in the file names (using
Jolliet, ISO9600 and UDF). Anyone have any ideas how to deal with this?
For example, a filename with unicode value 026B, a tilde lower case L,
causes problems.
In the meantime, to get it onto CD, I decided to try and zip all the
files. Turns out almost all the zippers out there DO NOT support Unicode
filenames. Doug Rintoul found WinRAR
(http://www.rarlab.com/rar_archiver.htm) which does the trick in the RAR
format only. There is a RAR expander for Macintosh and Linux systems as
well (all of these are $29 USD). So far, have not found a freeware
solution that meets unicode filename needs. Have any of you run into this
yet?
I could try to determine what Unicode values are causing problems on the
CD burner and do an unacceptable-to-acceptable character translation in
the filenames and the links to those filenames ... but that seems like a
huge compromise. Also, it will be difficult to come up with a generic
solution ... that is to say, I don't know what RANGE of values are
unacceptable for characters in a CD filename. Jolliet is supposed to allow
Unicode filenames according to the documentation I have seen.
Larry




Re: unicode in Mac


Tom Gewecke writes

PS The FEFF could well be the BOM (Byte Order Mark)  which  NotePad puts at
the beginning of UTF-8 encoded files (even though it is not needed or
customary for other apps to do so).  It does not have any significance.


The opening bytes are FF FE ( or FEFF read as a short integer), imposed 
when the file is saved in Word as a plain text unicode.
If these two bytes are deleted the text still opens correctly in Notepad.

The MAC OS is OS 9; my colleague has been put off by an attempt to install 
OS X, although the CD for it came with his new machine.
I am surprised that he should expect such difficulties with OS 9.

Raymond




unicode in Mac

Given a plain text unicode file, with the opening byte FEFF, and which 
displays correctly in Notepad on a PC.
What facility is available on a Mac to make this file  display correctly ?
I am trying to help a colleague, who has MAC OS IX, and I need to tell him 
what font will cover Greek and Extended Greek.

Raymond Mercier




Re: Greek fractions

Just a final note on the 'half' symbol. I was wrong to criticise TLG in 
stating that this symbol is omitted in their reproduction of the texts. 
They represent the text in Beta-code, http://www.tlg.uci.edu/BetaCode.html 
and this certainly has the 'half' in a variety of forms.
The problem is rather: when are Unicode going to include the great many 
symbols covered in Betacode so that TLG files can be converted to Unicode 
?  I understand that they hope to have this conversion in about two years.

Raymond Mercier




Re: Greek fractions

At 11:59 AM 1/22/2003 -0800, you wrote:

Does this affect Euclid at all?

Also, do you know of any source for Euclid in Greek other than the full 
TLG or
Perseus CD-ROMS? I have read a fair chunk of the Elements online, but  would
like a print copy that I can write on, or read outside with Heath and Strong
handy. Much as I admire your work, I have no need for all of the other
authors in either collection

I would be surprised if fractions occurred anywhere in Euclid, but someone 
may correct me.
I don't know of any on line version other than TLG.

Raymond




Re: Greek fractions



At 11:47 PM 1/21/2003 -0800, Doug Ewell wrote

Thanks for the link, John.  Indeed, the TLG proposal for numerals [1]
does include a GREEK HALF SIGN, although their preferred glyph does not
include the "prime" sign Raymond mentioned (it is listed as a glyph
variant, however).


I am glad to learn of the proposal, and realize that I should have checked 
the Unicode site first.
There are indeed many variants for this sign, with none, one, or two 
primes. Sometimes the sign is as I described, or at other times as a sort 
of scrunched up stigma, and so on. What troubles me a little about the 
proposal is that it may depend more on the way editors have handled it 
rather than on what is used in the manuscripts. For examples Heiberg's 
edition of Heron is quoted in the proposal.

Raymond




Greek fractions

In Classical Greek scientific texts the fraction 'one half' is represented 
very commonly by a symbol which looks a bit like 'less than', or like 
'angle' U+2220, but followed by a prime. Is there no place for this in the 
Unicode scheme of things ?
Other symbols are also found for common fractions, apart from the general 
usage where a prime is added to indicate the reciprocal.
I have been converting some TLG* files to Unicode, and I notice that even 
in the original TLG file the symbol is just replaced by a space. This makes 
a nonsense of Ptolemy's geographical coordinates.

*TLG = Thesaurus Graecae Linguae

Raymond Mercier




Re: Status of Unihan Mandarin readings?



At 08:44 AM 12/20/2002 -0700, you wrote:

That's because the file was converted to UTF-8.  Previously it had not 
been in any single encoding, which was creating problems

Well, OK, but should you have created by now some sort of program that 
checks the file whenever you make a change - a sort of spellcheck ? Should 
not be too hard to write something that displays the effects of any changes.

Raymond Mercier





Re: Status of Unihan Mandarin readings?


On the errors in kMandarin:

Apart from the kMandarin errors of the kind that Andrew West has noted, 
there is another corruption, namely the loss of ü, and this happened 
between "3.0b1" and "3.0b2", when the ü became the two bytes C393.

As to Han/Yi, "U+6C49 YI4 HAN4" is found not only in "3.0b1" and "3.0b2", 
but also in "2.0". The HAN4 was dropped only in 3.2.
While I admire the effort to "explain" the intrusion of YI4, I feel it is a 
bit misplaced, and that some more mechanical/clerical explanation is in 
order. After all, look at the number of times "same as U+ " is written 
as "sama as U+... " in 3.2: 6 to be precise.


Raymond Mercier







Raymond Mercier




Re: CJK fonts

At 09:04 AM 12/11/2002 -0700, you wrote:


On Wednesday, December 11, 2002, at 08:27 AM, Raymond Mercier wrote:


For example, the simplified form of the character Han itself (U+6C49) is 
given the Pinyin reading Yi, the traditional form U+6F22 is the correct 
reading Han.

Have you reported this?



Not yet, since I have only just noticed it. I know there is an address on 
the Unicode site for such reporting.
Andrew West in a recent message mentioned a number of serious mis-readings. 
These are the tip of the iceberg. The file needs a total overhaul.


BTW, there's the official Unihan lookup Web page at 
<http://www.unicode.org/charts/unihan.html>.

==
John H. Jenkins
[EMAIL PROTECTED]
[EMAIL PROTECTED]
http://www.tejat.net/






Re: CJK fonts

The display in Hanfind uses the html browser embedded in the program. Any 
unicode reference in an html page can be written as an entity, such as 
㱩 for U+3C69. This displays without any problem, as long as you have 
the font.
Or have I missed your point ?
Raymond


At 10:08 AM 12/11/2002 -0800, you wrote:
> http://ourworld.compuserve.com/homepages/RaymondM.

I clicked on "Hanfind". Something is wrong with that page. It's HTML
encoded directly in Unicode, which as far as I know is invalid HTML.

Rick






  1   2   >