from:"Peter_Constable"

Re: Font creation

2003-09-28 Thread Peter_Constable

"Ananda" <[EMAIL PROTECTED]> wrote on 09/28/2003 03:26:39 AM:

> Is Adobe tools for Font Creation can be downloaded?

Yes. Look at the URL I provided:


> - Original message follows -

> You can find links to these and other font tools at 
> http://scripts.sil.org/cms/scripts/page.php?
> site_id=nrsi&item_id=fonttoollinks&highlight=
> 
> 
> 
> Peter

Re: font creation software for Unicode Hebrew proposal ?

2003-09-27 Thread Peter_Constable

> Can anyone let me know which one is the best tool to create Indic 
opentype 
> fonts for Windows.

You can use any of a variety of tools for creating glyhphs, but I don't 
recommend Fontographer and it has various bugs and creates fonts that have 
problems. As you create the glyphs, you should also use Microsoft's Font 
Validator tool to verify that there are not errors in the font. 

For adding OpenType layout tables, you can use either the Adobe Font 
Development Kit, or Microsoft's VOLT. There are also Perl-based tools, if 
you're a Perl fan. For a visual interface, VOLT is the only option.

Again, after getting OpenType tables added, you should run the Font 
Validator -- highly recommended. Adobe also has some interesting testing 
tools, especially if you create fonts with CFF ("Type 1") outlines.


You can find links to these and other font tools at 
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=fonttoollinks&highlight=



Peter

Re: font creation software for Unicode Hebrew proposal ?

2003-09-27 Thread Peter_Constable

Elaine:

> I am looking for inexpensive glyph creation 
> software to produce a Unicode Hebrew proposal.
> The Hebrew Unicode list recommended several
> possibilities:  Graphite, PfaEdit, 
> VOLT, and TypeTool...

Neither Graphite or VOLT are for creating glyphs. If all you need to do is 
create glyphs, then both PfaEdit and TypeTool will do that.


Peter

Re: Internal Representation of Unicode

2003-09-26 Thread Peter_Constable

James Kass wrote on 09/26/2003 12:03:42 AM:

> Peter Constable (IIRC) reported on this list a while ago that there was
> a Latin-based writing system used for an indigenous South American
> language which stacks up to three marks above.

Good memory, James! The language is Ticuna.


Peter

Re: Fun with proof by analogy, was Re: Mojibake on my Web pages

2003-09-26 Thread Peter_Constable

James Kass wrote on 09/26/2003 01:46:43 AM:

> But, this simply isn't the case with Doug Ewell's web pages.  Doug's
> pages are properly encoded using the world's standard for text
> encoding and properly tagged.  The server isn't performing any
> conversion, it's just adulterating the content of the web pages
> by adding an incorrect protocol resulting in the display of 
> mojibake.

Doug's server may be doing the wrong thing, but that isn't a 
counterargument to the general principle of whether the browser should 
believe what the server says or what the document says about the encoding. 
That was the question to which I and, I think, Jon were responding.


Peter

Re: Fun with proof by analogy, was Re: Mojibake on my Web pages

2003-09-26 Thread Peter_Constable

Peter Kirk wrote on 09/26/2003 02:21:59 AM:

> >Unlike Jame's cup of wine, this really is a good analogy. Suppose the 
> >document is stored on the server in ISO 8859-1 and the browser 
requesting 
> >the page understands only EBCDIC. The server must convert it -- if it 
> >doesn't, it will appear on the client as complete garbage. As Jon 
> >mentioned, the server is the last one to touch it, and this illustrates 

> >why it is appropriate for the server to touch it.

> Is server software actually obliged to perform such conversions on 
> request? Surely, rather, browsers should be expected to support a 
> certain minimum set of encodings...

Folks, feel free to spend your time bantering on about whether something 
should or shouldn't do this or that. But while you're at it, if you want 
to know whether the http encoding declaration is supposed to have 
precedence over the encoding declaration inside the HTML doc, go read the 
specs to find the definitive answer.



Peter

Re: Fun with proof by analogy, was Re: Mojibake on my Web pages

2003-09-26 Thread Peter_Constable

> > The last agent handling the document would be the mail carrier.
> > Does the mail carrier have the right to open the mailing and
> > replace your document with garbage?
> 
> No, however if I receive a letter in the post written in German I'm 
> going to ask someone to translate it rather than try to cope with a 
> language (c.f. encoding) I don't understand.

Unlike Jame's cup of wine, this really is a good analogy. Suppose the 
document is stored on the server in ISO 8859-1 and the browser requesting 
the page understands only EBCDIC. The server must convert it -- if it 
doesn't, it will appear on the client as complete garbage. As Jon 
mentioned, the server is the last one to touch it, and this illustrates 
why it is appropriate for the server to touch it.


Peter

Re: Character codes for Egyptian transliteration

2003-09-02 Thread Peter_Constable

Peter Kirk on 08/21/2003 09:33:27 AM:

> As for the requirement for distinct upper and lower case variants of 
> ayin, I understood that there was a similar requirement in some minor 
> Cyrillic languages, at least for apostrophe and double apostrophe. 
> Earlier this year Peter Constable was gathering information for a 
> possible proposal. But I never heard if it was proceeded with.

I was given charts reporting these things being used for various 
languages, but don't think I ever got an explanation of what the purpose 
for them was, and I didn't get any confirmation of actual use let alone 
samples from actual publications. If you can provide samples, that would 
be great.


Peter Constable

Re: Last Resort Font

2003-09-02 Thread Peter_Constable

Michael Eversion wrote on 08/19/2003 02:52:55 PM:

> >p. 63 (Syloti Nagri): both top and bottom read "SILOTI NAGRI".

> I will look into all of that, and thank you for it; but note that of 
> those only Thaana can be expected to display, as none of the others 
> have been encoded. So none of those could EVER be displayed; they are 
> just extra glyphs in the current font.

Syloti Nagri has been approved by UTC and assigned to A800..A82F, though 
this is yet to be ratified by WG2 (presumably will happen in October) and 
published in a new version of Unicode (will be 4.1) or an amendment to ISO 
10646 (I don't know what timetable is in place for publishing further 
amendments).



Peter Constable

[OT]Re: Breaking free from UNICODE

2003-09-02 Thread Peter_Constable

Michael Eversion wrote on 08/19/2003 03:14:47 PM:

> Golly, I was able to distinguish Latin and Georgian and Cyrillic on a 
> Mac SE 30 in 1985. Or was it 1987.(Long before Worldscript I admit.) 
> And years before that there was the Osborne with its dot-matrix 
> miracles.

IIRC, the Mac SE did not exist in 1985; I was using a relatively new Fat 
Mac in the summer of that year. 

BTW, the Osborne didn't particularly have dot-matrix miracles. That was 
the domain of printers like the Toshiba P321 and various Epson LQ models, 
and such printers could be connected to CP/M machines like the Osbornes 
and Kaypros, DOS machines like the IBM PC and Sharp PC 5000, and the Macs. 
But in 1985 I think the only dot matrix printers were the 9-pin variety, 
which weren't all that conducive to readable Latin with diacritics, let 
alone Chinese or Arabic typesetting. The P321 was one of the first 24-pin 
models, and I think it came out in 1987 or maybe late 1986.


Peter Constable

Re: [hebrew] Re: Roadmap---Mandaic, Early Aramaic, Samaritan

2003-08-14 Thread Peter_Constable


Peter Kirk wrote on 08/11/2003 01:50:17 PM:

> But suppose someone like Elaine or myself wants to offer their expertise
> on a specific script or script family as an ongoing commitment. Is there
> a way they can do so without having to receive dozens of messsages every
> day on the Unicode list, the vast majority of which have nothing to do
> with their specific area of expertise?
>
> Maybe the script specific mailing lists are the answer. But is their
> existence an ongoing commitment, or just a temporary reaction to a
> temporary burst of activity?

Not an official answer, but accurate, I think: script-specific ad hoc lists
can be created as a need arises, and will continue to exist as long as a
need exists. A Hebrew-specific list did not exist until recently as there
had not been enough Hebrew-specific discussion to warrant a list until
recently.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

RE: Assume everything on this list is ignored

2003-08-14 Thread Peter_Constable


Jill:

> Isn't the very notion of "submit[ting] a FAQ question" a contradiction in
> terms?

No; *somebody* has to provide the content that gets added to an FAQ page.


> Forum FAQs are generally put together by long-term
> members of forums who are sick of having to answer the same question over
> and over again to all these damn newbies, or by other long-term members
who
> simply wish to cut down the traffic on the list.

True enough. It might help to be aware of a few things, though:

- almost all of the work done in development, maintenance and promotion of
the Standard is done on a voluntary basis

- I think it is generally true of long-term members that their workload in
relation to the Standard has been increasing, not decreasing

- in the past month, during which you arrived, there have been *very*
long-winded discussions on a small set of topics, with a significant number
of posts from people that tend to have *very* lengthy messages (either
because they tend to make lengthy comments or because they tend to quote
messages in their entirety), and with nothing particularly conducive to
addition to an FAQ page

In this recent situation, but even in general, I don't think we should
expect to find long-term members particularly attentive to discussions with
a view to what they could be adding to an FAQ page: they only have so much
bandwidth to offer. If there is something that you think would be a useful
addition to an FAQ, it won't necessarily be a high priority for Mark Davis
or some other long-term member to write it up and add it; and at present,
there's a fair chance that it hasn't even occurred to them that an addition
to an FAQ page might be useful as they're probably exhausted just from the
sheer volume of discussion that's been going on. Certainly, I'm exhausted,
and recent discussion has been on topics in which I have a real interest.

In general, I think you'll find that you're questions will get answered on
this list. Every now and then after discussing some question, someone will
suggest it be added to the FAQ page, and often -- but not necessarily
always -- someone will prepare the content and submit it for addition. But
it's done voluntarily, and someone has to take the initiative to do it and
submit it.


> Now, if it is true, as Mark Davis suggests, that the Frequently Asked
> Questions list at "http://www.unicode.org/faq/"; is unrelated to this
list,

Mark did not say or even suggest it was unrelated.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: [hebrew] Re: Roadmap---Mandaic, Early Aramaic, Samaritan

2003-08-14 Thread Peter_Constable


[EMAIL PROTECTED] wrote on 08/12/2003 01:07:46 PM:

> > It does not happen if an expert sticks her/his
> > head in the door momentarily, makes a few comments and then disappears.
>
> Well, yeah, I am an irresponsible fly-by-night.
>
> After all, so far I only spent 4 years and 3 months
> on the Hebrew character set...

> But I can see why you think my attitude stinks,
> I've accomplished so little in these years.


Your remark on which I was commenting was:


During this short period, when I am on this list and receiving
the enormous volume of mail, I am stating things that I will not be
around much to state.


I wasn't suggesting that you haven't been working on relevant issues, or
not doing useful work. I was commenting on the fact that you appear, by
your remarks, to be making a temporary appearance on this list. Development
of the Unicode Standard is a community affair, and if you want to have a
significant impact on the development of the Standard you need to be an
active and on-going member of the community, at least for the duration
needed to make the changes you care about, not just someone that pops into
town for brief appearances.




- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: [hebrew] Re: Roadmap---Mandaic, Early Aramaic, Samaritan

2003-08-11 Thread Peter_Constable


Elaine Keown wrote on 08/10/2003 06:30:44 PM:

> During this short period, when I am on this list and receiving
> the enormous volume of mail, I am stating things that I will not be
> around much to state.

Therein lies a problem in attitude: encoding of scripts, and doing it well,
takes an on-going commitment and presence from people who have expertise on
the scripts in question. It does not happen if an expert sticks her/his
head in the door momentarily, makes a few comments and then disappears.


> You've never had a Semitic script expert, that's the problem.

If that's the case, see above.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

RE: Questions on ZWNBS - for line initial holam plus alef

2003-08-09 Thread Peter_Constable


Ken Whistler wrote on 08/06/2003 03:19:34 PM:

> > Again, why should not  be canonically
> > equivalent to , when  > dot below> is canonically equivalent to ?
> > And I want a design answer, not a formal answer! (The latter I already
> > know, and is uninteresting.)
>
> The formal answer is the true and interesting answer!
>
> It shouldn't be canonically equivalent because it *isn't*
> canonically equivalent.
>
> But instead of obsessing about the particular case of the CGJ,
> admit that the same shenanigans can apply to any number of
> default ignorable characters which will not result in visually
> distinct renderings under normal assumptions about rendering.

What I think is different here, Ken, is that a suggestion has been made
that CGJ be recommended for use within a combining sequence in order to
maintain a distinction for Biblical Hebrew, which it does by virtue of it's
property of blocking canonical reordering. No other default ignorable has
ever been specifically given this function. In introducing this function
for a particular character (CGJ, in this case), the issue really arises for
the first time. And I don't think it's insignificant: surely there will be
implementers out there wondering what the implications are with a
canonical-reordering blocker that can be inserted into sequences creating a
distinction where none previously existed -- and where none was ever
desired. (I think I mentioned this issue shortly after the CGJ suggestion
was first raised.)



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Hebrew Vav Holam

2003-07-31 Thread Peter_Constable


Ted Hopp wrote on 07/31/2003 12:12:34 PM:

> I'd propose something that would look like this in the UCD (with 'nn' to
be
> determined, but it should be in the Hebrew block):
>
> 05nn;HEBREW VOWEL HOLAM MALE;Lo;0;R; 05D5 05B9N;

I don't understand at all why you'd want to encode a
compatibility-decomposable character. If it's the same as something else,
then this isn't needed. If it's really and truly distinct, then encode it
as a distinct character, period.

It seems that the only reason you'd have for suggesting something with a
compatibility decomposition is that you want to encode the combination vav
+ right-holam = holam male. But there's absolutely no reason why the holam
male cannot be encoded as a sequence. This happens all the time for lots of
languages. Precomposed combinations should not be added any more for Hebrew
than any other script or language.

I will plan on preparing a proposal for a new right-holam character (with
some agreeable name) sometime in the next few months, unless someone else
gets to it first (I likely won't be able to do so before the August UTC
meeting).



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Back to Hebrew -holem-waw vs waw-holem

2003-07-30 Thread Peter_Constable


Ted Hopp wrote on 07/30/2003 11:43:10 AM:

> One of the key points some of us are trying to make is that vav with
kholam
> khaser is a different mark on the page than a kholam male. Different
> semantics AND different appearance, but no separate Unicode encoding.

In your earlier message, to which I responded, you spoke of two things
written with the same glyph; that sounds to me like one character. But now
you're talking about different appearances for combinations of certain
characters. That distinction does need to be representable in Unicode, so
how. It might involve an addional character, though it might also be done
some other way.


> Besides, what's all this that I keep reading about Unicode encodes
> characters, not glyphs?

True, but characters are not the same as phonemes. Your examples earlier of
qamats and shewa were very clearly phonemic differences and not character
differences.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Back to Hebrew -holem-waw vs waw-holem

2003-07-30 Thread Peter_Constable


Ted Hopp wrote on 07/29/2003 01:20:08 PM:


> The two vowels kholam male and shuruq have nothing to do with the
consonant
> vav (HEBREW LETTER VAV) other than that they are written with the same
> glyph.

If they are written with the same glyph, then they are written with the
same character. Unicode encodes characters, not phonemes. There is probably
some language for which "x" is used to represent a vowel, or perhaps a
tone, but we don't need to encode two (or three) "x" characters. Sorry, but
I think the reasoning here is wrong.


> Hebrew characters are used for
> much more than spelling Hebrew words.

And, apparently, for more than one phoneme; but we still encode the
characters but once.



> These different uses for the same (or approximately same) glyphs

Well, are the glyphs the same, or only approximately the same?


> Other typographic curiosities: The HEBREW POINT QAMATS [05B8] is used for
> two Hebrew vowels...

> The same comment goes for HEBREW POINT SHEVA...

Same response.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

RE: Yerushala(y)im - or Biblical Hebrew

2003-07-29 Thread Peter_Constable


John Hudson wrote on 07/29/2003 12:36:01 PM:

> Perhaps you would like to expand on this? What kind of markup? How would
it
> interract with fonts and rendering engines?

It seems to me it would not, unless application software were explicitly
written to support the markup conventions and use some appropriate
interface into the font/rendering engine. To keep things simple, that
interface could be in the form of font features. Alternately, it could
perhaps be done in terms of specific orderings of marks during rendering.
Either way, the application software would have to be aware of whichever
interface was established as the norm.

And that raises a big issue: can we get universal agreement on the app -
font/rendering interface, or are we going to see different vendors
supporting proprietary solutions?

And what about interchange? I believe from a previous post that when Jony
says "markup" he means either markup or rich text -- i.e. plain text plus
*something* else. Can we get universal agreement on what that something
else is?

Not to mention the most obvious concern: all this only works in apps that
have been explicitly written to support Biblical Hebrew. That's really
going to make users happy.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

RE: Back to Hebrew - Vav Holam

2003-07-29 Thread Peter_Constable


Jony Rosenne wrote on 07/29/2003 03:21:08 PM:

> The only thing established is that this artifact has been used in
> several manuscripts, one of many similar artifacts, to aid the
> understanding of the text. And the correct vehicle to convey such
> artifacts is markup.

You say this as if it's objective truth. Now, if I see Latin-script text
with a diacritic comma above in some places but also a comma above and a
little to the right, the correct vehicle to convey these "artifacts" is the
pair of distinct characters, U+0313 COMBINING COMMA ABOVE and U+0315
COMBINING COMMA ABOVE RIGHT. Apparently, in the case of Latin, it was not
considered an objective truth that the correct vehicle is markup.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-29 Thread Peter_Constable


Peter Kirk wrote on 07/29/2003 09:22:35 AM:

> Or is markup
> being suggested as a solution of the Yerushala(y)im issue? If so I fail
> to see how it addresses the problem, as markup does not inhibit
> normalisation.

The markup-based solution would have to be something like

yerushalaim

which would normalize to

yerushaliam

Alternate tags such as  would, of course, be
possible.) Thus, it would be the markup that determined the "a(y)i"
semantics and rendering, not the characters themselves.

A variation (assuming that canonical ordering does not occur around markup
tags), might be something like

yerushala



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

RE: Yerushala(y)im - or Biblical Hebrew

2003-07-29 Thread Peter_Constable


Jony Rosenne wrote on 07/29/2003 01:37:00 AM:

> Failing that, it was suggested that an existing Unicode character, such
as
> ZERO WIDTH NO-BREAK SPACE, be used for "invisible" Hebrew letters, in
cases
> such as Yerushala(y)im.

ZWNBSP or any other word-boundary-causing character would give
inappropriate behaviour for certain processes.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-29 Thread Peter_Constable


Ken Whistler wrote on 07/25/2003 07:39:59 PM:

> > Of course, zwnbs is not a base character...

> There is no need for an invisible base character here.

Moreover, a space of any type would be a particularly bad thing -- it's not
two words.

- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-29 Thread Peter_Constable


Ken Whistler wrote on 07/28/2003 08:34:50 PM:

> I doubt it. I think it is much more likely that the stability of
> normalization per se will hold. And when people finally come to
understand
> that Unicode normalization forms don't meet all of their
> string equivalencing needs, the pressure will grow to define other
> kinds of equivalences.

This is most likely a reliable prediction of the future: existing
normalizations and not seen as doing it all, and other equivalence
relations will be defined. The current problem with normalization and
Hebrew vowel combinations might not be such a big deal if early
normalization to NFC wasn't a W3C recommendation for W3C protocols, such as
XML.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Hebrew hataf vowels (was: About CGJ)

2003-07-24 Thread Peter_Constable


Peter Kirk wrote on 07/24/2003 01:10:53 PM:

> Actually I don't need to foresee this, it is
> happening already, as there is already one Hebrew Bible text available
> which displays properly only with Ezra SIL, another which requires
> FrankRuehl, and another which has a different preference. We need to put
> an end to this kind of situation as soon as possible.

Absolutely!



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Hebrew hataf vowels (was: About CGJ)

2003-07-24 Thread Peter_Constable


John Hudson wrote on 07/24/2003 12:49:11 PM:

> * Of course, this gets screwed up by Unicode normalisation, but that's
just
> another example of what we've been talking about all along. Personally, I

> would rather see a 'right meteg' character encoded than use CGJ or
another
> mechanism to force right positioning.

Of course, one of the nasty details in all these suggestions is that, if we
do start using CGJ in the way suggested and also get a new character RIGHT
METEG (for which we need to dream up an appropriate combining class -- pick
a number from 1 to 199!), then we need to consider what the significance
(if any) will be of the distinctions between (e.g.)

QAMETS + RIGHT METEG
QAMETS + CGJ + RIGHT METEG
RIGHT METEG + QAMETS
RIGHT METEG + CGJ + QAMETS

Of course, we'll probably just disregard RIGHT METEG + (CGJ + ) QAMETS +
(CGJ + ) METEG and variations thereof as just sequences with no linguistic
meaning (i.e. misspellings).



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: About CGJ (was: Yerushala(y)im - or Biblical Hebrew)

2003-07-24 Thread Peter_Constable


Philippe Verdy wrote on 07/23/2003 10:19:09 PM:

> However, its canonical decomposition into  COMBINING ACUTE ACCENT> who are both of combining class
> 230 (Above), has an impact in renderers: they are supposed to stack
> one above the other, so the ACUTE ACCENT (oxia, tonos) should
> appear *above* the DIERESIS (Dialytika). But usage in Greek (similar
> cases occur with Vietnamese Latin letters with two above diacritics),
> show that they do not stack up, but above diacritics are really
> combined (the tonos accent is written in the middle of the two dots of
> the dialitika).
>
> So this is alredy a case where diacritics can (and should) ligate by
> default, and that a CGJ may be used to remove (?) this ligature of
> accents and instead use the vertical stack.

Not needed, IMO, nor would it be a good idea to use CGJ as a rendering
control. A while ago there was an idea that CGJ could be used as a
rendering control in exactly the opposite way: presence of CGJ would give
the side-by-side stacking needed for Vietnamese and Greek. That idea was
rejected, however. (Besides, the positioning for Greek and for Vietnamese
are not entirely the same.)


> If this is wrong, then
> how do you combine a macron with a dieresis?

*Macron* and diaeresis? How can these combine in any way other than
vertical stacking?



> If correct placement of diacritics must be specified, could we use the
> ideographic description characters to create those combining
> sequences with a more descriptive composition rule?

Yikes! My initial reaction is that I hope we don't go that direction.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-24 Thread Peter_Constable


One thought: Ken has suggested CGJ be used to prevent reordering of
combining marks in fixed position classes such as the Hebrew vowels, and
also suggested that users should not need to be aware of the need for CGJ
for this purpose but that software can be implemented in a way that hides
that detail. I'm not sure how that will work, but it's making me wonder if
effectively we'd be looking at some amendment to the normalization
algorithms to insert CGJ in certain enumerated contexts.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-24 Thread Peter_Constable


Jony Rosenne wrote on 07/23/2003 01:43:51 PM:

> With all due respect, this kind of implementation issues is of secondary
> importance. The task of Unicode is to get the encoding right.

I realise that some things that may not work now can be made to work with a
little more effort. But your comment raised a question in my mind: what
criteria do we use to decide how to get the encoding right if not
implementation issues involved in creating software that gives the desired
behaviour as perceived by the user?



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-23 Thread Peter_Constable


Peter Kirk wrote on 07/23/2003 09:24:12 AM:

>  From Unicode 4.0 section 3.11,
> http://www.unicode.org/book/preview/ch03.pdf: "The particular numeric
> value of the combining class does not have any special significance; the
> intent of providing the numeric values is /only/ to distinguish the
> combining classes as being different, for use in equivalence
> comparisons. ... The canonical order of character sequences does /not/
> imply any kind of linguistic correctness or linguistic preference for
> ordering of combining marks in sequences." There is therefore no reason
> for combining classes to reflect ordering.

But, the combining classes do have to reflect relative visual positions wrt
the base -- that's what determines which things do or do not interact
typographically. (Things that are above right will interact typographically
with one another because they are competing for the same visual space, but
not with things that are above, above left, below right, etc.) The classes
will define *some* order, and as the classes are visually based, that order
cannot be the logical order for both RTL and LTR text.


> The problem, if there is one,
> is with rendering software which expects to receive an input stream in a
> logical order although Unicode implies that the order is arbitrary,

No, the problem I'm referring to has to do with editing, when certain
protocols recommend (and all but guarantee) that data be stored in a
particular order. While the particular numerical values of combining
classes are of no special significance and were not intended for anything
more than to distinguish between different classes for use in equivalence
comparisons, is just happens that that order will be used for storage order
of data, and that will be the visual LTR order, making it the opposite of
what users will likely think in terms of for RTL text.

So, even if rendering software is performing its own reordering, the issue
I'm trying to point out is that *editing* software may also need to do its
own reordering in order to provide the kind of editing experience that
users expect. Otherwise, users may have a difficult time knowing what order
to type things, and what will get deleted the next time they backspace.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-23 Thread Peter_Constable


Peter Kirk <[EMAIL PROTECTED]> wrote on 07/23/2003 09:55:02 AM:

> Peter C, I guess that when you wrote this you had not yet seen my
> posting pointing out that in Unicode 4.0 developers are obliged to
> "implement" CGJ, quite apart from Hebrew, as a "default ignorable
> character",

Unicode does not ever oblige developers to implement support for any given
character, including CGJ. But *if* a developer is going to implement
support for CGJ, they may not want to do so just for rendering purposes,
and they probably want to ensure that something done with Biblical Hebrew
in mind doesn't hurt what they've done for other scripts. Sometimes things
that seem to us as simple as saying "just do such-and-such" are not quite
that simple to the people actually maintaining a large code base.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-23 Thread Peter_Constable


Peter Kirk wrote on 07/23/2003 04:40:03 AM:

> I hope you are not suggesting that any application developers are
> prepared to implement changes to support proposals which they have put
> forward to the UTC but are not prepared to implement changes to support
> alternative fixes to the same problems which may be preferred by the UTC
> because they are acceptable to users. Well, this would be an acceptable
> position if the alternative fix is much harder to implement than the
> preferred proposal. But in this case the alternative fix, using CGJ,
> seems to be actually a very trivial matter for a rendering engine.

There's a concern that it may not be a good idea for a developer to
implement support for CGJ just in relation to Hebrew, and that the proposed
usage of CGJ for Hebrew is quite distinct from it's more general uses.
Doing half a job may cost more in the end, and one has to consider whether
one's implementation, intended for Hebrew, has had any unexpected effects
on one's implementations of other scripts.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-23 Thread Peter_Constable


Philippe Verdy wrote on 07/22/2003 09:18:35 PM:

> If there's an agreement about what should have been the best
> combining classes...

Describing what would be the best combining classes can be tricky for RTL
scripts if the canonical ordering is intended not only for purposes of
normalization and string comparison but also as a preferred order for
storage and editing interaction. The reason is that the combining classes
are intentionally based on visual relative position wrt the base character,
not logical. Arbitrarily, a LTR ordering ... < below left < below < below
right < ... is used, meaning that combinations of marks will be sequenced
in the opposite order to the underlying line order, and so not in the
logical order in terms of which users will be thinking. As an example using
Hebrew, for a combination of (say) beth with qamats and dehi, preferred
classes according to the visual basis on which classes are defined would be

qamats = 220
dehi = 222

and so you'd get an encoded sequence of < beth, qamats, dehi >. But for the
user, the pre-positive dehi, being to the right of the qamats, would
probably be thought of as occuring before the qamats.

Now, I said above that the classes were based arbitrarily on a visual LTR
order. A RTL ordering ... < below right < below < below left < ... could
have been used, but then the same mismatch would exist for LTR scripts. So,
the problem is not with the arbitrary choice of LTR visual ordering for the
classes.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-21 Thread Peter_Constable


Philippe Verdy wrote on 07/20/2003 08:37:19 AM:

> > What would be the purpose of encoding these? I can't think of any.
> > They certainly don't need to be encoded as distinct characters to use
> > in a Last Resort font.
>
> Mostly for documentation purpose

Since Unicode is not a glyph encoding standard, there's no need for it to
assign glyphs to codepoints for documentation purposes.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Karen Language Representation in Unicode

2003-07-20 Thread Peter_Constable

Michael Everson wrote on 07/20/2003 07:09:40 AM:

> I've discussed the matter with Christian and you can write to me about 
it.

It would be appreciated if you could please include Martin Hosken 
<[EMAIL PROTECTED]> and me in that discussion.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Karen Language Representation in Unicode

2003-07-20 Thread Peter_Constable

Heather Batterham wrote on 07/20/2003 06:46:16 AM:

> The second interest I have is in the development of word processing 
> tools that utilize the contents of unicode.  I use a Macintosh with OSX 
> installed.  The basic language packages are very good but they do not 
> have the Burmese script included.

The only working font implementation for Burmese script that I know of is 
a one that we have (in beta), implemented using Graphite rendering. It's 
available at 
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=GraphiteFonts.

Unfortunately, Graphite is not currently available for use on the Mac, 
though I understand significant interest has been expressed in seeing a 
Mac port. A Linux port is in progress.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: About the European MES-2 subset

2003-07-20 Thread Peter_Constable

> On Windows, the "cannot find a font for it" situation is the NULL glyph. 
The
> Last Resort font is cool but a Code2000 stab at the actual glyph is 
(IMHO)
> cooler than both.:-)

Then wouldn't it make sense for Arial Unicode MS to be included with 
Windows rather than just with Office?



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Last Resort Glyphs (was: About the European MES-2 subset)

2003-07-20 Thread Peter_Constable

Philippe Verdy wrote on 07/19/2003 01:24:48 PM:

> Isn't this page creating the idea for a specific block of
> script-representative glyphs, that could be mapped in plane 14
> as special supplementary characters ?

What would be the purpose of encoding these? I can't think of any. They 
certainly don't need to be encoded as distinct characters to use in a Last 
Resort font.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: [Private Use Area] Audio Description, Subtitle, Signing

2003-07-15 Thread Peter_Constable


William Overington wrote on 07/15/2003 05:33:22 AM:

> >William, CENELEC is an international standards body. Such bodies either
> >create their own standards or use other international standards. They do
> >not use PUA codepoints.
>
> Well, the fact of the matter is that Cenelec is trying to achieve a
> consensus for the implementation of interactive television within the
> European Union

And that does not require PUA codepoints; moreover, your response does not
escape the fact I was pointing out that a standards body will not be
publishing standards that make reference to PUA codepoints.


> In view of the fact that the interactive television system (DVB-MHP,
Digital
> Video Broadcasting - Multimedia Home Platform http://www.mhp.org ) uses
Java
> and Java uses Unicode it is then a matter of deciding how to be able to
> signal the symbols in a Unicode text stream.

And they won't be standardizing on symbols encoded using PUA codepoints.



> In view of the fact that the process of getting regular Unicode code
points
> for the symbols would take quite a time, and indeed that there is as yet
no
> agreement on which symbols to use, and that the implementation of
> interactive television needs to proceed, it seems to me that putting
forward
> three specific Private Use Area code points for the symbols at this time
is
> helpful to the process.

Then you obviously don't understand the process.



> >Such things are *not* useful. They do not achieve consistency, not in
the
> >short term, and most certainly not in the long term. If consistency is
> >needed, the standardization process is used to established standardized
> >representations.
>
> Well, what is the alternative?

The alternative to agreeing on a standard? None, but why would you need an
alternative?



> The code points are in the Private Use Area,
> so the suggestion avoids the possibility of a non-conformant use of a
> regular Unicode code point.

That is hardly the concern. Standards are designed to be international
agreements that foster international commerce; such IT standards are
intended to be international agreements on data representation or
processing protocols on which interoperable products can be developed. In
order to ensure reliable interoperation, they will not build standards on
anything that isn't standardized. PUA codepoint usage is not standardized,
by definition.



> For the long term, hopefully regular Unicode
> code points will be achieved.  In the short term, my suggesting of some
> specific code points does not impede consistency and may possibly help to
> achieve consistency.

Wrong. There is no consistency if company A decides to follow one set of
PUA character assignments while company B uses a different, incompatible
set. As long as PUA codepoints are used, that is going to be at stake.



> However, publishing Private Use Area code points as an interim solution
is
> an established process.

I think I'm about to set up that "default-ignorable post" rule.


> These symbols seem to be as valid as many of the symbols already in the
> Miscellaneous Symbols section and as valid as those currently going
through
> the registration process.  In view of the excellent .pdf files which have
> appeared about those symbols which are presently going through the
> registration process I am rather hoping that these symbols will at some
> future time appear in such a .pdf file as part of the registration
process.

Your and our time would be much better spent if you were contributing to
getting the set of symbols finalized and getting proposal documents
prepared to have them added to ISO 10646 than by proposing PUA codepoints
to members of this list.



> >And might I also suggest that you create a Yahoo discussion group or MSN
> >community for PUA use, and then carry on discussion of ways to use the
PUA
> >there rather than here?
>
> Reading the information into the Unicode mail list archive is a process
of
> great value to me, so that is the motivation for posting the information
in
> this mailing list.

I'm suggesting that you take your PUA discussions elsewhere so that the
information value from the Unicode mail list can be maintained for the rest
of us.



> I was not carrying on a discussion of ways to use the PUA.  I was simply
> making an announcement to the people on the list.  There may well be
people
> on this list who are interested in interactive television and who are not
> members of the Cenelec discussion forum

Wouldn't you have a greater likelihood of reaching your target audience on
a forum dedicated to interactive television?


> and who might like to know of this
> suggestion.  Also, the symbols might well be used in hardcopy television
> programme listing magazines, so it would be desirable to have them
available
> in fonts.

Think about the workflow for such magazines and then tell me again you're
not suggesting PUA codepoints for use in interchange.




- Peter


---

Re: Combining diacriticals and Cyrillic

2003-07-15 Thread Peter_Constable


William Overington wrote on 07/15/2003 07:22:22 AM:

> No, the Private Use Area codes would not be used for interchange, only
> locally for producing an elegant display in such applications as chose to
> use them.  Other applications could ignore their existence.

Then why do you persist in public discussion of suggested codepoints for
such purposes? If it is for local, proprietary use internal to some
implementation, then the only one who needs to know, think or care about
these codepoints is the person creating that implementation.



> Publishing a list of Private Use Area code points would

have absolutely no purpose at all.


> mean that such
> display could be produced using a choice of fonts from various font
makers
> using the same software

Now you are talking interchange. Interchange means more than just person A
sends a document to person B. It means that person A's document works with
person B's software using person C's font. (An alternate term that is often
used, interoperate, makes this clearer.)



> I feel that an important thing to remember is the dividing line between
what
> is in Unicode and what is in particular advanced format font technology
> solutions

And best practice for advanced format font technologies eschews PUA
codepoints for glyph processing. You've been told that several times by
people who have expertise in advanced font technologies, an area in which
you are not deeply knowledgable or experienced, by your own admission.


> yet they are not suitable for platforms such as Windows 95 and
> Windows 98, whereas a eutocode typography file approach would be suitable
> for those platforms and for various other platforms.

Wm, if someone wanted, they could create an advanced font technology to
work on DOS, but why bother? Who's going to create all the new software
that works with that technology, and make it to work within the limitations
of a DOS system? Your idea is at best a mental exercise, and even if you or
someone else built an implementation, what is not needed is some public
agreement on PUA codepoints for use in glyph processing.


> I am hoping that the eutocode typography file approach with display
glyphs
> added into the Private Use Area will be a useful technique in many areas,
> including, yet not limited to, interactive broadcasting.

If your ideas were to get used in some area like interactive broadcasting,
the use of PUA codepoints for rendering purposes would be relevant to that
technology, and out of scope for discussion on this list.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

encoding sniffing

2003-07-14 Thread Peter_Constable

Are there any libraries out there (open-source or otherwise) that can be
used to detect the character encoding of a file or data stream?



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: [Private Use Area] Audio Description, Subtitle, Signing

2003-07-14 Thread Peter_Constable


William Overington wrote on 07/14/2003 02:28:03 AM:

> There is presently discussion about the symbols to be used to indicate
the
> availability of Audio Description, Subtitle and Signing in television
> broadcasts.
>
> This is being discussed in the Digital_TV and TV_for_All discussion
forums
> at the http://www.cenelec.org webspace.
>
> I am suggesting that the following Private Use Area code points be used
for
> the symbols at the present time.

Sigh...

William, CENELEC is an international standards body. Such bodies either
create their own standards or use other international standards. They do
not use PUA codepoints.


> This could lead to a useful consistency of
> encoding for use with interactive television systems.  Hopefully regular
> Unicode code points will be established at some time in the future, these
> Private Use Area code point suggestions are simply to help in achieving
> consistency in the mean time.

Such things are *not* useful. They do not achieve consistency, not in the
short term, and most certainly not in the long term. If consistency is
needed, the standardization process is used to established standardized
representations.

Please, if you want to see things encoded as characters, then learn how to
use the established processes for doing so (but please also learn what are
and are not suitable candidates for character encoding).

And might I also suggest that you create a Yahoo discussion group or MSN
community for PUA use, and then carry on discussion of ways to use the PUA
there rather than here?




- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

2003-07-12 Thread Peter_Constable

> Where does the fact of saying that a Grapheme Disjoiner...

The character you should be referring to is not a new character GDJ, but 
rather is the existing ZWNJ, the functions of which include prevention of 
a ligature.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: 24th Unicode Conference - Atlanta, GA - September 3-5, 2003

2003-07-10 Thread Peter_Constable

Peter Kirk wrote on 07/10/2003 10:52:55 AM:

> Well, Peter, I see that according to 
> http://www.unicode.org/timesens/calendar.html the next UTC is to be held 

> at Pleasanton, CA, which is either a village or a not-well-known 
> (internationally) suburb either in California or in Canada, not large 
> enough or well enough known to be mentioned in my world atlas, so if I 
> wanted to attend I would have to do significant research to find out 
> where it is going to be held.

UTC meetings are closed events, not industry conferences attended by the 
general public.  Those who attend the meeting are already going to be in 
close communication with the people organizing the meeting and will be 
able to get details as to where it is located.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: 24th Unicode Conference - Atlanta, GA - September 3-5, 2003

2003-07-10 Thread Peter_Constable

"Tim Greenwood" <[EMAIL PROTECTED]> wrote on 07/10/2003 10:44:57 AM:

> The point is not that any potential attendee would actually travel 
> to the wrong place. It is that advertising the 24th conference as 
> Atlanta, GA but the 23rd as Prague, Czech Republic is part of  a 
> cultural arrogance in the USA.

Sure, it's better not to assume "USA" is understood, but the criticism was 
being taken too far. The original criticism was certainly valid, but 
taking it to the extent of suggesting that people will do a lot of 
research on villages and suburbs in Gabon is, IMO, absurd. If you do a 
google search on "atlanta, ga", you'd have to wade through pages and pages 
of results related to the US city in the state of Georgia before you'd 
come close to anything else, and if somebody hasn't figured out by that 
point that the conference is probably in Atlanta, Georgia, USA rather than 
a village in a country not particularly known for involvement in this 
industry, then they're probably not intelligent enough to be involved in 
this industry anyway.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: 24th Unicode Conference - Atlanta, GA - September 3-5, 2003

2003-07-10 Thread Peter_Constable

Peter Kirk wrote on 07/10/2003 05:23:04 AM:

> Or they will spend a long time researching lists of villages and suburbs 

> in Gabon before finding out that there is no Atlanta there, or perhaps 
> finding that there actually is one - unless Tex has actually done this 
> exhaustive research and ascertained that there is none.

And the likelihood that a software industry conference will be held in a 
village is?

And the likelihood that the location on an industry conference would be 
expressed in terms of a not-well-known suburb that one needs to spend a 
long researching lists to find, rather than the major-city name?



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Peter_Constable

Ted Hopp wrote on 07/08/2003 11:26:14 AM:

> Also, there are missing letters and there are missing letters. There are
> cases of a single text (e.g., Holzhausen Bible of 1889, Lowe and Brydone
> Bible of 1948, as documented by Yannis Haralambous) where the "missing
> letters" in some words are simply not present in the representation and 
the
> vowels are placed on the consonants that do appear...

I have intended for a while to propose a character ELLIPTIC LETTER for 
such situations, which occur for Hebrew but may also occur in other 
paleographic studies. I think this is the best approach to that, but just 
haven't had a chance yet to write the proposal.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Peter_Constable

Peter Kirk wrote on 07/08/2003 08:18:33 AM:

> A couple of off list comments have made it clear to me that this 
> proposal needs some clarification and adjustment...

> The solution for this sequence is as follows: Define a new combining 
> character something like HEBREW LIGATURE PATAH HIRIQ with a canonical 
> decomposition of hiriq - patah (yes, that way round) and a glyph with a 
> hiriq to the left of a patah... But when 
> this text is normalised into NFC, the sequence will first be reordered 
> as hiriq - patah, and then this combination will be composed into the 
> new ligature. That is correct, isn't it?

Yes, but I wouldn't call it a ligature; I'd call it a precomposed or 
digraph character (and the glyph, I'd call a composite).

> So an application which renders 
> the NFC text will see the new character and should render it according 
> to its glyph. In NFD text, the hiriq - patah sequence remains, but it 
> is, I think, customary if not required for the renderer to combine the 
> glyphs into the defined ligature before rendering.

I'm not aware of anything that presently requires a renderer to combine 
the characters into a composite glyph, or to present the sequence of 
characters < hiriq, patah > with the hiriq to the left of the patah -- 
remember, the description of Hebrew currently in Unicode assumes that such 
sequences don't occur. 

But, in order for your solution to work, this rendering would *have* to be 
required. The fixed position classes would have to be understood as fixed 
relative positions; i.e. given this combination of marks, they are always 
positioned relative to one another in a fixed way, regardless of their 
encoded order. This would assume that any other positioning will never 
occur or be required -- true for cases that we know of, but it is possible 
that there are cases we do not know of, and that such a user need could 
exist in the future. You also haven't said anything about how to deal with 
accents that occur between the two vowel marks (though you did notice the 
issue), and the alternative of that same accent occuring either to the 
left or to the right of the pair of vowel marks (which offhand seems a 
likely potentiality with at least meteg -- I can't check that now since 
I'm away from the office); and these would have to be dealt with as well.

Also, if the rendering of the sequence < hiriq, patah > is required to 
have hiriq to the left of the patah, then what's the point of having the 
additional digraph character? None that I can see. So, a simpler solution 
would simply to specify the relative ordering of certain combinations of 
vowel marks, regardless of the order in which they are encoded. But we'd 
still have the other issues I mentioned in the preceding paragraph.


It is occuring to me that perhaps there is a way to address the stability 
issues that are a concern for IETF while fixing the combining classes for 
other purposes. I need to think about that some more, but that is seeming 
to me like (if the details can be worked out) the best hope for finding a 
solution without having a bunch of "Yeah, but..."s to deal with.


> Of  course we could simply store the reversed order without defining a 
> new character. But renderers would then need clear instruction somewhere 

> in the Unicode text that, as an exception to the normal rules for 
> rendering multiple diacritics, the hiriq should be positioned to the 
> left of the patah and similarly for the other attested sequences.

As mentioned above, this would be necessary anyway for your solution to 
work.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-08 Thread Peter_Constable

Peter Kirk wrote on 07/08/2003 04:23:59 AM:

> Would it work to define a new character, for example, for patah-hiriq 
> which has a canonical decomposition into patah plus hiriq, or even into 
> hiriq plus patah?

No, because any Unicode normalization form would decompose this, and then 
apply canonical reordering, thereby obviating the entire purpose for 
adding the digraph character.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Documents needed for proposal

2003-07-04 Thread Peter_Constable

Rick McGowan wrote on 07/03/2003 10:59:19 AM:

> > Where can the average proposal author browse "section II, Character
> > Categories" (needed for item B.3), "clause 14, ISO/IEC 10646-1: 2000"
> > (needed for B.4)
> 
> That is section 2.2 of the WG2 Principles and Procedures document. It is 
 
> available on-line. Go here:
> 
> http://std.dkuug.dk/JTC1/SC2/WG2/docs/principles.html

What is *not* available in that doc or anywhere else (except drafts of ISO 
10646 in the WG2 registry) is an explanation of implementation levels. 
(And even reading drafts of ISO 10646, it's not always clear how one is 
supposed to respond to the question about levels.)



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

RE: Yerushala(y)im - or Biblical Hebrew

2003-07-03 Thread Peter_Constable

Jony Rosenne wrote on 07/02/2003 05:55:02 AM:

> I would like to summarize my understanding:

I agree with you on most points, but would quibble on the first, as I find 
it overgeneralizes and is not explicit enough.

> 1. The sequence Lamed Patah Hiriq is invalid for Hebrew. It is invalid 
in
> Hebrew to have two vowels for one letter. It may or may not be a valid
> Unicode sequence, but there are many examples of valid Unicode sequences
> that are invalid. 

We need to state more carefully *what* is invalid. The facts are that 
spellings such as lamed patah hiriq *are* attested in literature and 
encoded representations are needed for them. These spellings are invalid 
as written representations of Hebrew that are consistent with Hebrew 
phonology; but their use in literature is not assumed to be consistent 
with Hebrew phonology; they are used *in spite of the fact* that they are 
inconsistent with Hebrew phonology. It is not normal for Hebrew spelling, 
but the literature to be encoded includes abnormal spellings, and they 
have as much need to be represented as the normal spellings.

It appears to me that you are trying to establish invalidity of such 
sequences as a basis to argue that encoded representations should involve 
some character between the two vowels. I consider this reasoning flawed, 
however: the encoded representation is a representation of the *text*, not 
the phonology, and the text most certainly does include sequences such as 
lamed patah hiriq. It may be that we end up deciding to adopt an encoded 
representation for this that involves a character between the two vowels, 
but that is a technical-design choice, and not something that we are 
compelled to do because of the nature of the Hebrew language and normal 
conventions of Hebrew spelling.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Biblical Hebrew (U+034F Combining Grapheme Joiner works)

2003-07-02 Thread Peter_Constable

[Inadvertently sent just to me; forwarded with Philippe's permission]

On Wednesday, July 02, 2003 7:03 AM, [EMAIL PROTECTED] 
<[EMAIL PROTECTED]> wrote:
> Philippe Verdy wrote on 06/28/2003 02:48:01 AM:
> 
> > If the user strikes the two keys  and , the input
> > method for Traditional Hebrew will generate 
> 
> That requires* an input method that is aware of the input context (or
> of what has already been input -- but awareness of context is far more
> reliable).

Not necessarily: the keyboard driver may return host-specific PUA for the 
vowels, and these will be mapped visually to render them with CGJ on the 
display interface, and the edited file can then be saved to standard 
Unicode by remapping them to the standard Unicode sequences, and an editor 
aware of this use of CGJ can also recreate these vowels by remapping 
 to a single PUA during the edition, as this facilitates 
the internal implementation of character selection and string 
search/replace operations.

Yes it requires some knowledge of this particular encoding in the editor, 
but it's not impossible. So in Traditional Hebrew mode, the vowel 
keystrokes could either be returned all with  codepoints (not 
 as it would be incorrect), or as PUA if this facilitates the 
implementation (notably for mouse selection), and unnecessary extra CGJ 
codepoints can easily be removed when saving the file.

An alternative method may also be to use a single PUA instead of CGJ in 
the edited text, if one wants to preserve CGJ codepoints present in the 
input stream. This PUA would be mapped by the editor as meaning: "don't 
reorder the following combining character when serializing the text, so 
that the following combining character will keep its relative order after 
normalization", and it could then be completely language neutral.

Re: Biblical Hebrew (U+034F Combining Grapheme Joiner works)

2003-07-01 Thread Peter_Constable

Philippe Verdy wrote on 06/28/2003 02:48:01 AM:

> If the user strikes the two keys  and , the input method
> for Traditional Hebrew will generate 

That requires* an input method that is aware of the input context (or of 
what has already been input -- but awareness of context is far more 
reliable). How many systems do you know that are capable of that? It 
requires the input drivers, such as keyboard DLLs, that support 
context-sensitive operations; it requires application interfaces that 
allow the input driver to find out from the app what the input context is; 
and it requires applications that support that interface. Can you name for 
me any system on which the existing keyboard driver format supports 
context-sensitive rules? Can you name an application interface that allows 
input methods (other than full-blown input method editors -- i.e. 
something with a composition window) to communicate the input context to 
the input method, and can you name one or more apps that support this 
interface?

This is all stuff I'd like to see become commonplace for a variety of 
reasons, but I doubt we'll see that happen for the sake of Biblical 
Hebrew.

*That is, unless you expect the input method to generate CGJ after *every* 
vowel (ugh!).



> On Windows XP (for example), the language bar (or its
> user-selected accelerator keys) allows such immediate switch of
> input methods. I don't see why there would not be for the
> Hebrew language, two keyboard input methods

The problem isn't requiring multiple input methods for different purposes; 
it's having an input method and application that can interact with 
particular behaviours.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

RE: SPAM: About combining classes

2003-06-27 Thread Peter_Constable

Jony Rosenne wrote on 06/27/2003 08:32:11 AM:

> I am under the impression that the existing scientific encodings of the
> Bible are encode with the help of some kind of mark up, and maybe this 
is
> how they should continue.

The existing eBHS texts use an encoding in which the order of characters 
is significant (there are no notions like canonical ordering and canonical 
combining classes), and several positioning distinctions (including three 
different meteg-vowel positionings) are represented using distinct 
characters.

Newer projects are using XML, but for things markup is suitable for. 
Suggesting that things like vowel-vowel or meteg-vowel orderings be 
handled by markup does not provide a solution for scores of users, such as 
publishers needing to typset the text, content providers trying to deliver 
content using common web browsers, seminary professors creating lessons 
and students writing papers...


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)

2003-06-27 Thread Peter_Constable

Karljürgen Feuerherm wrote on 06/27/2003 08:23:08 AM:

> Now, Q: I take it the combining classes are linked to the script, rather
> than say to a dialect

They're linked to the character.


> --e.g. one can't define BH as a separate dialect from
> MH with its own set of rules?

No, not unless BH is encoded with a separate set of characters.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: [cowan: Re: Biblical Hebrew (Was: Major Defect in Combining Classes ofTibetan Vowels)]

2003-06-27 Thread Peter_Constable

John Cowan wrote on 06/27/2003 06:29:12 AM:

> Since the use of non-ASCII characters in things like XML and the DNS

I suspect the users of Biblical Hebrew would rather be told they can't use 
Hebrew vowels and accents in markup or URIs than deal with a hack to fix 
errors in the combining classes. Sigh.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Biblical Hebrew (U+034F Combining Grapheme Joiner works)

2003-06-27 Thread Peter_Constable

Philippe Verdy wrote on 06/27/2003 04:46:56 AM:

> > Could this finally be the missing "killer ap" for the CGJ?
> 
> It will be perfect to allow an application like XML to encode Hebrew
> text using Unicode 4.0 rules (and before).

It is not perfect. CGJ is supposed to be significant (and kept in the 
text) for a variety of processes, such as searching and sorting. To use 
this for Biblical Hebrew, though, it should be ignored in such processes. 
It's another hack.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: [cowan: Re: Biblical Hebrew (Was: Major Defect in Combining Classes ofTibetan Vowels)]

2003-06-27 Thread Peter_Constable

Michael Everson wrote on 06/27/2003 09:39:16 AM:

> But you might trot on over with a white flag to parley about a problem.
> 
> They [IETF] 're only human beings over there, just as we are over here.

Every time I have referred to IETF as "them" in his presence, Misha Wolf 
has reminded me, "WE are the IETF."



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)

2003-06-27 Thread Peter_Constable

John Cowan wrote on 06/27/2003 08:24:35 AM:

> The IETF has an explicit contract with Unicode: "We'
> ll use your normalization algorithm if you promise NEVER, NEVER to 
change
> the normalization status of a single character."  Unicode has already
> broken that promise four times, so its credibility is shaky.

Yeah, but what I don't get is that IETF doesn't set anything in stone 
until there are working implementations, but Unicode's canonical combining 
classes have to be set in stone for IETF's benefit before there are 
working implementations. I just have a hard time understanding that.


> So far I have not heard any compelling objections to CGJ except that
> invisible characters are fuggly.

I just sent a message discussing this.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Biblical Hebrew (Was: Major Defect in Combining Classes of TibetanVowels)

2003-06-27 Thread Peter_Constable

Kenneth Whistler wrote on 06/26/2003 05:36:34 PM:

> But in the 10646 WG2 context... You can always come in
> with the proposal to encode BIBLICAL HEBREW POINT PATAH and
> say, even though the glyph is identical, see, the name is
> different, so the character is different. But this is a pretty
> thin disguise, and is vulnerable to simple questioning:
> What is it for? Well, to point Biblical Hebrew texts. But
> what was U+05B7 HEBREW POINT PATAH for? Well, to point Biblical
> Hebrew texts (or any Hebrew text, for that matter...). Well,
> then, what is the difference? Uh, the combining classes for
> the two are different. What is a combining class? 

If they're so unaware of combining classes, might it not seem reasonable 
to think the the dialog might continue as follows?

- [gives explanation of combining classes and the related problem for 
Hebrew] 
ISO: So, you're saying you're coming to us asking for duplicates of 
existing characters because of an error the Unicode Consortium made with 
some of those character properties they define? 
- Well, yes, that's basically it. 
ISO: Then, obviously they need to correct their errors. I mean, it's not 
like the wrong characters got encoded or something. Tell them to just fix 
the errors; that can't be difficult to do, and is obviously the right 
thing to do.




- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Biblical Hebrew (Was: Major Defect in Combining Classes of TibetanVowels)

2003-06-27 Thread Peter_Constable

Kenneth Whistler wrote on 06/26/2003 05:36:34 PM:

> Why is making use of the existing behavior of existing characters
> a "groanable kludge", if it has the desired effect and makes
> the required distinctions in text?

Why is it a kludge to insert some cc=0 control character into the text for 
the sole purpose of preventing reordering during canonical ordering of two 
combining marks that do interact typographically and so should but 
nevertheless do not have the same combining class; and, moreover, to do so 
using a control character that was not created for that purpose?

The answer seems so obvious, I wouldn't know how to begin responding.

And the fact that it achieves some desired effect has no bearing on being 
described as a kludge -- every kludge achieves some desired effect. If it 
were otherwise, the given practice would never have been conceived.



> But in the 10646 WG2 context, coming in with a duplicate set
> of Hebrew points is not going to make any sense... 
> You can always come in
> with the proposal to encode BIBLICAL HEBREW POINT PATAH and
> say, even though the glyph is identical, see, the name is
> different, so the character is different. But this is a pretty
> thin disguise, and is vulnerable to simple questioning:
> What is it for?

Are we saying that ISO doesn't give a rip for implementation issues? Or 
that their notion of ordering distinctions is different from Unicode's 
such that *any* differently ordering permutation of some given set of 
characters is considered a distinct representation? Are we saying that the 
voting members of WG2 are not already aware of the issue that has been 
discussed and incapable of understanding an explanation of these issues 
addressed to them?


> I'm trying to find a way, using existing characters and a
> simple set of text representational conventions, to make
> the distinctions and preserve the order relations that you
> need for decent font lookup, without the whole enterprise
> washing up on either of those two rocks.

Understood. I wasn't expecting the surf to go off in this direction since 
I was under the impression when we discussed this back in December on 
unicoRe that there was a consensus that we should pursue just exactly what 
I wrote in the proposal.

If we want to insert a control character to prevent reordering under 
canonical ordering, I think it would be preferable to create a new control 
character for just that purpose: that would give a character that could be 
used elsewhere for the very same purpose without needing to worry about 
what unanticipated and undesirable effects might result by hijacking a 
control created for some completely unrelated purpose. For instance, you 
suggested RLM. Suppose next week we discover a very similar issue in a LTR 
script; do we want to insert RLM to prevent mark reordering in that case? 
No! Do we want to be telling people to pick and choose from various 
controls, using different ones according to the directionality of the 
text? What if the base character is a neutral, or has selectable 
directionality (I'm thinking ahead to Tifinagh, which is written either 
LTR, or RTL)? Are we also going to introduce the use of PDF for this 
purpose in some contexts? How complicated to we want to make this? (Every 
time we conflate distinct functions on a single control character, we are 
inviting added complication, and are setting ourselves up for regrets. One 
might think that lesson was learned from the conflation of ZWNBSP and BOM.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

RE: Question about Unicode Ranges in TrueType fonts

2003-06-27 Thread Peter_Constable

> but premature standardization can
> also be a problem if the wrong choices get codified too soon.
 
As in canonical combining classes? :-)


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Biblical Hebrew

2003-06-27 Thread Peter_Constable

Kenneth Whistler wrote on 06/26/2003 10:15:12 PM:

> How does a user of pointed Hebrew text know whether they are
> dealing with the legacy points...

Ken, corresponding arguments apply equally to your suggestion of putting 
CGJ everywhere and letting software make it transparent to the user: how 
does the user distinguish between implementations intended for Modern 
Hebrew / Yiddish / etc. which do not have special processing for CGJ, and 
implenentations intended for Biblical Hebrew that do?


> What happens if they edit text represented in one
> scheme with a tool meant for the other?

Ditto.


> What about searches
> on data with pointed Hebrew -- should it normalize the two
> sets of points or not?

The users aren't going to insert a bunch of CGJs. Should software treat 
representations with and without (or partially with) as equivalent?

Etc.

The problem lies with incorrect assumptions related to canonical combining 
classes and the requirements of Biblical Hebrew when the characters were 
added. I think *any* of the solutions we've been looking at is going to 
leave multiple parties "holding some part of the can".



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Yerushala(y)im - or Biblical Hebrew (was Major Defect in CombiningClasses of Tibetan Vowels)

2003-06-27 Thread Peter_Constable

Ken Whistler wrote on 06/26/2003 05:04:55 PM:

> Another possibility to consider is U+2060 WORD JOINER, the
> version of the zero width non-breaking space unfreighted with
> the BOM confusion of U+FEFF.

It wouldn't allow line breaks, but it would indicate an unwanted word 
boundary, no? (I don't have access to UTR14 at the moment, but the text 
Ken quoted

  "...inserting a word joiner between two characters has no
  effect on their ligating and cursive joining behavior. The
  word joiner should be ignored in contexts other than word
  or line breaking."

wouldn't seem to make sense if WJ did not indicate a word boundary.)


 
- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Biblical Hebrew (Was: Major Defect in Combining Classes of TibetanVowels)

2003-06-27 Thread Peter_Constable

Kenneth Whistler wrote on 06/26/2003 08:54:08 PM:

> Actually, in casting around for the solution to the problem of
> introduction of format controls creating defective combining
> character sequences, it finally occurred to me that:
> 
> U+034F COMBINING GRAPHEME JOINER
> 
> has the requisite properties.

This seems far less problematic than either RLM or WJ, since it was 
decided that it functions between bases, but has no defined function 
between combining marks.

But John's objections to this whole approach have validity.


> I don't understand this contention. There is no reason, in principle,
> why this has to be surfaced to end users of Biblical Hebrew...

Your arguments in this regard, Ken, assume that the needs of Biblical 
Hebrew users are going to be addressed by dedicated engineering of all the 
software tools that they use with BH text. This includes rendering systems 
and fonts, but also apps, input methods, various kinds of text services... 
I think that's a bit unrealistic. Is something like Word (say) likely to 
be written to provide correct processing of CGJ (or whatever control is 
used) for BH, and do so in a way that is completely transparent to the 
user? It might be theoretically possible, but it's not terribly likely. 
(Perhaps just slightly more than is the likelihood that UTC will just 
revise the combining classes? :-)



> Nope, just insert CGJ in *all* the sequences. That blocks all reordering
> of such sequences, and you're done.

And I suppose this is considered elegant, right?



> > and adds another level of complexity to using 
> > what are already some of the most complicated fonts in existence (how 
many 
> > fonts do you know that come with 18 page user manuals?).
> 
> That, of course, I am in no position to be able to judge. 

Having reviewed the doc in question and being familiar with user manuals 
for other fonts, and can assure you it's quite unusual -- surreal, almost 
-- to see a user doc with the level of technical detail this one has, and 
by necessity, not choice.

I just have a hard time believing that 50 years from now our grandchildren 
won't look back, "What were they thinking? So it took them a couple of 
years to figure out canonical ordering and normalization; why on earth 
didn't they work that out first before setting things in stone, rather 
than saddling us with this hodgepodge of ad hoc workarounds? How short 
sighted." As Rick said, I know this will get shot down; don't bother 
telling me so.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Biblical Hebrew (Was: Major Defect in Combining Classes of TibetanVowels)

2003-06-27 Thread Peter_Constable

John Hudson wrote on 06/26/2003 03:19:44 PM:

> >That is a potential solution, thought it would have to be *two* 
additional
> >metegs.
> 
> Can you explain your thinking here, Peter?

I was thinking of the three-way distinction for hataf vowels, but you were 
correct in pointing out earlier that can be dealt with in other ways as 
well.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Biblical Hebrew

2003-06-27 Thread Peter_Constable

Rick McGowan wrote on 06/26/2003 05:52:32 PM:

> The *best* thing to do, in my personal opinion and I know it'll get shot 
 
> down so don't bother telling me so, is to fix the combining classes of 
the 
> Hebrew points.

In discussing these issues among Biblical Hebrew implementers, content 
providers and users, I have had to explain repeatedly why UTC doesn't want 
to consider this. It is completely obvious to them that this is the right 
solution. Even on explaining the impact on normalization, the response is 
that there is no impact since implementations and content using Unicode do 
not yet exist.

(And then, on the other side, I'm asked to explain why other solutions are 
kludges? Two different worlds with completely different sets of 
assumptions.)


 
> Since the combining classes can't be fixed because we have the 
> normalization-stability albatross firmly down our gullets and will 
forever 
> be choking on that, the next best thing is to use a ZWJ. Problem solved. 
 
> Just document it.

I think it would be better to create a new character for this purpose than 
to use ZWJ in yet another way.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Revised N2586R

2003-06-26 Thread Peter_Constable

William Overington wrote on 06/26/2003 07:03:12 AM:

> yet I am suggesting that where a
> few characters added into an established block are accepted, which is 
what
> is claimed for these characters, there should be a faster route than 
having
> to wait for bulk release in Unicode 4.1.

Once both UTC and WG2 have approved the assignment of characters to 
particular codepoints, I might risk making fonts using those codepoints 
for those characters, as it's not very likely the codepoints will be 
changed at that point. There's no guarantee that would not happen, 
however, so I certainly wouldn't distribute such fonts if I were a 
commercial foundary -- too much at stake. If an ammendment to ISO 10646 
gets published prior to a new version of Unicode, though, that would 
constitute a guarantee the codepoints will not change.



> If these characters have been
> accepted, why not formally warrant their use now by having Unicode 4.001
> and then having Unicode 4.002 when a few more are accepted?

That is not how versioning is done with the standard. Please read 
http://www.unicode.org/standard/versions/



> Some fontmakers can react to new
> releases more quickly than can some other fontmakers, so why should 
progress
> be slowed down for the benefit of those who cannot add new glyphs into 
fonts
> quickly?

Fontmakers don't need to wait until a new version is published before they 
start preparing fonts.


 
> For example, symbols for audio description, subtitles and signing are 
needed
> for broadcasting.  Will that need to have years of waiting and using the
> Private Use Area when it could be a fairly swift process and the 
characters
> could be implemented into read-only memories in interactive television 
sets
> that much sooner?

Well, if the characters haven't even been proposed for addition to the 
standard, then yes, it will take years of PUA usage.


> Why is it that it is regarded by the Unicode Consortium
> as reasonable that it takes years to get a character through the 
committees
> and into use?

Because there is a process that takes time. International standards aren't 
created by a few people working out of their garage. Some international 
standards take far longer than do updates to Unicode.



> Surely where a few characters are needed the Unicode
> Consortium and ISO need to take a twenty-first century attitude to 
getting
> the job done

It might be a good idea to become more familiar with the actual process 
and work on international standards in general before criticizing the 
people doing the work. There are a number of people working quite hard on 
this stuff, with their time being volunteered by the organizations and 
companies they represent, or from their own personal time.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Revised N2586R

2003-06-26 Thread Peter_Constable

William Overington wrote on 06/26/2003 06:24:44 AM:

> >  the name is simply a unique identifier within the std.
> 
> Well, the Standard is the authority for what is the meaning of the 
symbol
> when found in a file of plain text.  So if the symbol is in a plain text
> file before or after the name of a person then the Standard implies a
> meaning to the plain text file.

The only meaning that the Standard implies is that the character encoded 
at codepoint x represents they symbol of a wheelchair. It does not imply 
*anything* about how its usage in juxtaposition with the name of a person 
should be interpreted.


 
- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

RE: Major Defect in Combining Classes of Tibetan Vowels (Hebrew)

2003-06-26 Thread Peter_Constable

Jony Rosenne wrote on 06/26/2003 06:26:02 AM:

> It may look, silly, but it is correct. What you see are letters 
according to
> the writing tradition, which does not include a Yod, and vowels 
according to
> the reading tradition which does.

I understand that. My point was, you were talking about phonology, but in 
terms of the text, it was not correct: there *are* multiple vowels on a 
single consonant.


> There are in the Bible other, more extreme
> cases. 

I'd be interested on whatever info you can provide in that regard.


 
> I don't think we need any new characters, ZERO WIDTH SPACE would do and 
it
> requires no new semantics.

No, that's a terrible solution: a space creates unwanted word boundaries.


> Moreover, everybody who knows his Hebrew Bible
> knows the Yod is there although it isn't written.

But the point is, how to people encode the text? The yod is not there in 
the text. How does a publisher encode text in the typesetting process? How 
do researchsers encode the text they want to analyze? Saying, "everybody 
knows there's a yod there" doesn't provide a solution, particular given 
that the researchers know in point of fact that the consonantal text 
explicitly does not include a yod.


 
> The Meteg is a completely different issue. There is a small number of 
places
> were the Meteg is placed differently. Since it does not behave the same 
as
> the regular Meteg, and is thus visually distinguishable, it should be
> possible to add a character, as long as it is clearly named.

That is a potential solution, thought it would have to be *two* additional 
metegs.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Biblical Hebrew (Was: Major Defect in Combining Classes of TibetanVowels)

2003-06-26 Thread Peter_Constable

Karljürgen Feuerherm wrote on 06/25/2003 08:31:41 PM:

> I was going to suggest something very similar, a ZW-pseudo-consonant of 
some
> kind, which would force each vowel to be associated with one consonant.

An invisible *consonant* doesn't make sense because the problem involves 
more than just multiple written vowels on one consonant; in fact, that is 
a small portion of the general problem. If we want such a character, it 
would notionally be a zero-width-canonical-ordering-inhibiter, and nothing 
more.

And I don't particular want to think about what happens when people start 
sticking this thing into sequences other than Biblical Hebrew ("in 
unicode, any sequence is legal").



> General question: when does canonical reordering take place? At input 
time,
> at rendering time, at another time?

For the purpose for which canonical ordering was intended, it occurs when 
comparing two strings for "equality" or ordering. In practice, it can 
occur at *any* time, including transmission (when it is no longer under 
the control of the author). Some protocols, and notably W3C protocols, 
require that data be canonically ordered, and recommend that this happen 
at the earliest point possible, e.g. at input time.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Major Defect in Combining Classes of Tibetan Vowels

2003-06-26 Thread Peter_Constable

John Hudson wrote on 06/25/2003 06:47:44 PM:

> >This is not. The Unicode Standard makes no assumptions or claims
> >about what the phonological or meaning equivalence of 
> >or  is for Biblical Hebrew.
> 
> But it does make assumptions about the canonical equivalence of the mark 

> orders  and , unless my understanding of 

> the purpose of combining classes is completely mistaken.

Your understanding on this point is correct.


> My understanding 
> is that any ordering of two marks with different combining classes is 
> canonically equivalent; 

Yes.


> further, I understand that some normalisation forms 
> will re-order marks to move marks with lower combining class values 
closer 
> to the base character.

*Every* Unicode normalization form will apply canonical reordering.



> * Meteg re-ordering is in some respects even more problematic than 
> multi-vowel re-ordering

And it is because of meteg-vowel ordering distinctions that the ordering 
of things like patah + hiriq should not be solved in any way other than 
the two having the same canonical combining class, because that is exactly 
what will be needed to deal with meteg-vowel ordering distinctions.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Biblical Hebrew (Was: Major Defect in Combining Classes of TibetanVowels)

2003-06-26 Thread Peter_Constable

Ken Whistler wrote on 06/25/2003 06:57:56 PM:

> People could consider, for example, representation
> of the required sequence:
> 
>   
> 
> as:
> 
>   

So, we want to introduce yet *another* distinct semantic for ZWJ? We've 
got one for Indic, another for Arabic, another for ligatures (similar to 
that for Arabic, but slightly different). Now another that is "don't 
affect any visual change, just be there to inhibit reordering under 
canonical ordering / normalization"?



> The presence of a ZWJ (cc=0) in the sequence would block
> the canonical reordering of the sequence to hiriq before
> qamets. If that is the essence of the problem needing to
> be addressed, then this is a much simpler solution which would
> impact neither the stability of normalization nor require
> mass cloning of vowels in order to give them new combining
> classes.

Yes, it would accomplish all that; and is groanable kludge. At least with 
having distinct vowel characters for Biblical Hebrew, we'd come to a point 
we could forget about it, and wouldn't be wincing every time we considered 
it.


 
> The problem of combinations of vowels with meteg could be
> amenable to a similar approach. OR, one could propose just
> one additional meteq/silluq character, to make it possible
> to distinguish (in plain text) instances of left-side and
> right-side meteq placement, for example.

And the third position of meteg with hataf vowels? Introduce *two* 
additional meteg/silluq characters?



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Major Defect in Combining Classes of Tibetan Vowels (Hebrew)

2003-06-26 Thread Peter_Constable

Jony Rosenne wrote on 06/26/2003 12:16:22 AM:

> When, in the Bible, one sees two vowels on a given consonant, it isn't 
so.

That's silly. When one sees two vowels on a given consonant in the Bible, 
it *is* so: the two vowels are written there. It may not correspond to 
actual phonology, ie what is spoken, but as has been made clear on many 
occasions, Unicode is not encoding phonology, it is encoding text. And in 
relation to text, your statement is simply wrong.


> There is one vowel for the consonant one sees, and another vowel for an
> invisible consonant. The proper way to encode it is to use some code to
> represent the invisible consonant. Then the problem mentioned below does 
not
> arise.

The idea of an invisible consonant would amount to encoding a phonological 
entity, which is the kind of thing that was at one time approved for Khmer 
(invisible characters representing inherent vowels), but later turned into 
an albatross, and when I proposed the same thing (invisible inherent 
vowel) for Syloti Nagri, it was made very clear to me that it would not go 
down well with UTC.

Also, the proposed solution of an invisible consonant would leave 
unresolved the problem of meteg-vowel ordering distinctions, while the 
alternate proposal of having meteg and vowels all with a class of 230 
solves both problems at once. Two ad hoc solutions (one for multi-vowel 
ordering, and another for meteg-vowel ordering) must certainly be far less 
preferred for one motivated solution (having characters with canonical 
combining classes that are appropriate for the writing behaviours 
exhibited).

I invite people to review the discussions from the unicoRe list from last 
December, at which time everyone (including you, Jony) were all concluding 
that the solution which I proposed in L2/03-195 was the best solution to 
pursue.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Major Defect in Combining Classes of Tibetan Vowels

2003-06-26 Thread Peter_Constable

Michael Everson wrote on 06/25/2003 04:36:20 PM:

[ re Biblical Hebrew ]

> Write it up with glyphs and minimal pairs and people will see the 
> problem, if any. Or propose some solution. (That isn't "add duplicate 
> characters".)

The only solution that UTC is willing to consider I have already submitted 
in a proposal (L2/03-195).



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Major Defect in Combining Classes of Tibetan Vowels

2003-06-26 Thread Peter_Constable

John Cowan wrote on 06/25/2003 03:15:21 PM:

> I don't understand how the current implementation "breaks BH text".
> At worst, normalization may put various combining marks in a 
non-traditional
> order, but all alternative orders are canonically equivalent anyway, and
> no (ordinary) Unicode process should depend on any specific order.

No, John, there are distinctions in Biblical Hebrew related to ordering, 
but due to the canonical combining classes these distinctions are all 
neutralized under canonical ordering / normalization. The alternate orders 
are canonically equivalent, but should not have been so.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Major Defect in Combining Classes of Tibetan Vowels

2003-06-26 Thread Peter_Constable

Ken Whistler wrote on 06/25/2003 05:29:59 PM:

> > The point is that hiriq before patah is *not* 
> > canonically equivalent to patah before hiriq,
> 
> This is true. 
> 
> > except in the erroneous 
> > assumption of the Unicode Standard: the order of vowels makes words 
sound 
> > different and mean different things.
> 
> This is not.

Ken, I think you're reading John differently than he intended: the Unicode 
character sequences < hiriq, patah > and < patah, hiriq > *are* 
canonically equivalent, but the requirements for Biblical Hebrew are that 
alternate visual orders would correspond to different vocalizations, and 
thus the visual ordering of these does matter semantically, and therefore 
the encoded orders should *not* be canonically equivalent.


> The current situation is not optimal for implementations, nor
> does canonically ordered text follow traditional preferences
> for spelling order -- that we can agree on. But I think the
> claims of inadequacy for the representation or rendering
> of Biblical Hebrew text are overblown.

The serious problem is that the writing distinctions that matter cannot 
currently be reliably represented, as they are not preserved under 
canonical ordering / normalization. This is all just a rehash of 
discussions we had on this list back in December, at which time it was 
acknowledged that this was the case, and that this was a problem.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Major Defect in Combining Classes of Tibetan Vowels

2003-06-25 Thread Peter_Constable

Andrew C. West wrote on 06/25/2003 09:31:51 AM:

> What I'm suggesting is that although "cui" <0F45, 0F74, 0F72> and "ciu" 
<0F45,
> 0F72, 0F74> should be rendered identically, the logical ordering of the
> codepoints representing the vowels may represent lexical differencesthat 
would
> be lost during the process of normalisation.

How can things that are visually indistinguishable be lexically different? 
We don't encode the phonological distinctions between homographs; we 
encode text.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Major Defect in Combining Classes of Tibetan Vowels

2003-06-25 Thread Peter_Constable

Michael Kaplan wrote on 06/25/2003 10:55:47 AM:

> Let me add that this was the case recently for Hebrew (to mention on
> example). So it is certainly not impossible.

The Hebrew issue is different: that involves things that *are* visually 
distinct, and that distinction cannot be represented in a reliable manner.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Revised N2586R

2003-06-25 Thread Peter_Constable

William Overington wrote on 06/25/2003 06:26:25 AM:

> Well, I realize that what I say may, at first glance, possibly appear
> extreme at times, yet please do consider what I write in an objective
> manner.  If Unicode has a WHEELCHAIR SYMBOL then that is a symbol, if
> Unicode encodes a HANDICAPPED SIGN then that is a description of someone 
to
> whom it is applied, a Boolean sign for all, whatever the disability may 
be,
> whether it is relevant to the matter in hand or not.  I do wonder 
whether
> the encoding of the symbol as HANDICAPPED SIGN would be consistent with
> human rights as it would be assisting automated decision making with a
> Boolean flag and providing an infrastructure for such practices.

Wm, the name is simply a unique identifier within the std. A name may be 
somewhat indicative of it's function, but is not necessarily so. You could 
call it WHEELCHAIR SYMBOL, but that engineering of the standard is not 
also social engineering, and people may still use it to label individuals 
in a way that may be violating human rights -- we cannot stop that. No 
matter what we call it, end users are not very likely going to be aware of 
the name in the standard; they're just going to look for the shape, and if 
they find it, they'll use it for whatever purpose they chose to.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Revised N2586R

2003-06-24 Thread Peter_Constable

William Overington wrote on 06/24/2003 05:32:56 AM:

> In that the document proposes U+2693 for FLEUR-DE-LIS it would seem not
> unreasonable for fontmakers now to be able to produce fonts having a
> FLEUR-DE-LIS glyph at U+2693.

Bad idea. Bad William. No biscuit.

 
> However, what is the correct approach?

No, it is not. The correct approach is to first get something encoded in 
the standard, then create fonts with it at the assigned codepoint. If you 
want to put it in a font in the meantime, use a PUA codepoint, or create a 
font with a different encoding, such as a symbol font. Putting this at 
U+2693 in a unicode-encode font violates conformance requirements.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Revised N2586R

2003-06-24 Thread Peter_Constable

Michael Everson wrote on 06/24/2003 05:52:09 AM:

> Yes. Between the databases. For instance. Look, William, I' was 
> saying that for instance, an Arizona number plate

Oh yeah, that reminds me. When are you going to propose the SUGUARO 
SYMBOL? My wife's from Arizona; I'll back that one.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Revised N2586R

2003-06-24 Thread Peter_Constable

Philippe Verdy wrote on 06/24/2003 04:54:30 AM:

> This symbol [fleur-de-lis] is commonly found and used in some printed 
books, 
> sometimes as a bullet-like character, but most often to terminate a 
> chapter or add "fioritures" near a title

Well, such examples are better than a sample showing a description of the 
symbol and its significance. But bullets and flourishes aren't necessarily 
candidates for encoding in the UCS. There are an endless number of 
possible flourishes.


> often used in patterns of 
> 3 symbols

If the bullet / flourish is a set of 3 f-d-l in an inverted triangular 
pattern, someone would have to be proposing that combination as a 
distinct, atomic character.


> royalists, when opponsed to the later Emperor supporters which used 
> the Eagle, and the Republicans using branches of chest and olivetrees).

So, I suppose these are going to be proposed, too.


 
> A similar, culturally linked symbol is the "ermine spot", shortly 
> "ermine"

And the lion, and the gryffen, and the dragon, and...


> The ermine spot seems to be found and used in 
> various places, including modern book publications within text, 
> where it is not only considered "decorative" but linked to a strong 
> Breton reference.

Create a doc with samples.

 
- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Major Defect in Combining Classes of Tibetan Vowels

2003-06-24 Thread Peter_Constable

Christopher John Fynn wrote on 06/21/2003 08:23:17 PM:

>  Any suggestions as to how to create a standardized work around
>  for these incorrect values?

Propose new characters, and deprecate the old ones?



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Revised N2586R

2003-06-23 Thread Peter_Constable

Michael Everson wrote on 06/23/2003 07:54:13 AM:

> We have *all* seen the atom sign, and I have, 
> as Liungman points out, seen it on maps, though I don't seem to have 
> such a map here in the house.

But just because a symbol appears on maps, does that mean it should be 
encoded as a character? I've seen a lot of maps that have a pointed cross 
showing four cardinal points of the compass; should we encode that?


> Similarly, the fleur-de-lis is a 
> well-known named symbol which can be used to represent a number of 
> things.

In text? I've seen it on flags, on license plates, on heraldic crests, but 
can't recall seeing it in text.


> I do the best I can. At the end of the day my document won its case 
> and the five characters were accepted.

So, this isn't a new proposal? These characters have already been 
accepted? (If so, that's fine.)



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Unicode not in Quark 6

2003-06-23 Thread Peter_Constable

John Jenkins wrote on 06/22/2003 05:25:40 PM:

> MySQL is also available for Mac OS X 
> ().  I'm not sure 
> of the status of Unicode support, but it seems to be fine if you're not 
> worrying about collating or similar services.  It's what's used at the 
> moment to host the Unihan database, for example.

It (MySQL for OS X) is also being used for the content management system 
that drives scripts.sil.org, and so far we haven't encountered any 
problems in storing text content using Unicode characters from a variety 
of ranges.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Revised N2586R

2003-06-22 Thread Peter_Constable

It seems to me the proposal would present a stronger case if samples were 
available that were something *other* than an explanation of the symbol in 
a dictionary, encyclopaedia, or other reference. It would be similar to 
these kinds of samples if I were to create a proposal using as a sample 
the Phonetic Symbol Guide, but that might not clearly show if a character 
was something that was merely proposed by someone at one time but never 
actually used -- in such a case, taking a sample from Phonetic Symbol 
Guide does not really demonstrate the need to encode as a character for 
text representation. Likewise, the sample for (e.g.) the fleur-de-lis 
doesn't really provide a case that this should be a character to 
facilitate representation in text. It wouldn't be hard to provide a 
comparable descriptive paragraph that began with an image of the Stars and 
Stripes, but I don't think we'd want to encode the US flag as a character.

I'm not saying that I oppose the proposed characters; just that samples of 
a different nature would make for a stronger case.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Problem with Arial Unicode MS font for BOLD/ITALICS in PDF

2003-06-20 Thread Peter_Constable


Philippe Verdy wrote on 06/20/2003 03:29:17 PM:

> I think that Italic is to avoid for most Asian scripts, as readers are
not
> used to it. For Arabic it may cause problems because of the placement
> of diacritic points.

Thai type designers are extremely creative and not afraid of doing with
Thai type most anything that gets done with Latin type, as well as some
things that I have not seen done with Latin type. Bold and italic? No prob.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

some resources online

2003-06-20 Thread Peter_Constable

A few years ago, we put together a book of background reading for a
workshop we were doing on implementing writing systems on computers using
current technologies. The readings consisted of a collection of items that
had been prepared by different authors, but there was some coordination of
topics for at least some of the articles.

 Anyway, some of the articles in this book are now available online:



The NRSI Model for Implementing Writing Systems: An introduction

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IWS-Chapter01


An introduction to keyboard design theory: What goes where?

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=KeybrdDesign


Rendering technologies overview

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IWS-Chapter07


An Introduction to TrueType Fonts: A look inside the TTF format

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IWS-Chapter08


Challenges in publishing with non-Roman scripts

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IWS-Chapter09


TrueType table listing

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IWS-AppendixC

Glossary
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=Glossary



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Chinese "departing" tone marks

2003-06-20 Thread Peter_Constable


"Andrew C. West" <[EMAIL PROTECTED]> wrote on 06/20/2003
09:14:35 AM:

> It hadn't occurred to me that these contoured tone marks could be
> represented in
> Unicode by means of ligatures. Are there any fonts that currently support
such
> ligatures ?

The bigger question is, can your software access the ligatures? We're we're
working on a font that supports the ligatures using either OpenType or
Graphite, hoping to having have it out around August. (You can get a peek
at an alpha -- the page says beta, but against my better judgment -- at
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=DoulosSILfont;
 I think this build has OT and Graphite tables, but am not positive.)



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Chinese "departing" tone marks

2003-06-20 Thread Peter_Constable


Ken Whistler wrote on 06/19/2003 01:35:14 PM:

> P.S. Is somebody collecting the 'Every character has a story'
> stories?

It's a small start, and I can't guarantee how far it will get developed.

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=CatUnicodeCharacterStories



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

RE: Problem with Arial Unicode MS font for BOLD/ITALICS in PDF

2003-06-20 Thread Peter_Constable


> Edward,
>thanks for the response. Is it possible to integrate glyph for
> bold and italic in arialuni.ttf or can I have one font which support all
> the languages and also have related glyph for bold and italic.

Bold and italic need to be separate font files, and these do not exist for
Arial Unicode MS -- and I wouldn't count on such appearing any time soon.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Chinese "departing" tone marks

2003-06-20 Thread Peter_Constable


Andrew C. West wrote on 06/20/2003 03:59:10 AM:

> I noticed that in Yuan Jiahua's authoritative overview of Chinese
dialects,
> _Hanyu Fangyan Gaiyao_ (2nd ed., 1980), he uses left-stemmed mirrors of
the
> ordinary right-stemmed tone marks to indicate tone sandhi, the unmutated
tone
> having a right stem, immediately followed by the mutated tone with aleft
stem

Just so -- these left vs. right stems are distinct for Chinese linguists,
which is why I have planned to proposed five left-stemmed tone letters.

> (I can send you a scan off-list if you want). The examples he gives
include
> marks that look identical to U+02EA and U+02EB, as well as many
> other left- and
> right-stemmed tone marks that are not currently encoded in Unicode. Are
these
> the subject of your proposal by any chance ?

Hard to say without seeing them, but if they are simply contours, then
those are already supported in Unicode by means of ligatures of the five
already there. If it's something else, go ahead and send me the scan (with
bibliographic details, please); if it's just contours, then I've got
samples.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Chinese "departing" tone marks

2003-06-19 Thread Peter_Constable


Ken Whistler wrote on 06/19/2003 01:35:14 PM:

> In Bopomofo these tone marks are typeset in a separate
> vertical line to the right of the vertical line of
> Bopomofo. In horizontal typesetting, they are either
> over the top of the Bopomofo or set at the upper-right
> shoulder of the Bopomofo character representing the
> 'final' of a syllable.
>
> It is possible that they were originally designed by a
> dyslexic based on the Chao tone letters, but no, they
> are not really intended to be used as particular values
> of those or as an augmentation of those tone letters. Instead,
> as I indicated, they are used with another collection of
> tonal diacritics for extended Bopomofo.
>
> So no, they glyphs haven't been munged. But these obviously
> need better annotation for the names list.

That sounds, then, like these are *not* two of the left-stemmed tone
letters (mirrors of 02E5..02E9) that I'm going to be including in a
proposal for additional modifier characters for tone.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Chinese "departing" tone marks

2003-06-18 Thread Peter_Constable

Can anybody enlighten me with regard to the following two characters?

U+02EA ˪ ModYinDepartingTnMrk
U+02EB ˫ ModYangDepartingToneMrk

What do they represent? Where/by whom are they used? Any pointers to 
sample text using them? 


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

RE: International Font to be Used

2003-06-10 Thread Peter_Constable


Raymond Mercier wrote on 06/09/2003 12:04:58 PM:

> At 12:16 09/06/2003 -0400, you wrote:
> >One (free) tool that will allow you to investigate what blocks of
Unicode
> >are actually covered in a font file is:
> >
> >http://pfaedit.sourceforge.net/
>
> And to see what fonts on your disk support specified unicode blocks,
> another free tool at
> http://ourworld.compuserve.com/homepages/RaymondM/unisearch.htm

Addendum: I added this one too.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

RE: International Font to be Used

2003-06-10 Thread Peter_Constable


Edward H Trager wrote on 06/09/2003 11:16:19 AM:

> One (free) tool that will allow you to investigate what blocks of Unicode
> are actually covered in a font file is:
>
>http://pfaedit.sourceforge.net/

Added to my list of font tools on the web, at

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=fonttoollinks



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: UNESCO standard keyboards? (Re: Tamazight/berber language : ....)

2003-06-06 Thread Peter_Constable


Philippe Verdy wrote on 06/06/2003 03:27:04 AM:

> Thanks for this information. However I don't think I stated that
> UNESCO was a standard body

I didn't interpret your statement that way, but some might have interpreted
*Don's* statement as implying SIL and UNESCO were developing ICT standards
for keyboard layouts.


- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Tamazight/berber language : How to send mail, write word documents ....

2003-06-06 Thread Peter_Constable


Miikka-Markus Alhonen wrote on 06/06/2003 01:31:55 AM:

> > (Epsilon? Where'd that come from?)

> Epsilon is the second character in the alphabet chart corresponding to
> Arabic 'ayn (U+0639, a voiced pharyngeal fricative).

Right. I missed that at first. They character that appears here is U+025B
LATIN SMALL LETTER OPEN E. Perhaps what they should have used for the
pharyngeal fricative, though, is U+01B9 LATIN SMALL LETTER EZH REVERSED.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

Re: Tamazight/berber language : How to send mail, write word documents ....

2003-06-05 Thread Peter_Constable


Philippe Verdy wrote on 06/05/2003 11:10:45 AM:

> However the interesting part of your question for discussion in this list
is:
> - Which Unicode character should be used to encode the spacing ring?

That is U+1D52 MODIFIER LETTER SMALL O.


> - Should you use a Greek Gamma or a Latin Gamma, and a Greek epsilon?

(Epsilon? Where'd that come from?) That is U+0263 LATIN SMALL LETTER GAMMA.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485

1 2 3 4 5 6 7 8 9 >

1 - 100 of 878 matches

Mail list logo