Re: missing .GIF's for ideographs on unicode.org?

2003-07-16 Thread Richard Cook
"Ostermueller, Erik" wrote:
> 
> I apologize if you all have already discussed this.
> 
> At unicode.org, when I click this link,
> 
> http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=2
> 
> I'm expecting to see a little square GIF that displays U+2.
> Instead, I see "N/A".
> 
> Shouldn't there be a link like this?
> http://www.unicode.org/cgi-bin/refglyph?24-2
> 
> What am I doing wrong here?
> 
Erik, I think you are correct. The link should be like so:

http://www.unicode.org/cgi-bin/refglyph?24-2

I'm guessing this just hasn't been implemented yet.

-Richard



missing .GIF's for ideographs on unicode.org?

2003-07-16 Thread Ostermueller, Erik
I apologize if you all have already discussed this.

At unicode.org, when I click this link,

http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=2

I'm expecting to see a little square GIF that displays U+2.
Instead, I see "N/A".

Shouldn't there be a link like this?
http://www.unicode.org/cgi-bin/refglyph?24-2

What am I doing wrong here?

Thanks,

Erik O.



Re: Aramaic, Samaritan, Phoenician

2003-07-16 Thread Thomas M. Widmann
Michael Everson <[EMAIL PROTECTED]> writes:

> At 20:17 +0100 2003-07-15, Thomas M. Widmann wrote:
> >
> > But if that criterion is applied, surely Georgian Xucuri/Khutsuri
> > should be separated from Georgian Mxedruli/Mkhedruli: Although
> > there roughly is a one-to-one correspondence between the two, and
> > although both are generally applied to the same language (though
> > normally to different stages of it), they definitely are not
> > mutually intelligible (and in fact knowledge of Xucuri seems to be
> > quite low in Georgia).
> 
> The UTC has agreed that we should do this. After 8 years or so of my
> whining ;-)

That's excellent news!

Well whined! ;-)

/Thomas
-- 
Thomas Widmann, MA  +44  141 419 9872   Glasgow, Scotland, EU
[EMAIL PROTECTED] http://www.widmann.uklinux.net



Hebrew with Aramaic, Phoenician etc

2003-07-16 Thread Peter Kirk
I asked the following question on the b-hebrew and biblical-languages 
lists (http://lists.ibiblio.org/mailman/listinfo/b-hebrew, 
http://lists.ibiblio.org/mailman/listinfo/biblical-languages):



Are there scholarly publications (more recent than BDB!) which quote 
inscriptional Aramaic, Phoenician, Samaritan, paleo-Hebrew etc as well 
as Hebrew? In such cases, what scripts are used for Aramaic, 
Phoenician etc? BDB (1906) quoted these and even south Arabian 
inscriptions in Hebrew script. But what is the modern practice? Are 
ancient alphabets (other than Hebrew, Arabic, Syriac etc which are in  
modern use) ever used in such publications? Are these languages ever 
transcribed in Hebrew script, or only in Latin script transliteration? 
I am interested in practice in Israeli journals in modern Hebrew as 
well as in journals in western languages.
Some responses I have received:

From a PhD student in Semitics at a major US university:

As far as I know, they are normally transcribed in Latin or Hebrew
letters. There may be some need for Samaritan as its own script, but
generally speaking the epigraphic scripts are better hand-drawn where
necessary.
From a Jewish professor at a US university:

Today, even Israeli academic (Hebrew-language) journals usually prefer
Latin transcription rather than Hebrew, though publications meant for the
lay public often use Hebrew. 

My personal feeling is that using specific scripts for any but the most
commonly-studies languages would be lost on the readership of all but the
most specialized publications. 
 

From a PhD candidate in early Judaism in Canada:

Current scholarly practice is to transcribe such texts with either the
square "Hebrew" script (e.g., Discoveries in the Judean Desert; Syrian
Semitic Inscriptions) or transliteration (e.g., Gogel's Grammar of
Epigraphic Hebrew). As for Israeli scholars, Kutscher's _The Language and
Linguistic Background of the Isaiah Scroll_ even transcribes Syriac and
Ugaritic and some Arabic (as well as Phoenician, Samaritan, Lachish,
Elephantine, Palmyrene, Mandean, Gaonic) into "Hebrew" script, although
El-Amarna words is transcribed into Latin characters, and Arabic words may
be also be in Arabic script or transliteration.
Two things however, may be worthwhile considering for Unicode:

(1) Although it is possible to transcribe inscriptional numerals as Arabic
(i.e. Western) numerals, some (e.g., Gogel) still reproduce their
inscriptional shapes in transcription. 

(2) Clarification on how to note uncertain readings in transcription (a
circle or dot above the uncertain letter). I've been using HEBREW MARK
MASORA CIRCLE 05AF and HEBREW MARK UPPER DOT 05C4 for this purpose, but I'm
not sure if this is recommended practice.
I'll let you all know if I get any more relevant feedback.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Re: Re: Article on Unicode in Globalization Insider

2003-07-16 Thread vladimirg
> >http://www.lisa.org/archive_domain/newsletters/2003/
> >3.2/lommel_unicode.html
>
> This link seems to be broken. I get a message *Our apologies*
> *The page you requested is not available.*

I guess you just have to combine the whole URL properly into one line.

Vladimir



Re: [Private Use Area] Audio Description, Subtitle, Signing

2003-07-16 Thread Michael Everson
William.

If CENELEC wishes to standardize a set of icons, they will do so. If 
they have a need to interchange data using those icons, they will (if 
they are wise) come to us an ask to encode them. If they want to use 
the Private Use Area before they do that, they will.

Please don't tell us all about it over and over again, as you have 
done. If you want to talk to CENELEC, do so. Please stop trying to 
peddle your PUA schemes for CENELEC to us.

I maintain the ConScript Unicode Registry, which contains PUA 
assignments. I do not promulgate those on this list. (Apart from that 
fun testing of the Phaistos implementation some time ago.)

Roozbeh and I assigned two unencoded characters for Afghanistan to 
the PUA, and we encourage implementors to use them until such time as 
the characters are encoded.

We do not spend oceans of digital ink evangelizing our brilliant 
schemes to the Unicode list.

It is essentially a matter for end users of the system, just as the 
two Private Use Area characters being suggested in another thread of 
this forum in relation to Afghanistan are a matter for end users of 
the Unicode Standard and does not affect the content of the Unicode 
Standard itself.
Then go talk about it with the users of the system.

Code points for the symbols are needed now or in the near future.
Are they? By whom? And if they need to use the PUA, they can do so. 
It's Private.

It remains to be seen what will be decided as the built-in font for the
European Union implementation of the DVB-MHP specification.  It might be the
minimum font of the DVB-MHP specification or it might be more comprehensive.
For example, should Greek characters be included?  Should weather symbols be
included?  These and many other issues remain to be decided.
The minimum font for any specification for Europe should be the 
MES-2. If you are talking to these people, tell them.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: [Private Use Area] Audio Description, Subtitle, Signing

2003-07-16 Thread Philippe Verdy
On Wednesday, July 16, 2003 12:33 PM, William Overington <[EMAIL PROTECTED]> wrote:

> Peter Constable wrote as follows.
> 
> I have posted the suggested code points within the Cenelec hosted
> discussion some time ago.
> 
> > > and who might like to know of this
> > > suggestion.  Also, the symbols might well be used in hardcopy
> > > television programme listing magazines, so it would be desirable
> > > to have them available in fonts.
> > 
> > Think about the workflow for such magazines and then tell me again
> > you're not suggesting PUA codepoints for use in interchange.
> 
> Well, I am here suggesting Private Use Area code points for
> interchange, both in interactive broadcasting and in typesetting of
> magazines. 
> 
> Where I was specifically not suggesting interchange of Private Use
> Area code points was (in other threads) in the use of Private Use
> Area code points for precomposed characters which are display glyphs
> for sequences of Unicode characters, where such display glyphs are
> accessed using a eutocode typography file.

Given that Java already allows using resources such as icon bitmaps or
classes, and that it also fully supports the PUA. Given that the buil-in
core Java engine will certainly include the appropriate minimum fonts
to support these characters. Given that it will work within the private
domain of interactive television.

Given that the navigation code will be broadcasted as compiled Java
file archives that may contain all the necessary resources as
completely embedded documents.

Do we really need to define these characters in Unicode? Your
experimentation can still start using some PUA of its choices,
and embedded fonts for the symbols you need, and it will not
require an allocation.

The definition of an open-standard normally requires a prior definition,
approvals from distinct actors, regulators or standardization organism
or forum or a community of independant users, and an effective
implementation. The initial launch of the service does not need a fixed
assignment for these symbols in Unicode. Such usage of symbols
will start using private collections of symbols in icons or fonts.

This does not restrict the required usage for documentation of these
symbols and their usage, which can use a custom font used either
in the broadcasted Java application (and can be changed at anytime
on each broadcasted program, according to editor's needs). Use of
conventional symbols that will look ugly in various countries or
cultures will start by a lot of experimentation (including meteorological
symbols whose use in plain-text seems ugly, when viewers will prefer
see maps or will want to benefit from a rich-text layout).

Why couldn't this service use a web-like (HTML) navigation system,
with hyperlinks? When I look at my remote command for my
Teletex-enabled TV set, I already have most of the tools needed,
and I would not like to have more than a dozen of supplementary
buttons. In fact there already is 4 navigation buttons (with colors
Red, Green, Blue, Yellow), and the numeric keypad to specify the
page number to view.

In the existing Teletext service, which is based on the legacy
Videotex and ANSI escape sequences to control the layout
and presentation, users don't care about the encoding.

But Teletext applications are limited in their presentation by:
- the number of supported characters
- no support for bitmaps (only mosaic graphics)
- very few variations for the font sizes
- limited content of the screen, typically 24x40 characters
- few colors (8)

Adding the suppor of Unicode and Java will allow using a
richer and more interesting experience on this revized
Teletext service which was designed 20 years ago, and
widely available on TV sets only since 10 years (before that
you needed a separate "decoder").

Which content will be appropriate to broadcast on interactive
TV channels is still something to discuss.

But the audio description system for subtitles already exists
on most European broadcasted channels (page 888 of their
Teletext service), which are encoded in the normally not
displayed top and bottom rows of video frames (that's why
they are often removed on satellite or cable services, to
limit the necessary bandwidth for each analogic channel).

Digital broadcasts with MPEG4 will change the panorama,
but there are still other competing technologies, notably within
the MPEG standard itself, which supports extensions
commonly found on DVDs... If the intent is too reduce costs
by reusing other existing standards, I can't see why the
existing technologies used in Video-DVD can't be used on
interactive broadcasted numeric technologies.

In any case, the system will not use only plain-text: it will
support many media-formats, and it will require an "enveloppe"
format to embed and multiplex them in the broadcasted
program. This format will be rich enough to allow specifying
non textual data (such as Java classes or JARs) or meta-data.

Why then is there a need to encode 

Re: Article on Unicode in Globalization Insider

2003-07-16 Thread Peter Kirk
On 16/07/2003 03:19, Alex Lam wrote:

http://www.lisa.org/archive_domain/newsletters/2003/3.2/lommel_unicode.h
tml


 

This link seems to be broken. I get a message "*Our apologies*
*The page you requested is not available."*
--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Re: Article on Unicode in Globalization Insider

2003-07-16 Thread Peter Kirk
On 16/07/2003 03:19, Alex Lam wrote:

http://www.lisa.org/archive_domain/newsletters/2003/3.2/lommel_unicode.h
tml


 

Ah, I see the problem is that the final "tml" has become detached from 
the URL, already in the source I received. That's the problem with URLs 
as long as that. I added the "tml" in my browser window, and now I have 
the article.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/




Re: [Private Use Area] Audio Description, Subtitle, Signing

2003-07-16 Thread William Overington
Peter Constable wrote as follows.

>William Overington wrote on 07/15/2003 05:33:22 AM:
>
>> >William, CENELEC is an international standards body. Such bodies either
>> >create their own standards or use other international standards. They do
>> >not use PUA codepoints.
>>
>> Well, the fact of the matter is that Cenelec is trying to achieve a
>> consensus for the implementation of interactive television within the
>> European Union
>
>And that does not require PUA codepoints; moreover, your response does not
>escape the fact I was pointing out that a standards body will not be
>publishing standards that make reference to PUA codepoints.

Please have a look at what Cenelec is do in trying to achieve a consensus
for the implementation of interactive television within the European Union.
This particular project for the European Commission is trying to achieve a
consensus for the implementation of interactive television within the
European Union.  Your comments seem to relate to standards bodies generally
or as to how Cenelec proceeds generally.  This project is a particular
project trying to achieve a consensus for the implementation of interactive
television within the European Union.  The difference is that things need to
move forward promptly.  There are lots of aspects, such as how many buttons
to have on a hand-held infra-red control device for end user interaction
with a running Java program (that is, the _minimum_ twenty of the DVB-MHP
specification, or some more) and such as whether mouse events should be
accessible to end users (as the DVB-MHP specification has mouse event access
as optional in interactive televisions) and so on.

What you write in relation to most projects carried out by standards bodies
may well be true, yet I was writing specifically about one particular
project being run by Cenelec.

>> In view of the fact that the interactive television system (DVB-MHP,
>Digital
>> Video Broadcasting - Multimedia Home Platform http://www.mhp.org ) uses
>Java
>> and Java uses Unicode it is then a matter of deciding how to be able to
>> signal the symbols in a Unicode text stream.
>
>And they won't be standardizing on symbols encoded using PUA codepoints.

The "deciding" is not about something to incorporate into the DVB-MHP
standard.  It is a matter of trying to gain a consensus as to how to signal
those symbols at the present time and in the near future (that is, until (if
and when) some regular Unicode code points are achieved) within Java
programs which run upon the DVB-MHP platform and in fonts which are used
upon the DVB-MHP platform.  It is essentially a matter for end users of the
system, just as the two Private Use Area characters being suggested in
another thread of this forum in relation to Afghanistan are a matter for end
users of the Unicode Standard and does not affect the content of the Unicode
Standard itself.

>> In view of the fact that the process of getting regular Unicode code
>points
>> for the symbols would take quite a time, and indeed that there is as yet
>no
>> agreement on which symbols to use, and that the implementation of
>> interactive television needs to proceed, it seems to me that putting
>forward
>> three specific Private Use Area code points for the symbols at this time
>is
>> helpful to the process.
>
>Then you obviously don't understand the process.

Well, maybe I don't.  However, the fact of the matter is that sooner or
later some code points are needed to signal those symbols.  I have put
forward three suggested code points.  I also mentioned them in this mailing
list.  My specific suggestions are in the Private Use Area and do not clash
with various uses of the Private Use Area known to me.  So three specific
code points have been mentioned and I suggest that having those three code
points published both in the Cenelec forum and here is beneficial as if they
are used then various potential problems which could have arisen if some
other choices (such as three unused code points in regular Unicode or
several different sets of three code points in regular Unicode) were used.

>> >Such things are *not* useful. They do not achieve consistency, not in
>the
>> >short term, and most certainly not in the long term. If consistency is
>> >needed, the standardization process is used to established standardized
>> >representations.
>>
>> Well, what is the alternative?
>
>The alternative to agreeing on a standard? None, but why would you need an
>alternative?

Code points for the symbols are needed now or in the near future.  The
symbol designs are not yet agreed.  Obtaining regular Unicode points, if
achievable, would take quite a time.  With my suggested code points
published, decisions on which symbol designs to use and getting them into
use with everyone using the same code points could happen within a few days.

>> The code points are in the Private Use Area,
>> so the suggestion avoids the possibility of a non-conformant use of a
>> regular Unicode code point.
>
>T

Article on Unicode in Globalization Insider

2003-07-16 Thread Alex Lam
http://www.lisa.org/archive_domain/newsletters/2003/3.2/lommel_unicode.h
tml




Re: Combining diacriticals and Cyrillic

2003-07-16 Thread Philippe Verdy
On Wednesday, July 16, 2003 8:55 AM, William Overington <[EMAIL PROTECTED]> wrote:

> Peter Constable wrote as follows.
> 
> > William Overington wrote on 07/15/2003 07:22:22 AM:
> > 
> > > No, the Private Use Area codes would not be used for interchange,
> > > only locally for producing an elegant display in such
> > > applications as chose to use them.  Other applications could
> > > ignore their existence. 
> > 
> > Then why do you persist in public discussion of suggested
> > codepoints for such purposes? If it is for local, proprietary use
> > internal to some implementation, then the only one who needs to
> > know, think or care about these codepoints is the person creating
> > that implementation. 
> 
> The original enquiry sought advice about how to proceed.  I posted
> some ideas of a possible way to proceed.  If the idea of using a
> eutocode typography file is taken up and software which uses it is
> produced, then it would be reasonable to have a published list of
> Private Use Area code points for the precomposed characters which are
> to be available, as in that way the output stream from the processing
> could be viewed with a number of fonts from a variety of font makers
> without needing to change the eutocode typography file if one changed
> font. 
> 
> I have not published many of my suggested code points in this forum
> precisely because a few people do not want them published here.  For
> example, there is the ViOS-like system for a three-dimensional visual
> indexing system for use in interactive broadcasting.
> 
> > > Publishing a list of Private Use Area code points would
> > 
> > have absolutely no purpose at all.
> > 
> > 
> > > mean that such
> > > display could be produced using a choice of fonts from various
> > > font makers using the same software
> > 
> > Now you are talking interchange. Interchange means more than just
> > person A sends a document to person B. It means that person A's
> > document works with person B's software using person C's font. (An
> > alternate term that is often used, interoperate, makes this
> > clearer.) 
> 
> Exactly.  This is why publishing the list of Private Use Area code
> point assignments for the precomposed characters is a good idea. 
> Person B can display the document and then wonder if it might look
> better with that font made by person D and have a try with that font.
> If the list of Private Use Area code point assignments for the
> precomposed characters has been published and both C and D have used
> the list to add the extra Cyrillic characters into their fonts, then
> the published list of Private Use Area code point assignments for the
> precomposed characters has helped to achieve interoperability.
> 
> > > I feel that an important thing to remember is the dividing line
> > > between what is in Unicode and what is in particular advanced
> > > format font technology solutions
> > 
> > And best practice for advanced format font technologies eschews PUA
> > codepoints for glyph processing.
> 
> Who decides upon what is best practice?
> 
> > You've been told that several times by
> > people who have expertise in advanced font technologies, an area in
> > which you are not deeply knowledgable or experienced, by your own
> > admission. 
> 
> Well, it is not a matter of an "admission" as if dragged out of me
> under examination by counsel in a courtroom.  I openly stated the
> limits of my knowledge in that area, not as a retrospective defence
> yet as an up-front expression of the limitation of my knowledge when
> putting forward ideas, specifically so as not to produce any
> incorrect impression as to expertise in that area.
> 
> > > yet they are not suitable for platforms such as Windows 95 and
> > > Windows 98, whereas a eutocode typography file approach would be
> > > suitable for those platforms and for various other platforms.
> > 
> > Wm, if someone wanted, they could create an advanced font
> > technology to work on DOS, but why bother? Who's going to create
> > all the new software that works with that technology, and make it
> > to work within the limitations of a DOS system?
> 
> Yet I am not suggesting a system to work on DOS.
> 
> > Your idea is at best a mental exercise, and even if you or
> > someone else built an implementation, what is not needed is some
> > public agreement on PUA codepoints for use in glyph processing.
> 
> When you say "agreement" I am not suggesting agreement in some formal
> manner.  It is more like the authorship of a story where people may
> read it or not as they choose.  Yet if people do read the story, or
> watch a television or movie implementation of it, a common culture
> may come to exist amongst the readers which can be applied in other
> circumstances. 
> 
> For example, "it's as if on a holodeck and a character says 'arch'
> and " is something which people who have watched Star Trek The
> Next Generation may use as a cultural way of expressing something.
> 
> The original enquir

Re: Combining diacriticals and Cyrillic

2003-07-16 Thread William Overington
Peter Constable wrote as follows.

>William Overington wrote on 07/15/2003 07:22:22 AM:
>
>> No, the Private Use Area codes would not be used for interchange, only
>> locally for producing an elegant display in such applications as chose to
>> use them.  Other applications could ignore their existence.
>
>Then why do you persist in public discussion of suggested codepoints for
>such purposes? If it is for local, proprietary use internal to some
>implementation, then the only one who needs to know, think or care about
>these codepoints is the person creating that implementation.

The original enquiry sought advice about how to proceed.  I posted some
ideas of a possible way to proceed.  If the idea of using a eutocode
typography file is taken up and software which uses it is produced, then it
would be reasonable to have a published list of Private Use Area code points
for the precomposed characters which are to be available, as in that way the
output stream from the processing could be viewed with a number of fonts
from a variety of font makers without needing to change the eutocode
typography file if one changed font.

I have not published many of my suggested code points in this forum
precisely because a few people do not want them published here.  For
example, there is the ViOS-like system for a three-dimensional visual
indexing system for use in interactive broadcasting.

>> Publishing a list of Private Use Area code points would
>
>have absolutely no purpose at all.
>
>
>> mean that such
>> display could be produced using a choice of fonts from various font
>makers
>> using the same software
>
>Now you are talking interchange. Interchange means more than just person A
>sends a document to person B. It means that person A's document works with
>person B's software using person C's font. (An alternate term that is often
>used, interoperate, makes this clearer.)

Exactly.  This is why publishing the list of Private Use Area code point
assignments for the precomposed characters is a good idea.  Person B can
display the document and then wonder if it might look better with that font
made by person D and have a try with that font.  If the list of Private Use
Area code point assignments for the precomposed characters has been
published and both C and D have used the list to add the extra Cyrillic
characters into their fonts, then the published list of Private Use Area
code point assignments for the precomposed characters has helped to achieve
interoperability.

>> I feel that an important thing to remember is the dividing line between
>what
>> is in Unicode and what is in particular advanced format font technology
>> solutions
>
>And best practice for advanced format font technologies eschews PUA
>codepoints for glyph processing.

Who decides upon what is best practice?

>You've been told that several times by
>people who have expertise in advanced font technologies, an area in which
>you are not deeply knowledgable or experienced, by your own admission.

Well, it is not a matter of an "admission" as if dragged out of me under
examination by counsel in a courtroom.  I openly stated the limits of my
knowledge in that area, not as a retrospective defence yet as an up-front
expression of the limitation of my knowledge when putting forward ideas,
specifically so as not to produce any incorrect impression as to expertise
in that area.

>> yet they are not suitable for platforms such as Windows 95 and
>> Windows 98, whereas a eutocode typography file approach would be suitable
>> for those platforms and for various other platforms.
>
>Wm, if someone wanted, they could create an advanced font technology to
>work on DOS, but why bother? Who's going to create all the new software
>that works with that technology, and make it to work within the limitations
>of a DOS system?

Yet I am not suggesting a system to work on DOS.

>Your idea is at best a mental exercise, and even if you or
>someone else built an implementation, what is not needed is some public
>agreement on PUA codepoints for use in glyph processing.

When you say "agreement" I am not suggesting agreement in some formal
manner.  It is more like the authorship of a story where people may read it
or not as they choose.  Yet if people do read the story, or watch a
television or movie implementation of it, a common culture may come to exist
amongst the readers which can be applied in other circumstances.

For example, "it's as if on a holodeck and a character says 'arch' and "
is something which people who have watched Star Trek The Next Generation may
use as a cultural way of expressing something.

The original enquiry referred as if a number of people are trying to solve
the problem.  If a list of the characters is published with Private Use Area
code points from U+EF00 upwards, then they could all, if they so choose, use
that set of code points and it might help in font interoperability,
certainly if they choose to implement a eutocode typography file sys