Re: missing .GIF's for ideographs on unicode.org?
"Ostermueller, Erik" wrote: > > I apologize if you all have already discussed this. > > At unicode.org, when I click this link, > > http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=2 > > I'm expecting to see a little square GIF that displays U+2. > Instead, I see "N/A". > > Shouldn't there be a link like this? > http://www.unicode.org/cgi-bin/refglyph?24-2 > > What am I doing wrong here? > Erik, I think you are correct. The link should be like so: http://www.unicode.org/cgi-bin/refglyph?24-2 I'm guessing this just hasn't been implemented yet. -Richard
missing .GIF's for ideographs on unicode.org?
I apologize if you all have already discussed this. At unicode.org, when I click this link, http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=2 I'm expecting to see a little square GIF that displays U+2. Instead, I see "N/A". Shouldn't there be a link like this? http://www.unicode.org/cgi-bin/refglyph?24-2 What am I doing wrong here? Thanks, Erik O.
Re: Aramaic, Samaritan, Phoenician
Michael Everson <[EMAIL PROTECTED]> writes: > At 20:17 +0100 2003-07-15, Thomas M. Widmann wrote: > > > > But if that criterion is applied, surely Georgian Xucuri/Khutsuri > > should be separated from Georgian Mxedruli/Mkhedruli: Although > > there roughly is a one-to-one correspondence between the two, and > > although both are generally applied to the same language (though > > normally to different stages of it), they definitely are not > > mutually intelligible (and in fact knowledge of Xucuri seems to be > > quite low in Georgia). > > The UTC has agreed that we should do this. After 8 years or so of my > whining ;-) That's excellent news! Well whined! ;-) /Thomas -- Thomas Widmann, MA +44 141 419 9872 Glasgow, Scotland, EU [EMAIL PROTECTED] http://www.widmann.uklinux.net
Hebrew with Aramaic, Phoenician etc
I asked the following question on the b-hebrew and biblical-languages lists (http://lists.ibiblio.org/mailman/listinfo/b-hebrew, http://lists.ibiblio.org/mailman/listinfo/biblical-languages): Are there scholarly publications (more recent than BDB!) which quote inscriptional Aramaic, Phoenician, Samaritan, paleo-Hebrew etc as well as Hebrew? In such cases, what scripts are used for Aramaic, Phoenician etc? BDB (1906) quoted these and even south Arabian inscriptions in Hebrew script. But what is the modern practice? Are ancient alphabets (other than Hebrew, Arabic, Syriac etc which are in modern use) ever used in such publications? Are these languages ever transcribed in Hebrew script, or only in Latin script transliteration? I am interested in practice in Israeli journals in modern Hebrew as well as in journals in western languages. Some responses I have received: From a PhD student in Semitics at a major US university: As far as I know, they are normally transcribed in Latin or Hebrew letters. There may be some need for Samaritan as its own script, but generally speaking the epigraphic scripts are better hand-drawn where necessary. From a Jewish professor at a US university: Today, even Israeli academic (Hebrew-language) journals usually prefer Latin transcription rather than Hebrew, though publications meant for the lay public often use Hebrew. My personal feeling is that using specific scripts for any but the most commonly-studies languages would be lost on the readership of all but the most specialized publications. From a PhD candidate in early Judaism in Canada: Current scholarly practice is to transcribe such texts with either the square "Hebrew" script (e.g., Discoveries in the Judean Desert; Syrian Semitic Inscriptions) or transliteration (e.g., Gogel's Grammar of Epigraphic Hebrew). As for Israeli scholars, Kutscher's _The Language and Linguistic Background of the Isaiah Scroll_ even transcribes Syriac and Ugaritic and some Arabic (as well as Phoenician, Samaritan, Lachish, Elephantine, Palmyrene, Mandean, Gaonic) into "Hebrew" script, although El-Amarna words is transcribed into Latin characters, and Arabic words may be also be in Arabic script or transliteration. Two things however, may be worthwhile considering for Unicode: (1) Although it is possible to transcribe inscriptional numerals as Arabic (i.e. Western) numerals, some (e.g., Gogel) still reproduce their inscriptional shapes in transcription. (2) Clarification on how to note uncertain readings in transcription (a circle or dot above the uncertain letter). I've been using HEBREW MARK MASORA CIRCLE 05AF and HEBREW MARK UPPER DOT 05C4 for this purpose, but I'm not sure if this is recommended practice. I'll let you all know if I get any more relevant feedback. -- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/
Re: Re: Article on Unicode in Globalization Insider
> >http://www.lisa.org/archive_domain/newsletters/2003/ > >3.2/lommel_unicode.html > > This link seems to be broken. I get a message *Our apologies* > *The page you requested is not available.* I guess you just have to combine the whole URL properly into one line. Vladimir
Re: [Private Use Area] Audio Description, Subtitle, Signing
William. If CENELEC wishes to standardize a set of icons, they will do so. If they have a need to interchange data using those icons, they will (if they are wise) come to us an ask to encode them. If they want to use the Private Use Area before they do that, they will. Please don't tell us all about it over and over again, as you have done. If you want to talk to CENELEC, do so. Please stop trying to peddle your PUA schemes for CENELEC to us. I maintain the ConScript Unicode Registry, which contains PUA assignments. I do not promulgate those on this list. (Apart from that fun testing of the Phaistos implementation some time ago.) Roozbeh and I assigned two unencoded characters for Afghanistan to the PUA, and we encourage implementors to use them until such time as the characters are encoded. We do not spend oceans of digital ink evangelizing our brilliant schemes to the Unicode list. It is essentially a matter for end users of the system, just as the two Private Use Area characters being suggested in another thread of this forum in relation to Afghanistan are a matter for end users of the Unicode Standard and does not affect the content of the Unicode Standard itself. Then go talk about it with the users of the system. Code points for the symbols are needed now or in the near future. Are they? By whom? And if they need to use the PUA, they can do so. It's Private. It remains to be seen what will be decided as the built-in font for the European Union implementation of the DVB-MHP specification. It might be the minimum font of the DVB-MHP specification or it might be more comprehensive. For example, should Greek characters be included? Should weather symbols be included? These and many other issues remain to be decided. The minimum font for any specification for Europe should be the MES-2. If you are talking to these people, tell them. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: [Private Use Area] Audio Description, Subtitle, Signing
On Wednesday, July 16, 2003 12:33 PM, William Overington <[EMAIL PROTECTED]> wrote: > Peter Constable wrote as follows. > > I have posted the suggested code points within the Cenelec hosted > discussion some time ago. > > > > and who might like to know of this > > > suggestion. Also, the symbols might well be used in hardcopy > > > television programme listing magazines, so it would be desirable > > > to have them available in fonts. > > > > Think about the workflow for such magazines and then tell me again > > you're not suggesting PUA codepoints for use in interchange. > > Well, I am here suggesting Private Use Area code points for > interchange, both in interactive broadcasting and in typesetting of > magazines. > > Where I was specifically not suggesting interchange of Private Use > Area code points was (in other threads) in the use of Private Use > Area code points for precomposed characters which are display glyphs > for sequences of Unicode characters, where such display glyphs are > accessed using a eutocode typography file. Given that Java already allows using resources such as icon bitmaps or classes, and that it also fully supports the PUA. Given that the buil-in core Java engine will certainly include the appropriate minimum fonts to support these characters. Given that it will work within the private domain of interactive television. Given that the navigation code will be broadcasted as compiled Java file archives that may contain all the necessary resources as completely embedded documents. Do we really need to define these characters in Unicode? Your experimentation can still start using some PUA of its choices, and embedded fonts for the symbols you need, and it will not require an allocation. The definition of an open-standard normally requires a prior definition, approvals from distinct actors, regulators or standardization organism or forum or a community of independant users, and an effective implementation. The initial launch of the service does not need a fixed assignment for these symbols in Unicode. Such usage of symbols will start using private collections of symbols in icons or fonts. This does not restrict the required usage for documentation of these symbols and their usage, which can use a custom font used either in the broadcasted Java application (and can be changed at anytime on each broadcasted program, according to editor's needs). Use of conventional symbols that will look ugly in various countries or cultures will start by a lot of experimentation (including meteorological symbols whose use in plain-text seems ugly, when viewers will prefer see maps or will want to benefit from a rich-text layout). Why couldn't this service use a web-like (HTML) navigation system, with hyperlinks? When I look at my remote command for my Teletex-enabled TV set, I already have most of the tools needed, and I would not like to have more than a dozen of supplementary buttons. In fact there already is 4 navigation buttons (with colors Red, Green, Blue, Yellow), and the numeric keypad to specify the page number to view. In the existing Teletext service, which is based on the legacy Videotex and ANSI escape sequences to control the layout and presentation, users don't care about the encoding. But Teletext applications are limited in their presentation by: - the number of supported characters - no support for bitmaps (only mosaic graphics) - very few variations for the font sizes - limited content of the screen, typically 24x40 characters - few colors (8) Adding the suppor of Unicode and Java will allow using a richer and more interesting experience on this revized Teletext service which was designed 20 years ago, and widely available on TV sets only since 10 years (before that you needed a separate "decoder"). Which content will be appropriate to broadcast on interactive TV channels is still something to discuss. But the audio description system for subtitles already exists on most European broadcasted channels (page 888 of their Teletext service), which are encoded in the normally not displayed top and bottom rows of video frames (that's why they are often removed on satellite or cable services, to limit the necessary bandwidth for each analogic channel). Digital broadcasts with MPEG4 will change the panorama, but there are still other competing technologies, notably within the MPEG standard itself, which supports extensions commonly found on DVDs... If the intent is too reduce costs by reusing other existing standards, I can't see why the existing technologies used in Video-DVD can't be used on interactive broadcasted numeric technologies. In any case, the system will not use only plain-text: it will support many media-formats, and it will require an "enveloppe" format to embed and multiplex them in the broadcasted program. This format will be rich enough to allow specifying non textual data (such as Java classes or JARs) or meta-data. Why then is there a need to encode
Re: Article on Unicode in Globalization Insider
On 16/07/2003 03:19, Alex Lam wrote: http://www.lisa.org/archive_domain/newsletters/2003/3.2/lommel_unicode.h tml This link seems to be broken. I get a message "*Our apologies* *The page you requested is not available."* -- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/
Re: Article on Unicode in Globalization Insider
On 16/07/2003 03:19, Alex Lam wrote: http://www.lisa.org/archive_domain/newsletters/2003/3.2/lommel_unicode.h tml Ah, I see the problem is that the final "tml" has become detached from the URL, already in the source I received. That's the problem with URLs as long as that. I added the "tml" in my browser window, and now I have the article. -- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/
Re: [Private Use Area] Audio Description, Subtitle, Signing
Peter Constable wrote as follows. >William Overington wrote on 07/15/2003 05:33:22 AM: > >> >William, CENELEC is an international standards body. Such bodies either >> >create their own standards or use other international standards. They do >> >not use PUA codepoints. >> >> Well, the fact of the matter is that Cenelec is trying to achieve a >> consensus for the implementation of interactive television within the >> European Union > >And that does not require PUA codepoints; moreover, your response does not >escape the fact I was pointing out that a standards body will not be >publishing standards that make reference to PUA codepoints. Please have a look at what Cenelec is do in trying to achieve a consensus for the implementation of interactive television within the European Union. This particular project for the European Commission is trying to achieve a consensus for the implementation of interactive television within the European Union. Your comments seem to relate to standards bodies generally or as to how Cenelec proceeds generally. This project is a particular project trying to achieve a consensus for the implementation of interactive television within the European Union. The difference is that things need to move forward promptly. There are lots of aspects, such as how many buttons to have on a hand-held infra-red control device for end user interaction with a running Java program (that is, the _minimum_ twenty of the DVB-MHP specification, or some more) and such as whether mouse events should be accessible to end users (as the DVB-MHP specification has mouse event access as optional in interactive televisions) and so on. What you write in relation to most projects carried out by standards bodies may well be true, yet I was writing specifically about one particular project being run by Cenelec. >> In view of the fact that the interactive television system (DVB-MHP, >Digital >> Video Broadcasting - Multimedia Home Platform http://www.mhp.org ) uses >Java >> and Java uses Unicode it is then a matter of deciding how to be able to >> signal the symbols in a Unicode text stream. > >And they won't be standardizing on symbols encoded using PUA codepoints. The "deciding" is not about something to incorporate into the DVB-MHP standard. It is a matter of trying to gain a consensus as to how to signal those symbols at the present time and in the near future (that is, until (if and when) some regular Unicode code points are achieved) within Java programs which run upon the DVB-MHP platform and in fonts which are used upon the DVB-MHP platform. It is essentially a matter for end users of the system, just as the two Private Use Area characters being suggested in another thread of this forum in relation to Afghanistan are a matter for end users of the Unicode Standard and does not affect the content of the Unicode Standard itself. >> In view of the fact that the process of getting regular Unicode code >points >> for the symbols would take quite a time, and indeed that there is as yet >no >> agreement on which symbols to use, and that the implementation of >> interactive television needs to proceed, it seems to me that putting >forward >> three specific Private Use Area code points for the symbols at this time >is >> helpful to the process. > >Then you obviously don't understand the process. Well, maybe I don't. However, the fact of the matter is that sooner or later some code points are needed to signal those symbols. I have put forward three suggested code points. I also mentioned them in this mailing list. My specific suggestions are in the Private Use Area and do not clash with various uses of the Private Use Area known to me. So three specific code points have been mentioned and I suggest that having those three code points published both in the Cenelec forum and here is beneficial as if they are used then various potential problems which could have arisen if some other choices (such as three unused code points in regular Unicode or several different sets of three code points in regular Unicode) were used. >> >Such things are *not* useful. They do not achieve consistency, not in >the >> >short term, and most certainly not in the long term. If consistency is >> >needed, the standardization process is used to established standardized >> >representations. >> >> Well, what is the alternative? > >The alternative to agreeing on a standard? None, but why would you need an >alternative? Code points for the symbols are needed now or in the near future. The symbol designs are not yet agreed. Obtaining regular Unicode points, if achievable, would take quite a time. With my suggested code points published, decisions on which symbol designs to use and getting them into use with everyone using the same code points could happen within a few days. >> The code points are in the Private Use Area, >> so the suggestion avoids the possibility of a non-conformant use of a >> regular Unicode code point. > >T
Article on Unicode in Globalization Insider
http://www.lisa.org/archive_domain/newsletters/2003/3.2/lommel_unicode.h tml
Re: Combining diacriticals and Cyrillic
On Wednesday, July 16, 2003 8:55 AM, William Overington <[EMAIL PROTECTED]> wrote: > Peter Constable wrote as follows. > > > William Overington wrote on 07/15/2003 07:22:22 AM: > > > > > No, the Private Use Area codes would not be used for interchange, > > > only locally for producing an elegant display in such > > > applications as chose to use them. Other applications could > > > ignore their existence. > > > > Then why do you persist in public discussion of suggested > > codepoints for such purposes? If it is for local, proprietary use > > internal to some implementation, then the only one who needs to > > know, think or care about these codepoints is the person creating > > that implementation. > > The original enquiry sought advice about how to proceed. I posted > some ideas of a possible way to proceed. If the idea of using a > eutocode typography file is taken up and software which uses it is > produced, then it would be reasonable to have a published list of > Private Use Area code points for the precomposed characters which are > to be available, as in that way the output stream from the processing > could be viewed with a number of fonts from a variety of font makers > without needing to change the eutocode typography file if one changed > font. > > I have not published many of my suggested code points in this forum > precisely because a few people do not want them published here. For > example, there is the ViOS-like system for a three-dimensional visual > indexing system for use in interactive broadcasting. > > > > Publishing a list of Private Use Area code points would > > > > have absolutely no purpose at all. > > > > > > > mean that such > > > display could be produced using a choice of fonts from various > > > font makers using the same software > > > > Now you are talking interchange. Interchange means more than just > > person A sends a document to person B. It means that person A's > > document works with person B's software using person C's font. (An > > alternate term that is often used, interoperate, makes this > > clearer.) > > Exactly. This is why publishing the list of Private Use Area code > point assignments for the precomposed characters is a good idea. > Person B can display the document and then wonder if it might look > better with that font made by person D and have a try with that font. > If the list of Private Use Area code point assignments for the > precomposed characters has been published and both C and D have used > the list to add the extra Cyrillic characters into their fonts, then > the published list of Private Use Area code point assignments for the > precomposed characters has helped to achieve interoperability. > > > > I feel that an important thing to remember is the dividing line > > > between what is in Unicode and what is in particular advanced > > > format font technology solutions > > > > And best practice for advanced format font technologies eschews PUA > > codepoints for glyph processing. > > Who decides upon what is best practice? > > > You've been told that several times by > > people who have expertise in advanced font technologies, an area in > > which you are not deeply knowledgable or experienced, by your own > > admission. > > Well, it is not a matter of an "admission" as if dragged out of me > under examination by counsel in a courtroom. I openly stated the > limits of my knowledge in that area, not as a retrospective defence > yet as an up-front expression of the limitation of my knowledge when > putting forward ideas, specifically so as not to produce any > incorrect impression as to expertise in that area. > > > > yet they are not suitable for platforms such as Windows 95 and > > > Windows 98, whereas a eutocode typography file approach would be > > > suitable for those platforms and for various other platforms. > > > > Wm, if someone wanted, they could create an advanced font > > technology to work on DOS, but why bother? Who's going to create > > all the new software that works with that technology, and make it > > to work within the limitations of a DOS system? > > Yet I am not suggesting a system to work on DOS. > > > Your idea is at best a mental exercise, and even if you or > > someone else built an implementation, what is not needed is some > > public agreement on PUA codepoints for use in glyph processing. > > When you say "agreement" I am not suggesting agreement in some formal > manner. It is more like the authorship of a story where people may > read it or not as they choose. Yet if people do read the story, or > watch a television or movie implementation of it, a common culture > may come to exist amongst the readers which can be applied in other > circumstances. > > For example, "it's as if on a holodeck and a character says 'arch' > and " is something which people who have watched Star Trek The > Next Generation may use as a cultural way of expressing something. > > The original enquir
Re: Combining diacriticals and Cyrillic
Peter Constable wrote as follows. >William Overington wrote on 07/15/2003 07:22:22 AM: > >> No, the Private Use Area codes would not be used for interchange, only >> locally for producing an elegant display in such applications as chose to >> use them. Other applications could ignore their existence. > >Then why do you persist in public discussion of suggested codepoints for >such purposes? If it is for local, proprietary use internal to some >implementation, then the only one who needs to know, think or care about >these codepoints is the person creating that implementation. The original enquiry sought advice about how to proceed. I posted some ideas of a possible way to proceed. If the idea of using a eutocode typography file is taken up and software which uses it is produced, then it would be reasonable to have a published list of Private Use Area code points for the precomposed characters which are to be available, as in that way the output stream from the processing could be viewed with a number of fonts from a variety of font makers without needing to change the eutocode typography file if one changed font. I have not published many of my suggested code points in this forum precisely because a few people do not want them published here. For example, there is the ViOS-like system for a three-dimensional visual indexing system for use in interactive broadcasting. >> Publishing a list of Private Use Area code points would > >have absolutely no purpose at all. > > >> mean that such >> display could be produced using a choice of fonts from various font >makers >> using the same software > >Now you are talking interchange. Interchange means more than just person A >sends a document to person B. It means that person A's document works with >person B's software using person C's font. (An alternate term that is often >used, interoperate, makes this clearer.) Exactly. This is why publishing the list of Private Use Area code point assignments for the precomposed characters is a good idea. Person B can display the document and then wonder if it might look better with that font made by person D and have a try with that font. If the list of Private Use Area code point assignments for the precomposed characters has been published and both C and D have used the list to add the extra Cyrillic characters into their fonts, then the published list of Private Use Area code point assignments for the precomposed characters has helped to achieve interoperability. >> I feel that an important thing to remember is the dividing line between >what >> is in Unicode and what is in particular advanced format font technology >> solutions > >And best practice for advanced format font technologies eschews PUA >codepoints for glyph processing. Who decides upon what is best practice? >You've been told that several times by >people who have expertise in advanced font technologies, an area in which >you are not deeply knowledgable or experienced, by your own admission. Well, it is not a matter of an "admission" as if dragged out of me under examination by counsel in a courtroom. I openly stated the limits of my knowledge in that area, not as a retrospective defence yet as an up-front expression of the limitation of my knowledge when putting forward ideas, specifically so as not to produce any incorrect impression as to expertise in that area. >> yet they are not suitable for platforms such as Windows 95 and >> Windows 98, whereas a eutocode typography file approach would be suitable >> for those platforms and for various other platforms. > >Wm, if someone wanted, they could create an advanced font technology to >work on DOS, but why bother? Who's going to create all the new software >that works with that technology, and make it to work within the limitations >of a DOS system? Yet I am not suggesting a system to work on DOS. >Your idea is at best a mental exercise, and even if you or >someone else built an implementation, what is not needed is some public >agreement on PUA codepoints for use in glyph processing. When you say "agreement" I am not suggesting agreement in some formal manner. It is more like the authorship of a story where people may read it or not as they choose. Yet if people do read the story, or watch a television or movie implementation of it, a common culture may come to exist amongst the readers which can be applied in other circumstances. For example, "it's as if on a holodeck and a character says 'arch' and " is something which people who have watched Star Trek The Next Generation may use as a cultural way of expressing something. The original enquiry referred as if a number of people are trying to solve the problem. If a list of the characters is published with Private Use Area code points from U+EF00 upwards, then they could all, if they so choose, use that set of code points and it might help in font interoperability, certainly if they choose to implement a eutocode typography file sys