Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
On 08/17/2002 09:29:00 AM William Overington wrote: Peter Constable wrote as follows. The standard already specifies that FFFC should not be exported from an application or interchanged. As far as I am aware that is not presently the case. If you still say that that is correct, could you please state the exact text of the standard relating to this matter and where in the standard that text can be found please? OK, it doesn't say it explicitly; nevertheless, I believe I know what the intent of the text is, and that it is not condoning interchange of FFFC. The fact that the text isn't more explicit is something that could perhaps be improved; but if you think about what the text on pp 326-7 *does* say, I think this intent can be detected. It seems clear to me that it assumes usage within the context of some higher-level protocol, such as would be imposed by a software process. For instance, the text makes reference to the object's formatting information, but Unicode / plain text does not provide representation for such information. Thus, there necessarily must be some other protocol at work within which that information is represented. FFFC, then, it something that is utilised by that higher-level protocol. Hence, this section of the Standard is *not* talking about FFFC being used in interchanged plain text. It is, rather, assuming usage internal to some processing context or other higher-level protocol. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
On 08/16/2002 04:58:58 PM William Overington wrote: The DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system (details at http://www.mhp.org ) which implements my telesoftware invention. A Java program which has been broadcast can read a Unicode plain text file and act upon the characters within it, and can read other file formats, such as .png files (Portable Network Graphics) and act upon the information in those files, so as to produce a display. So, a collection of files, namely a .uof file in the format that I suggested it, a Unicode plain text file with one or more U+FFFC characters in it and the appropriate graphics files in .png format as a package of free to the end user distance education learning material being broadcast from a direct broadcasting satellite or a terrestrial transmitter could be a very useful facility as the way to carry text with illustrations. I'd suggest that it would be far more useful to use a marked-up file format based on XML. It doesn't have to be verbose (besides which, the bandwidth requirements of embedded graphics will be far greater than any requirements for markup used to indicate their position within the text). The reason I think this would be far more advantageous is that there has been a massive interest throughout the IT industry in XML, meaning that there are lots of software implementations that support it, and it is very easy to build processes for publishing content. You coulde probably use any commonly-used database product out there to generate XML content suited for DVB-MHP; in fact, it would be easy to take some existing XML-based publishing process and extend it to support an XML-based file format specifically intended for DVB-MHP. In contrast, if you want to invent a new file format, then you've got to create new software implementations to go with it, and bolting that into any existing publishing process will be far more costly. Using HTML and a browser is just not the way to proceed in that situation. HTML and a browser is a very useful technique for the web and indeed is an option for the DVB-MHP system, yet the basic software system is Java based. Markup does not have to imply HTML and a Web browser. I'm sure you'd find a lot of Java implementations that made use of XML-based file formats, and though I'm not a Java programmer, I'm certain that you can find good support for parsing or generating XML streams in Java. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
On 08/16/2002 04:58:58 PM William Overington wrote: The DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system (details at http://www.mhp.org ) which implements my telesoftware invention. A Java program which has been broadcast can read a Unicode plain text file and act upon the characters within it, and can read other file formats, such as .png files (Portable Network Graphics) and act upon the information in those files, so as to produce a display. So, a collection of files, namely a .uof file in the format that I suggested it, a Unicode plain text file with one or more U+FFFC characters in it and the appropriate graphics files in .png format as a package of free to the end user distance education learning material being broadcast from a direct broadcasting satellite or a terrestrial transmitter could be a very useful facility as the way to carry text with illustrations. I'd suggest that it would be far more useful to use a marked-up file format based on XML. It doesn't have to be verbose (besides which, the bandwidth requirements of embedded graphics will be far greater than any requirements for markup used to indicate their position within the text). The reason I think this would be far more advantageous is that there has been a massive interest throughout the IT industry in XML, meaning that there are lots of software implementations that support it, and it is very easy to build processes for publishing content. You coulde probably use any commonly-used database product out there to generate XML content suited for DVB-MHP; in fact, it would be easy to take some existing XML-based publishing process and extend it to support an XML-based file format specifically intended for DVB-MHP. In contrast, if you want to invent a new file format, then you've got to create new software implementations to go with it, and bolting that into any existing publishing process will be far more costly. Using HTML and a browser is just not the way to proceed in that situation. HTML and a browser is a very useful technique for the web and indeed is an option for the DVB-MHP system, yet the basic software system is Java based. Markup does not have to imply HTML and a Web browser. I'm sure you'd find a lot of Java implementations that made use of XML-based file formats, and though I'm not a Java programmer, I'm certain that you can find good support for parsing or generating XML streams in Java. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
Kenneth Whistler wrote as follows about my idea. It occurs to me that it is possible to introduce a convention, either as a matter included in the Unicode specification, or as just a known about thing, that if one has a plain text Unicode file with a file name that has some particular extension (any ideas for something like .uof for Unicode object file) ...or to pick an extension, more or less at random, say .html Well, that could produce confusion with a .html file used for Hyper Text Markup Language, HTML. I suggested .uof so that a .uof file would be known as being for this purpose. that accompanies another plain text Unicode file which has a file name extension such as .txt, or indeed other choices except .uof (or whatever is chosen after discussion) then the convention could be that the .uof file has on lines of text, in order, the name of the text file then the names of the files which contains each object to which a U+FFFC character provides the anchor. For example, a file with a name such as story7.uof might have the following lines of text as its contents. story7.txt horse.gif dog.gif painting.jpg This is a shaggy dog story, right? No, it is a story about an artist who wanted to paint a picture of a horse and a picture of a dog and, since he knew that the horse and the dog were great friends and liked to be together and also that he only had one canvas upon which to paint, the artist painted a picture of a landscape with the horse and the dog in the foreground, thereby, as the saying goes, painting two birds on one canvas, http://www.users.globalnet.co.uk/~ngo/bird0001.htm in that he achieved two results by one activity. In addition the picture has various interesting details in the background, such as a windmill in a plain (or is that a windmill in a plain text file). :-) The file story7.uof could thus be used with a file named story.txt so as to indicate which objects were intended to be used for three uses of U+FFFC in the file story7.txt, in the order in which they are to be used. Or we could go even further, and specify that in the story7.html file, the three uses of those objects could be introduced with a very specific syntax that would not only indicate the order that they occur in, but could indicate the *exact* location one could obtain the objects -- either on one's own machine or even anywhere around the world via the Internet! And we could even include a mechanism for specifying the exact size that the object should be displayed. For example, we could use something like: img src=http://www.coteindustries.com/dogs/images/dogs4.jpg; width=380 height=260 border=1 or img src=http://www.artofeurope.com/velasquez/vel2.jpg; Now that is a good idea. In a .uof file specifically for the purpose, a line beginning with a character could be used to indicate a web based reference, or a local reference, for the object, using exactly the same format as is used in an HTML file. If the line does not start with a character, then it is simply a file name in the same directory as the .uof file, as I suggested originally. This would mean that where, say, a .uof file were broadcast upon a telesoftware service that the Java program (also broadcast) analysing the file names in the .uof file need not necessarily be able to decode lines starting with a character so that the Java program does not need to have the software for that decoding in it, yet the same .uof file specification could be used, both in a telesoftware service and on the web, where a more comprehensive method of referencing objects were needed. I can imagine that such a widely used practice might be helpful in bridging the gap between being able to use a plain text file or maybe having to use some expensive wordprocessing package. And maybe someone will write cheaper software -- we could call it a browser -- that could even be distributed for free, so that people could make use of this convention for viewing objects correctly distributed with respect to the text they are embedded in. Indeed, except not call it a browser as the name is already in widespread use for HTML browsers and might cause confusion. Analysing a .uof file would be a much less computational task than analysing the complete syntax of HTML files. Yes, yes, I think this is an idea which could fly. --Ken Good. It is a solution which could be very useful for people writing programs in Java, Pascal and C and so on which programs take in plain text files and process them for such purposes as producing a desktop publishing package. Hopefully the Unicode Technical Committee will be pleased to add a .uof format file specification into the set of Unicode documents so that the U+FFFC code can be used in an effective manner. The idea could be that if a .uof file is processed then the rules of .uof files apply in that situation, so that if a .uof file is not being processed, then the rules for .uof files do not apply, therefore
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
Yes, yes, I think this is an idea which could fly. --Ken Good. It is a solution which could be very useful for people writing programs in Java, Pascal and C and so on which programs take in plain text files and process them for such purposes as producing a desktop publishing package. Uhh, I think Ken's message was entirely sarcasm or some higher form of rhetorical humor whose obscure name slips my mind right now. The suggestion to use html as an extension was the give away - I was laughing out loud from that point on - his point was that the technology to do what you want already exists it is called HTML and it is displayed by browsers and so forth. Barry Caplan www.i18n.com
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
William Overington wrote, No, it is a story about an artist who wanted to paint a picture of a horse and a picture of a dog and, since he knew that the horse and the dog were great friends and liked to be together and also that he only had one canvas upon which to paint, the artist painted a picture of a landscape with the horse and the dog in the foreground, thereby, as the saying goes, painting two birds on one canvas, http://www.users.globalnet.co.uk/~ngo/bird0001.htm in that he achieved two results by one activity. In addition the picture has various interesting details in the background, such as a windmill in a plain (or is that a windmill in a plain text file). :-) 1) It's gif file format rather than plain text.* 2) There isn't any windmill. Best regards, James Kass, * P.S. - But, it's a nice gif file. In fact, aside from the absence of the windmill, it exceeded my expectations. -JK.
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
William, So let me see if I understand this correctly. Let's take 2 perfectly good standards, Unicode and HTML, and make some very minor tweaks to them, such as changing the meaning of U+FFFC and a special format for filenames in the beginning of the file and a new extension, so we have something new. Now the big benefit of this completely new thing, is that programs that do desktop publishing can use plain text files which are not quite plain text because they have some special formatting, but now they can publish them in better manner than before. For example, plain text with pictures. This is great. (It is true that it is less capable than if we had just used enough html to do the same thing, but .uof is more like plain text than html is.) Programmers will be happy because now they can support plain text with just a few tweaks. Oh I almost forgot, they also have to support Unicode, but slightly tweaked. And they can also support HTML, with some minor tweaks for .uof. Of course programmers don't mind supporting lots of variations of the same thing. Customer support personnel also don't mind. Oh, the plain text programmers will now need to support pictures and other aspects of full publishing, but at least they won't have a complex file format to work with. I guess it doesn't matter that a more complex format is also more expressive and therefore can leverage all of the publishing features. It probably doesn't matter that a desktop publishing product probably already supports more complex formats, and probably also supports html, it will be beneficial to add this slight difference from plain text. I like this very much. It is very much like when the magician slides the knot in the string and makes it disappear. I imagine that over time we will have some more wonderful inventions and add further tweaks and further improve the publishing of plain text. There are a few other things I would like to improve in Unicode, so I hope it will be ok to make some other suggestions. We can change the extention to know which tweaks we are talking about. .uo1, .uo2. Just a few small changes to characters and plain text format variations. Stability of the meaning of the file isn't important. However, I think my first suggestion will be to make the benefits of .uof available to XML. We can all this .uo1. I am a little disconcerted that html already can do everything that .uof does plus more, and is also supported by all of the publishers that are like to support .uof. Also, as there are more than a million characters in Unicode, most are unused so far, so changing the meaning of just FFFC in this one context doesn't seem like a big win, considering also every line of code that might work with FFFC now needs to consider the context to determine its semantics. But every invention deserves to be implemented, we need not look at whether the invention satisfies some demand of its customers. I like the 2 birds picture and I assume it was a metaphor for the idea- one bird was html the other unicode. I was a little disappointed that you used html instead of .uof format though. Maybe its the lateness of the hour here. I hope the idea looks as good in the morning. Oh I almost forgot. I was having difficulty discerning when you and Ken might be joking. The mails read very serious. I would like to suggest we make a new format .uo2. We can indicate line numbers and emotions with plain text characters that look like facial expressions. It would help me know when you both were serious and when you might be joking. Sometimes it is hard to tell. I am going to create a list of facial expressions and assign them in the PUA so we can all have a standard to follow. See my next mail with a list of facial expressions and assignments. tex William Overington wrote: Kenneth Whistler wrote as follows about my idea. It occurs to me that it is possible to introduce a convention, either as a matter included in the Unicode specification, or as just a known about thing, that if one has a plain text Unicode file with a file name that has some particular extension (any ideas for something like .uof for Unicode object file) ...or to pick an extension, more or less at random, say .html Well, that could produce confusion with a .html file used for Hyper Text Markup Language, HTML. I suggested .uof so that a .uof file would be known as being for this purpose. that accompanies another plain text Unicode file which has a file name extension such as .txt, or indeed other choices except .uof (or whatever is chosen after discussion) then the convention could be that the .uof file has on lines of text, in order, the name of the text file then the names of the files which contains each object to which a U+FFFC character provides the anchor. For example, a file with a name such as story7.uof might have the following lines of text as its contents. story7.txt horse.gif dog.gif painting.jpg
Re: Furigana
On 08/14/2002 05:53:58 AM James Kass wrote: Once a meaning like INTERLINEAR ANNOTATION ANCHOR has been assigned to a code point, any application which chooses to use that code point for any other purpose would be at fault. Since it's for internal use only, nobody would ever know. Unicode conformance must always be understood in terms of what happens externally, between two processes, or between a process and a user. What goes on inside doesn't matter as long as it is conformant on the outside. If my program includes a portion of code that interprets all USVs as jelly-bean flavours but doesn't let any symptoms of that leak outside, I haven't voilated any conformance requirement. In other words, if these characters are to be used internally for Japanese Ruby (furigana), etc., then they ought to be able to be used externally, as well. They simply aren't adequate for anything more than the simplest of cases. Moreover, the recommdations of TR#20 / the W3C character model clearly indicate that markup is to be preferred for applications like this. Because it seems to be an oxymoron. I think most would agree that that's clear now, but it wasn't always understood so clearly. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
RE: Furigana
On 08/14/2002 10:52:32 AM Michael Everson wrote: I'm saying I WANT to use these characters. They solve an apparent need of mine They only *appear* to you to solve that need, but in fact do not offer a good solution. Markup is recommended for your need. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
On 08/14/2002 02:04:50 PM William Overington wrote: As this concerns the U+FFFC character and the Unicode Technical Committee is due to meet next week, I think it might be helpful if this idea is discussed before the meeting as a straightforward idea like this might mean that the possibility to exchange U+FFFC characters at all if people want to do so is not lost. This does not solve any problems not already solved. This is not plain text; it is a form of interchange markup and a higher-level protocol. There are already higher-level markup protocols that accomplish this. The standard already specifies that FFFC should not be exported from an application or interchanged. There is no reason to change this. Everybody will welcome the new conventional, graphical-type characters and scripts that are coming with Unicode 4.0. What are those please? See the Proposed characters section of the Unicode site. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: RE: Furigana
On 08/14/2002 01:16:29 AM starner wrote: That seems to be basically what William Overington is proposing, except these characters only handle furigana, instead all markup. Not quite. WO has proposed characters to be used in interchange. These are only intended for internal use by programmers. They are exactly like the non-characters at FDD0..FDEF except that these were named to a specific function (as was FFFC -- also an internal-use code with a specifically-named function). - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Furigana
[EMAIL PROTECTED] wrote: On 08/14/2002 12:45:22 AM Kenneth Whistler wrote: But even at the time, as the record of the deliberations would show, if we had a more perfect record, the proponents were clear that the interlinear annotation characters were to solve an internal anchor point representation problem. I recall at the UTC meeting in Jan 2000 (I think it was 2000) there was discussion of adding non-character code points for internal use by programmers, and I remember Tex suggesting that it might be better to identify the specific functions for which internal-use codepoints might be needed, as had been done in the case of things like the IA characters. In other words, at that time, it seems that they were understood by everyone present to be intended for internal use by programmers only. Peter's made the point that for internal use was understood which is fine. Let me add, that my concern with internal-use code points not having specific functions, is that we now live in a world where software applications often use third party components (various drivers, shared libraries, OCXs, DLLs, etc.) internally. Having internal-use code points, which may not be treated with the right semantics by 3rd parties that have been integrated with internally, is problematic. You should be careful and avoid passing these internal-use code points to third parties, but this greatly inhibits their use, or makes for an awkward and not easily extensible architecture. At the time (in the discussion), I don't think we had many examples of what the uses would be, and it wan't clear that many were needed, since the functionality could be arrived at with higher level protocols. So to be clear, when internal-use code points are used, not only do they need to be filtered from external exchanges, you need to be very clear about your internal architecture and make sure you don't call a system function or third party function that might mistreat the i-u. c. p or worse barf at it. (Anyway, I think that's what I was thinking at the time. I have trouble remembering what I said yesterday much less the last millenia.) tex -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: The existing rules for U+FFF9 through to U+FFFC. (spins from Re: Furigana)
Kenneth Whistler replied to my posting as follows. An interesting point for consideration is as to whether the following sequence is permitted in interchanged documents. U+FFF9 U+FFFC U+FFFA Temperature variation with time. U+FFFB That is, the annotated text is an object replacement character and the annotation is a caption for a graphic. Yes, permitted. Great. That may well be useful for free to the end user distance education using telesoftware upon digital television channels. A .uof file (as in the thread An idea for keeping U+FFFC usable. ) could be used with a Unicode plain text file of some learning material over the broadcast link and a Java program (also broadcast) could place the pictures with their captions in the correct place in the text. As would also be: U+FFF9 U+FFFC U+FFFC U+FFFA U+FFF9 Temperature U+FFFA a measure of hotness, related to the U+FFF9 kinetic energy U+FFFA energy of motion U+FFFB of molecules of a substance U+FFFB U+FFF9 variation U+FFFA rate of change U+FFFB with time U+FFFC . U+FFFB Where the first U+FFFC is associated with a URL with a realtime data feed, the second U+FFFC is a jar file for a 3-dimensional dynamic display algorithm, and the third U+FFFC is a banner ad for Swatch watches. Thank you for this example. I have analysed it thoroughly using Notepad by going to a new line and indenting at each occurrence of U+FFF9 and going to a new line and indenting at each occurrence of U+FFFA, and going to a new line and placing each U+FFFB beneath the corresponding U+FFFA. For each U+FFFC I went to a new line, and placed the U+FFFC beneath the most recent U+FFF9 or U+FFFA character. In addition, after each U+FFF. character, for ordinary text, I went to a new line and indented so that the next ordinary text character was beneath the U of the most recently entered U+FFF. character, except that after a U+FFFB the indentation went back two indentation levels. After each U+FFFC character, and on the same line, I added the details of the object within parentheses. This gave the following. U+FFF9 U+FFFC (URL with a realtime data feed) U+FFFC (jar file for a 3-dimensional dynamic display algorithm) U+FFFA U+FFF9 Temperature U+FFFA a measure of hotness, related to the U+FFF9 kinetic energy U+FFFA energy of motion U+FFFB of molecules of a substance U+FFFB U+FFF9 variation U+FFFA rate of change U+FFFB with time U+FFFC (banner ad for Swatch watches) . U+FFFB This took me quite some time to figure out, and was indeed an interesting challenge. It seems to me that if that is indeed permissible that it could potentially be a useful facility. I was referring to my original example, not to your example! :-) Permissible does not imply useful, however, in this case. That's referring to your example when you refer to this case is it? :-) It is unlikely that you are going to have access to software that would unscramble such layering in purported plain text, even if you had agreements with your receivers. Hmm? Yet, it is not the example to which I referred. The example to which I referred has not been commented upon as to its practical feasibility has it? However, is your example that difficult if someone set his or her mind to it? Consider for example that the software which does the unscrambling were to have its own internal list of annotation facilitating characters so that it assigned, for each page of the final rendered text, the characters in the list of annotation facilitating characters in order for each U+FFF9 U+FFFA pairing wherever the U+FFF9 item to be annotated were other than just one or more U+FFFC characters. The list of annotation facilitating characters could be something like U+002A, U+2020, U+2021, U+2051, that is, asterisk, dagger, double dagger, two asterisks aligned vertically. The annotation facilitating character is then placed both after the annotated item and before the annotation, wherever that may be on the page, such as in a footnote. I am not suggesting that an algorithm for such is quickly programmable, yet it does not seem on the face of it to be as unlikely to be possible as your comment might perhaps seem to imply. That is what markup and rich text formats are for. Well, maybe for your example, yet for my example a plain text file for the main text together with a .uof file to state
Re: Furigana
Tex Texin scripsit: At the time (in the discussion), I don't think we had many examples of what the uses would be, and it wan't clear that many were needed, since the functionality could be arrived at with higher level protocols. One application that has always seemed obvious to me is regular expressions: a compiled regular expression can be represented by a Unicode string, with non-characters representing things like any character, zero or more, one or more, beginning of string, end of string, etc. etc. -- John Cowan [EMAIL PROTECTED] http://www.ccil.org/~cowan One time I called in to the central system and started working on a big thick 'sed' and 'awk' heavy duty data bashing script. One of the geologists came by, looked over my shoulder and said 'Oh, that happens to me too. Try hanging up and phoning in again.' --Beverly Erlebacher
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
James Kass wrote as follows. William Overington wrote, No, it is a story about an artist who wanted to paint a picture of a horse and a picture of a dog and, since he knew that the horse and the dog were great friends and liked to be together and also that he only had one canvas upon which to paint, the artist painted a picture of a landscape with the horse and the dog in the foreground, thereby, as the saying goes, painting two birds on one canvas, http://www.users.globalnet.co.uk/~ngo/bird0001.htm in that he achieved two results by one activity. In addition the picture has various interesting details in the background, such as a windmill in a plain (or is that a windmill in a plain text file). :-) 1) It's gif file format rather than plain text.* 2) There isn't any windmill. The picture of the birds has been in our family webspace since 1998 as an illustration for the saying Painting two birds on one canvas. That saying, originated by me, is a peaceful saying meaning to achieve two results by one activity. I made the picture from clip art as a learning exercise. The picture of the birds is referenced as a way of illustrating the saying Painting two birds on one canvas. It is not the picture in the story about which Ken asked. I may well have a go at constructing such a picture, perhaps using clip art. The reference to a windmill is meant as a humourous aside to Don Quixote tilting at windmills. I am interested in creative writing, so when Ken asked about the story, I just thought of something to put in my response. Part of the training in, and the fun of, creative writing is to be able to write something promptly to a topic. William Overington 16 August 2002
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
Tex Texin wrote as follows. William, So let me see if I understand this correctly. Let's take 2 perfectly good standards, Unicode and HTML, Yes. and make some very minor tweaks to them, No. such as changing the meaning of U+FFFC and a special format for filenames in the beginning of the file and a new extension, so we have something new. I have suggested no changes whatsoever to HTML at all. The only thing which I have suggested in relation to Unicode in this thread is that, in relation to the fact that information about the object to which any particular use of U+FFFC refers is kept outside the character data stream, that it could be a good idea to define a file format .uof so that details of the names of the files for which the U+FFFC codes are anchors could be provided in a known format, if and only if end users chose to use a .uof file for that purpose on that occasion and not otherwise. This was in the context of seeking to protect the use of U+FFFC as a character which could be used in interchanging of documents following from the discussion of U+FFFC and annotation characters in the thread from off of which I spun this thread, which discussion, by Ken and Doug, is repeated in the first posting of this present thread. I thought it a good idea that the Unicode Technical Committee might like to make such a .uof file format an official Unicode document so as to offer one possible way to use U+FFFC codes. That is now a matter for discussion. If the Unicode Consortium wishes to do that, then fine. If the Unicode Consortium chooses not to do that, then I can write it up myself and publish it, which is not such a good solution, yet is adequate for my own needs and might be useful for some other people if they choose to use the same format for .uof files. Hopefully I have now managed to raise the issue of protecting the fact that the U+FFFC character can be used in document interchange and it will hopefully not become deprecated to the status of a noncharacter. There is a practical reason for this, which is, from my own perspective, quite important. This is as follows. The DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system (details at http://www.mhp.org ) which implements my telesoftware invention. A Java program which has been broadcast can read a Unicode plain text file and act upon the characters within it, and can read other file formats, such as .png files (Portable Network Graphics) and act upon the information in those files, so as to produce a display. So, a collection of files, namely a .uof file in the format that I suggested it, a Unicode plain text file with one or more U+FFFC characters in it and the appropriate graphics files in .png format as a package of free to the end user distance education learning material being broadcast from a direct broadcasting satellite or a terrestrial transmitter could be a very useful facility as the way to carry text with illustrations. Using HTML and a browser is just not the way to proceed in that situation. HTML and a browser is a very useful technique for the web and indeed is an option for the DVB-MHP system, yet the basic software system is Java based. It is as if the television set is acting as a computer which has a slow read only access disc drive in the sky from which it may gather information, including software. The system is interactive with no return information link to the central broadcasting computer, by means of the telesoftware invention. Overlays and virtual running with programs bigger than the local storage being able to be run using chaining techniques are possible. Please do not think of this as downloading as no uplink request is made! Now the big benefit of this completely new thing, Well, it's only a way of sender and receiver being able to have information in a file with the suffix .uof about what objects are being anchored by U+FFFC codes in a Unicode plain text file which it accompanies. is that programs that do desktop publishing can use plain text files which are not quite plain text because they have some special formatting, Well, the plain text files are only Unicode plain text which might contain one or more U+FFFC characters and some of the other Unicode control characters such as CARRIAGE RETURN. but now they can publish them in better manner than before. Well, my thinking is that it would help to have a well known way to express the meaning of the anchors encoded by U+FFFC in a file rather than having only a vague specification that all other information about the object is kept outside the data stream. I am saying that, yes, all other information about the object is kept outside the data stream and, if, and only if, end users choose to use a .uof file in a standard format to convey that information for some particular use of a U+FFFC code, then that format could be considered for definition and publication by the Unicode Consortium. That does not seem unreasonable to me.
Re: Furigana
John, Why would you want them to be for internal-use only? When you exchange regular expressions wouldn't you want operators such as any character to be passed as well, and standardized so that there is agreement on the meaning of the expression? It is also not clear to me that it is desirable to encode operators of regular expressions as individual characters, because then you get into the slippery slope of encoding operators for every function that someone might want, and that is what started this thread isn't it... (But a Unicode APL operator set would be nice. ;-) ) tex John Cowan wrote: Tex Texin scripsit: At the time (in the discussion), I don't think we had many examples of what the uses would be, and it wan't clear that many were needed, since the functionality could be arrived at with higher level protocols. One application that has always seemed obvious to me is regular expressions: a compiled regular expression can be represented by a Unicode string, with non-characters representing things like any character, zero or more, one or more, beginning of string, end of string, etc. etc. -- John Cowan [EMAIL PROTECTED] http://www.ccil.org/~cowan One time I called in to the central system and started working on a big thick 'sed' and 'awk' heavy duty data bashing script. One of the geologists came by, looked over my shoulder and said 'Oh, that happens to me too. Try hanging up and phoning in again.' --Beverly Erlebacher -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: Furigana
Tex Texin scripsit: Why would you want them to be for internal-use only? When you exchange regular expressions wouldn't you want operators such as any character to be passed as well, and standardized so that there is agreement on the meaning of the expression? Regular expressions are usually interchanged using (some approximation of) Posix syntax, so as abc.*\*, not abcANYSTAR*. Note the phrase compiled form in my posting. It is also not clear to me that it is desirable to encode operators of regular expressions as individual characters, because then you get into the slippery slope of encoding operators for every function that someone might want, and that is what started this thread isn't it... Ah, but for internal use you can do what you want with the 66 non-characters and the 4 pseudo-non-characters. (But a Unicode APL operator set would be nice. ;-) ) Um, we have one of those, don't we? -- John Cowan [EMAIL PROTECTED] I am a member of a civilization. --David Brin
Re: Furigana
John Cowan wrote: Tex Texin scripsit: Why would you want them to be for internal-use only? When you exchange regular expressions wouldn't you want operators such as any character to be passed as well, and standardized so that there is agreement on the meaning of the expression? Regular expressions are usually interchanged using (some approximation of) Posix syntax, so as abc.*\*, not abcANYSTAR*. Note the phrase compiled form in my posting. Seems like a very minor optimization then. (I am not saying undesirable, just it is a small benefit.) It is also not clear to me that it is desirable to encode operators of regular expressions as individual characters, because then you get into the slippery slope of encoding operators for every function that someone might want, and that is what started this thread isn't it... Ah, but for internal use you can do what you want with the 66 non-characters and the 4 pseudo-non-characters. Yes. Same thing is true for higher level protocols. (But a Unicode APL operator set would be nice. ;-) ) Um, we have one of those, don't we? Sorry, I was unclear. I meant this in the context of encoding a set of APL-like operators for working on Unicode text to manipulate them in regular expressions, going way beyond the any character, 0 or more character operators. tex -- John Cowan [EMAIL PROTECTED] I am a member of a civilization. --David Brin -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: The existing rules for U+FFF9 through to U+FFFC. (spins from Re: Furigana)
On 08/15/2002 06:41:59 AM William Overington wrote: In essence, though not formally, U+FFF9..U+FFFC are non-characters as well, and the Unicode semantics just tells what programs *may* find them useful for. Unicode 4.0 editors: it might be a good idea to emphasize the close relationship of this small repertoire with the non-characters. That is not what the specification says. William, John knows what he is talking about, and is exactly correct: in essense, though not formally, FFF9..FFFC are non-characters. No, the Standard doesn't say that; that's why he said, not formally. The use intended by the Standard is, however, exactly comparable to the non-characters at FDD0..FDEF. If they had been defined in the Standard as non-characters, the world would not be different in any meaningful way. It appears to me that the use of the annotation characters in document interchange is never forbidden and is strongly discouraged only where there is no prior agreement between the sender and the receiver, and that that strong discouragement is because the content may be misinterpreted otherwise. So, if there is a prior agreement, then there is no problem about using them in interchanged documents. There appears to be nothing that suggests that U+FFFC cannot be used in an interchanged document. Well, you've missed the intent of the authors of the Standard, and appear not to grasp the mindset. When it says interchange of IA characters may be OK given prior agreement, what's really in mind is that e.g. I've written code library A that handles some aspects of interlinear annotation, you've written code library B that handles different aspects of interlinear annotation, and we agree on certain interfaces so that my library can call yours or vice versa, and agree that strings passed by those interfaces can contain IA characters. That's the kind of thing that's in mind. It does *not* imply that anyone should consider create a document containing IA characters. I know little about Bliss symbols, though I have seen a few of them and have read a brief introduction to them, yet it seems to me that annotating Bliss symbols with English or Swedish is entirely within the specification absolutely and would be no more than strongly discouraged even if there is no prior agreement between the sender and the receiver. Of course the Standard doesn't discourage anyone from annotating Bliss symbols with English or Swedish; it only discourages the use of IA characters as markup in documents. Further, it seems to me from the published rules that these annotation characters could possibly be used to provide a footnote annotation facility within a plain text file That would not be a proposal worth pursuing; in fact, I'd say it's a very bad idea. The reason you DO NOT want to use IA characters in a document is that you do not know what someone's software will do with them. The characters have always been intended for use by software programmers, not by content authors. (Ditto for the object replacement character.) An interesting point for consideration is as to whether the following sequence is permitted in interchanged documents... It seems to me that if that is indeed permissible that it could potentially be a useful facility. On the whole, it would be very unwise to use these characters in documents for reasons I explained above. If two people agree to do this, nobody's going to send the Unicode police to stop them. But very few of us on this list are particularly interested in what is hypothetically possible for some pair of us to do. We're far more interested in how widely-used implementations should and do work, and in such implementations, FFF9..FFFC are assumed not to be use in content. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
On Wed, 14 Aug 2002, James Kass wrote: One, the use of *.html clearly violates the standard file naming convention of eight uppercase ASCII letters followed by a period followed by a *three* letter uppercase ASCII file name extension. I was wondering if the capitalization, ASCII, is for emphasis... ;) roozbeh
Re: The existing rules for U+FFF9 through to U+FFFC. (spins from Re: Furigana)
An interesting point for consideration is as to whether the following sequence is permitted in interchanged documents. U+FFF9 U+FFFC U+FFFA Temperature variation with time. U+FFFB That is, the annotated text is an object replacement character and the annotation is a caption for a graphic. Yes, permitted. As would also be: U+FFF9 U+FFFC U+FFFC U+FFFA U+FFF9 Temperature U+FFFA a measure of hotness, related to the U+FFF9 kinetic energy U+FFFA energy of motion U+FFFB of molecules of a substance U+FFFB U+FFF9 variation U+FFFA rate of change U+FFFB with time U+FFFC . U+FFFB Where the first U+FFFC is associated with a URL with a realtime data feed, the second U+FFFC is a jar file for a 3-dimensional dynamic display algorithm, and the third U+FFFC is a banner ad for Swatch watches. It seems to me that if that is indeed permissible that it could potentially be a useful facility. Permissible does not imply useful, however, in this case. It is unlikely that you are going to have access to software that would unscramble such layering in purported plain text, even if you had agreements with your receivers. That is what markup and rich text formats are for. Note that it is also *permissible* in Unicode to spell permissible as purrmisuhbal. That doesn't mean that it would be a good idea to do so, but the standard does not preclude you from doing so. You could even write a rendering algorithm which would display the sequence of Unicode characters p,u,r,r,m,i,s,u,h,b,a,l with the glyphs {permissible} if you so choose. --Ken
Re: Furigana
Tex Texin tex at i18nguy dot com wrote: http://www.unicode.org/unicode/uni2book/ch13.pdf As I read that material, I take it to be saying that senders should remove the I.A. characters. What if I *want* to design an annotation-aware rendering mechanism? Suppose I read Section 13.6 and decide that, instead of just throwing the annotation characters away, I should attempt to display them directly above (and smaller than) the normal text, the way furigana are displayed above kanji. This would work not only for typical Japanese ruby, but also for Michael's English-or-Swedish-over-Bliss scenario. It might even be useful in assisting beleaguered Azerbaijanis, for example, by annotating Latin-script text with its Cyrillic equivalent. (Just a thought.) Would this be conformant? -Doug Ewell Fullerton, California
Re: Furigana
The text says: except for private agreement. So if con-senting a-d-u-l-t-s want to exchange interlinear annotated text, that is fine. (I hyphenated the words because some of my previous emails were rejected by Doug's filters..) tex Doug Ewell wrote: Tex Texin tex at i18nguy dot com wrote: http://www.unicode.org/unicode/uni2book/ch13.pdf As I read that material, I take it to be saying that senders should remove the I.A. characters. What if I *want* to design an annotation-aware rendering mechanism? Suppose I read Section 13.6 and decide that, instead of just throwing the annotation characters away, I should attempt to display them directly above (and smaller than) the normal text, the way furigana are displayed above kanji. This would work not only for typical Japanese ruby, but also for Michael's English-or-Swedish-over-Bliss scenario. It might even be useful in assisting beleaguered Azerbaijanis, for example, by annotating Latin-script text with its Cyrillic equivalent. (Just a thought.) Would this be conformant? -Doug Ewell Fullerton, California -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
RE: Furigana
At 16:35 -0700 2002-08-13, Murray Sargent wrote: Michael Everson said Well then they [interlinear annotation characters] oughtn't to have been encoded. Michael, you aren't an implementer. I'm not the kind of implementor you are. I do implement things. :-) When you implement things unambiguously, you may need internal code points in your plain-text stream to attach higher-level protocols (such as formatting properties) to. Such internal code points should not be exported or imported. Excuse me, this makes no sense whatsoever. If your company, for instance, needed INTERNAL code points to attach to higher level protocols, why did you not use the Private Use Area? Have I got this wrong? You're saying your company did want to use them but wanted them in the non-PUA BMP so they could -- am I getting it right -- be INTERCHANGED. OK, that's fine, but is it the case that these are ONLY allowed to be used by your company? From your point of view perhaps, they shouldn't have been encoded. But from an implementation point of view, they're very handy. Unicode needs to serve both purposes. For what use would Unicode be if you couldn't implement it effectively? I'm saying I WANT to use these characters. They solve an apparent need of mine -- they would be very handy indeed, as I said in the Beijing meeting where they were discussed. I am mystified as to why people are telling me that I shouldn't because lots of applications may strip them out. I am deeply confused. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: RE: Furigana
At 17:59 -0700 2002-08-13, Kenneth Whistler wrote: And Microsoft has others of such beasties hiding internally as anchors for you-don't-wanna-know-what -- also not interchanged. I am ***NOT*** bashing MS here, but what is everyone saying? That these characters should be annotated in the Unicode Standard as for Microsoft's use only? Or is it to be for the use of anyone except Microsoft who does something else? Or what!? -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Furigana
James Kass scripsit: Once a meaning like INTERLINEAR ANNOTATION ANCHOR has been assigned to a code point, any application which chooses to use that code point for any other purpose would be at fault. But a purely nominal one, since any use of these three codepoints should be behind the firewall of the application. I understand that having common internal use code points might be considered handy from an implementer's point of view, but suggest that such conventions should be shared among implementers only, and should not be enshrined in a character encoding standard. I doubt you will see any more such things. BTW, note that FFFC has the same internal-only property. Because it seems to be an oxymoron. If it has a specific semantic meaning, then it should be possible to store and exchange it without any loss of meaning. For what seemed to them good and sufficient reasons, the UTC did not do this: they allocated the points but proscribed them from use in interchange. Had they thought of the permanent non-character block at the time, they probably would not have done this. -- John Cowanhttp://www.ccil.org/~cowan [EMAIL PROTECTED] Please leave your values| Check your assumptions. In fact, at the front desk. | check your assumptions at the door. --sign in Paris hotel |--Miles Vorkosigan
Re: Furigana
Michael Everson scripsit: Excuse me, this makes no sense whatsoever. If your company, for instance, needed INTERNAL code points to attach to higher level protocols, why did you not use the Private Use Area? Well, suppose I wanted to use a codepoint internally to a program for some purpose or other -- for example, to indicate the point at which a graphic was to be inserted in the final HTML output. If I allocated U+E000 to that purpose, then that program could not be used to process CSUR Tengwar text. Thus it is useful to have non-character codepoints, which are not meant to be interchanged, as well as PUA codepoints, which are meant to be interchanged under private agreements. In essence, though not formally, U+FFF9..U+FFFC are non-characters as well, and the Unicode semantics just tells what programs *may* find them useful for. Unicode 4.0 editors: it might be a good idea to emphasize the close relationship of this small repertoire with the non-characters. -- John Cowan http://www.ccil.org/~cowan[EMAIL PROTECTED] To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --The Hobbit
Re: Furigana
At 20:09 -0700 2002-08-12, Doug Ewell wrote: Everybody will welcome the new conventional, graphical-type characters and scripts that are coming with Unicode 4.0. But maybe before standardizing another COMBINING GRAPHEME JOINER or other control-type character, it would be prudent to study the angles even more thoroughly and carefully, and make *damn* sure the character is going to be usable and not discouraged or even deprecated at birth. I tried to get some proper discussion papers on the CGJ from its advocates but they never really appeared. I still have misgivings about this beastly thing. -- Michael Everson *** Everson Typography *** http://www.evertype.com
RE: Furigana
Doug Ewell wrote: I'll have to check with Adelphia and see who or what is trying to protect me from myself. Those automatic b*llsh*ts! A few years ago I was temporarily assigned to the central national office of my previous employer. It was when the Unicode list was discussing something about the *Mongolian* script: each time I tried to access some website having information about the script, the server replied Access to sites advocating abortion is banned on this system! _ Marco
Re: Furigana
Doug (and Michael also): What if I *want* to design an annotation-aware rendering mechanism? Suppose I read Section 13.6 and decide that, instead of just throwing the annotation characters away, I should attempt to display them directly above (and smaller than) the normal text, the way furigana are displayed above kanji. This would work not only for typical Japanese ruby, but also for Michael's English-or-Swedish-over-Bliss scenario. It might even be useful in assisting beleaguered Azerbaijanis, for example, by annotating Latin-script text with its Cyrillic equivalent. (Just a thought.) Would this be conformant? Well, technically conformant, but not wise. If commonly available display and rendering mechanisms are not rendering them as interlinear annotations, then you aren't really providing much assistance here by using a mechanism designed for internal anchors and trying to turn it into something it isn't really up to snuff for. Frankly, you would be much better off making use of the Ruby annotation schemes available in markup languages, which will give you better scoping and attribute mechanisms. Stop worrying a moment about Why are these characters standardized, and why the hedoublehockeysticks can't I use them?! and think about the problem that furigana or any other interlinear annotation rendering system has to address: a. How are the annotations adjusted? Left-adjusted, centered, something else? And what point(s) are they synched on? b. If the annotated text or the annotation itself consist of multiple units, are there subalignments? E.g. note note note note text text textextextext text or note note note note text text textextextext text c. Can an annotation itself be stacked into a multiline form? note note note nononononote text d. Can the text of the annotation itself in turn be annotated? e. Can the text have two or more coequal annotations? And if so, how are they aligned? e. If the annotation is in a distinct style from the text it annotates, how is that indicated and controlled? f. How is line-break controlled on a line which also has an annotation? And so on. This is all the kind of stuff that clearly smacks to me of document formatting concerns and rich text. Why anyone would consider such things to be plain text rather escapes me. --Ken
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
William Overington teased us all unmercifully with: It occurs to me that it is possible to introduce a convention, either as a matter included in the Unicode specification, or as just a known about thing, that if one has a plain text Unicode file with a file name that has some particular extension (any ideas for something like .uof for Unicode object file) ...or to pick an extension, more or less at random, say .html that accompanies another plain text Unicode file which has a file name extension such as .txt, or indeed other choices except .uof (or whatever is chosen after discussion) then the convention could be that the .uof file has on lines of text, in order, the name of the text file then the names of the files which contains each object to which a U+FFFC character provides the anchor. For example, a file with a name such as story7.uof might have the following lines of text as its contents. story7.txt horse.gif dog.gif painting.jpg This is a shaggy dog story, right? The file story7.uof could thus be used with a file named story.txt so as to indicate which objects were intended to be used for three uses of U+FFFC in the file story7.txt, in the order in which they are to be used. Or we could go even further, and specify that in the story7.html file, the three uses of those objects could be introduced with a very specific syntax that would not only indicate the order that they occur in, but could indicate the *exact* location one could obtain the objects -- either on one's own machine or even anywhere around the world via the Internet! And we could even include a mechanism for specifying the exact size that the object should be displayed. For example, we could use something like: img src=http://www.coteindustries.com/dogs/images/dogs4.jpg; width=380 height=260 border=1 or img src=http://www.artofeurope.com/velasquez/vel2.jpg; I can imagine that such a widely used practice might be helpful in bridging the gap between being able to use a plain text file or maybe having to use some expensive wordprocessing package. And maybe someone will write cheaper software -- we could call it a browser -- that could even be distributed for free, so that people could make use of this convention for viewing objects correctly distributed with respect to the text they are embedded in. Yes, yes, I think this is an idea which could fly. --Ken
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
Kenneth Whistler wrote in response to William Overington, ...or to pick an extension, more or less at random, say .html The file story7.uof could thus be used with a file named story.txt so as to indicate which objects were intended to be used for three uses of U+FFFC in the file story7.txt, in the order in which they are to be used. Or we could go even further, and specify that in the story7.html file, the three uses of those objects could be introduced with a very specific syntax that would not only indicate the order that they occur in, but could indicate the *exact* location one could obtain the objects -- either on one's own machine or even anywhere around the world via the Internet! And we could even include a mechanism for specifying the exact size that the object should be displayed. For example, we could use something like: img src=http://www.coteindustries.com/dogs/images/dogs4.jpg; width=380 height=260 border=1 And maybe someone will write cheaper software -- we could call it a browser -- that could even be distributed for free, so that people could make use of this convention for viewing objects correctly distributed with respect to the text they are embedded in. Yes, yes, I think this is an idea which could fly. Well, there might be some serious objections to such a proposal. One, the use of *.html clearly violates the standard file naming convention of eight uppercase ASCII letters followed by a period followed by a *three* letter uppercase ASCII file name extension. Secondly, the use of the greater-than and less-than ASCII characters to denote the mark-up sure appears to be a misuse of those characters. This may well cause too much confusion in parsing. 3superscriptrd/superscript, the cost of development of these hypothetical browsers would be quite high, and we couldn't really expect any such expensive software to be literally given away. There would have to be some catch to it all, wouldn't there? Best regards, James Kass, (P.S. - The point of this response is that maybe we shouldn't hastily reject new concepts just because they seem to fly in the face of existing practices. - JK)
The existing rules for U+FFF9 through to U+FFFC. (spins from Re: Furigana)
John Cowan wrote as follows. In essence, though not formally, U+FFF9..U+FFFC are non-characters as well, and the Unicode semantics just tells what programs *may* find them useful for. Unicode 4.0 editors: it might be a good idea to emphasize the close relationship of this small repertoire with the non-characters. That is not what the specification says. Something can only be emphasised if it is true in the first place! If it is desired to make U+FFF9 through to U+FFFC noncharacters then that needs to be done explicitly with a fair opportunity for people to object and make representations before a decision is made. A saying of my own is as follows. When goalposts are moved, aromatic herbs should be scattered around. It seems to me, not having known about annotation characters previously, yet, due to this thread now having read the published rules in Chapter 13 that these are not noncharacters. It appears to me that the use of the annotation characters in document interchange is never forbidden and is strongly discouraged only where there is no prior agreement between the sender and the receiver, and that that strong discouragement is because the content may be misinterpreted otherwise. So, if there is a prior agreement, then there is no problem about using them in interchanged documents. There appears to be nothing that suggests that U+FFFC cannot be used in an interchanged document. I know little about Bliss symbols, though I have seen a few of them and have read a brief introduction to them, yet it seems to me that annotating Bliss symbols with English or Swedish is entirely within the specification absolutely and would be no more than strongly discouraged even if there is no prior agreement between the sender and the receiver. Further, it seems to me from the published rules that these annotation characters could possibly be used to provide a footnote annotation facility within a plain text file, so that, if a plain text file is being printed out in book format, then a footnote about a word or phrase could be encoded using this technique so that the rendering software could place the footnote on the same page as the word or phrase which is being annotated, regardless of whether that word or phrase occurs near the start, middle or end of that page. It seems to me that the statement of the meaning of U+FFFA means that Figure 13-3 of the specification are just examples, though as the word exact is used, perhaps they are guiding examples and the use in footnotes is perhaps stretching the variation from the examples in the diagram. An interesting point for consideration is as to whether the following sequence is permitted in interchanged documents. U+FFF9 U+FFFC U+FFFA Temperature variation with time. U+FFFB That is, the annotated text is an object replacement character and the annotation is a caption for a graphic. It seems to me that if that is indeed permissible that it could potentially be a useful facility. On balance, it seems to me that if both sender and receiver are clear as to what is meant, then the use of annotation characters for Bliss symbols and for footnotes and for captions for illustrations harms no one, for a person skilled in the art seeking to use the file without knowledge of the interpretation agreement which should ideally exist between sender and receiver and who has only the Unicode specification to go on would probably be unlikely to get a wrong interpretation of the intended meaning, even if the actual graphical layout were imprecise, as the Unicode standard locks together the two parts of the annotation sequence and shows that one of the parts is the annotation for the other part. William Overington 15 August 2002 .
Re: Furigana
At 12:11 -0700 2002-08-08, Kenneth Whistler wrote: Ah, but read the caveats carefully. The Unicode interlinear annotation characters are *not* intended for interchange, unlike the HTML4 ruby tag. See TUS 3.0, p. 326. They are, essentially, internal-use anchor points. What does this mean? That if I have a text all nice and marked up with furigana in Quark I can't export it to Word and reimport it in InDesign and expect my nice marked up text to still be marked up? Surely all Unicode/10646 characters are expected to be preserved in interchange. What have I got wrong, Ken? -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Furigana
At 19:59 +0900 2002-08-08, Dan Kogai wrote: On Thursday, August 8, 2002, at 04:17 , Michael Everson wrote: Where do I start looking for information about implementing furigana? Can you have more than one gloss attached to a word? We are considering implementing this for Blissymbols. What do you mean by implementing? Or to what extent do you want furigana implemented? I want to be able to send a Blissymbol string with a gloss in English or Swedish attached. Nothing to do with Japanese whatsoever. -- Michael Everson *** Everson Typography *** http://www.evertype.com
RE: Furigana
As Ken says the Unicode interlinear annotation characters are for internal use only. Specifically, their meanings can be different for different programs. If you have your nice marked up text in memory and want to export it for use by some program, you need to use a higher-level protocol that translates the interlinear annotation characters to a standardized external format, such as HTML. In addition to U+FFF9 - U+FFFB, there are other characters for internal use only, namely U+FDD0 - U+FDEF. The meanings of these characters also can (and do) differ for different programs. Originally it was hoped that the interlinear annotation characters might be able to describe ruby adequately, but it became clear that additional information is necessary to express ruby unambiguously. Hence the UTC adopted them for internal use only, with associated information presumably stored elsewhere to resolve the ambiguities. Frankly IMHO the best thing for a program to do with reading such characters is to delete them. This isn't quite what one might think from the Standard since they unfortunately aren't labeled as noncharacters. But if a program uses them internally with a well defined meaning, getting them in from an external source can violate the internal usage. To actually roundtrip these rogue characters would require some extra internal protocol to ignore them when they've been read in. So my edit engine (RichEdit), which uses them for table row delimiters, simply deletes them on input and only exports them for RichEdit-specific contexts. Murray -Original Message- From: Michael Everson [mailto:[EMAIL PROTECTED]] Sent: Tuesday, August 13, 2002 7:52 AM To: [EMAIL PROTECTED] Cc: Ken Whistler Subject: Re: Furigana At 12:11 -0700 2002-08-08, Kenneth Whistler wrote: Ah, but read the caveats carefully. The Unicode interlinear annotation characters are *not* intended for interchange, unlike the HTML4 ruby tag. See TUS 3.0, p. 326. They are, essentially, internal-use anchor points. What does this mean? That if I have a text all nice and marked up with furigana in Quark I can't export it to Word and reimport it in InDesign and expect my nice marked up text to still be marked up? Surely all Unicode/10646 characters are expected to be preserved in interchange. What have I got wrong, Ken? -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Furigana
I want to be able to send a Blissymbol string with a gloss in English or Swedish attached. Nothing to do with Japanese whatsoever. Basically, as for all things annotational or interlineating, this is an excellent application for markup. --Ken
Re: Furigana
Hi Michael, ME I want to be able to send a Blissymbol string with a gloss in ME English or Swedish attached. Do you need this in plain text? If I understand Blissymbols correctly, this is just to give an explanation of the Blissymbol string, much like giving the Pinyin pronunciation to a Han ideograph or giving IPA for a native orthography in linguistics textbooks. Philippmailto:[EMAIL PROTECTED]
Re: Furigana
At 14:16 -0700 2002-08-13, Kenneth Whistler wrote: I want to be able to send a Blissymbol string with a gloss in English or Swedish attached. Nothing to do with Japanese whatsoever. Basically, as for all things annotational or interlineating, this is an excellent application for markup. When this was discussed in WG2 in Japan before they went in, I asked specifically, could I use this method to put Anglo-Saxon glosses on Latin text. The answer was positive, so it received my support. Were these always pre-deprecated? Why are they in the standard if no one is going to be allowed to use them? -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Furigana
At 23:50 +0200 2002-08-13, Philipp Reichmuth wrote: Hi Michael, ME I want to be able to send a Blissymbol string with a gloss in ME English or Swedish attached. Do you need this in plain text? We are exploring what to do. If I understand Blissymbols correctly, this is just to give an explanation of the Blissymbol string, much like giving the Pinyin pronunciation to a Han ideograph or giving IPA for a native orthography in linguistics textbooks. But Blissymbols are most often transmitted (in gifs for instance) with glosses which help people not literate in Blissymbols but able to read other languages to understand what is being said. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Furigana
Michael, At 14:16 -0700 2002-08-13, Kenneth Whistler wrote: I want to be able to send a Blissymbol string with a gloss in English or Swedish attached. Nothing to do with Japanese whatsoever. Basically, as for all things annotational or interlineating, this is an excellent application for markup. When this was discussed in WG2 in Japan before they went in, I asked specifically, could I use this method to put Anglo-Saxon glosses on Latin text. The answer was positive, so it received my support. Were these always pre-deprecated? Why are they in the standard if no one is going to be allowed to use them? Read the discussion which has been published in the Unicode Standard ever since these things were available. TUS 3.0, pp. 325 - 326. The annotation characters are used in internal processing when ^^^ out-of-band information is associated with a character stream, very similarly to the usage of the U+FFFC OBJECT REPLACEMENT CHARACTER... Usage of the annotation characters in plain text interchange is strongly discouraged without prior agreement between the sender and received because the content may be misinterpreted otherwise... When an output for plain text usage is desired and when the receiver ^ is unknown to the sender, these interlinear annotation characters should be removed... ^ The Japanese national body was very clear about this, and was opposed to these going into the standard unless such clarifications were made, to ensure that these were not intended for plain text interchange of furigana (or other similar annotations). --Ken
Re: Furigana
At 16:00 -0700 2002-08-13, Kenneth Whistler wrote The Japanese national body was very clear about this, and was opposed to these going into the standard unless such clarifications were made, to ensure that these were not intended for plain text interchange of furigana (or other similar annotations). Well then they oughtn't to have been encoded. -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Furigana
Michael Everson (in training as a curmudgeon) harrumpfed ;-) The Japanese national body was very clear about this, and was opposed to these going into the standard unless such clarifications were made, to ensure that these were not intended for plain text interchange of furigana (or other similar annotations). Well then they oughtn't to have been encoded. Yes, we agree that hindsight is a wonderful skill. This function would better be served by noncharacter code points, but nobody had quite figured out how to articulate that yet. But even at the time, as the record of the deliberations would show, if we had a more perfect record, the proponents were clear that the interlinear annotation characters were to solve an internal anchor point representation problem. Nobody (well, maybe somebody) expected them to serve as a substitute for a general markup mechanism for indication of annotation, and in particular, interlinear annotations. I recall at the time I pointed out that as a linguist I had routinely made use of 4-line interlinear annotation formats, and that this simple anchoring scheme couldn't even begin to represent such complexities in a usable fashion. --Ken
RE: Furigana
Michael Everson said Well then they [interlinear annotation characters] oughtn't to have been encoded. Michael, you aren't an implementer. When you implement things unambiguously, you may need internal code points in your plain-text stream to attach higher-level protocols (such as formatting properties) to. Such internal code points should not be exported or imported. From your point of view perhaps, they shouldn't have been encoded. But from an implementation point of view, they're very handy. Unicode needs to serve both purposes. For what use would Unicode be if you couldn't implement it effectively? Murray
Re: RE: Furigana
Michael, you aren't an implementer. When you implement things unambiguously, you may need internal code points in your plain-text stream to attach higher-level protocols (such as formatting properties) to. That seems to be basically what William Overington is proposing, except these characters only handle furigana, instead all markup.
Re: Furigana
Murray, It's true implementers need some place to attach higher level protocols, but they don't need specific points for specific implementations of internal protocols. If they weren't good enough to be used for exchange, then simply having some unpurposed code points available for internal use accomplishes the same thing and is available for other purposes as well. But at the time the annotation characters were introduced, we were unclear about this. tex Murray Sargent wrote: Michael Everson said Well then they [interlinear annotation characters] oughtn't to have been encoded. Michael, you aren't an implementer. When you implement things unambiguously, you may need internal code points in your plain-text stream to attach higher-level protocols (such as formatting properties) to. Such internal code points should not be exported or imported. From your point of view perhaps, they shouldn't have been encoded. But from an implementation point of view, they're very handy. Unicode needs to serve both purposes. For what use would Unicode be if you couldn't implement it effectively? Murray -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: Furigana
Ken, http://www.unicode.org/unicode/uni2book/ch13.pdf As I read that material, I take it to be saying that senders should remove the I.A. characters. Does the standard discuss anywhere filtering the characters on the receiver side? Clearly Murray has good justification for removing the I.A. characters as it interferes with his use of the code points internally. (Although I am sure that could also be designed in ways to preserve them if absolutely needed.) But does the standard address their removal by receivers (or intermediaries) , and does removing them include removing the contained annotation? I can imagine an application that doesn't support I.A. deciding the annotation is out of band and can't be preserved in its plain text output, and so justifiably strips it as well. Does the standard say what to do with for internal use only characters? I would have thought the rule was to ignore and pass along. Kenneth Whistler wrote: Michael, At 14:16 -0700 2002-08-13, Kenneth Whistler wrote: Read the discussion which has been published in the Unicode Standard ever since these things were available. TUS 3.0, pp. 325 - 326. The annotation characters are used in internal processing when ^^^ out-of-band information is associated with a character stream, very similarly to the usage of the U+FFFC OBJECT REPLACEMENT CHARACTER... Usage of the annotation characters in plain text interchange is strongly discouraged without prior agreement between the sender and received because the content may be misinterpreted otherwise... When an output for plain text usage is desired and when the receiver ^ is unknown to the sender, these interlinear annotation characters should be removed... ^ The Japanese national body was very clear about this, and was opposed to these going into the standard unless such clarifications were made, to ensure that these were not intended for plain text interchange of furigana (or other similar annotations). --Ken -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: Furigana
Tex asked: But does the standard address their removal by receivers (or intermediaries) , and does removing them include removing the contained annotation? Yes and yes. p. 326: On input, a plain text receiver should either preserve all characters ^^ or remove the interlinear annotation characters as well as the annotating ^^ text... I can imagine an application that doesn't support I.A. deciding the annotation is out of band and can't be preserved in its plain text output, and so justifiably strips it as well. Does the standard say what to do with for internal use only characters? Yes. Unicode 3.1: D7b: Noncharacter: a code point that is permanently reserved for internal use, and that should never be interchanged. C10: A process shall make no change in a valid coded character representation other than the possible replacement of character sequences by their canonical-equivalent sequences or the deletion of noncharacter code points, if that process purports not to modify the interpretation of that coded character sequence. The interlinear annotation characters fall in a gray zone, since they are not noncharacters, but by rights ought to have been. Since they are standard characters though, the standard has to provide some guidelines -- and it is simply safer, if you encounter and delete them, to also delete the annotation. You would be changing the interpretation of the text, but in a knowing, intended manner. I would have thought the rule was to ignore and pass along. In general, yes, as for everything else, including unassigned code points. If your role in life is as a database, for example, or some other kind of data source or data pipe, then minimal meddling with the bytes is safest. But other kinds of processes will do graduated manipulations, depending on what they are aiming for. --Ken
RE: Furigana
I agree. The current thinking is that U+FFF9 - U+FFFB are have no external meaning and shouldn't appear externally, i.e., they are noncharacters in every way except in the spec (sigh). They can be used for whatever an implementer wants internally. I mentioned earlier that the RichEdit edit engine uses them for table-row delimiters, which have nothing to do with Furigana. Instead, RichEdit 5.0 uses codes from the U+FDD0 - U+FDEF block for Furigana and various 2D math objects. Thanks Murray -Original Message- From: Tex Texin [mailto:[EMAIL PROTECTED]] Sent: Tuesday, August 13, 2002 6:11 PM To: Murray Sargent Cc: Michael Everson; [EMAIL PROTECTED] Subject: Re: Furigana Murray, It's true implementers need some place to attach higher level protocols, but they don't need specific points for specific implementations of internal protocols. If they weren't good enough to be used for exchange, then simply having some unpurposed code points available for internal use accomplishes the same thing and is available for other purposes as well. But at the time the annotation characters were introduced, we were unclear about this. tex
Re: Furigana
Thanks Ken. I don't know how I missed the text on 326 when I scanned it before I mailed. tex Kenneth Whistler wrote: Tex asked: But does the standard address their removal by receivers (or intermediaries) , and does removing them include removing the contained annotation? Yes and yes. p. 326: On input, a plain text receiver should either preserve all characters ^^ or remove the interlinear annotation characters as well as the annotating ^^ text... I can imagine an application that doesn't support I.A. deciding the annotation is out of band and can't be preserved in its plain text output, and so justifiably strips it as well. Does the standard say what to do with for internal use only characters? Yes. Unicode 3.1: D7b: Noncharacter: a code point that is permanently reserved for internal use, and that should never be interchanged. C10: A process shall make no change in a valid coded character representation other than the possible replacement of character sequences by their canonical-equivalent sequences or the deletion of noncharacter code points, if that process purports not to modify the interpretation of that coded character sequence. The interlinear annotation characters fall in a gray zone, since they are not noncharacters, but by rights ought to have been. Since they are standard characters though, the standard has to provide some guidelines -- and it is simply safer, if you encounter and delete them, to also delete the annotation. You would be changing the interpretation of the text, but in a knowing, intended manner. I would have thought the rule was to ignore and pass along. In general, yes, as for everything else, including unassigned code points. If your role in life is as a database, for example, or some other kind of data source or data pipe, then minimal meddling with the bytes is safest. But other kinds of processes will do graduated manipulations, depending on what they are aiming for. --Ken -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: Furigana
Kenneth Whistler wrote, The interlinear annotation characters fall in a gray zone, since they are not noncharacters, but by rights ought to have been. Since they are standard characters though, the standard has to provide some guidelines -- and it is simply safer, if you encounter and delete them, to also delete the annotation. You would be changing the interpretation of the text, but in a knowing, intended manner. Should a character encoding standard ever encode a non-character? Is there such a thing as a non-character with a specific semantic meaning? Can't apps needing internal processing code points which are only going to be deleted before export simply use the PUA? If the PUA isn't acceptable, and the existing code points reserved for undefined non-characters isn't large enough, wouldn't it be better to assign a range of undefined non-characters in one of the higher planes for these internal processing needs? No application should delete anything without first asking the user's permission. Imagine spending considerable time and effort getting a text to look just as desired only to have some application arbitrarily decide to delete half of it without your permission or knowledge. Best regards, James Kass.
Re: Furigana
James Kass scripsit: Should a character encoding standard ever encode a non-character? Non-characters aren't encoded, they're reserved either for specific purposes or for any desired purpose. Is there such a thing as a non-character with a specific semantic meaning? Why not? Can't apps needing internal processing code points which are only going to be deleted before export simply use the PUA? No, because they may need the PUA to represent characters interchanged under a private agreement. -- John Cowan [EMAIL PROTECTED] www.ccil.org/~cowan www.reutershealth.com In computer science, we stand on each other's feet. --Brian K. Reid
Re: Furigana
John Cowan wrote, Non-characters aren't encoded, they're reserved either for specific purposes or for any desired purpose. If it's a specific purpose, it seems like it should either fall under character or mark-up. I can understand reserving code points for any desired purpose, such as control characters or escape sequences. These may well differ from application to application. Once a meaning like INTERLINEAR ANNOTATION ANCHOR has been assigned to a code point, any application which chooses to use that code point for any other purpose would be at fault. In other words, if these characters are to be used internally for Japanese Ruby (furigana), etc., then they ought to be able to be used externally, as well. I understand that having common internal use code points might be considered handy from an implementer's point of view, but suggest that such conventions should be shared among implementers only, and should not be enshrined in a character encoding standard. Is there such a thing as a non-character with a specific semantic meaning? Why not? Because it seems to be an oxymoron. If it has a specific semantic meaning, then it should be possible to store and exchange it without any loss of meaning. In other words, it's a character and should be so encoded. (Logos and such notwithstanding.) Best regards, James Kass.
Re: Furigana
Michael asked: At 12:11 -0700 2002-08-08, Kenneth Whistler wrote: Ah, but read the caveats carefully. The Unicode interlinear annotation characters are *not* intended for interchange, unlike the HTML4 ruby tag. See TUS 3.0, p. 326. They are, essentially, internal-use anchor points. What does this mean? That if I have a text all nice and marked up with furigana in Quark I can't export it to Word and reimport it in InDesign and expect my nice marked up text to still be marked up? Yes, among other things. Surely all Unicode/10646 characters are expected to be preserved in interchange. What have I got wrong, Ken? Your expectation that this stuff will actually work that way. Yes, the characters will be preserved in interchange. But the most likely result you will get is: anchor1textanchor2annotationanchor3 where the anchors will just be blorts. You should not expect that the whole annotation *framework* will be implemented, and certainly not that these three characters will suffice for nice[ly] marked up... furigana. These animals are more like U+FFFC -- they are internal anchors that should not be exported, as there is no general expectation that once exported to plain text, a receiver will have sufficient context for making sense of them in the way the originator was dealing with them internally. By rights, this whole problem of synchronizing the internal anchor points for various ruby schemes should have been handled by noncharacters -- but that mechanism was not really understood and expanded sufficiently until after the interlinear annotation characters were standardized. --Ken
Re: Furigana
Kenneth Whistler kenw at sybase dot com wrote: Surely all Unicode/10646 characters are expected to be preserved in interchange. What have I got wrong, Ken? Your expectation that this stuff will actually work that way. Yes, the characters will be preserved in interchange. But the most likely result you will get is: anchor1textanchor2annotationanchor3 where the anchors will just be blorts. You should not expect that the whole annotation *framework* will be implemented, and certainly not that these three characters will suffice for nice[ly] marked up... furigana. I don't have any problem with the idea that many, or even all, of today's applications lack meaningful support for ideographical annotation characters, and will display them as blorts, and I doubt that Michael expects widespread support for them either. What worries me is what Ken saus next: These animals are more like U+FFFC -- they are internal anchors that should not be exported, as there is no general expectation that once exported to plain text, a receiver will have sufficient context for making sense of them in the way the originator was dealing with them internally. By rights, this whole problem of synchronizing the internal anchor points for various ruby schemes should have been handled by noncharacters -- but that mechanism was not really understood and expanded sufficiently until after the interlinear annotation characters were standardized. This moves the entire issue out of the realm of poor support and into the big, dark, scary cavern of pre-deprecation. Unicode 3.0 doesn't say exactly what Ken says. Unicode 3.0 (p. 326) says the annotation characters should only be used under prior agreement between the sender and the receiver because the content may be misinterpreted otherwise. Fine, no problem; those are the same rules that apply to the PUA. Ken, though, seems to say they shouldn't be exported at all, and furthermore they shouldn't even have been encoded in the first place, except that the noncharacters (which explicitly mustn't be interchanged) hadn't been invented yet. This sounds like Plane 14, or the combining Vietnamese tone marks, all over again -- Unicode (and/or WG2) invents a mechanism, but then wishes they hadn't, or thinks of a better way, so the mechanism is strongly discouraged and eventually deprecated. (Not that I liked the separate Vietnamese tone marks; don't get me wrong.) Some groups, like IDN and the security mavens, criticize Unicode for its perceived instability. A lot of the attention seems to revolve around gray areas of normalization and bidi, or confusable glyphs (what I call spoof buddies). Can I suggest that a potentially larger source of instability comes from the creation of characters and encoding mechanisms that are subsequently discouraged or deprecated because maybe they weren't fully thought out in the first place? The approval process in Unicode, and especially WG2, is a slow one, and some of these on second thought decisions race ahead of the approval process, so that the mechanisms are already doomed by the time of publication. Everybody will welcome the new conventional, graphical-type characters and scripts that are coming with Unicode 4.0. But maybe before standardizing another COMBINING GRAPHEME JOINER or other control-type character, it would be prudent to study the angles even more thoroughly and carefully, and make *damn* sure the character is going to be usable and not discouraged or even deprecated at birth. (No, I have never been involved in the character standardization process -- but I *have* been on committees that encoded other types of things too hastily and then had to find a way to take back their decision.) -Doug Ewell Fullerton, California
Re: Furigana
At 12:11 -0700 2002-08-08, Kenneth Whistler wrote: Ah, but read the caveats carefully. The Unicode interlinear annotation characters are *not* intended for interchange, unlike the HTML4 ruby tag. See TUS 3.0, p. 326. They are, essentially, internal-use anchor points. What does this mean? That if I have a text all nice and marked up with furigana in Quark I can't export it to Word and reimport it in InDesign and expect my nice marked up text to still be marked up? Surely all Unicode/10646 characters are expected to be preserved in interchange. What have I got wrong, Ken? -- Michael Everson *** Everson Typography *** http://www.evertype.com
Re: Furigana
Stefan wrote: Many Japanese word processors already have that capability. HTML4 has ruby tag exactly for that purpose. And Unicode has characters for that purpose, too. Unicode: U+FFF9 kanji U+FFFA furigana U+FFFB HTML4: RUBYRD kanji /RDRT furigana /RT/RUBY Examples: ?$B4A;z(B?$B$U$j$,$J(B? $B4A;z$U$j$,$J(B Ah, but read the caveats carefully. The Unicode interlinear annotation characters are *not* intended for interchange, unlike the HTML4 ruby tag. See TUS 3.0, p. 326. They are, essentially, internal-use anchor points. --Ken
Re: Furigana can be katakana
- Original Message - From: ろ ろ〇〇〇 [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: den 25 januari 2002 23:23 Subject: Furigana can be katakana In my Love Hina vol 7, 千年 has furigana ミレニアム. In cases such as ?瑞典?スウェーデン? (is the furigana encoded correctly?) the furigana should always be written in katakana, right? Stefan _ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
RE: Furigana codes?
Daniel Biddle wrote: On Wed, 5 Jul 2000, Rick McGowan wrote: iRck I thought this was a typo until I saw your address. U263A It's not a typo: Rick's signature has passed through an Indic renderer, so the "i" was reordered. U+FF1AU+FF0DU+FF09 _ Maco`
Re: Furigana codes?
Will someone PLEASE send this boy a book!? iRck Begin forwarded message: From: [EMAIL PROTECTED] Date: Sat, 01 Jul 2000 02:49:30 -0800 (GMT-0800) To: Unicode List [EMAIL PROTECTED] Subject: Furigana codes? X-UML-Sequence: 14481 (2000-07-01 10:49:31 GMT) Are there furigana codes? If not, there darn well need to be. Like: BEGIN WHAT THE FURIGANA IS FOR, then START FURIGANA, then END FURIGANA.
Re: Furigana codes?
From: [EMAIL PROTECTED] Are there furigana codes? If not, there darn well need to be. Like: BEGIN WHAT THE FURIGANA IS FOR, then START FURIGANA, then END FURIGANA. AFAIK, Furigana is not made up of separate code points it is text that can be Hiragana, Katakana, or Romanji. There are converters build into all versions of Microsoft Access 2000/Excel 2000 (and Asian versions of Access 97 andd Excel 97). I have also seen a couple on the web. In any case, what are you wanting to see covered, and where? michka