Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
On 08/16/2002 04:58:58 PM "William Overington" wrote: >The DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system >(details at http://www.mhp.org ) which implements my telesoftware invention. >A Java program which has been broadcast can read a Unicode plain text file >and act upon the characters within it, and can read other file formats, such >as .png files (Portable Network Graphics) and act upon the information in >those files, so as to produce a display. > >So, a collection of files, namely a .uof file in the format that I suggested >it, a Unicode plain text file with one or more U+FFFC characters in it and >the appropriate graphics files in .png format as a package of free to the >end user distance education learning material being broadcast from a direct >broadcasting satellite or a terrestrial transmitter could be a very useful >facility as the way to carry text with illustrations. I'd suggest that it would be far more useful to use a marked-up file format based on XML. It doesn't have to be verbose (besides which, the bandwidth requirements of embedded graphics will be far greater than any requirements for markup used to indicate their position within the text). The reason I think this would be far more advantageous is that there has been a massive interest throughout the IT industry in XML, meaning that there are lots of software implementations that support it, and it is very easy to build processes for publishing content. You coulde probably use any commonly-used database product out there to generate XML content suited for DVB-MHP; in fact, it would be easy to take some existing XML-based publishing process and extend it to support an XML-based file format specifically intended for DVB-MHP. In contrast, if you want to invent a new file format, then you've got to create new software implementations to go with it, and bolting that into any existing publishing process will be far more costly. >Using HTML and a browser is just not the way to proceed in that situation. >HTML and a browser is a very useful technique for the web and indeed is an >option for the DVB-MHP system, yet the basic software system is Java based. Markup does not have to imply HTML and a Web browser. I'm sure you'd find a lot of Java implementations that made use of XML-based file formats, and though I'm not a Java programmer, I'm certain that you can find good support for parsing or generating XML streams in Java. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
On 08/16/2002 04:58:58 PM "William Overington" wrote: >The DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system >(details at http://www.mhp.org ) which implements my telesoftware invention. >A Java program which has been broadcast can read a Unicode plain text file >and act upon the characters within it, and can read other file formats, such >as .png files (Portable Network Graphics) and act upon the information in >those files, so as to produce a display. > >So, a collection of files, namely a .uof file in the format that I suggested >it, a Unicode plain text file with one or more U+FFFC characters in it and >the appropriate graphics files in .png format as a package of free to the >end user distance education learning material being broadcast from a direct >broadcasting satellite or a terrestrial transmitter could be a very useful >facility as the way to carry text with illustrations. I'd suggest that it would be far more useful to use a marked-up file format based on XML. It doesn't have to be verbose (besides which, the bandwidth requirements of embedded graphics will be far greater than any requirements for markup used to indicate their position within the text). The reason I think this would be far more advantageous is that there has been a massive interest throughout the IT industry in XML, meaning that there are lots of software implementations that support it, and it is very easy to build processes for publishing content. You coulde probably use any commonly-used database product out there to generate XML content suited for DVB-MHP; in fact, it would be easy to take some existing XML-based publishing process and extend it to support an XML-based file format specifically intended for DVB-MHP. In contrast, if you want to invent a new file format, then you've got to create new software implementations to go with it, and bolting that into any existing publishing process will be far more costly. >Using HTML and a browser is just not the way to proceed in that situation. >HTML and a browser is a very useful technique for the web and indeed is an >option for the DVB-MHP system, yet the basic software system is Java based. Markup does not have to imply HTML and a Web browser. I'm sure you'd find a lot of Java implementations that made use of XML-based file formats, and though I'm not a Java programmer, I'm certain that you can find good support for parsing or generating XML streams in Java. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
On 08/17/2002 09:29:00 AM "William Overington" wrote: >Peter Constable wrote as follows. > >>The standard already specifies that FFFC should not be exported from an >application or interchanged. > >As far as I am aware that is not presently the case. > >If you still say that that is correct, could you please state the exact text >of the standard relating to this matter and where in the standard that text >can be found please? OK, it doesn't say it explicitly; nevertheless, I believe I know what the intent of the text is, and that it is not condoning interchange of FFFC. The fact that the text isn't more explicit is something that could perhaps be improved; but if you think about what the text on pp 326-7 *does* say, I think this intent can be detected. It seems clear to me that it assumes usage within the context of some higher-level protocol, such as would be imposed by a software process. For instance, the text makes reference to " the object's formatting information", but Unicode / plain text does not provide representation for such information. Thus, there necessarily must be some other protocol at work within which that information is represented. FFFC, then, it something that is utilised by that higher-level protocol. Hence, this section of the Standard is *not* talking about FFFC being used in interchanged plain text. It is, rather, assuming usage internal to some processing context or other higher-level protocol. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
Tex Texin wrote as follows. >William, > >So let me see if I understand this correctly. > >Let's take 2 perfectly good standards, Unicode and HTML, Yes. and make some >very minor tweaks to them, No. such as changing the meaning of U+FFFC and a >special format for filenames in the beginning of the file and a new >extension, so we have something new. I have suggested no changes whatsoever to HTML at all. The only thing which I have suggested in relation to Unicode in this thread is that, in relation to the fact that information about the object to which any particular use of U+FFFC refers is kept outside the character data stream, that it could be a good idea to define a file format .uof so that details of the names of the files for which the U+FFFC codes are anchors could be provided in a known format, if and only if end users chose to use a .uof file for that purpose on that occasion and not otherwise. This was in the context of seeking to protect the use of U+FFFC as a character which could be used in interchanging of documents following from the discussion of U+FFFC and annotation characters in the thread from off of which I spun this thread, which discussion, by Ken and Doug, is repeated in the first posting of this present thread. I thought it a good idea that the Unicode Technical Committee might like to make such a .uof file format an official Unicode document so as to offer one possible way to use U+FFFC codes. That is now a matter for discussion. If the Unicode Consortium wishes to do that, then fine. If the Unicode Consortium chooses not to do that, then I can write it up myself and publish it, which is not such a good solution, yet is adequate for my own needs and might be useful for some other people if they choose to use the same format for .uof files. Hopefully I have now managed to raise the issue of protecting the fact that the U+FFFC character can be used in document interchange and it will hopefully not become deprecated to the status of a noncharacter. There is a practical reason for this, which is, from my own perspective, quite important. This is as follows. The DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system (details at http://www.mhp.org ) which implements my telesoftware invention. A Java program which has been broadcast can read a Unicode plain text file and act upon the characters within it, and can read other file formats, such as .png files (Portable Network Graphics) and act upon the information in those files, so as to produce a display. So, a collection of files, namely a .uof file in the format that I suggested it, a Unicode plain text file with one or more U+FFFC characters in it and the appropriate graphics files in .png format as a package of free to the end user distance education learning material being broadcast from a direct broadcasting satellite or a terrestrial transmitter could be a very useful facility as the way to carry text with illustrations. Using HTML and a browser is just not the way to proceed in that situation. HTML and a browser is a very useful technique for the web and indeed is an option for the DVB-MHP system, yet the basic software system is Java based. It is as if the television set is acting as a computer which has a slow read only access disc drive in the sky from which it may gather information, including software. The system is interactive with no return information link to the central broadcasting computer, by means of the telesoftware invention. Overlays and virtual running with programs bigger than the local storage being able to be run using chaining techniques are possible. Please do not think of this as downloading as no uplink request is made! >Now the big benefit of this completely new thing, Well, it's only a way of sender and receiver being able to have information in a file with the suffix .uof about what objects are being anchored by U+FFFC codes in a Unicode plain text file which it accompanies. is that programs that >do desktop publishing can use plain text files which are not quite plain >text because they have some special formatting, Well, the plain text files are only Unicode plain text which might contain one or more U+FFFC characters and some of the other Unicode control characters such as CARRIAGE RETURN. but now they can publish >them in better manner than before. Well, my thinking is that it would help to have a well known way to express the meaning of the anchors encoded by U+FFFC in a file rather than having only a vague specification that all other information about the object is kept outside the data stream. I am saying that, yes, all other information about the object is kept outside the data stream and, if, and only if, end users choose to use a .uof file in a standard format to convey that information for some particular use of a U+FFFC code, then that format could be considered for definition and publication by the Unicode Consortium. That does not seem unreasonab
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
James Kass wrote as follows. >William Overington wrote, > >> >> No, it is a story about an artist who wanted to paint a picture of a horse >> and a picture of a dog and, since he knew that the horse and the dog were >> great friends and liked to be together and also that he only had one canvas >> upon which to paint, the artist painted a picture of a landscape with the >> horse and the dog in the foreground, thereby, as the saying goes, painting >> two birds on one canvas, http://www.users.globalnet.co.uk/~ngo/bird0001.htm >> in that he achieved two results by one activity. In addition the picture >> has various interesting details in the background, such as a windmill in a >> plain (or is that a windmill in a plain text file). :-) >> > >1) It's gif file format rather than plain text.* >2) There isn't any windmill. The picture of the birds has been in our family webspace since 1998 as an illustration for the saying "Painting two birds on one canvas". That saying, originated by me, is a peaceful saying meaning to achieve two results by one activity. I made the picture from clip art as a learning exercise. The picture of the birds is referenced as a way of illustrating the saying "Painting two birds on one canvas". It is not the picture in the story about which Ken asked. I may well have a go at constructing such a picture, perhaps using clip art. The reference to a windmill is meant as a humourous aside to Don Quixote tilting at windmills. I am interested in creative writing, so when Ken asked about the story, I just thought of something to put in my response. Part of the training in, and the fun of, creative writing is to be able to write something promptly to a topic. William Overington 16 August 2002
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
On 08/14/2002 02:04:50 PM "William Overington" wrote: >As this concerns the U+FFFC character and the Unicode Technical Committee is >due to meet next week, I think it might be helpful if this idea is discussed >before the meeting as a straightforward idea like this might mean that the >possibility to exchange U+FFFC characters at all if people want to do so is >not lost. This does not solve any problems not already solved. This is not plain text; it is a form of interchange markup and a higher-level protocol. There are already higher-level markup protocols that accomplish this. The standard already specifies that FFFC should not be exported from an application or interchanged. There is no reason to change this. >>Everybody will welcome the new conventional, graphical-type characters >>and scripts that are coming with Unicode 4.0. > >What are those please? See the "Proposed characters" section of the Unicode site. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
William, So let me see if I understand this correctly. Let's take 2 perfectly good standards, Unicode and HTML, and make some very minor tweaks to them, such as changing the meaning of U+FFFC and a special format for filenames in the beginning of the file and a new extension, so we have something new. Now the big benefit of this completely new thing, is that programs that do desktop publishing can use plain text files which are not quite plain text because they have some special formatting, but now they can publish them in better manner than before. For example, plain text with pictures. This is great. (It is true that it is less capable than if we had just used enough html to do the same thing, but .uof is more like plain text than html is.) Programmers will be happy because now they can support plain text with just a few tweaks. Oh I almost forgot, they also have to support Unicode, but slightly tweaked. And they can also support HTML, with some minor tweaks for .uof. Of course programmers don't mind supporting lots of variations of the same thing. Customer support personnel also don't mind. Oh, the plain text programmers will now need to support pictures and other aspects of full publishing, but at least they won't have a complex file format to work with. I guess it doesn't matter that a more complex format is also more expressive and therefore can leverage all of the publishing features. It probably doesn't matter that a desktop publishing product probably already supports more complex formats, and probably also supports html, it will be beneficial to add this slight difference from plain text. I like this very much. It is very much like when the magician slides the knot in the string and makes it disappear. I imagine that over time we will have some more wonderful inventions and add further tweaks and further improve the publishing of plain text. There are a few other things I would like to improve in Unicode, so I hope it will be ok to make some other suggestions. We can change the extention to know which tweaks we are talking about. .uo1, .uo2. Just a few small changes to characters and plain text format variations. Stability of the meaning of the file isn't important. However, I think my first suggestion will be to make the benefits of .uof available to XML. We can all this .uo1. I am a little disconcerted that html already can do everything that .uof does plus more, and is also supported by all of the publishers that are like to support .uof. Also, as there are more than a million characters in Unicode, most are unused so far, so changing the meaning of just FFFC in this one context doesn't seem like a big win, considering also every line of code that might work with FFFC now needs to consider the context to determine its semantics. But every invention deserves to be implemented, we need not look at whether the invention satisfies some demand of its customers. I like the 2 birds picture and I assume it was a metaphor for the idea- one bird was html the other unicode. I was a little disappointed that you used html instead of .uof format though. Maybe its the lateness of the hour here. I hope the idea looks as good in the morning. Oh I almost forgot. I was having difficulty discerning when you and Ken might be joking. The mails read very serious. I would like to suggest we make a new format .uo2. We can indicate line numbers and emotions with plain text characters that look like facial expressions. It would help me know when you both were serious and when you might be joking. Sometimes it is hard to tell. I am going to create a list of facial expressions and assign them in the PUA so we can all have a standard to follow. See my next mail with a list of facial expressions and assignments. tex William Overington wrote: > > Kenneth Whistler wrote as follows about my idea. > > >> It occurs to me that it is possible to introduce a convention, either as > a > >> matter included in the Unicode specification, or as just a known about > >> thing, that if one has a plain text Unicode file with a file name that > has > >> some particular extension (any ideas for something like .uof for Unicode > >> object file) > > > >...or to pick an extension, more or less at random, say ".html" > > Well, that could produce confusion with a .html file used for Hyper Text > Markup Language, HTML. > > I suggested .uof so that a .uof file would be known as being for this > purpose. > > > > >> that accompanies another plain text Unicode file which has a > >> file name extension such as .txt, or indeed other choices except .uof (or > >> whatever is chosen after discussion) then the convention could be that > the > >> .uof file has on lines of text, in order, the name of the text file then > the > >> names of the files which contains each object to which a U+FFFC character > >> provides the anchor. > >> > >> For example, a file with a name such as story7.uof might have the > following > >> lines of text as its
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
William Overington wrote, > > No, it is a story about an artist who wanted to paint a picture of a horse > and a picture of a dog and, since he knew that the horse and the dog were > great friends and liked to be together and also that he only had one canvas > upon which to paint, the artist painted a picture of a landscape with the > horse and the dog in the foreground, thereby, as the saying goes, painting > two birds on one canvas, http://www.users.globalnet.co.uk/~ngo/bird0001.htm > in that he achieved two results by one activity. In addition the picture > has various interesting details in the background, such as a windmill in a > plain (or is that a windmill in a plain text file). :-) > 1) It's gif file format rather than plain text.* 2) There isn't any windmill. Best regards, James Kass, * P.S. - But, it's a nice gif file. In fact, aside from the absence of the windmill, it exceeded my expectations. -JK.
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
> >>Yes, yes, I think this is an idea which could fly. >> >>--Ken >> > >Good. It is a solution which could be very useful for people writing >programs in Java, Pascal and C and so on which programs take in plain text >files and process them for such purposes as producing a desktop publishing >package. Uhh, I think Ken's message was entirely sarcasm or some higher form of rhetorical humor whose obscure name slips my mind right now. The suggestion to use "html" as an extension was the give away - I was laughing out loud from that point on - his point was that the technology to do what you want already exists it is called HTML and it is displayed by "browsers" and so forth. Barry Caplan www.i18n.com
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
Kenneth Whistler wrote as follows about my idea. >> It occurs to me that it is possible to introduce a convention, either as a >> matter included in the Unicode specification, or as just a known about >> thing, that if one has a plain text Unicode file with a file name that has >> some particular extension (any ideas for something like .uof for Unicode >> object file) > >...or to pick an extension, more or less at random, say ".html" Well, that could produce confusion with a .html file used for Hyper Text Markup Language, HTML. I suggested .uof so that a .uof file would be known as being for this purpose. > >> that accompanies another plain text Unicode file which has a >> file name extension such as .txt, or indeed other choices except .uof (or >> whatever is chosen after discussion) then the convention could be that the >> .uof file has on lines of text, in order, the name of the text file then the >> names of the files which contains each object to which a U+FFFC character >> provides the anchor. >> >> For example, a file with a name such as story7.uof might have the following >> lines of text as its contents. >> >> story7.txt >> horse.gif >> dog.gif >> painting.jpg > >This is a shaggy dog story, right? No, it is a story about an artist who wanted to paint a picture of a horse and a picture of a dog and, since he knew that the horse and the dog were great friends and liked to be together and also that he only had one canvas upon which to paint, the artist painted a picture of a landscape with the horse and the dog in the foreground, thereby, as the saying goes, painting two birds on one canvas, http://www.users.globalnet.co.uk/~ngo/bird0001.htm in that he achieved two results by one activity. In addition the picture has various interesting details in the background, such as a windmill in a plain (or is that a windmill in a plain text file). :-) >> The file story7.uof could thus be used with a file named story.txt so as to >> indicate which objects were intended to be used for three uses of U+FFFC in >> the file story7.txt, in the order in which they are to be used. > >Or we could go even further, and specify that in the story7.html file, >the three uses of those objects could be introduced with a very specific >syntax that would not only indicate the order that they occur in, but >could indicate the *exact* location one could obtain the objects -- either on >one's own machine or even anywhere around the world via the Internet! And we could >even include a mechanism for specifying the exact size that the object should be >displayed. For example, we could use something like: > >http://www.coteindustries.com/dogs/images/dogs4.jpg"; width="380" > height="260" border="1"> > >or > >http://www.artofeurope.com/velasquez/vel2.jpg";> Now that is a good idea. In a .uof file specifically for the purpose, a line beginning with a < character could be used to indicate a web based reference, or a local reference, for the object, using exactly the same format as is used in an HTML file. If the line does not start with a < character, then it is simply a file name in the same directory as the .uof file, as I suggested originally. This would mean that where, say, a .uof file were broadcast upon a telesoftware service that the Java program (also broadcast) analysing the file names in the .uof file need not necessarily be able to decode lines starting with a < character so that the Java program does not need to have the software for that decoding in it, yet the same .uof file specification could be used, both in a telesoftware service and on the web, where a more comprehensive method of referencing objects were needed. >> I can imagine that such a widely used practice might be helpful in bridging >> the gap between being able to use a plain text file or maybe having to use >> some expensive wordprocessing package. > >And maybe someone will write cheaper software -- we could call it a "browser" -- >that could even be distributed for free, so that people could make use of >this convention for viewing objects correctly distributed with respect to >the text they are embedded in. Indeed, except not call it a browser as the name is already in widespread use for HTML browsers and might cause confusion. Analysing a .uof file would be a much less computational task than analysing the complete syntax of HTML files. >Yes, yes, I think this is an idea which could fly. > >--Ken > Good. It is a solution which could be very useful for people writing programs in Java, Pascal and C and so on which programs take in plain text files and process them for such purposes as producing a desktop publishing package. Hopefully the Unicode Technical Committee will be pleased to add a .uof format file specification into the set of Unicode documents so that the U+FFFC code can be used in an effective manner. The idea could be that if a .uof file is processed then the rules of .uof files apply in that situation, so that if a .uof file
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
On Wed, 14 Aug 2002, James Kass wrote: > One, the use of *.html clearly violates the standard file naming > convention of eight uppercase ASCII letters followed by a period > followed by a *three* letter uppercase ASCII file name extension. I was wondering if the capitalization, "ASCII", is for emphasis... ;) roozbeh
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
Kenneth Whistler wrote in response to William Overington, > > ...or to pick an extension, more or less at random, say ".html" > > > The file story7.uof could thus be used with a file named story.txt so as to > > indicate which objects were intended to be used for three uses of U+FFFC in > > the file story7.txt, in the order in which they are to be used. > > Or we could go even further, and specify that in the story7.html file, > the three uses of those objects could be introduced with a very specific > syntax that would not only indicate the order that they occur in, but > could indicate the *exact* location one could obtain the objects -- either on > one's own machine or even anywhere around the world via the Internet! And we could > even include a mechanism for specifying the exact size that the object should be > displayed. For example, we could use something like: > > http://www.coteindustries.com/dogs/images/dogs4.jpg"; width="380" > height="260" border="1"> > > And maybe someone will write cheaper software -- we could call it a "browser" -- > that could even be distributed for free, so that people could make use of > this convention for viewing objects correctly distributed with respect to > the text they are embedded in. > > Yes, yes, I think this is an idea which could fly. > Well, there might be some serious objections to such a proposal. One, the use of *.html clearly violates the standard file naming convention of eight uppercase ASCII letters followed by a period followed by a *three* letter uppercase ASCII file name extension. Secondly, the use of the greater-than and less-than ASCII characters to denote the mark-up sure appears to be a misuse of those characters. This may well cause too much confusion in parsing. 3rd, the cost of development of these hypothetical "browsers" would be quite high, and we couldn't really expect any such expensive software to be literally given away. There would have to be some catch to it all, wouldn't there? Best regards, James Kass, (P.S. - The point of this response is that maybe we shouldn't hastily reject new concepts just because they seem to fly in the face of existing practices. - JK)
Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
William Overington teased us all unmercifully with: > It occurs to me that it is possible to introduce a convention, either as a > matter included in the Unicode specification, or as just a known about > thing, that if one has a plain text Unicode file with a file name that has > some particular extension (any ideas for something like .uof for Unicode > object file) ...or to pick an extension, more or less at random, say ".html" > that accompanies another plain text Unicode file which has a > file name extension such as .txt, or indeed other choices except .uof (or > whatever is chosen after discussion) then the convention could be that the > .uof file has on lines of text, in order, the name of the text file then the > names of the files which contains each object to which a U+FFFC character > provides the anchor. > > For example, a file with a name such as story7.uof might have the following > lines of text as its contents. > > story7.txt > horse.gif > dog.gif > painting.jpg This is a shaggy dog story, right? > > The file story7.uof could thus be used with a file named story.txt so as to > indicate which objects were intended to be used for three uses of U+FFFC in > the file story7.txt, in the order in which they are to be used. Or we could go even further, and specify that in the story7.html file, the three uses of those objects could be introduced with a very specific syntax that would not only indicate the order that they occur in, but could indicate the *exact* location one could obtain the objects -- either on one's own machine or even anywhere around the world via the Internet! And we could even include a mechanism for specifying the exact size that the object should be displayed. For example, we could use something like: http://www.coteindustries.com/dogs/images/dogs4.jpg"; width="380" height="260" border="1"> or http://www.artofeurope.com/velasquez/vel2.jpg";> > I can imagine that such a widely used practice might be helpful in bridging > the gap between being able to use a plain text file or maybe having to use > some expensive wordprocessing package. And maybe someone will write cheaper software -- we could call it a "browser" -- that could even be distributed for free, so that people could make use of this convention for viewing objects correctly distributed with respect to the text they are embedded in. Yes, yes, I think this is an idea which could fly. --Ken
An idea for keeping U+FFFC usable. (spins off from Re: Furigana)
Doug Ewell wrote as follows. >Kenneth Whistler wrote: [snipped] >> These animals are more like U+FFFC -- they are internal anchors >> that should not be exported, as there is no general expectation >> that once exported to plain text, a receiver will have sufficient >> context for making sense of them in the way the originator was >> dealing with them internally. >> [snipped] >This moves the entire issue out of the realm of poor support and into >the big, dark, scary cavern of pre-deprecation. > >Unicode 3.0 doesn't say exactly what Ken says. Unicode 3.0 (p. 326) >says the annotation characters should only be used under "prior >agreement between the sender and the receiver because the content may be >misinterpreted otherwise." Fine, no problem; those are the same rules >that apply to the PUA. Ken, though, seems to say they shouldn't be >exported at all, and furthermore they shouldn't even have been encoded >in the first place, except that the noncharacters (which explicitly >mustn't be interchanged) hadn't been invented yet. It occurs to me that it is possible to introduce a convention, either as a matter included in the Unicode specification, or as just a known about thing, that if one has a plain text Unicode file with a file name that has some particular extension (any ideas for something like .uof for Unicode object file) that accompanies another plain text Unicode file which has a file name extension such as .txt, or indeed other choices except .uof (or whatever is chosen after discussion) then the convention could be that the .uof file has on lines of text, in order, the name of the text file then the names of the files which contains each object to which a U+FFFC character provides the anchor. For example, a file with a name such as story7.uof might have the following lines of text as its contents. story7.txt horse.gif dog.gif painting.jpg The file story7.uof could thus be used with a file named story.txt so as to indicate which objects were intended to be used for three uses of U+FFFC in the file story7.txt, in the order in which they are to be used. I have used .gif and .jpg graphics files for my example, but the format could be left open so that a Java class file or anything else could be used as the object that is anchored within the document. There is no obligation that the first part of the file name of the .uof file and of the .txt file should be the same, yet that would typically be a useful thing to do. I can imagine that such a widely used practice might be helpful in bridging the gap between being able to use a plain text file or maybe having to use some expensive wordprocessing package. I am not saying that this suggestion fully solves all of the possible implications of rendering and so forth. I am simply suggesting that having such a convention would be a useful facility. Such a convention, because it uses a special file extension, would not intrude upon the right of anybody to devise their own convention. As this concerns the U+FFFC character and the Unicode Technical Committee is due to meet next week, I think it might be helpful if this idea is discussed before the meeting as a straightforward idea like this might mean that the possibility to exchange U+FFFC characters at all if people want to do so is not lost. >Everybody will welcome the new conventional, graphical-type characters >and scripts that are coming with Unicode 4.0. What are those please? William Overington 14 August 2002