Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-19 Thread Peter_Constable

On 08/16/2002 04:58:58 PM "William Overington" wrote:

>The DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform)
system
>(details at http://www.mhp.org ) which implements my telesoftware
invention.
>A Java program which has been broadcast can read a Unicode plain text
file
>and act upon the characters within it, and can read other file formats,
such
>as .png files (Portable Network Graphics) and act upon the information in
>those files, so as to produce a display.
>
>So, a collection of files, namely a .uof file in the format that I
suggested
>it, a Unicode plain text file with one or more U+FFFC characters in it
and
>the appropriate graphics files in .png format as a package of free to the
>end user distance education learning material being broadcast from a
direct
>broadcasting satellite or a terrestrial transmitter could be a very
useful
>facility as the way to carry text with illustrations.

I'd suggest that it would be far more useful to use a marked-up file
format based on XML. It doesn't have to be verbose (besides which, the
bandwidth requirements of embedded graphics will be far greater than any
requirements for markup used to indicate their position within the text).
The reason I think this would be far more advantageous is that there has
been a massive interest throughout the IT industry in XML, meaning that
there are lots of software implementations that support it, and it is very
easy to build processes for publishing content. You coulde probably use
any commonly-used database product out there to generate XML content
suited for DVB-MHP; in fact, it would be easy to take some existing
XML-based publishing process and extend it to support an XML-based file
format specifically intended for DVB-MHP. In contrast, if you want to
invent a new file format, then you've got to create new software
implementations to go with it, and bolting that into any existing
publishing process will be far more costly.



>Using HTML and a browser is just not the way to proceed in that
situation.
>HTML and a browser is a very useful technique for the web and indeed is
an
>option for the DVB-MHP system, yet the basic software system is Java
based.

Markup does not have to imply HTML and a Web browser. I'm sure you'd find
a lot of Java implementations that made use of XML-based file formats, and
though I'm not a Java programmer, I'm certain that you can find good
support for parsing or generating XML streams in Java.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





















Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-19 Thread Peter_Constable

On 08/16/2002 04:58:58 PM "William Overington" wrote:

>The DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform)
system
>(details at http://www.mhp.org ) which implements my telesoftware
invention.
>A Java program which has been broadcast can read a Unicode plain text
file
>and act upon the characters within it, and can read other file formats,
such
>as .png files (Portable Network Graphics) and act upon the information in
>those files, so as to produce a display.
>
>So, a collection of files, namely a .uof file in the format that I
suggested
>it, a Unicode plain text file with one or more U+FFFC characters in it
and
>the appropriate graphics files in .png format as a package of free to the
>end user distance education learning material being broadcast from a
direct
>broadcasting satellite or a terrestrial transmitter could be a very
useful
>facility as the way to carry text with illustrations.

I'd suggest that it would be far more useful to use a marked-up file
format based on XML. It doesn't have to be verbose (besides which, the
bandwidth requirements of embedded graphics will be far greater than any
requirements for markup used to indicate their position within the text).
The reason I think this would be far more advantageous is that there has
been a massive interest throughout the IT industry in XML, meaning that
there are lots of software implementations that support it, and it is very
easy to build processes for publishing content. You coulde probably use
any commonly-used database product out there to generate XML content
suited for DVB-MHP; in fact, it would be easy to take some existing
XML-based publishing process and extend it to support an XML-based file
format specifically intended for DVB-MHP. In contrast, if you want to
invent a new file format, then you've got to create new software
implementations to go with it, and bolting that into any existing
publishing process will be far more costly.



>Using HTML and a browser is just not the way to proceed in that
situation.
>HTML and a browser is a very useful technique for the web and indeed is
an
>option for the DVB-MHP system, yet the basic software system is Java
based.

Markup does not have to imply HTML and a Web browser. I'm sure you'd find
a lot of Java implementations that made use of XML-based file formats, and
though I'm not a Java programmer, I'm certain that you can find good
support for parsing or generating XML streams in Java.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





















Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-19 Thread Peter_Constable

On 08/17/2002 09:29:00 AM "William Overington" wrote:

>Peter Constable wrote as follows.
>
>>The standard already specifies that FFFC should not be exported from an
>application or interchanged.
>
>As far as I am aware that is not presently the case.
>
>If you still say that that is correct, could you please state the exact
text
>of the standard relating to this matter and where in the standard that
text
>can be found please?

OK, it doesn't say it explicitly; nevertheless, I believe I know what the
intent of the text is, and that it is not condoning interchange of FFFC.
The fact that the text isn't more explicit is something that could perhaps
be improved; but if you think about what the text on pp 326-7 *does* say,
I think this intent can be detected. It seems clear to me that it assumes
usage within the context of some higher-level protocol, such as would be
imposed by a software process. For instance, the text makes reference to "
the object's formatting information",  but Unicode / plain text does not
provide representation for such information. Thus, there necessarily must
be some other protocol at work within which that information is
represented. FFFC, then, it something that is utilised by that
higher-level protocol. Hence, this section of the Standard is *not*
talking about FFFC being used in interchanged plain text. It is, rather,
assuming usage internal to some processing context or other higher-level
protocol.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>



















Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-16 Thread William Overington

Tex Texin wrote as follows.

>William,
>
>So let me see if I understand this correctly.
>
>Let's take 2 perfectly good standards, Unicode and HTML,

Yes.

and make some
>very minor tweaks to them,

No.

such as changing the meaning of U+FFFC and a
>special format for filenames in the beginning of the file and a new
>extension, so we have something new.

I have suggested no changes whatsoever to HTML at all.

The only thing which I have suggested in relation to Unicode in this thread
is that, in relation to the fact that information about the object to which
any particular use of U+FFFC refers is kept outside the character data
stream, that it could be a good idea to define a file format .uof so that
details of the names of the files for which the U+FFFC codes are anchors
could be provided in a known format, if and only if end users chose to use a
.uof file for that purpose on that occasion and not otherwise.  This was in
the context of seeking to protect the use of U+FFFC as a character which
could be used in interchanging of documents following from the discussion of
U+FFFC and annotation characters in the thread from off of which I spun this
thread, which discussion, by Ken and Doug, is repeated in the first posting
of this present thread.

I thought it a good idea that the Unicode Technical Committee might like to
make such a .uof file format an official Unicode document so as to offer one
possible way to use U+FFFC codes.  That is now a matter for discussion.  If
the Unicode Consortium wishes to do that, then fine.  If the Unicode
Consortium chooses not to do that, then I can write it up myself and publish
it, which is not such a good solution, yet is adequate for my own needs and
might be useful for some other people if they choose to use the same format
for .uof files.

Hopefully I have now managed to raise the issue of protecting the fact that
the U+FFFC character can be used in document interchange and it will
hopefully not become deprecated to the status of a noncharacter.

There is a practical reason for this, which is, from my own perspective,
quite important.  This is as follows.

The DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system
(details at http://www.mhp.org ) which implements my telesoftware invention.
A Java program which has been broadcast can read a Unicode plain text file
and act upon the characters within it, and can read other file formats, such
as .png files (Portable Network Graphics) and act upon the information in
those files, so as to produce a display.

So, a collection of files, namely a .uof file in the format that I suggested
it, a Unicode plain text file with one or more U+FFFC characters in it and
the appropriate graphics files in .png format as a package of free to the
end user distance education learning material being broadcast from a direct
broadcasting satellite or a terrestrial transmitter could be a very useful
facility as the way to carry text with illustrations.

Using HTML and a browser is just not the way to proceed in that situation.
HTML and a browser is a very useful technique for the web and indeed is an
option for the DVB-MHP system, yet the basic software system is Java based.
It is as if the television set is acting as a computer which has a slow read
only access disc drive in the sky from which it may gather information,
including software.  The system is interactive with no return information
link to the central broadcasting computer, by means of the telesoftware
invention.  Overlays and virtual running with programs bigger than the local
storage being able to be run using chaining techniques are possible.  Please
do not think of this as downloading as no uplink request is made!

>Now the big benefit of this completely new thing,

Well, it's only a way of sender and receiver being able to have information
in a file with the suffix .uof about what objects are being anchored by
U+FFFC codes in a Unicode plain text file which it accompanies.

is that programs that
>do desktop publishing can use plain text files which are not quite plain
>text because they have some special formatting,

Well, the plain text files are only Unicode plain text which might contain
one or more U+FFFC characters and some of the other Unicode control
characters such as CARRIAGE RETURN.

but now they can publish
>them in better manner than before.

Well, my thinking is that it would help to have a well known way to express
the meaning of the anchors encoded by U+FFFC in a file rather than having
only a vague specification that all other information about the object is
kept outside the data stream.  I am saying that, yes, all other information
about the object is kept outside the data stream and, if, and only if, end
users choose to use a .uof file in a standard format to convey that
information for some particular use of a U+FFFC code, then that format could
be considered for definition and publication by the Unicode Consortium.
That does not seem unreasonab

Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-16 Thread William Overington

James Kass wrote as follows.

>William Overington wrote,
>
>>
>> No, it is a story about an artist who wanted to paint a picture of a
horse
>> and a picture of a dog and, since he knew that the horse and the dog were
>> great friends and liked to be together and also that he only had one
canvas
>> upon which to paint, the artist painted a picture of a landscape with the
>> horse and the dog in the foreground, thereby, as the saying goes,
painting
>> two birds on one canvas,
http://www.users.globalnet.co.uk/~ngo/bird0001.htm
>> in that he achieved two results by one activity.  In addition the picture
>> has various interesting details in the background, such as a windmill in
a
>> plain (or is that a windmill in a plain text file).  :-)
>>
>
>1)  It's gif file format rather than plain text.*
>2)  There isn't any windmill.

The picture of the birds has been in our family webspace since 1998 as an
illustration for the saying "Painting two birds on one canvas".  That
saying, originated by me, is a peaceful saying meaning to achieve two
results by one activity.  I made the picture from clip art as a learning
exercise.

The picture of the birds is referenced as a way of illustrating the saying
"Painting two birds on one canvas".  It is not the picture in the story
about which Ken asked.  I may well have a go at constructing such a picture,
perhaps using clip art.  The reference to a windmill is meant as a humourous
aside to Don Quixote tilting at windmills.

I am interested in creative writing, so when Ken asked about the story, I
just thought of something to put in my response.  Part of the training in,
and the fun of, creative writing is to be able to write something promptly
to a topic.

William Overington

16 August 2002







Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-16 Thread Peter_Constable

On 08/14/2002 02:04:50 PM "William Overington" wrote:

>As this concerns the U+FFFC character and the Unicode Technical Committee 
is
>due to meet next week, I think it might be helpful if this idea is 
discussed
>before the meeting as a straightforward idea like this might mean that 
the
>possibility to exchange U+FFFC characters at all if people want to do so 
is
>not lost.

This does not solve any problems not already solved. This is not plain 
text; it is a form of interchange markup and a higher-level protocol. 
There are already higher-level markup protocols that accomplish this. The 
standard already specifies that FFFC should not be exported from an 
application or interchanged. There is no reason to change this.


>>Everybody will welcome the new conventional, graphical-type characters
>>and scripts that are coming with Unicode 4.0.
>
>What are those please?

See the "Proposed characters" section of the Unicode site.



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>





Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-16 Thread Tex Texin

William,

So let me see if I understand this correctly.

Let's take 2 perfectly good standards, Unicode and HTML, and make some
very minor tweaks to them, such as changing the meaning of U+FFFC and a
special format for filenames in the beginning of the file and a new
extension, so we have something new.

Now the big benefit of this completely new thing, is that programs that
do desktop publishing can use plain text files which are not quite plain
text because they have some special formatting, but now they can publish
them in better manner than before. For example, plain text with
pictures. This is great. (It is true that it is less capable than if we
had just used enough html to do the same thing, but .uof is more like
plain text than html is.) Programmers will be happy because now they can
support plain text with just a few tweaks. Oh I almost forgot, they also
have to support Unicode, but slightly tweaked. And they can also support
HTML, with some minor tweaks for .uof. Of course programmers don't mind
supporting lots of variations of the same thing. Customer support
personnel also don't mind.
Oh, the plain text programmers will now need to support pictures and
other aspects of full publishing, but at least they won't have a complex
file format to work with. I guess it doesn't matter that a more complex
format is also more expressive and therefore can leverage all of the
publishing features. It probably doesn't matter that a desktop
publishing product probably already supports more complex formats, and
probably also supports html, it will be beneficial to add this slight
difference from plain text.

I like this very much. It is very much like when the magician slides the
knot in the string and makes it disappear.

I imagine that over time we will have some more wonderful inventions and
add further tweaks and further improve the publishing of plain text.

There are a few other things I would like to improve in Unicode, so I
hope it will be ok to make some other suggestions. We can change the
extention to know which tweaks we are talking about. .uo1, .uo2. Just a
few small changes to characters and plain text format variations.
Stability of the meaning of the file isn't important.

However, I think my first suggestion will be to make the benefits of
.uof available to XML. We can all this .uo1.

I am a little disconcerted that html already can do everything that .uof
does plus more, and is also supported by all of the publishers that are
like to support .uof. Also, as there are more than a million characters
in Unicode, most are unused so far, so changing the meaning of just FFFC
in this one context doesn't seem like a big win, considering also every
line of code that might work with FFFC now needs to consider the context
to determine its semantics.
But every invention deserves to be implemented, we need not look at
whether the invention satisfies some demand of its customers.

I like the 2 birds picture and I assume it was a metaphor for the idea-
one bird was html the other unicode. I was a little disappointed that
you used html instead of .uof format though. 

Maybe its the lateness of the hour here. I hope the idea looks as good
in the morning.

Oh I almost forgot. I was having difficulty discerning when you and Ken
might be joking. The mails read very serious. I would like to suggest we
make a new format .uo2. We can indicate line numbers and emotions with
plain text characters that look like facial expressions. It would help
me know when you both were serious and when you might be joking.
Sometimes it is hard to tell. I am going to create a list of facial
expressions and assign them in the PUA so we can all have a standard to
follow. See my next mail with a list of facial expressions and
assignments.
tex



William Overington wrote:
> 
> Kenneth Whistler wrote as follows about my idea.
> 
> >> It occurs to me that it is possible to introduce a convention, either as
> a
> >> matter included in the Unicode specification, or as just a known about
> >> thing, that if one has a plain text Unicode file with a file name that
> has
> >> some particular extension (any ideas for something like .uof for Unicode
> >> object file)
> >
> >...or to pick an extension, more or less at random, say ".html"
> 
> Well, that could produce confusion with a .html file used for Hyper Text
> Markup Language, HTML.
> 
> I suggested .uof so that a .uof file would be known as being for this
> purpose.
> 
> >
> >> that accompanies another plain text Unicode file which has a
> >> file name extension such as .txt, or indeed other choices except .uof (or
> >> whatever is chosen after discussion) then the convention could be that
> the
> >> .uof file has on lines of text, in order, the name of the text file then
> the
> >> names of the files which contains each object to which a U+FFFC character
> >> provides the anchor.
> >>
> >> For example, a file with a name such as story7.uof might have the
> following
> >> lines of text as its

Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-16 Thread James Kass


William Overington wrote,

> 
> No, it is a story about an artist who wanted to paint a picture of a horse
> and a picture of a dog and, since he knew that the horse and the dog were
> great friends and liked to be together and also that he only had one canvas
> upon which to paint, the artist painted a picture of a landscape with the
> horse and the dog in the foreground, thereby, as the saying goes, painting
> two birds on one canvas, http://www.users.globalnet.co.uk/~ngo/bird0001.htm
> in that he achieved two results by one activity.  In addition the picture
> has various interesting details in the background, such as a windmill in a
> plain (or is that a windmill in a plain text file).  :-)
> 

1)  It's gif file format rather than plain text.*
2)  There isn't any windmill.

Best regards,

James Kass,

* P.S. - But, it's a nice gif file.  In fact, aside from the absence of
the windmill, it exceeded my expectations.  -JK.








Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-16 Thread Barry Caplan


>
>>Yes, yes, I think this is an idea which could fly.
>>
>>--Ken
>>
>
>Good.  It is a solution which could be very useful for people writing
>programs in Java, Pascal and C and so on which programs take in plain text
>files and process them for such purposes as producing a desktop publishing
>package.


Uhh, I think Ken's message was entirely sarcasm or some higher form of rhetorical 
humor whose obscure name slips my mind right now.

The suggestion to use "html" as an extension was the give away - I was laughing out 
loud from that point on - his point was that the technology to do what you want 
already exists it is called HTML and it is displayed by "browsers" and so forth.

Barry Caplan
www.i18n.com





Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-15 Thread William Overington

Kenneth Whistler wrote as follows about my idea.

>> It occurs to me that it is possible to introduce a convention, either as
a
>> matter included in the Unicode specification, or as just a known about
>> thing, that if one has a plain text Unicode file with a file name that
has
>> some particular extension (any ideas for something like .uof for Unicode
>> object file)
>
>...or to pick an extension, more or less at random, say ".html"

Well, that could produce confusion with a .html file used for Hyper Text
Markup Language, HTML.

I suggested .uof so that a .uof file would be known as being for this
purpose.

>
>> that accompanies another plain text Unicode file which has a
>> file name extension such as .txt, or indeed other choices except .uof (or
>> whatever is chosen after discussion) then the convention could be that
the
>> .uof file has on lines of text, in order, the name of the text file then
the
>> names of the files which contains each object to which a U+FFFC character
>> provides the anchor.
>>
>> For example, a file with a name such as story7.uof might have the
following
>> lines of text as its contents.
>>
>> story7.txt
>> horse.gif
>> dog.gif
>> painting.jpg
>
>This is a shaggy dog story, right?

No, it is a story about an artist who wanted to paint a picture of a horse
and a picture of a dog and, since he knew that the horse and the dog were
great friends and liked to be together and also that he only had one canvas
upon which to paint, the artist painted a picture of a landscape with the
horse and the dog in the foreground, thereby, as the saying goes, painting
two birds on one canvas, http://www.users.globalnet.co.uk/~ngo/bird0001.htm
in that he achieved two results by one activity.  In addition the picture
has various interesting details in the background, such as a windmill in a
plain (or is that a windmill in a plain text file).  :-)

>> The file story7.uof could thus be used with a file named story.txt so as
to
>> indicate which objects were intended to be used for three uses of U+FFFC
in
>> the file story7.txt, in the order in which they are to be used.
>
>Or we could go even further, and specify that in the story7.html file,
>the three uses of those objects could be introduced with a very specific
>syntax that would not only indicate the order that they occur in, but
>could indicate the *exact* location one could obtain the objects -- either
on
>one's own machine or even anywhere around the world via the Internet! And
we could
>even include a mechanism for specifying the exact size that the object
should be
>displayed. For example, we could use something like:
>
>http://www.coteindustries.com/dogs/images/dogs4.jpg"; width="380"
> height="260" border="1">
>
>or
>
>http://www.artofeurope.com/velasquez/vel2.jpg";>

Now that is a good idea.  In a .uof file specifically for the purpose, a
line beginning with a < character could be used to indicate a web based
reference, or a local reference, for the object, using exactly the same
format as is used in an HTML file.

If the line does not start with a < character, then it is simply a file name
in the same directory as the .uof file, as I suggested originally.  This
would mean that where, say, a .uof file were broadcast upon a telesoftware
service that the Java program (also broadcast) analysing the file names in
the .uof file need not necessarily be able to decode lines starting with a <
character so that the Java program does not need to have the software for
that decoding in it, yet the same .uof file specification could be used,
both in a telesoftware service and on the web, where a more comprehensive
method of referencing objects were needed.

>> I can imagine that such a widely used practice might be helpful in
bridging
>> the gap between being able to use a plain text file or maybe having to
use
>> some expensive wordprocessing package.
>
>And maybe someone will write cheaper software -- we could call it a
"browser" --
>that could even be distributed for free, so that people could make use of
>this convention for viewing objects correctly distributed with respect to
>the text they are embedded in.

Indeed, except not call it a browser as the name is already in widespread
use for HTML browsers and might cause confusion.  Analysing a .uof file
would be a much less computational task than analysing the complete syntax
of HTML files.

>Yes, yes, I think this is an idea which could fly.
>
>--Ken
>

Good.  It is a solution which could be very useful for people writing
programs in Java, Pascal and C and so on which programs take in plain text
files and process them for such purposes as producing a desktop publishing
package.

Hopefully the Unicode Technical Committee will be pleased to add a .uof
format file specification into the set of Unicode documents so that the
U+FFFC code can be used in an effective manner.  The idea could be that if a
.uof file is processed then the rules of .uof files apply in that situation,
so that if a .uof file

Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-14 Thread Roozbeh Pournader

On Wed, 14 Aug 2002, James Kass wrote:

> One, the use of *.html clearly violates the standard file naming
> convention of eight uppercase ASCII letters followed by a period
> followed by a *three* letter uppercase ASCII file name extension.

I was wondering if the capitalization, "ASCII", is for emphasis... ;)

roozbeh





Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-14 Thread James Kass


Kenneth Whistler wrote in response to William Overington,

> 
> ...or to pick an extension, more or less at random, say ".html"
> 

> > The file story7.uof could thus be used with a file named story.txt so as to
> > indicate which objects were intended to be used for three uses of U+FFFC in
> > the file story7.txt, in the order in which they are to be used.
> 
> Or we could go even further, and specify that in the story7.html file,
> the three uses of those objects could be introduced with a very specific
> syntax that would not only indicate the order that they occur in, but
> could indicate the *exact* location one could obtain the objects -- either on 
> one's own machine or even anywhere around the world via the Internet! And we could 
> even include a mechanism for specifying the exact size that the object should be
> displayed. For example, we could use something like:
> 
> http://www.coteindustries.com/dogs/images/dogs4.jpg"; width="380"
>  height="260" border="1">
> 
> And maybe someone will write cheaper software -- we could call it a "browser" --
> that could even be distributed for free, so that people could make use of
> this convention for viewing objects correctly distributed with respect to
> the text they are embedded in.
> 
> Yes, yes, I think this is an idea which could fly.
> 

Well, there might be some serious objections to such a proposal.

One, the use of *.html clearly violates the standard file naming
convention of eight uppercase ASCII letters followed by a period
followed by a *three* letter uppercase ASCII file name extension.

Secondly, the use of the greater-than and less-than ASCII characters
to denote the mark-up sure appears to be a misuse of those 
characters.  This may well cause too much confusion in parsing.

3rd, the cost of development of these 
hypothetical "browsers" would be quite high, and we couldn't really 
expect any such expensive software to be literally given away.  
There would have to be some catch to it all, wouldn't there?

Best regards,

James Kass,
(P.S. - The point of this response is that maybe we shouldn't 
hastily reject new concepts just because they seem to fly
in the face of existing practices. - JK)






Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-14 Thread Kenneth Whistler

William Overington teased us all unmercifully with:

> It occurs to me that it is possible to introduce a convention, either as a
> matter included in the Unicode specification, or as just a known about
> thing, that if one has a plain text Unicode file with a file name that has
> some particular extension (any ideas for something like .uof for Unicode
> object file) 

...or to pick an extension, more or less at random, say ".html"

> that accompanies another plain text Unicode file which has a
> file name extension such as .txt, or indeed other choices except .uof (or
> whatever is chosen after discussion) then the convention could be that the
> .uof file has on lines of text, in order, the name of the text file then the
> names of the files which contains each object to which a U+FFFC character
> provides the anchor.
> 
> For example, a file with a name such as story7.uof might have the following
> lines of text as its contents.
> 
> story7.txt
> horse.gif
> dog.gif
> painting.jpg

This is a shaggy dog story, right?

> 
> The file story7.uof could thus be used with a file named story.txt so as to
> indicate which objects were intended to be used for three uses of U+FFFC in
> the file story7.txt, in the order in which they are to be used.

Or we could go even further, and specify that in the story7.html file,
the three uses of those objects could be introduced with a very specific
syntax that would not only indicate the order that they occur in, but
could indicate the *exact* location one could obtain the objects -- either on 
one's own machine or even anywhere around the world via the Internet! And we could 
even include a mechanism for specifying the exact size that the object should be
displayed. For example, we could use something like:

http://www.coteindustries.com/dogs/images/dogs4.jpg"; width="380"
 height="260" border="1">

or

http://www.artofeurope.com/velasquez/vel2.jpg";>

> I can imagine that such a widely used practice might be helpful in bridging
> the gap between being able to use a plain text file or maybe having to use
> some expensive wordprocessing package.

And maybe someone will write cheaper software -- we could call it a "browser" --
that could even be distributed for free, so that people could make use of
this convention for viewing objects correctly distributed with respect to
the text they are embedded in.

Yes, yes, I think this is an idea which could fly.

--Ken






An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-14 Thread William Overington

Doug Ewell wrote as follows.

>Kenneth Whistler  wrote:

[snipped]

>> These animals are more like U+FFFC -- they are internal anchors
>> that should not be exported, as there is no general expectation
>> that once exported to plain text, a receiver will have sufficient
>> context for making sense of them in the way the originator was
>> dealing with them internally.
>>

[snipped]

>This moves the entire issue out of the realm of poor support and into
>the big, dark, scary cavern of pre-deprecation.
>
>Unicode 3.0 doesn't say exactly what Ken says.  Unicode 3.0 (p. 326)
>says the annotation characters should only be used under "prior
>agreement between the sender and the receiver because the content may be
>misinterpreted otherwise."  Fine, no problem; those are the same rules
>that apply to the PUA.  Ken, though, seems to say they shouldn't be
>exported at all, and furthermore they shouldn't even have been encoded
>in the first place, except that the noncharacters (which explicitly
>mustn't be interchanged) hadn't been invented yet.

It occurs to me that it is possible to introduce a convention, either as a
matter included in the Unicode specification, or as just a known about
thing, that if one has a plain text Unicode file with a file name that has
some particular extension (any ideas for something like .uof for Unicode
object file) that accompanies another plain text Unicode file which has a
file name extension such as .txt, or indeed other choices except .uof (or
whatever is chosen after discussion) then the convention could be that the
.uof file has on lines of text, in order, the name of the text file then the
names of the files which contains each object to which a U+FFFC character
provides the anchor.

For example, a file with a name such as story7.uof might have the following
lines of text as its contents.

story7.txt
horse.gif
dog.gif
painting.jpg

The file story7.uof could thus be used with a file named story.txt so as to
indicate which objects were intended to be used for three uses of U+FFFC in
the file story7.txt, in the order in which they are to be used.

I have used .gif and .jpg graphics files for my example, but the format
could be left open so that a Java class file or anything else could be used
as the object that is anchored within the document.

There is no obligation that the first part of the file name of the .uof file
and of the .txt file should be the same, yet that would typically be a
useful thing to do.

I can imagine that such a widely used practice might be helpful in bridging
the gap between being able to use a plain text file or maybe having to use
some expensive wordprocessing package.

I am not saying that this suggestion fully solves all of the possible
implications of rendering and so forth.  I am simply suggesting that having
such a convention would be a useful facility.  Such a convention, because it
uses a special file extension, would not intrude upon the right of anybody
to devise their own convention.

As this concerns the U+FFFC character and the Unicode Technical Committee is
due to meet next week, I think it might be helpful if this idea is discussed
before the meeting as a straightforward idea like this might mean that the
possibility to exchange U+FFFC characters at all if people want to do so is
not lost.

>Everybody will welcome the new conventional, graphical-type characters
>and scripts that are coming with Unicode 4.0.

What are those please?

William Overington

14 August 2002