RE: Keys. (derives from Re: Sequences of combining characters.)
Doug Ewell wrote: Marco Cimarosti marco dot cimarosti at essetre dot it wrote: He said that he didn't understand how this detail could help us but, anyway, he obtained the child's name and address from the parent: Daniel Zubeispiel Hauptkirchestrasse, 26 Zürich, Switzerland Is this a pseudonym? I am thinking of the German word Beispiel meaning example. Of course. AFAIK, Zu Beispiel means e.g., for example. Hauptkirchestrasse is a made-up road name meaning cathedral street. Zurich is the only real piece of the address. _ Marco
My German blunders (was Keys. (derives from Re: Sequences of combining characters.))
I (Marco Cimarosti) wrote: Of course. AFAIK, Zu Beispiel means e.g., for example. Hauptkirchestrasse is a made-up road name meaning cathedral street. Zurich is the only real piece of the address. But a native German speaker patiently explained, in a private message: | If it's an example, it's one not constructed by a native speaker ;-) | For one, it's zum Beispiel and the name of the street (road | would have been ...landstrasse) is either missing an 'n', or | possibly has an extra 'e'. | As it stands, it's decidedly odd looking. | | Although, it's supposed to be Swiss. That could explain a lot. Thanks for the corrections. I should not have retained the city where it actually happened: if I just settled the scene in Lugano... _ Marco
RE: Keys. (derives from Re: Sequences of combining characters.)
At 09:56 +0200 2002-09-30, Marco Cimarosti wrote: Of course. AFAIK, Zu Beispiel means e.g., for example. Recte Zum Beispiel. -- Michael Everson * * Everson Typography * * http://www.evertype.com 48B Gleann na Carraige; Cill Fhionntain; Baile Átha Cliath 13; Éire Telephone +353 86 807 9169 * * Fax +353 1 832 2189 (by arrangement)
Re: Keys. (derives from Re: Sequences of combining characters.)
[Still off-topic, but I'm hopeful that progress can be made, so am continuing a little farther] On 09/27/2002 10:26:36 AM William Overington wrote: XML is the way to go. Maybe, maybe not. The issue of U+003C being used to mean LESS-THAN SIGN in documents which mix ordinary text and markup may or may not, depending upon the application, be a problem. It really isn't a problem. XML provides other means to represent that character when it is needed as part of the content rather then as part of the markup. It is the job of an XML parser to sort that out, and there are various XML parsers that all handle this without a hitch and that are freely available. Someone made reference to MathML, which is a markup language built on XML (XML is a spec for building markup languages), and clearly mathematicians need to be able to represent this character within content, and the special use of U+003C for markup in XML was not seen in any way to be an obstacle. Your proposed markup convention would also need a parser to identify the pieces in a stream of data. If someone wants to use U+2604 in content, you would probably need some indirect way to represent it in a data stream. (E.g. One can imagine a hypothetical message My favourite Unicode character is P1 into which someone might want to insert the COMET character.) So, I expect you'll have to deal with the same problem anyway. But this parser doesn't yet exist; some software developer will have to create it. On the other hand, XML parsers exist today. If you had been pursuing an XML-based approach, you might already be testing live prototypes rather than discussing a hypothetical system. Also, in an earlier message, you mentioned that you wanted to be able to use this messaging system on the Web. And, of course, you want to be able to represent U+003C directly in content. Did you realise that those two are contradictory? HTML has the same heredity as XML (both are implementations of SGML). It also uses U+003C for markup, and provides the same alternative means to represent that character as part of content. So, if one of the contexts within which you want your system to work is the Web, then you're going to have to deal with indirect representation of U+003C anyway. Since its already a magic character, why not let it be the magic character for your proposed protocol. XML really *is* the way to go. Please believe us. You don't need to believe me; believe Tex, Ken, Marco and the others who have offered you this recommendation. They really are among the most well-informed contributors to this list. BTW, my mail client (Lotus Notes, for better or worse) reports what time in *my* time zone an author wrote the given message. Such reporting of time in international communications is problematic; time zones need to be stated explicitly. We discovered this quite a while ago after scheduling a tele-conference; the half of the dept. in the UK assumed the time they saw was Dallas time (or maybe they suggested the time and we were reading it), but Notes had silently done a time zone conversion. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Keys. (derives from Re: Sequences of combining characters.)
At 12:24 PM 9/27/2002 +0100, William Overington wrote: You tell me which one is more likely to result in productive work and adoption by others. Likelihood of success and what actually happens are not the same thing. I do not know which is more likely as I do not know of what has happened already. Well, as you mentioned, the nature of scholarly research demands that you are familiar with the basis for the arguments being presented. If your goal is merely to build such a system, I am sure everyone is willing to concede that it is technically feasible, even bordering on trivial. It is not interesting in a scholarly sense at all, so it is only your ego that is going to benefit. If your goal is to enjoy some commercial success, well, that may be possible too. The utility of the application will be strongly limited by its lack of interoperability with other existing systems, many of which are used by the likely community of users for your system. That community has these choices: - Not use your system - Use your system and never interchange data - Use your system and roll their own tools to do data interchange - Use your system and demand data interchange tools from their other vendors - Use your tool and demand data interchange tools from you - Create a closed source functional near-equivalent of your tool with data interchange facilities - Create an open source functional near-equivalent of your tool with data interchange facilities Ponder very carefully the implications of each of these upon: the utility (usefulness and value) of your software, the effects on your limited resources of needing to support an extra layer of data interchange, and the effects on other vendors' limited resources of being asked to support data interchange with a proprietary format in limited use. If you want to share with program with a handful of folks, your proposal might fly. If you want real people in real places on earth to contribute text, then I predict issues will arise and you will lose all control because the last item in the list above will occur. Just to give you my sense of how much work it would take to do that, I think about a intense week or so for any experienced open source programmer for each type of UI is about right (GNOME, Web, etc), based on your description of the functionality and the availability of major modules such as XML, message catalog, UI, database and Web support. Some people may have deleted the email, some may have read it and disregarded it, yet it is possible that some people might have tried to produce a comet circumflex button on the screen using an all-Unicode font and might be considering the possibilities of how the system could be applied or might even be writing an experimental software program which can take comet circumflex sequences and process them through a database. Speaking of reading the sources, you might want to read Richard Dawkins' The Selfish Gene and other related works on memes to get a sense of why any alternative to XML for data interchange is likely to fail in the marketplaces of business and of ideas even if technically feasible. The topic of keys generally which I have introduced Why are you claiming credit for a system which has been a core part of programming APIs since probably the 1960s? You can search for the documentation online for the printf function and its relatives for *nix, or resource APIs for Windows and Mac for a good start. Any translator who has done localization is familiar with the use of parameterized sentences that you describe, and why they are a problem when it comes to translation. I am sure I am not the only localization consultant on the list that preaches a very limited use of them (what I call constructed sentences. is potentially a far-reaching development in the application of markup in Unicode based systems. Its been done to death in the past. See Trados, Uniscape, GlobalSight, and countless in-house systems. The only revolutionary aspect is that you want to throw away all the experience and consensus that has been developing in the sw development, i18n, l10n, and transaltion communities about proper workflow and data interchange. If you came to me with such a tool in 1990, Unicode not withstanding, it may have been useful. But now, standalone tools are much less useful for a lot of reasons I won't go into here. My own comet circumflex system may be highly useful in business communications and distance education. May be, but most likely not. That you think so indicates you are after a commercial market, and I refer you to the discussion above of likely outcomes. I am happy to respond to questions and to consider documents which people suggest. I have suggested a lot in a message yesterday and a lot more here. I hope your future messages will take the material I have suggested into account. XML exists and it uses
Re: Keys. (derives from Re: Sequences of combining characters.)
At 12:23 PM 9/27/2002 +0100, William Overington wrote: Are you perhaps trying to make a deduction by the fallacy of the undistributed middle, along the following lines. William's need is a markup system. XML is a markup system. William's need is XML. I think what is being suggested is not nearly so obvious as that. It is more along the lines of: William's need is a product of which data interchange is a key feature Said product needs a architecture and a business model Data interchange happens both externally and internally within the program The business model chosen may indeed require a non-xml system XML data interchange is better supported than any proprietary system. If non-xml is chosen for the outside system, it should be converted to xml as early as possible for inbound, and as late as possible for outbound interchange in order to capitalize on xml tools Of course, if the system is closed on the outside, and useful, it will be quickly duplicated by someone using open interchange formats anyway, but that advice on how to handle that situation only comes at a price :) I am simply saying that XML, as I understand it, does not suit my specific need. It may be, that you don't understand your need well enough to understand why XML for outside interchange is an extremely strong contender. text cannot be used directly. For me, that is a major limitation of XML. Why is it a major limitation of XML? Have not already over a million applications and web sites been implemented using XML technology? Is there a record of anyone ever griping about this limitation at all? legacy issue of which I do not want to have the problem with my research in language translation and distance education. How so? A single line of code will automatically escape any characters as needed. Maybe one day Unicode will encode special XML opening and closing angle brackets so that XML can operate without that problem. This is not up to Unicode to decide, it is XML's choice to specify the way its tags are constructed. XML's family tree starting with SGML (or earlier for all I know) and going through HTML pretty much constrains it. Trillions of people know as the tag delimiter. Earlier markup languages used a . PERIOD in the first character in a line as a delimiter - I think RTF is of this heritage. when was the last time someone mentioned they were creating or editing a RTF file compared to *ML? However, as XML uses the U+003C character in that manner at the moment, for me it is a problem and it has led me to use the key method using a comet circumflex key. Instead of typing a trivial escape character in the rare case of a in the content you want to force people to type weird Unicode characters in every tag? Also, I do not need to have all those characters and = characters and / characters within messages. Have not thought the problem all the way through? Why on earth would you want your message creators typing raw XML anyway? You are going to need some other UI, right? And that message editor can generate the XML, complete with escapes, using existing code you can have for free. This frees your time from having to create your own wheel and maintain it. Well, U+2604 U+0302 U+20E3 is not ridiculous. It is entirely permissible within the Unicode specification. He is not saying it is ridiculous because it is not within the specification. He is saying it is ridiculous because the development community as a whole (a very large whole), both closed source and open source advocates, is rallied around XML as a basis for data interchange. If you ever wanted to move your comet files to another system, or create them from data in an existing system (such as Trados or another translation memory), you will need a 2 way XML-Comet converter anyway. Why bother? you think it ridiculous then maybe that is good evidence of its originality as a piece of creativity. I am sure it will create a pretty glyph. But software creation is about way more than pretty glyphs. A comet circumflex key could be viewed as a piece of original art. I specifically designed it so as to be a design which involves an inventive leap so as to produce something new and unexpected, which someone skilled in the art would not produce as the application of skill in the existing art without invention, yet which would display properly using an all-Unicode font. This sounds a lot like you are planning to trademark or patent a character. I would personally travel to the ends of the earth to testify that all possible combining sequences are described as prior art in the description of how to create them in the Unicode specification and thus can never be proprietary. Now if you want to have a graphic artist draw a logo of a comet with a box around it, that is your prerogative. But the idea that combining characters in any fashion is somehow proprietary is not ridiculous, is it just a waste of time. In case you think
Re: Keys. (derives from Re: Sequences of combining characters.)
Marco Cimarosti marco dot cimarosti at essetre dot it wrote: He said that he didn't understand how this detail could help us but, anyway, he obtained the child's name and address from the parent: Daniel Zubeispiel Hauptkirchestrasse, 26 Zürich, Switzerland Is this a pseudonym? I am thinking of the German word Beispiel meaning example. A very funny story, whether or not any names were changed to protect the innocent. -Doug Ewell Fullerton, California
Re: XML Primer (was Keys. (derives from Re: Sequences of combining characters.))
Shawn Steele wrote to the [EMAIL PROTECTED] list, not directly to me, yet began by writing. Mr. Overington, There is then a long document of very helpful information, for which I am grateful. Mr Steele then concludes with the following. I hope that this example improves your understanding of XML and how it may be applied to your inventions. As others have mentioned, this topic is digressing from the purpose of this message board and would be best discussed off line or in a different forum. Well, a letter addressed to me could have been sent by private email. - Shawn Shawn Steele Software Developer Engineer Microsoft Unfortunately, this is then followed by the following. My comments in no way endorse the original Well, that is fine, the letter has been posted to the Unicode list from a Microsoft address, so a clarification makes the situation clear just in case anyone had thought that in some way it might. and are not intended to confer legitimacy, Ah! That is not fine. The original is entirely legitimate and there is no need for legitimacy to be conferred at all, also the conferring of legitimacy is not something which is within the powers of Microsoft to confer, as Microsoft is a corporation and does not vote in public elections, let alone have jurisdiction in such matters. Mentioning legitimacy in that way in a document from Microsoft, a member of the Unicode Consortium, is very unfair. rather they are merely intended to be educational. Well, they are merely intended to be educational. No rather about it. This posting is provided AS IS with no warranties, Well, that is fine, the letter has been posted to the Unicode list from a Microsoft address, so a clarification makes the situation clear just in case anyone had thought that in some way it might. and confers no rights. What rights are being referred to here? William Overington 27 September 2002
Re: Keys. (derives from Re: Sequences of combining characters.)
Peter Constable commented as follows. On 09/26/2002 06:05:45 AM William Overington wrote: Dallas is 6 hours behind England on the clock. I'm going to refrain from commenting on anything beyond the markup issues As you wish. Though did you stick to that even in the same sentence? -- and I'm continuing with that only because it's an easy follow-on to what I already wrote, As you wish. even though there is every indication that the sensibility of it will be ignored. This did not appear to have meaning. I checked on the meaning of the word sensibility just to make sure. Did you intend to convey the meaning the good sense of what I write rather than the sensibility of it? Yet what indication whatsoever do you have that I ignore what you write? I do not always agree with you, yet where specific references to documents on the web are made I always attempt to obtain them and study the points you make. Certainly, I may not agree with you. Sometimes I agree, sometimes I do not agree and sometimes I am undecided in a matter. That surely is the nature of critical scholarship and research. A document would contain a sequence such as follows. U+2604 U+0302 U+20E3 12001 U+2460 London U+2604 U+0302 U+20E2 You could just as easily have used S C=12001London/S or S C=12001 P1=London/ which are only slightly more verbose, but which follow a widely-implemented standard that can be parsed by lots of existing software, for which there are a large number of tools available, and which a vast number of indivuals, businesses and other agencies have an interest in. Your markup convention is completely proprietary, Thank you. That is excellent. I designed the comet circumflex key with the specific intention that it was creatively original whilst being expressible using a standard all-Unicode font. it has no existing software support, and nobody but you has any interest in it. You have no basis whatsoever for claiming that nobody other than me has any interest in it. Maybe you are not interested, maybe some people you know are not interested, yet I feel that it is unfair for you to make such a statement without evidence when writing from an established organization as that remark may prejudice people from taking an interest in helping to develop the idea because of a political dimension of going against the tide. You have your position and I feel that you should allow someone who does not have such a position an even-handed chance to put forward an idea and have it considered on its merits. You tell me which one is more likely to result in productive work and adoption by others. Likelihood of success and what actually happens are not the same thing. I do not know which is more likely as I do not know of what has happened already. Some people may have deleted the email, some may have read it and disregarded it, yet it is possible that some people might have tried to produce a comet circumflex button on the screen using an all-Unicode font and might be considering the possibilities of how the system could be applied or might even be writing an experimental software program which can take comet circumflex sequences and process them through a database. Look, for example, at The Respectfully Experiment in the Unicode mailing list archives. There a result was assumed and something different was observed in practice. that it is because I am an inventor, interested in pushing the envelope as to what is possible scientifically and technologically. Marco asked me a specific question, so I answered what he had asked. Perhaps there is an [EMAIL PROTECTED] list somewhere where you might find greater interest in your ideas than here. That is unfair of you. You have chosen to respond to my posts and I have answered the questions which you asked. You even stated in the same post. quote I'm going to refrain from commenting on anything beyond the markup issues end quote The topic of keys generally which I have introduced is potentially a far-reaching development in the application of markup in Unicode based systems. My own comet circumflex system may be highly useful in business communications and distance education. I am happy to respond to questions and to consider documents which people suggest. None of us here mind invention, but I think most would believe that inventiveness is most productive when building off the advancement of others rather than reinventing wheels or widgets. XML exists, and it works. XML exists and it uses U+003C in a way that makes using U+003C with the meaning LESS-THAN SIGN in body text intermixed with markup sections awkward. That feature of XML may not matter for situations involving encoding simply literary works, yet for a comprehensive system which can include the U+003C character with the meaning LESS-THAN SIGN in body text and in markup parameters, it does not suit my need. Beside the fact that your proposed markup convention is not a good idea, it has nothing
Re: Keys. (derives from Re: Sequences of combining characters.)
Peter Constable wrote as follows. On 09/26/2002 03:42:16 AM William Overington wrote: Well, it might have been 03:42:16 AM where you are, indeed it probably was, as Dallas is six hours behind England on the clock, but I would not want people to think that I write my posts in the middle of the night! On the one hand, you say XML does not suit my specific need as far as I can tell. But you also said Documents with the code sequence are intended to be sent over the internet as email, used as web pages and broadcast in multimedia broadcasts over a direct broadcast satellite system, so the codes which you suggest would be unsuitable. In that quote the codes which you suggest was your list of specific Unicode code points as follows. quote Sorry to be blunt, but that's silly. If you need a special-purpose character (a code-sequence, to be more precise) for use within your specialised application, use one of FDD0..FDEF, FFFE, , 1FFFE, 1, 2FFFE... 10FFFE, 10. They are non-characters available for exactly this use. end quote I maintain that they are unsuitable for use in documents which are to be sent from one end user to another. Yet the first part of my sentence which you have quoted could by going to the final comma and converting it to a full stop form a sentence on its own as follows. Documents with the code sequence are intended to be sent over the internet as email, used as web pages and broadcast in multimedia broadcasts over a direct broadcast satellite system. So, I will reason from that. You also quote me as stating the following sentence. XML does not suit my specific need as far as I can tell. I am happy with that. The two sentences are entirely consistent. Are you perhaps trying to make a deduction by the fallacy of the undistributed middle, along the following lines. William's need is a markup system. XML is a markup system. William's need is XML. It may well be that XML could be used to carry the comet circumflex code numbers which I am devising. I am not saying that it could not be so used. I am simply saying that XML, as I understand it, does not suit my specific need. For example, if I understand it correctly, XML uses U+003C in a document in such a manner that its use for the meaning LESS-THAN SIGN in the body of the text cannot be used directly. For me, that is a major limitation of XML. Now, I am not trying to make some big issue out of this by criticising XML as I am not trying to criticise XML, yet to my mind that is a very big legacy issue of which I do not want to have the problem with my research in language translation and distance education. Maybe one day Unicode will encode special XML opening and closing angle brackets so that XML can operate without that problem. However, as XML uses the U+003C character in that manner at the moment, for me it is a problem and it has led me to use the key method using a comet circumflex key. Also, I do not need to have all those characters and = characters and / characters within messages. One of the things that is especially useful about XML and related technologies is the facility with which data can be repurposed. You have one schema for marking up data, and stylesheets that transform it as needed for different publishing / usage contexts. Also, I don't see how it can be that a character sequence such as U+003C U+0061 U+003E can't be useful to you when some ridiculous character sequence like U+2604 U+0302 U+20E3 is. Well, U+2604 U+0302 U+20E3 is not ridiculous. It is entirely permissible within the Unicode specification. I have used combining characters productively, in accordance with the rules set out in the specification. Please see section 7.9. The button displays using an all-Unicode font. If you think it ridiculous then maybe that is good evidence of its originality as a piece of creativity. A comet circumflex key could be viewed as a piece of original art. I specifically designed it so as to be a design which involves an inventive leap so as to produce something new and unexpected, which someone skilled in the art would not produce as the application of skill in the existing art without invention, yet which would display properly using an all-Unicode font. The sequence U+003C U+0061 U+003E is unsuitable because it begins with a U+003C character and I do not want the use of U+003C to mean LESS-THAN SIGN to be unavailable in a simple direct manner. I want to be able to use the comet circumflex translation system in documents which contain mathematics and software listings as well as literary text. So, I have decided to use a straightforward system which allows me to do that without problems. An added bonus of using the comet circumflex key is that documents containing comet circumflex codes do not necessarily need to contain any characters from the Latin alphabet. William Overington 27 September 2002
Re: Keys. (derives from Re: Sequences of combining characters.)
William Overington scripsit: Well, it depends what one is trying to do. If one wishes to establish a system whereby proprietary intellectual property rights exist, then a proprietary coding can be a good idea. That is the function of encryption. XML is the way to go. Maybe, maybe not. The issue of U+003C being used to mean LESS-THAN SIGN in documents which mix ordinary text and markup may or may not, depending upon the application, be a problem. Since there are several standard ways to represent the semantic LESS-THAN SIGN in XML (lt; is most typical, but #x3C; also works), there is no problem, only a little extra work as tradeoff. After all, why not invent your own character code as well as your own markup language? The keys idea is pushing the envelope. As spin off from this discussion, maybe the XML people, and the Unicode Technical Committee, will do something about having special characters for the XML tags rather than using U+003C and thereby help people wanting to place mathematics and software listings in the same file as markup. MathML is a markup standard for mathematical text that is an application of XML, so people wanting to place etc. need no further help. Don't hold your breath, and don't *mutcheh* us about it. What is wrong with private encodings? Interchanging them does not scale. People may ignore them if they wish. They will, they will. High level application semantics assigned to particular code points are potentially very useful. I have published various documents on the web about them with Private Use Area allocations for various items such as colour and point size for text. Of course you can use the Private Use Area for whatever you like. A character standard, however, is intended for encoding *characters*. It is not intended as a source of useful integers -- for that, apply to Dedekind. -- John Cowan [EMAIL PROTECTED] You need a change: try Canada You need a change: try China --fortune cookies opened by a couple that I know
Re: Keys. (derives from Re: Sequences of combining characters.)
[This is entirely off-topic.] On 09/27/2002 06:24:27 AM William Overington wrote: Yet what indication whatsoever do you have that I ignore what you write? The fact that you have been given recommendations from several people on this list not to invent new markup conventions but to take advantage of the existing, state-of-the-art technologies for this purpose, yet you have consistently rejected those recommendations. I do not always agree with you, I doubt there's anyone on this list that always agrees with me (I certainly hope not; after the passage of time, I often don't agree with myself :-). it has no existing software support, and nobody but you has any interest in it. You have no basis whatsoever for claiming that nobody other than me has any interest in it. It's only a claim, a hypothesis that I happen to consider to have enough probability of validity to make me feel confident in stating in a public forum. Of course, I may be wrong. Maybe you are not interested, maybe some people you know are not interested, yet I feel that it is unfair for you to make such a statement without evidence when writing from an established organization as that remark may prejudice people from taking an interest in helping to develop the idea because of a political dimension of going against the tide. I feel there is evidence: take a look at any serial publication related to the software industry from the past three years and look for references to XML. It comes up again and again and again. The evidence very strongly points in favour of XML if one is needing a markup convention for some protocol. There may well be some situation in which XML isn't appropriate; e.g. one might have valid reasons for wanting to maintain a binary file format as the native storage representation for a word-processing or spreadsheet app. But if one is going to use a *character*-based markup convention, I think you'd be hard pressed to come up with good reasons at this point for using something other than XML. Perhaps there is an [EMAIL PROTECTED] list somewhere where you might find greater interest in your ideas than here. That is unfair of you. If I offended, then I apologize. I merely wished to suggest that your ideas regarding markup are what I think the vast majority on this list would consider eccentric, and to also suggest that it's all off-topic for this list and really should be taken up elsewhere. You even stated in the same post. quote I'm going to refrain from commenting on anything beyond the markup issues end quote And I believe I did so. The topic of keys generally which I have introduced is potentially a far-reaching development in the application of markup in Unicode based systems. My own comet circumflex system may be highly useful in business communications and distance education. I am happy to respond to questions and to consider documents which people suggest. But please, not on this list. The is not the comet circumflex list. XML exists and it uses U+003C in a way that makes using U+003C with the meaning LESS-THAN SIGN in body text intermixed with markup sections awkward. Not significantly so, as evidenced by the fact that many have needed to represent the character within content yet this has not impeded the widespread -- near ubiquitous -- adoption of XML. That feature of XML may not matter for situations involving encoding simply literary works, yet for a comprehensive system which can include the U+003C character with the meaning LESS-THAN SIGN in body text and in markup parameters, it does not suit my need. Then I think you're making decisions about design of a protocol using the wrong criteria. Actually, I was rather hoping that, with your specific interest in languages that you would have wished to have a try at using the comet circumflex system as one of the features of the comet circumflex system is that it could be used with minority languages as easily as with the major languages of the world. Actually, one of the things that I chose *not* to comment on in the previous message was the very significant issues the comet circumflex system raises in relation to internationalisation and localisation. As someone else pointed out, your system has a problem in that a parameter such as London needs to be localised. There are a range of internationalisation issues that your system doesn't address. It isn't always safe to assume that one can define a matrix statement that can be translated into multiple languages and into which parameter strings can be inserted; issues such as grammatical concord may be a problem. I don't want to get into such a discussion (especially on this list). My point is, I see many potential problems in terms of multilingual application of the system. Also, the users I support are not dealing with text involving a set of short, pre-defined messages, so this system isn't all that relevant for my work. - Peter
Re: Keys. (derives from Re: Sequences of combining characters.)
On Friday, September 27, 2002, at 09:52 AM, [EMAIL PROTECTED] wrote: I doubt there's anyone on this list that always agrees with me I think you're wrong, there, Peter. I *never* disagree with you. :-) == John H. Jenkins [EMAIL PROTECTED] [EMAIL PROTECTED] http://www.tejat.net/
Re: Keys. (derives from Re: Sequences of combining characters.)
William Overington wrote: Message catalogs are not new. I had not heard the description Message catalog previously, so I can search for that too. I have previously searched under telegraphic code and language and translation. look for: software localization, message, catalog, resource files, perhaps localisation ;-) An email correspondent drew my attention to the following list of numbered radiograms. http://www.arrl.org/FandES/field/forms/fsd3.html That is an interesting document. I have not yet found any example oriented to language translation. I have not yet found any example oriented to carrying on a complete conversation. A new prisoner sits down for his first lunch. Someone shouts out 53. Everyone laughs. Another shouts 26. More laughter. He asks his neighbor what's going on... The neighbor explains they have all been there so long they have heard all the jokes told very many times. Finally they just gave them numbers. So when someone shouts out a number they remember the joke and laugh. After a bit the new guy shouts out: 42! Dead silence. He asks his neighbor what went wrong. He turns to him and says That one is not funny.. This is a very old joke. It is an indication of how old the idea of numbered messages might be. ;-) The arrl list was missing quite a few. 73 88 were common for Best regards, and love and kisses. I was rather surprised therefore when the Target products with 88 were recently pulled from the market because they signaled the neo-nazi movement. I thought it meant Love and kisses. A proprietary coding system is a bad idea. Well, it depends what one is trying to do. Yes, for the problem you described, given the availability of an open system, with lots of tool support, creating a proprietary system in which you could not create nearly as many tools as the open-based systems, it would not be competitive. You would really have to build in some significant market advantage. Given your lack of familiarity with what exists in the market, and a presumption of a one-man shop (limited resources), we speculated it was a mistake. XML is the way to go. Maybe, maybe not. The issue of U+003C being used to mean LESS-THAN SIGN in documents which mix ordinary text and markup may or may not, depending upon the application, be a problem. You can use the character with some minor escaping. It is a smaller issue than trying to create all the various tools and benefits you would get from XML. but as Peter and others have already defined several times where the envelope needs pushing (e.g. XML), and in particular where they should not (private encodings, and hi level application semantics assigned to particular code points), continued attempts to do so are not welcome. What is wrong with private encodings? The Private Use Area is there to be used. Sure, but use them privately and discuss them privately with people who have an interest in those particular purposes. This is not the place. I know this has been stated before. I think Suzanne or Barry even created a list for purposes of PUA discussion: http://groups.yahoo.com/group/CharMan/ Or start a list of your own. You are welcome (as are others) to send announcements here saying- Hey I have these PUA ideas, and would like to discuss them here and here. It is really quite unfair to the members of the list to cause it to go over the same ground. hth tex -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
Re: Keys. (derives from Re: Sequences of combining characters.)
John H. Jenkins scripsit: I think you're wrong, there, Peter. I *never* disagree with you. :-) Hmm. Has anyone ever seen Peter and John together? :-) -- John Cowan [EMAIL PROTECTED] www.ccil.org/~cowan www.reutershealth.com In the sciences, we are now uniquely privileged to sit side by side with the giants on whose shoulders we stand. --Gerald Holton
Re: Keys. (derives from Re: Sequences of combining characters.)
Tex Texin scripsit: After a bit the new guy shouts out: 42! Dead silence. He asks his neighbor what went wrong. He turns to him and says That one is not funny.. Other punchlines I have heard: (about a third party): Steve should know he can't handle Swedish dialect. (after uproarious laughter): Hey, we've never heard that one before! (after silence): I guess you just don't know how to tell a joke. This is a very old joke. It is an indication of how old the idea of numbered messages might be. ;-) As William mentions, commercial telegraph codes are almost as old as the telegraph itself; when the five-letter-code principle was eventually accepted internationally, it became possible to use a single group to represent things as complex as We are shipping to you, care of your agent in X, our product Y where all possible combinations of X and Y were given individual codes. This of course was a code commissioned by a private company; public codes necessarily had to be more inclusive and thus more verbose. Several of them were indeed published in multilingual editions, so that the same code sequence could be read as English, French, German, In the case of public codes, company code clerks became quite adept at reading the more frequent codes without reference to the code book. On one occasion, a code clerk got a cable from an agent located halfway around the planet reading simply AHXNO, a code entirely unfamiliar to him. Unfortunately, when he looked it up, he found the reading to be: Met with a fatal accident. -- John Cowanhttp://www.ccil.org/~cowan [EMAIL PROTECTED] Please leave your values| Check your assumptions. In fact, at the front desk. | check your assumptions at the door. --sign in Paris hotel |--Miles Vorkosigan
Re: Keys. (derives from Re: Sequences of combining characters.)
At 04:26 PM 9/27/2002 +0100, William Overington wrote: I had not heard the description Message catalog previously, so I can search for that too. I have previously searched under telegraphic code and language and translation. An email correspondent drew my attention to the following list of numbered I have not yet found any example oriented to language translation. Key Unix libraries have used message catalogs as part of the API since time immemorial. Hence any Unix application with even a whiff of a chance of being internationalized is likely to have used those functions. I have not yet found any example oriented to carrying on a complete conversation. I would look for the earliest references to machine translation int he 1940s and 50s, up to the work with Eliza at MIT in the 60s. I think there is an enormous project whose name I don't recall right now going on in Texas, perhaps Austin, which is spiritually derived from Eliza and focused on sending whole, previous composed sentences back conversational style. If you want to find the whole of the literature in this area, I suggest searching Turing Test. A proprietary coding system is a bad idea. Well, it depends what one is trying to do. If one wishes to establish a system whereby proprietary intellectual property rights exist, then a proprietary coding can be a good idea. Various large companies use proprietary coding systems for files used with their software packages. If, however, one is trying to establish an open system, then you might well be right. Or if you want to minimize the amount of reinventing the wheel you do internally. You can easily use a proprietary format outside and XML inside, just as you can use SJIS outside and Unicode for internal processing. Failure to investigate the state of the art, (especially where google is so effortless), means this idea is not pushing any envelope. Well, if you have any specific suggestions of what keywords to use in a search, that would be very helpful. I have given you some. Rather than focusing on pseudo-scientific terms like radiogram, I suggest a starting with a familiarity with the history of computer science, both pure and applied research. The keys idea is pushing the envelope. No it is not. As spin off from this discussion, maybe the XML people, and the Unicode Technical Committee, will do something about having special characters for the XML tags rather than using U+003C and thereby help people wanting to place mathematics and software listings in the same file as markup. Is using U+003C a legacy from ASCII days? Why is it not possible to use signs in XML? Most of my postings in this thread are in response to people asking me specific questions and raising interesting points. That is surely why a discussion group exists. But most of the answers you get are based on a shared technical and educational background which you don't have and/or seem to value. It is difficult to describe but a lot of early computer science research was about how to effectively decompose functionality and data. Sadly, I think a lot of this is being lost. For a more technical starting point, look for the works of Edsger Dijkstra starting in the 1960s. For a less technical point of view, look for The mythical Man-month from the mid 60s (recently updated), and its spiritual followups by Ed Yourdon and Tom Demarco. When I read the responses you get, I have the feeling that the authors have internalized the lessons of these important texts (even if they may not have studied them explicitly). Once you internalize the lessons also, then you will have a better understanding of the points of view you are consistently receiving with friction. I am hoping that I can publish some web pages with some comet circumflex codes and sentences about asking about the weather conditions and temperatures at the message recipients location together with codes and sentences for making replies so that hopefully people who might be interested in some concept proving experiments can hopefully have a go at some fascinating experiments with this technology. Unicode can be used to encode many langauges and it will be interesting to find out what can be achieved using the comet circumflex system. That might be an interesting web site in its own right, but the technology is nothing special and has ben done a million times under a million names and ten million times with no name at all. Barry Caplan Publisher, www.i18n.com
RE: Keys. (derives from Re: Sequences of combining characters.)
Tex Texin wrote: What's funny to me about this message, is a product message catalog I was responsible for localizing had messages created by software developers, such as (paraphrasing from memory): The client is dead. The client has been killed. You killed the client. Some of the translators were horrified. We had to explain that the client was software used by the user, and that to kill it meant the software was no longer operating, not that the product caused the death of the user. And then we had to get the developers to change the message, since even in english they were not the most effective messages. Lucky too, that support couldn't cause someone on the phone to give a command that could kill the client... Years ago, I was in charge of supporting software system composed of a main module, called the parent (task), and of a number of secondary modules, called child (tasks). Each child was identified with a name and a (task) address. One day, the IT manager reported that the system started having problems after a child had turned off the computer. I explained that, according to my knowledge, that was impossible: children ran in a protected area, so the parent would have stopped them before they had any chance of turning off the computer. But he replied that he saw the child turning off the system with his own eyes, and the parent could not stop it. This guy was such an idiot, and I was quite surprised to discover that he could use the utility called Children Monitor. So, I asked him to let me know the child's name and address. He said that he didn't understand how this detail could help us but, anyway, he obtained the child's name and address from the parent: Daniel Zubeispiel Hauptkirchestrasse, 26 Zürich, Switzerland (Seven years-old Daniel, the son of a system engineer, was in the laboratory that day because his school was closed for maintenance.) Ciao. Marco
Re: Keys. (derives from Re: Sequences of combining characters.)
Peter Constable commented as follows. On 09/25/2002 05:55:02 AM William Overington wrote: For example, I am looking at using the following sequence so as to produce a special purpose key within documents. U+2604 U+0302 U+20E3 Hopefully that sequence will be so unlikely to occur other than in my specialised application that the sequence can be used uniquely for that specialised application. Sorry to be blunt, but that's silly. If you need a special-purpose character (a code-sequence, to be more precise) for use within your specialised application, use one of FDD0..FDEF, FFFE, , 1FFFE, 1, 2FFFE... 10FFFE, 10. They are non-characters available for exactly this use. Documents with the code sequence are intended to be sent over the internet as email, used as web pages and broadcast in multimedia broadcasts over a direct broadcast satellite system, so the codes which you suggest would be unsuitable. If you need real character sequences for markup, there's this thing called XML. Perhaps you've heard of it. It's worth taking a look at; I think it really might catch on some day. I have heard of XML, though I know little about it. I have read some introductory documents about XML. XML does not suit my specific need as far as I can tell. William Overington 26 September 2002
Re: Keys. (derives from Re: Sequences of combining characters.)
Marco Cimarosti asked about what key caps have to do with mark up or text files. My idea is as follows. A document would contain a sequence such as follows. U+2604 U+0302 U+20E3 12001 U+2460 London U+2604 U+0302 U+20E2 This would have a meaning such as follows. It was a pleasure to welcome you to our stand at the recent exhibition in London. Please now consider the following sequence. U+2604 U+0302 U+20E3 12001 U+2460 Rome U+2604 U+0302 U+20E2 This would have the following meaning. It was a pleasure to welcome you to our stand at the recent exhibition in Rome. This being because my published dictionary would state that sentence 12001 within the Comet Circumflex system has one parameter and has the meaning as follows. It was a pleasure to welcome you to our stand at the recent exhibition in P1. The idea is based upon the telegraphic codes of days gone by, as used, in particular, on railway systems, except that this idea is for automated computer translation of preset sentences with one or more parameters. For example, someone in,say, Japan, who does not speak English (or does not speak it well enough to produce a professional quality translation) could communicate over the internet with someone in England who does not speak Japanese by using sentence C_C+12001 as above, provided that both sender and recipient have a dictionary for the Comet Circumflex system in his or her own language. The system needs the sender to encode the document. A recipient could, with an automated system, simply read the message in his or her own language. However, it will hopefully be possible to have a computer assisted encoding system whereby an end user may select sentences from topic areas and an encoded document be produced. In a computer system which does not have translation software installed, or has it installed but only uses it when specifically requested, the message would appear with a button at the start, provided that a font which carries the characters is being used. The message could then be translated, either automatically if translation software with a local database of C_C sentences in the local language is available, or manually from a dictionary of sentences. I expect that, whatever the potential for automation, to get started translations will be done manually. What languages will be used in early experiments will depend largely on whether any people who are fluent in a language other than English and can also translate from English into that language will want to try the system out, and thus upon whatever those languages happen to be. Ultimately, if no one is interested, I can get some translations done into a few languages by paying a professional bureau to do the work for me. However, the scope is there that the sentences could potentially be translated into many languages, both major languages and minority languages. Although I am preparing the sentences in English, it would not be necessary for either a sender or a recipient to know English, as, once the sentences have been translated once into their respective languages, then the code numbers could be used directly without using English in the sending and receiving of the messages. I have it in mind that I might author and publish, as shareware, a collection of sentences which could be used in business communications, hopefully gaining shareware royalties. For example, sentences making an enquiry about an item shown on someone's website, where the part number of the item is a parameter of the sentence. I am also interested in producing a set of sentences which might be useful in a distance education context. I am thinking of producing a few sentences asking about and commenting about the weather as a convenient way to experiment with a few sentences. For example, a sentence such as It is raining. would not have a parameter, a sentence such as The temperature in this room is P1 degrees Celsius. would have one parameter. There would clearly need to be lots of sentences encoded. However, I am hoping that meaningful communication will be possible with a collection of sentences which can be used with modern computing equipment. By using the U+2604 U+0302 U+20E3 sequence the system can be used within an email so that some special sentences are either translated manually or left in the original language. That, however, is only useful for one-to-one correspondence, for general publication of learning material only encoded sentences could be used, though that could, in conjunction with illustrations be potentially useful for some purposes. I am not envisaging doing any of the translation myself, as my linguistic knowledge is insufficient for professional quality translation work. Certainly, sentences for this Comet Circumflex system will need to be carefully designed so as to cover the needs of business communication without causing problems for a translation engine inserting parameters, so parameters will need to be either
Re: Keys. (derives from Re: Sequences of combining characters.)
On 09/26/2002 03:42:16 AM William Overington wrote: On the one hand, you say XML does not suit my specific need as far as I can tell. But you also said Documents with the code sequence are intended to be sent over the internet as email, used as web pages and broadcast in multimedia broadcasts over a direct broadcast satellite system, so the codes which you suggest would be unsuitable. One of the things that is especially useful about XML and related technologies is the facility with which data can be repurposed. You have one schema for marking up data, and stylesheets that transform it as needed for different publishing / usage contexts. Also, I don't see how it can be that a character sequence such as U+003C U+0061 U+003E can't be useful to you when some ridiculous character sequence like U+2604 U+0302 U+20E3 is. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Keys. (derives from Re: Sequences of combining characters.)
On 09/26/2002 06:05:45 AM William Overington wrote: I'm going to refrain from commenting on anything beyond the markup issues -- and I'm continuing with that only because it's an easy follow-on to what I already wrote, even though there is every indication that the sensibility of it will be ignored. A document would contain a sequence such as follows. U+2604 U+0302 U+20E3 12001 U+2460 London U+2604 U+0302 U+20E2 You could just as easily have used S C=12001London/S or S C=12001 P1=London/ which are only slightly more verbose, but which follow a widely-implemented standard that can be parsed by lots of existing software, for which there are a large number of tools available, and which a vast number of indivuals, businesses and other agencies have an interest in. Your markup convention is completely proprietary, it has no existing software support, and nobody but you has any interest in it. You tell me which one is more likely to result in productive work and adoption by others. that it is because I am an inventor, interested in pushing the envelope as to what is possible scientifically and technologically. Perhaps there is an [EMAIL PROTECTED] list somewhere where you might find greater interest in your ideas than here. None of us here mind invention, but I think most would believe that inventiveness is most productive when building off the advancement of others rather than reinventing wheels or widgets. XML exists, and it works. Beside the fact that your proposed markup convention is not a good idea, it has nothing whatsoever to do with the development of Unicode. This discussion really ought to be taken elsewhere. - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: Keys. (derives from Re: Sequences of combining characters.)
So that Peter's comments cannot be perceived as strictly Peter's view, I am seconding them. Message catalogs are not new. A proprietary coding system is a bad idea. XML is the way to go. Failure to investigate the state of the art, (especially where google is so effortless), means this idea is not pushing any envelope. New ideas are welcome, but as Peter and others have already defined several times where the envelope needs pushing (e.g. XML), and in particular where they should not (private encodings, and hi level application semantics assigned to particular code points), continued attempts to do so are not welcome. tex [EMAIL PROTECTED] wrote: ...Your markup convention is completely proprietary, it has no existing software support, and nobody but you has any interest in it. You tell me which one is more likely to result in productive work and adoption by others. ... None of us here mind invention, but I think most would believe that inventiveness is most productive when building off the advancement of others rather than reinventing wheels or widgets. XML exists, and it works. Beside the fact that your proposed markup convention is not a good idea, it has nothing whatsoever to do with the development of Unicode. This discussion really ought to be taken elsewhere. - Peter -- - Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCrafthttp://www.XenCraft.com Making e-Business Work Around the World -
RE: XML Primer (was Keys. (derives from Re: Sequences of combining characters.))
Mr. Overington, Peter didn't specifically mention that his suggestion is an example of XML, although he alluded to that fact. As many people have mentioned before on this list, XML is a more appropriate mechanism for many of your inventions, and it is also a standard. One of the neatest things about XML is that you can invent your own tags, as Peter's example did below. Of course applications still must agree on the meanings of those tags, but your suggestion has the same limitation. A big advantage of XML is that even when the tags are not understood, they can still be safely ignored without fear that other information is lost, garbled or otherwise mangled. Some other examples of how your XML tags may have been chosen are: CometCircumflex SentenceCode=12001London/CometCircumflex or CometCircumflex SentenceCode=12001 Parameter1=London/ or CometCircumflex SentenceCode=12001 Parameter1=LondonThanks for visiting our stand in London./CometCircumflex or CometCircumflex SentenceCode=12001Thanks for visiting our Parameter Number=1London/Parameter stand./CometCircumflex Notice that in the last 2 examples an English string appears, so a reader without your translation system will still have understandable text if your XML tags are ignored (as most programs do when they don't understand XML.) Also, even though English is provided in the last 2 strings, the other necessary information (Sentence=12001 and Parameter #1=London) is included for your translation algorithm. The author chose to use slightly different text than your standard It was a pleasure to welcome you to our stand at the recent exhibition in P1. That allows the author to make minor deviations to customize his text for native speakers, yet the author could still communicate with non-native speakers. I should also mention that your proposed system still has some limitations. For example if the conference were in Cologne, Germany, a Deutsch speaker would expect the city name Köln instead. I hope that this example improves your understanding of XML and how it may be applied to your inventions. As others have mentioned, this topic is digressing from the purpose of this message board and would be best discussed off line or in a different forum. - Shawn Shawn Steele Software Developer Engineer Microsoft My comments in no way endorse the original and are not intended to confer legitimacy, rather they are merely intended to be educational. This posting is provided AS IS with no warranties, and confers no rights. -Original Message- A document would contain a sequence such as follows. U+2604 U+0302 U+20E3 12001 U+2460 London U+2604 U+0302 U+20E2 You could just as easily have used S C=12001London/S or S C=12001 P1=London/
Re: Keys. (derives from Re: Sequences of combining characters.)
Peter responded: A document would contain a sequence such as follows. U+2604 U+0302 U+20E3 12001 U+2460 London U+2604 U+0302 U+20E2 You could just as easily have used S C=12001London/S or S C=12001 P1=London/ or even: cometcircumflex messageId=12001London/cometcircumflex if one likes the ring of comet circumflex for one's tags. which are only slightly more verbose, but which follow a widely-implemented standard namely, XML, which I think effectively gainsays William's earlier comment: XML does not suit my specific need as far as I can tell. And as far as the idea of having parameterized messages, with translation catalogs, I would join the chorus inviting William to investigate state of the art before attempting to invent something that already exists in many forms. Or, to further mangle Marco's musical metaphor, as you go round and around on this topic, make sure that you don't mix up the apples *for* the horses with the horseapples *from* the horses. --Ken ;-)
Keys. (derives from Re: Sequences of combining characters.)
The recent discussion on sequences has led me to have a look through the various combining characters and I have found the following. U+20E3 COMBINING ENCLOSING KEYCAP It has occurred to me that the use of a sequence of a base character, then one or more combining characters so as to produce a sequence which would be otherwise unlikely, followed by U+20E3 might be a very effective way to include specialised markup systems within a plain text file without disrupting the normal textual information conveying capabilities of a file. An all-Unicode font would then produce a graphic representation of the key, without any prior arrangement being necessary, so that such marked-up sequences could be produced using just a regular all-Unicode plain text editor. A receiving program with a specialized plug-in could then decode the markup, or it could be decoded manually in some cases. For example, I am looking at using the following sequence so as to produce a special purpose key within documents. U+2604 U+0302 U+20E3 Hopefully that sequence will be so unlikely to occur other than in my specialised application that the sequence can be used uniquely for that specialised application. I am also thinking in terms of using the following sequence to indicate the end of the markup sequence. U+2604 U+0302 U+20E2 I have it in mind that characters in the range U+2460 through to U+2473 could be used before parameters within the markup system. Also, I have noticed that in the document U02D0.pdf that U+20E4 is shown, in the listing, in magenta whereas U+20DF is shown in black. Could someone say what significance the magenta colouring in the document has please? Is it perhaps to indicate additions since the previous issue of the document? William Overington 25 September 2002
Re: Keys. (derives from Re: Sequences of combining characters.)
William Overington WOverington at ngo dot globalnet dot co dot uk wrote: Also, I have noticed that in the document U02D0.pdf (actually U20D0.pdf) that U+20E4 is shown, in the listing, in magenta whereas U+20DF is shown in black. Could someone say what significance the magenta colouring in the document has please? Is it perhaps to indicate additions since the previous issue of the document? Since the previous release of Unicode. The magenta characters are those added in Unicode 3.2. They were marked specially in the draft copies of the code charts to indicate the changes (and probably to highlight the fact that the assignments were still tentative), and left that way after 3.2 went live. Whether this was intentional or not, I don't know. -Doug Ewell Fullerton, California
RE: Keys. (derives from Re: Sequences of combining characters.)
William Overington wrote: The recent discussion on sequences has led me to have a look through the various combining characters and I have found the following. U+20E3 COMBINING ENCLOSING KEYCAP It has occurred to me that the use of a sequence of a base character, then one or more combining characters so as to produce a sequence which would be otherwise unlikely, followed by U+20E3 might be a very effective way to include specialised markup systems within a plain text file [...] What the hell do key caps have to do with mark up or text files!!?? Mr. Overington, why do you have this irresistible compulsion to mix up apples and horses? (I feel that the usual apples and oranges is not enough to convey the idea fully.) Regards. _ Marco