Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
On 4/23/2013 3:00 AM, Philippe Verdy wrote: Do you realize the operating cost of any international standard comittee or for the maintenance ans securization of an international registry ? Who will pay ? Currently we all are paying by having interminable discussions of half-baked ideas foisted onto us. There's a word for this. Time for this discussion to be dropped. A./
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
On 4/23/2013 2:01 AM, William_J_G Overington wrote: On Monday 22 April 2013, Asmus Freytag wrote: I'm always suspicious if someone wants to discuss scope of the standard before demonstrating a compelling case on the merits of wide-spread actual use. The reason that I want to discuss the scope is because there is uncertainty. I'm not going to engage on a scope discussion with you, even on this lovely list, without some shred of evidence that there is "compelling need". Cheers, A./
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
Do you realize the operating cost of any international standard comittee or for the maintenance ans securization of an international registry ? Who will pay ? You ? Unless there's a very productive and demonstrate need of such a registry, using the existing domain name or URI schemes mechanism will be enough. 2013/4/23 William_J_G Overington > On Tuesday 23 April 2013, Philippe Verdy wrote: > > > There's also noather issue: your proposal now uses identifiers that will > be resolved in a registry database you are the only one to control. > > Not at all. The registry would be controlled by an International Standards > Organization committee. > > As you have raised the matter, here is a quote from a document that I > submitted to the ISO/IEC 10646 committee in January 2012. > > quote > > My current thinking is that an ISO committee entity would choose sentences > and symbols and then approach the ISO/IEC 10646 committee on an > inter-committee liaison basis to ask for character code points to be > assigned to the symbol and sentence pairs. For the avoidance of doubt I > have, as at the time of preparing this document, made no application to ISO > about such a committee entity carrying out such activities. > > My thinking is that that ISO committee entity could potentially be one of > the following. > > 1. A new ISO committee, generated for the purpose. > > 2. The ISO/IEC 10646 committee, or a subcommittee of the ISO/IEC 10646 > committee. > > 3. An existing ISO committee, other than the ISO 10646 committee, or a > subcommittee of that committee. > > end quote > > William Overington > > 23 April 2013 > >
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
On Tuesday 23 April 2013, Philippe Verdy wrote: > There's also noather issue: your proposal now uses identifiers that will be > resolved in a registry database you are the only one to control. Not at all. The registry would be controlled by an International Standards Organization committee. As you have raised the matter, here is a quote from a document that I submitted to the ISO/IEC 10646 committee in January 2012. quote My current thinking is that an ISO committee entity would choose sentences and symbols and then approach the ISO/IEC 10646 committee on an inter-committee liaison basis to ask for character code points to be assigned to the symbol and sentence pairs. For the avoidance of doubt I have, as at the time of preparing this document, made no application to ISO about such a committee entity carrying out such activities. My thinking is that that ISO committee entity could potentially be one of the following. 1. A new ISO committee, generated for the purpose. 2. The ISO/IEC 10646 committee, or a subcommittee of the ISO/IEC 10646 committee. 3. An existing ISO committee, other than the ISO 10646 committee, or a subcommittee of that committee. end quote William Overington 23 April 2013
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
On 2013/04/23 18:01, William_J_G Overington wrote: On Monday 22 April 2013, Asmus Freytag wrote: I'm always suspicious if someone wants to discuss scope of the standard before demonstrating a compelling case on the merits of wide-spread actual use. The reason that I want to discuss the scope is because there is uncertainty. If people are going to spend a lot of time and effort in the research and development of a system whether the effort would all be wasted if the system, no matter how good and no matter how useful were to come to nothing because it would be said that encoding such a system in Unicode would be out of scope. [I'm just hoping this discussion will go away soon.] You can develop such a system without using the private use area. Just make little pictures out of your "characters", and everybody can include them in a Web page or an office document, print them, and so on. The fact that computers now handle text doesn't mean that text is the only thing computers can handle. Once you have shown that your little pictures are widely used as if they were characters, then you have a good case for encoding. This is how many symbols got encoded; you can check all the documentation that is now public. A ruling that such a system, if developed and shown to be useful, would be within scope for encoding in Unicode would allow people to research and develop the system with the knowledge that there will be a clear pathway of opportunity ahead if the research and development leads to good results. As far as I know, the Unicode consortium doesn't rule on eventualities. So, I feel that wanting to discuss the scope of Unicode so as to clear away uncertainty that may be blocking progress in research and development is a straightforward and reasonable thing to do. The main blocking factor is the (limited) usefulness of your ideas. In case that's ever solved, the rest will be comparatively easy. Regards, Martin.
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
There's also noather issue: your proposal now uses identifiers that will be resolved in a registry database you are the only one to control. There are other competing registries for storing images, logos, and so on. Finally your registry does not exist for now, or nobody else than you uses it. And why would Unicode delegate a part of the encoding process to you only and only for your specific registry? How many chracters would Unicode need to encode to use other registries? There's already working standards for using registries in an open competition : domain names, or URL fragments, or URI schemes for URNs. And they don't require any addition of characters in Unicode for domain names or URIs to be encoded in documents. 2013/4/23 William_J_G Overington > On Monday 22 April 2013, Asmus Freytag wrote: > > > I'm always suspicious if someone wants to discuss scope of the standard > before demonstrating a compelling case on the merits of wide-spread actual > use. > > The reason that I want to discuss the scope is because there is > uncertainty. If people are going to spend a lot of time and effort in the > research and development of a system whether the effort would all be wasted > if the system, no matter how good and no matter how useful were to come to > nothing because it would be said that encoding such a system in Unicode > would be out of scope. > > A ruling that such a system, if developed and shown to be useful, would be > within scope for encoding in Unicode would allow people to research and > develop the system with the knowledge that there will be a clear pathway of > opportunity ahead if the research and development leads to good results. > > So, I feel that wanting to discuss the scope of Unicode so as to clear > away uncertainty that may be blocking progress in research and development > is a straightforward and reasonable thing to do. > > William Overington > > 23 April 2013 > > > > > > > > > > > > >
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
On Tuesday 23 April 2013, Charlie Ruland ☘ wrote: > Taken together the above sentences mean that he has to face the fact that > there is no “basis for further discussion of the topic.” Well I knew and had just put up with the old situation and was researching on other topics. I had deposited the documents and fonts with the British Library so that they would be available for researchers in the future. Then the Unicode Consortium made its announcement. http://unicode-inc.blogspot.co.uk/2013/04/utc-document-register-now-public.html quote This change has been made to increase public involvement in the ongoing deliberations of the UTC in its work developing and maintaining the Unicode Standard and other related standards and reports. end quote William Overington 23 April 2013
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
On Monday 22 April 2013, Asmus Freytag wrote: > I'm always suspicious if someone wants to discuss scope of the standard > before demonstrating a compelling case on the merits of wide-spread actual > use. The reason that I want to discuss the scope is because there is uncertainty. If people are going to spend a lot of time and effort in the research and development of a system whether the effort would all be wasted if the system, no matter how good and no matter how useful were to come to nothing because it would be said that encoding such a system in Unicode would be out of scope. A ruling that such a system, if developed and shown to be useful, would be within scope for encoding in Unicode would allow people to research and develop the system with the knowledge that there will be a clear pathway of opportunity ahead if the research and development leads to good results. So, I feel that wanting to discuss the scope of Unicode so as to clear away uncertainty that may be blocking progress in research and development is a straightforward and reasonable thing to do. William Overington 23 April 2013
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
On Monday 22 April 2013, Asmus Freytag wrote: > I'm afraid that any proposal submitted this way would just become the basis > for a rejection "with prejudice". Well, the rules could be changed. I feel that the existing position is not suitable for the advances in ideas that are taking place with pure electronic publications and communications. It is not the same situation as coining a new word where the new word only becomes included in the Oxford English Dictionary once the new word has an amount of use by people other than the person who coined the word. Not the same because with a new word there is not an associated character code point. Achieving widespread use using a Private Use Area code point is not an easy matter for an individual. > Independent of the lack of technical merit of the proposal, the utter lack of > support (or use) by any established community would make such a proposal a > non-starter. Only because rules made long ago before many recent advances in technology have not been updated for modern times. ? Mr. Overington is quite aware of what would be the inevitable outcome of submitting an actual proposal, that's why he keeps raising this issue with some regularity here on the open list. Well I am aware of the present rules. The reason that I started the thread from which this thread was derived is solely because of an announcement by the Unicode Consortium. http://unicode-inc.blogspot.co.uk/2013/04/utc-document-register-now-public.html quote This change has been made to increase public involvement in the ongoing deliberations of the UTC in its work developing and maintaining the Unicode Standard and other related standards and reports. end quote Given this new openness by the Unicode Consortium I felt that it was worthwhile seeking to put forward my ideas for consideration by the committee. The http://www.unicode.org/timesens/calendar.html web page at present shows that the next meeting of the Unicode Technical Committee is due to start on 6 May 2013. The http://www.unicode.org/pending/docsubmit.html web page includes the following. quote Once a document is received and accepted for posting to the registry, we will assign a document number to it and tell you the number for future reference. We usually update the document registry when several new documents have accumulated in our queue, so your document may not be posted immediately after acceptance. end quote I feel that it would be helpful if there were a change of policy and each time a document is accepted for addition to the document registry that it becomes added to the document registry immediately rather than waiting in a queue. I do not know if there are or are not any documents in a queue at present. The Unicode Consortium has declared that it wishes to increase public involvement, so why the queue system? William Overington 23 April 2013
RE: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
Only a formal proposal can be properly discussed and subsequently rejected at both UTC and SC2/WG2. At this stage there is only a lot of hot air and waste of time and effort. Sincerely, Erkki Lähettäjä: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] Puolesta Charlie Ruland ? Lähetetty: 23. huhtikuuta 2013 9:24 Vastaanottaja: unicode@unicode.org Aihe: Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public) * Asmus Freytag [2013/4/22]: On 4/22/2013 4:27 AM, Charlie Ruland ☘ wrote: [...] Please submit a formal proposal that can serve as a basis for further discussion of the topic. [...] Mr. Overington is quite aware of what would be the inevitable outcome of submitting an actual proposal, that's why he keeps raising this issue with some regularity here on the open list. Taken together the above sentences mean that he has to face the fact that there is no “basis for further discussion of the topic.” Charlie Ruland ☘ A./
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
* Asmus Freytag [2013/4/22]: On 4/22/2013 4:27 AM, Charlie Ruland ☘ wrote: [...] Please submit a formal proposal that can serve as a basis for further discussion of the topic. [...] Mr. Overington is quite aware of what would be the inevitable outcome of submitting an actual proposal, that's why he keeps raising this issue with some regularity here on the open list. Taken together the above sentences mean that he has to face the fact that there is no “basis for further discussion of the topic.” Charlie Ruland ☘ A./
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
On Monday 22 April 2013 I wrote: > This will need first of all a new version of the font so as to have symbols > for the localizable sentence markup bubble brackets and ten localizable > digits for use solely within localizable sentence markup bubbles. After sending that post I made the new version of the font. It is available from a post in the High-Logic forum. http://forum.high-logic.com/viewtopic.php?p=18680#p18680 The ten localizable digits for use solely within localizable sentence markup bubbles are encoded from U+ED80 through to U+ED89 with Alt codes from Alt 60800 through to Alt 60809. The localizable sentence markup bubble brackets are encoded at U+ED90 and U+ED91 with Alt codes of Alt 60816 and Alt 60817. I have made the designs for the two localizable sentence markup bubble brackets deliberately not horizontal mirror images of each other in case that might cause problems if intermixing them within right to left scripts. I do not know enough of right to left scripts to know if there would be a problem, so I thought that I would seek to design the glyphs so as to avoid any problems that might arise. William Overington 23 April 2013
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
On 4/22/2013 12:35 PM, Stephan Stiller wrote: [Charlie Ruland:] The Unicode Consortium is prepared to encode all characters that can be shown to be in actual use. Are you sure there is a precedent for what is essentially markup for a system of (alpha)numerical IDs? You don't even have to look that far. These inventions utterly fail the "actual use" test, in the sense that I explained in my other message. I'm always suspicious if someone wants to discuss scope of the standard before demonstrating a compelling case on the merits of wide-spread actual use. A./
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
On 4/22/2013 4:27 AM, Charlie Ruland ☘ wrote: * William_J_G Overington [2013/4/22]: [...] If the scope of Unicode becomes widened in this way, this will provide a basis upon which those people who so choose may research and develop localizable sentence technology with the knowledge that such research and development could, if successful, lead to encoding in plane 13 of the Unicode system. I don’t think your problem is “the scope of Unicode” but the size of the community that uses “localizable sentences.” The Unicode Consortium is prepared to encode all characters that can be shown to be in actual use. Please submit a formal proposal that can serve as a basis for further discussion of the topic. I'm afraid that any proposal submitted this way would just become the basis for a rejection "with prejudice". Independent of the lack of technical merit of the proposal, the utter lack of support (or use) by any established community would make such a proposal a non-starter. In other words "can be shown to be in actual use" is an important hurdle that this scheme, however dear to its inventor, cannot seem to pass. The actual bar would actually be a bit higher than you state it. The use has to be of a kind that benefits from standardization. Usually, that means that the use is wide-spread, or failing that, that the character(s) in question are essential elements of a script or notation that, while themselves perhaps rare, complete a repertoire that has sufficient established use. Characters invented for "possible" use (as in "could become successful") simply don't pass that hurdle, even if for example, the inventor were to publish documents using these characters. There are honest attempts, for example, to add new symbols to mathematical notation, which have to wait until there's evidence that they have become accepted by the community before they can be considered for encoding. Mr. Overington is quite aware of what would be the inevitable outcome of submitting an actual proposal, that's why he keeps raising this issue with some regularity here on the open list. A./
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
[Charlie Ruland:] The Unicode Consortium is prepared to encode all characters that can be shown to be in actual use. Are you sure there is a precedent for what is essentially markup for a system of (alpha)numerical IDs? Stephan
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
* William_J_G Overington [2013/4/22]: [...] If the scope of Unicode becomes widened in this way, this will provide a basis upon which those people who so choose may research and develop localizable sentence technology with the knowledge that such research and development could, if successful, lead to encoding in plane 13 of the Unicode system. I don’t think your problem is “the scope of Unicode” but the size of the community that uses “localizable sentences.” The Unicode Consortium is prepared to encode all characters that can be shown to be in actual use. Please submit a formal proposal that can serve as a basis for further discussion of the topic. Charlie Ruland ☘ William Overington 22 April 2013
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
On Saturday 20 April 2013, Erkki I Kolehmainen wrote: > I'm sorry to have to admit that I cannot follow at all your train of thought > on what would be the practical value of localizable sentences in any of the > forms that you are contemplating. In my mind, they would not appear to > broaden the understanding between different cultures (and languages), quite > the contrary. Well, most of the localizable sentences are not intended to broaden the understanding between different cultures (and languages). Broadening the understanding between different cultures (and languages) is a good thing, at an appropriate time. Localizable sentences are intended to assist communication through the language barrier for particular circumstances, which is a different situation. For example, seeking information about relatives and friends after a disaster in a country whose language one does not know. I have produced some simulations. Please consider the simulations in the locse027_four_simulations.pdf document that is available from the following forum post. http://forum.high-logic.com/viewtopic.php?p=16264#p16264 Consider please a derivative work of simulation 2. Simulation 2 is in pages 8 through to 17 of the pdf document. Let us suppose that, in this derivative version of simulation 2, that the Information Management Centre is located in Finland and that the native language of Sonja is Finnish. enter simulation Sonja has, at various times, three different messages displayed upon the screen of the computer that she is using. There is the message from Albert Johnson. There is Sonja's first reply to Albert Johnson. There is Sonja's second reply to Albert Johnson. The messages are displayed in Finnish on the screen of the computer that Sonja is using. leave simulation Now, if the three messages that are written in English in the text of the simulations as I wrote them were each translated into Finnish then the text of the derivative simulation could include those three messages in Finnish as well as in English. That would provide a good simulation of how the messages would be displayed on the computer screen that Sonja is using and on the computer screen that Albert Johnson is using. I am hoping to prepare Simulation 6 to show a simulation where the localizable sentences could be encoded within a plain text message using localizable sentence markup bubbles and Simulation 7 where there is a mixture of the two encoding methods. This will need first of all a new version of the font so as to have symbols for the localizable sentence markup bubble brackets and ten localizable digits for use solely within localizable sentence markup bubbles. I am then hoping to prepare a document to send to the Unicode Technical Committee making reference to the simulations. The purpose of the document that I am hoping to prepare for the Unicode Technical Committee is to ask for consideration of whether the scope of Unicode should be widened so as to allow for localizable items to become encoded in plane 13 at some future time. Those localizable items, at present, would be two localizable sentence markup bubble brackets, ten localizable digits for use solely within localizable sentence markup bubbles, a number of localizable sentences and a number of localizable stand-alone phrases. Each localizable item encoded within plane 13 would have an associated symbol for display in situations where automated localization were either not available or were not switched on. If the scope of Unicode becomes widened in this way, this will provide a basis upon which those people who so choose may research and develop localizable sentence technology with the knowledge that such research and development could, if successful, lead to encoding in plane 13 of the Unicode system. William Overington 22 April 2013
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
On Apr 21, 2013, at 11:01 AM, Christopher Fynn wrote: > In India you could have telegrams > containing such sentences delivered in any of the major Indian > regional languages. There is apparently a version of this still in use, seen in the List of Standard Phrases for Greeting Telegrams at the bottom of this page: http://www.pondyonline.com/User/static/TelegramService.aspx But it's not clear whether language translation is provided.
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
In India you could have telegrams containing such sentences delivered in any of the major Indian regional languages. This was a good idea in the days of the low-bandwidth telegraph And it was a domain-restricted application. Stephan
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
William Your "localizable sentences" idea reminds me of telegraph companies that used to have a number of common sentences that could be transmitted in morse code by number. In India you could have telegrams containing such sentences delivered in any of the major Indian regional languages. This was a good idea in the days of the low-bandwidth telegraph - but, as Ken suggested, with modern technology there are now far more sophisticated ways of accomplishing the same sort of thing. regards - Chris
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
Some better proaches have been used with practical applications, on TRUE languages supported by ACTIVE communities : it is sign-writing which represent sign languages which are FAR richer than what is proposed. They have a true grammar, a true syntax, they are versatile, with good links to other oral languages. And they solve practical problems. Other approches includes the proliferation of *conventional* pictograms to represent only basic meanings. But what ius important is that they are used under a convention that is widely recognized, and supported by active standards. This includes trafic signs on roads, rivers, railways, or pictograms frequently seen on maps or on directing banners in closed spaces (e.g. toilets, phone, stairs...). This uncludes also conventonal pictograms for representing a set of dangers or health safety, or environmental issues (recycling...). Or those used in meteoroly. Or the set of logos (logograms) used by organizations as trademarks. But they do not encode sentences, but essential items in their own specific domain of application ; they are essentially static in nature, not dynamic like actual humane languages and cannot be used to define other concepts than what they represent isolately. You can't really "speak" with pictograms and logograms. But to develop it to represent true languages, you'll need centuries if not milleniums to represent concepts and articulate them, and to include also some honograms. This results in ideograms, and notably the very rich (and still uncounted) set of sinograms used to write Chinese and partly Japanese and Korean. But in fact this system becomes so complex that it naturelly evolved to keep only the phonograms and you get the various alphabets of the world. The development of orthography comes later, when this written form of the language wants to "normalize" exchanges in a population using various spoken dialects, and when phonograms alone become ambiguous. For Chinese the system has evolved by compbining ideograms and phonograms to solve the ambiguities that phonograms alone can't solve without an orthography, and that ideograms alone can't solve with a rich enough set of ideograms only. Sign writing belongs to the categoty of alphabets. Its "phonograms" represent gestures, and they are combined to create semantics according to the orthgraphy and syntax of the sign languages they are used for. Even if some gestures used in sign languages may be perceived as ideograms, their use is in fact not significant alone outside of the grammatical context where these signs are used.
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
I am wondering whether it would be a good idea for there to be a list of numbered preset sentences that are an international standard and then if Google chose to front end Google Translate with precise translations of that list of sentences made by professional linguists who are native speakers, then there could be a system that can produce a translation that is precise for the sentences that are on the list and machine translated for everything else. Phrase-based machine translation goes much further: it already lets you pair up far more sentences than would fit into a standard with a limited code inventory such as Unicode, and it lets you pair up phrases. The fact that translations are not precise is a problem that has to do with context and with natural language per se. Maybe there could then just be two special Unicode characters, one to indicate that the number of a preset sentence is to follow and one to indicate that the number has finished. That would belong into a higher-level protocol, not Unicode. If that were the case then there might well not be symbols for the sentences, yet the precise conveying of messages as envisaged in the simulations would still be achievable. The sentences will be as precise as the scope of the sentence inventory allows. Enumerating sentences or phrasal fragments (I'm hesitant to talk of "phrases", which for me have constituent nature, but maybe that's just me) is unrealistic unless you are trying to cover only a /very/ limited domain. If all you encode is (say) requests for meals with the 100 most frequently wanted combinations of nutritional restrictions, your sentence inventory will encode those requests precisely, but as soon as you're trying to make adjustments to your formulaic requests (you're willing to eat /any/ vegetarian, gluten-free meal each time of the day and day of the year? of /any/ size?), the sentences won't be of use anymore. This is really why an approach that enumerates large text chunks is unworkable. (I won't say "useless", but of limited use; "point-at-me" picture books and imprecise translations are likely to do a tolerable job already.) The number of sentences you'll need will be exponential in the number of ingredient options you are intending to vary over. In any case, we are all left guessing about the intended coverage of any set of sentences you have mind. From your previous writings I'm guessing (as implied earlier) that you mean something like "travel and emergency communication", but that is already a large domain. If you try to delimit the coverage and come up with a finite list of sentences, you will see that you'll end up with far too many. You'd also need to think about how to make these sentences accessible (via number/ID? that would be difficult or require training for the user if the number of sentences isn't very small). What if you only want the inventory of a travel phrasebook? For that, you have the travel phrasebook (hierarchically organized, not by number), and I have heard of limited-domain computers/apps for crisis situations (the details elude me at the moment). Perhaps that is the way forward for some aspects of communication through the language barrier. You would need to specify which problems precisely you are attempting to solve, what is wrong with the approaches presently available, and why/how your approach does a better job. Stephan
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
On 2013-04-20 2:38 AM, William_J_G Overington wrote: I am thinking that the fact that I am not a linguist and that I am implicitly seeking the precision of mathematics and seeking provenance of a translation is perhaps the explanation of why I am thinking that localizable sentences is the way forward. There seems to a fundamental mismatch deep in human culture of the way that mathematics works precisely yet that translation often conveys an impression of meaning that is not congruently exact. Perhaps that is a factor in all of this. Natural language lacks the logic and precision of mathematics, and is only unpredictably unambiguous. That's why lojban was invented. https://en.wikipedia.org/wiki/Lojban -- Curtis Clarkhttp://www.csupomona.edu/~jcclark Biological Sciences +1 909 869 4140 Cal Poly Pomona, Pomona CA 91768
RE: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
LOL... {phone} On Apr 20, 2013 8:44 PM, "Erkki I Kolehmainen" wrote: > Mr. Overington, > > I'm sorry to have to admit that I cannot follow at all your train of > thought on what would be the practical value of localizable sentences in > any of the forms that you are contemplating. In my mind, they would not > appear to broaden the understanding between different cultures (and > languages), quite the contrary. I appreciate the fact that there are > several respectable members of this community who are far too polite to > state bluntly what they think of the technical merits of your proposal. > > Sincerely, Erkki I. Kolehmainen > > -Alkuperäinen viesti- > Lähettäjä: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] > Puolesta William_J_G Overington > Lähetetty: 20. huhtikuuta 2013 12:39 > Vastaanottaja: KenWhistler > Kopio: unicode@unicode.org; KenWhistler; wjgo_10...@btinternet.com > Aihe: Re: Encoding localizable sentences (was: RE: UTC Document Register > Now Public) > > On Friday 19 April 2013, Whistler, Ken wrote: > > > You are aware of Google Translate, for example, right? > > Yes. I use it from time to time, mostly to translate into English: it is > very helpful. > > > If you input sentences such as those in your scenarios or the other > examples, such as: > > > Where can I buy a vegetarian meal with no gluten-containing ingredients > in it please? > > > You can get immediately serviceable and understandable translations in > dozens of languages. For example: > > > Wo kann ich ein vegetarisches Essen ohne Gluten-haltigen Bestandteile > davon, bitte? > > > Not perfect, perhaps, but perfectly comprehensible. And the application > will even do a very decent job of text to speech for you. > > I am not a linguist and I know literally almost no German, so I am not > able to assess the translation quality of sentences. Perhaps someone on > this list who is a native speaker of German might comment please. > > I am thinking that the fact that I am not a linguist and that I am > implicitly seeking the precision of mathematics and seeking provenance of a > translation is perhaps the explanation of why I am thinking that > localizable sentences is the way forward. There seems to a fundamental > mismatch deep in human culture of the way that mathematics works precisely > yet that translation often conveys an impression of meaning that is not > congruently exact. Perhaps that is a factor in all of this. > > Thank you for your reply and for taking the time to look through the > simulations and for commenting. > > Having read what you have written and having thought about it for a while > I am wondering whether it would be a good idea for there to be a list of > numbered preset sentences that are an international standard and then if > Google chose to front end Google Translate with precise translations of > that list of sentences made by professional linguists who are native > speakers, then there could be a system that can produce a translation that > is precise for the sentences that are on the list and machine translated > for everything else. > > Maybe there could then just be two special Unicode characters, one to > indicate that the number of a preset sentence is to follow and one to > indicate that the number has finished. > > In that way, text and localizable sentences could still be intermixed in a > plain text message. For me, the concept of being able to mix text and > localizable sentences in a plain text message is important. Having two > special characters of international standard provenance for denoting a > localizable sentence markup bubble unambiguously in a plain text document > could provide an exact platform. If a software package that can handle > automated localization were active then it could replace the sequence with > the text of the sentence localized into the local language: otherwise the > open localizable sentence bubble symbol, some digits and the close > localizable sentence bubble symbol would be displayed. > > If that were the case then there might well not be symbols for the > sentences, yet the precise conveying of messages as envisaged in the > simulations would still be achievable. > > Perhaps that is the way forward for some aspects of communication through > the language barrier. > > Another possibility would be to have just a few localizable sentences with > symbols as individual characters and to have quite a lot of numbered > sentences using a localizable sentence markup bubble and then everything > else by machine translation. > > I shall try to think some more about this. > > > At any rate, if Margaret Gat
RE: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
Mr. Overington, I'm sorry to have to admit that I cannot follow at all your train of thought on what would be the practical value of localizable sentences in any of the forms that you are contemplating. In my mind, they would not appear to broaden the understanding between different cultures (and languages), quite the contrary. I appreciate the fact that there are several respectable members of this community who are far too polite to state bluntly what they think of the technical merits of your proposal. Sincerely, Erkki I. Kolehmainen -Alkuperäinen viesti- Lähettäjä: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] Puolesta William_J_G Overington Lähetetty: 20. huhtikuuta 2013 12:39 Vastaanottaja: KenWhistler Kopio: unicode@unicode.org; KenWhistler; wjgo_10...@btinternet.com Aihe: Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public) On Friday 19 April 2013, Whistler, Ken wrote: > You are aware of Google Translate, for example, right? Yes. I use it from time to time, mostly to translate into English: it is very helpful. > If you input sentences such as those in your scenarios or the other examples, > such as: > Where can I buy a vegetarian meal with no gluten-containing ingredients in it > please? > You can get immediately serviceable and understandable translations in dozens > of languages. For example: > Wo kann ich ein vegetarisches Essen ohne Gluten-haltigen Bestandteile davon, > bitte? > Not perfect, perhaps, but perfectly comprehensible. And the application will > even do a very decent job of text to speech for you. I am not a linguist and I know literally almost no German, so I am not able to assess the translation quality of sentences. Perhaps someone on this list who is a native speaker of German might comment please. I am thinking that the fact that I am not a linguist and that I am implicitly seeking the precision of mathematics and seeking provenance of a translation is perhaps the explanation of why I am thinking that localizable sentences is the way forward. There seems to a fundamental mismatch deep in human culture of the way that mathematics works precisely yet that translation often conveys an impression of meaning that is not congruently exact. Perhaps that is a factor in all of this. Thank you for your reply and for taking the time to look through the simulations and for commenting. Having read what you have written and having thought about it for a while I am wondering whether it would be a good idea for there to be a list of numbered preset sentences that are an international standard and then if Google chose to front end Google Translate with precise translations of that list of sentences made by professional linguists who are native speakers, then there could be a system that can produce a translation that is precise for the sentences that are on the list and machine translated for everything else. Maybe there could then just be two special Unicode characters, one to indicate that the number of a preset sentence is to follow and one to indicate that the number has finished. In that way, text and localizable sentences could still be intermixed in a plain text message. For me, the concept of being able to mix text and localizable sentences in a plain text message is important. Having two special characters of international standard provenance for denoting a localizable sentence markup bubble unambiguously in a plain text document could provide an exact platform. If a software package that can handle automated localization were active then it could replace the sequence with the text of the sentence localized into the local language: otherwise the open localizable sentence bubble symbol, some digits and the close localizable sentence bubble symbol would be displayed. If that were the case then there might well not be symbols for the sentences, yet the precise conveying of messages as envisaged in the simulations would still be achievable. Perhaps that is the way forward for some aspects of communication through the language barrier. Another possibility would be to have just a few localizable sentences with symbols as individual characters and to have quite a lot of numbered sentences using a localizable sentence markup bubble and then everything else by machine translation. I shall try to think some more about this. > At any rate, if Margaret Gattenford and her niece are still stuck at their > hotel and the snow is blocking the railway line, my suggestion would be that > Margaret whip out her mobile phone. And if she doesn't have one, perhaps her > niece will lend hers to Margaret. Well, they were still staying at the hotel were some time ago. They feature in locse027_simulation_five.pdf available from the following post. http://forum.high-logic.com/viewtopic.php?p=16378#p16378 The
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
On Friday 19 April 2013, Whistler, Ken wrote: > You are aware of Google Translate, for example, right? Yes. I use it from time to time, mostly to translate into English: it is very helpful. > If you input sentences such as those in your scenarios or the other examples, > such as: > Where can I buy a vegetarian meal with no gluten-containing ingredients in it > please? > You can get immediately serviceable and understandable translations in dozens > of languages. For example: > Wo kann ich ein vegetarisches Essen ohne Gluten-haltigen Bestandteile davon, > bitte? > Not perfect, perhaps, but perfectly comprehensible. And the application will > even do a very decent job of text to speech for you. I am not a linguist and I know literally almost no German, so I am not able to assess the translation quality of sentences. Perhaps someone on this list who is a native speaker of German might comment please. I am thinking that the fact that I am not a linguist and that I am implicitly seeking the precision of mathematics and seeking provenance of a translation is perhaps the explanation of why I am thinking that localizable sentences is the way forward. There seems to a fundamental mismatch deep in human culture of the way that mathematics works precisely yet that translation often conveys an impression of meaning that is not congruently exact. Perhaps that is a factor in all of this. Thank you for your reply and for taking the time to look through the simulations and for commenting. Having read what you have written and having thought about it for a while I am wondering whether it would be a good idea for there to be a list of numbered preset sentences that are an international standard and then if Google chose to front end Google Translate with precise translations of that list of sentences made by professional linguists who are native speakers, then there could be a system that can produce a translation that is precise for the sentences that are on the list and machine translated for everything else. Maybe there could then just be two special Unicode characters, one to indicate that the number of a preset sentence is to follow and one to indicate that the number has finished. In that way, text and localizable sentences could still be intermixed in a plain text message. For me, the concept of being able to mix text and localizable sentences in a plain text message is important. Having two special characters of international standard provenance for denoting a localizable sentence markup bubble unambiguously in a plain text document could provide an exact platform. If a software package that can handle automated localization were active then it could replace the sequence with the text of the sentence localized into the local language: otherwise the open localizable sentence bubble symbol, some digits and the close localizable sentence bubble symbol would be displayed. If that were the case then there might well not be symbols for the sentences, yet the precise conveying of messages as envisaged in the simulations would still be achievable. Perhaps that is the way forward for some aspects of communication through the language barrier. Another possibility would be to have just a few localizable sentences with symbols as individual characters and to have quite a lot of numbered sentences using a localizable sentence markup bubble and then everything else by machine translation. I shall try to think some more about this. > At any rate, if Margaret Gattenford and her niece are still stuck at their > hotel and the snow is blocking the railway line, my suggestion would be that > Margaret whip out her mobile phone. And if she doesn't have one, perhaps her > niece will lend hers to Margaret. Well, they were still staying at the hotel were some time ago. They feature in locse027_simulation_five.pdf available from the following post. http://forum.high-logic.com/viewtopic.php?p=16378#p16378 They also feature in the following document available from the forum post listed below it. a_simulation_about_an_idea_that_would_use_qr_codes.pdf http://forum.high-logic.com/viewtopic.php?p=16692#p16692 That idea is not about localizable sentences, yet I found that being able to use the continuing characters and the scenario from the previous simulations was helpful in the creative writing of that simulation. William Overington 20 April 2013
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
On 2013年4月19日, at 下午1:52, Stephan Stiller wrote: > But I'd argue that the distance of the information content of such > low-quality translations to the information content conveyed by correct and > polished language is often tolerable. Grammar isn't that important for > getting one's point across. As my daughter says, "Talking is for to be understood, so if the meaning conveyed, the point happened."
Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)
Not perfect, perhaps, but perfectly comprehensible. And the application will even do a very decent job of text to speech for you. and The quality of the translation for these kinds of applications has rapidly improved in recent years Not that the ability of MT to deal with long/discontinuous dependencies or morphology impresses me. And not that this is gonna significantly change without actual natural language understanding (read: major advances in AI) – this is not only my opinion. /But/ I'd argue that the distance of the information content of such low-quality translations to the information content conveyed by correct and polished language is often tolerable. Grammar isn't that important for getting one's point across. Images such as those shown on the linked webpage don't convey any subtlety. This is a different problem from the morphology or syntax being broken in a present-day, "state of the art" :-) MT rendition. But I don't see how such images would constitute an improvement, as far as information transmission is concerned. Stephan