Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-23 Thread Asmus Freytag

On 4/23/2013 3:00 AM, Philippe Verdy wrote:
Do you realize the operating cost of any international standard 
comittee or for the maintenance ans securization of an international 
registry ? Who will pay ?


Currently we all are paying by having interminable discussions of 
half-baked ideas foisted onto us. There's a word for this.


Time for this discussion to be dropped.

A./





Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-23 Thread Asmus Freytag

On 4/23/2013 2:01 AM, William_J_G Overington wrote:

On Monday 22 April 2013, Asmus Freytag  wrote:
  

I'm always suspicious if someone wants to discuss scope of the standard before 
demonstrating a compelling case on the merits of wide-spread actual use.
  
The reason that I want to discuss the scope is because there is uncertainty.


I'm not going to engage on a scope discussion with you, even on this 
lovely list, without some shred of evidence that there is "compelling need".


Cheers,

A./




Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-23 Thread Philippe Verdy
Do you realize the operating cost of any international standard comittee or
for the maintenance ans securization of an international registry ? Who
will pay ? You ? Unless there's a very productive and demonstrate need of
such a registry, using the existing domain name or URI schemes mechanism
will be enough.


2013/4/23 William_J_G Overington 

> On Tuesday 23 April 2013, Philippe Verdy  wrote:
>
> > There's also noather issue: your proposal now uses identifiers that will
> be resolved in a registry database you are the only one to control.
>
> Not at all. The registry would be controlled by an International Standards
> Organization committee.
>
> As you have raised the matter, here is a quote from a document that I
> submitted to the ISO/IEC 10646 committee in January 2012.
>
> quote
>
> My current thinking is that an ISO committee entity would choose sentences
> and symbols and then approach the ISO/IEC 10646 committee on an
> inter-committee liaison basis to ask for character code points to be
> assigned to the symbol and sentence pairs. For the avoidance of doubt I
> have, as at the time of preparing this document, made no application to ISO
> about such a committee entity carrying out such activities.
>
> My thinking is that that ISO committee entity could potentially be one of
> the following.
>
> 1. A new ISO committee, generated for the purpose.
>
> 2. The ISO/IEC 10646 committee, or a subcommittee of the ISO/IEC 10646
> committee.
>
> 3. An existing ISO committee, other than the ISO 10646 committee, or a
> subcommittee of that committee.
>
> end quote
>
> William Overington
>
> 23 April 2013
>
>


Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-23 Thread William_J_G Overington
On Tuesday 23 April 2013, Philippe Verdy  wrote:
 
> There's also noather issue: your proposal now uses identifiers that will be 
> resolved in a registry database you are the only one to control.
 
Not at all. The registry would be controlled by an International Standards 
Organization committee. 
 
As you have raised the matter, here is a quote from a document that I submitted 
to the ISO/IEC 10646 committee in January 2012.
 
quote
 
My current thinking is that an ISO committee entity would choose sentences and 
symbols and then approach the ISO/IEC 10646 committee on an inter-committee 
liaison basis to ask for character code points to be assigned to the symbol and 
sentence pairs. For the avoidance of doubt I have, as at the time of preparing 
this document, made no application to ISO about such a committee entity 
carrying out such activities.
 
My thinking is that that ISO committee entity could potentially be one of the 
following.
 
1. A new ISO committee, generated for the purpose.
 
2. The ISO/IEC 10646 committee, or a subcommittee of the ISO/IEC 10646 
committee.
 
3. An existing ISO committee, other than the ISO 10646 committee, or a 
subcommittee of that committee.
 
end quote
  
William Overington
 
23 April 2013





Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-23 Thread Martin J. Dürst



On 2013/04/23 18:01, William_J_G Overington wrote:

On Monday 22 April 2013, Asmus Freytag  wrote:


I'm always suspicious if someone wants to discuss scope of the standard before 
demonstrating a compelling case on the merits of wide-spread actual use.


The reason that I want to discuss the scope is because there is uncertainty. If 
people are going to spend a lot of time and effort in the research and 
development of a system whether the effort would all be wasted if the system, 
no matter how good and no matter how useful were to come to nothing because it 
would be said that encoding such a system in Unicode would be out of scope.


[I'm just hoping this discussion will go away soon.]

You can develop such a system without using the private use area. Just 
make little pictures out of your "characters", and everybody can include 
them in a Web page or an office document, print them, and so on. The 
fact that computers now handle text doesn't mean that text is the only 
thing computers can handle.


Once you have shown that your little pictures are widely used as if they 
were characters, then you have a good case for encoding. This is how 
many symbols got encoded; you can check all the documentation that is 
now public.



A ruling that such a system, if developed and shown to be useful, would be 
within scope for encoding in Unicode would allow people to research and develop 
the system with the knowledge that there will be a clear pathway of opportunity 
ahead if the research and development leads to good results.


As far as I know, the Unicode consortium doesn't rule on eventualities.


So, I feel that wanting to discuss the scope of Unicode so as to clear away 
uncertainty that may be blocking progress in research and development is a 
straightforward and reasonable thing to do.


The main blocking factor is the (limited) usefulness of your ideas. In 
case that's ever solved, the rest will be comparatively easy.


Regards,   Martin.



Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-23 Thread Philippe Verdy
There's also noather issue: your proposal now uses identifiers that will be
resolved in a registry database you are the only one to control. There are
other competing registries for storing images, logos, and so on.
Finally your registry does not exist for now, or nobody else than you uses
it. And why would Unicode delegate a part of the encoding process to you
only and only for your specific registry? How many chracters would Unicode
need to encode to use other registries?

There's already working standards for using registries in an open
competition : domain names, or URL fragments, or URI schemes for URNs. And
they don't require any addition of characters in Unicode for domain names
or URIs to be encoded in documents.



2013/4/23 William_J_G Overington 

> On Monday 22 April 2013, Asmus Freytag  wrote:
>
> > I'm always suspicious if someone wants to discuss scope of the standard
> before demonstrating a compelling case on the merits of wide-spread actual
> use.
>
> The reason that I want to discuss the scope is because there is
> uncertainty. If people are going to spend a lot of time and effort in the
> research and development of a system whether the effort would all be wasted
> if the system, no matter how good and no matter how useful were to come to
> nothing because it would be said that encoding such a system in Unicode
> would be out of scope.
>
> A ruling that such a system, if developed and shown to be useful, would be
> within scope for encoding in Unicode would allow people to research and
> develop the system with the knowledge that there will be a clear pathway of
> opportunity ahead if the research and development leads to good results.
>
> So, I feel that wanting to discuss the scope of Unicode so as to clear
> away uncertainty that may be blocking progress in research and development
> is a straightforward and reasonable thing to do.
>
> William Overington
>
> 23 April 2013
>
>
>
>
>
>
>
>
>
>
>
>
>


Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-23 Thread William_J_G Overington
On Tuesday 23 April 2013, Charlie Ruland ☘  wrote:
 
> Taken together the above sentences mean that he has to face the fact that 
> there is no “basis for further discussion of the topic.”
 
Well I knew and had just put up with the old situation and was researching on 
other topics.
 
I had deposited the documents and fonts with the British Library so that they 
would be available for researchers in the future.
 
Then the Unicode Consortium made its announcement.
 
http://unicode-inc.blogspot.co.uk/2013/04/utc-document-register-now-public.html
 
quote
 
This change has been made to increase public involvement in the ongoing 
deliberations of the UTC in its work developing and maintaining the Unicode 
Standard and other related standards and reports.
 
end quote
 
William Overington
 
23 April 2013
 





Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-23 Thread William_J_G Overington
On Monday 22 April 2013, Asmus Freytag  wrote:
 
> I'm always suspicious if someone wants to discuss scope of the standard 
> before demonstrating a compelling case on the merits of wide-spread actual 
> use.
 
The reason that I want to discuss the scope is because there is uncertainty. If 
people are going to spend a lot of time and effort in the research and 
development of a system whether the effort would all be wasted if the system, 
no matter how good and no matter how useful were to come to nothing because it 
would be said that encoding such a system in Unicode would be out of scope.
 
A ruling that such a system, if developed and shown to be useful, would be 
within scope for encoding in Unicode would allow people to research and develop 
the system with the knowledge that there will be a clear pathway of opportunity 
ahead if the research and development leads to good results.
 
So, I feel that wanting to discuss the scope of Unicode so as to clear away 
uncertainty that may be blocking progress in research and development is a 
straightforward and reasonable thing to do.
 
William Overington
 
23 April 2013

 




 







Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-23 Thread William_J_G Overington
On Monday 22 April 2013, Asmus Freytag  wrote:
 
> I'm afraid that any proposal submitted this way would just become the basis 
> for a rejection "with prejudice".
 
Well, the rules could be changed. I feel that the existing position is not 
suitable for the advances in ideas that are taking place with pure electronic 
publications and communications. It is not the same situation as coining a new 
word where the new word only becomes included in the Oxford English Dictionary 
once the new word has an amount of use by people other than the person who 
coined the word. Not the same because with a new word there is not an 
associated character code point. Achieving widespread use using a Private Use 
Area code point is not an easy matter for an individual.
 
> Independent of the lack of technical merit of the proposal, the utter lack of 
> support (or use) by any established community would make such a proposal a 
> non-starter.
 
Only because rules made long ago before many recent advances in technology have 
not been updated for modern times.
 
? Mr. Overington is quite aware of what would be the inevitable outcome of 
submitting an actual proposal, that's why he keeps raising this issue with some 
regularity here on the open list.
 
Well I am aware of the present rules.
 
The reason that I started the thread from which this thread was derived is 
solely because of an announcement by the Unicode Consortium.
 
http://unicode-inc.blogspot.co.uk/2013/04/utc-document-register-now-public.html
 
quote
 
This change has been made to increase public involvement in the ongoing 
deliberations of the UTC in its work developing and maintaining the Unicode 
Standard and other related standards and reports. 
 
end quote
 
Given this new openness by the Unicode Consortium I felt that it was worthwhile 
seeking to put forward my ideas for consideration by the committee.
 
The http://www.unicode.org/timesens/calendar.html web page at present shows 
that the next meeting of the Unicode Technical Committee is due to start on 6 
May 2013.
 
The http://www.unicode.org/pending/docsubmit.html web page includes the 
following.
 
quote
 
Once a document is received and accepted for posting to the registry, we will 
assign a document number to it and tell you the number for future reference. We 
usually update the document registry when several new documents have 
accumulated in our queue, so your document may not be posted immediately after 
acceptance.
 
end quote
 
I feel that it would be helpful if there were a change of policy and each time 
a document is accepted for addition to the document registry that it becomes 
added to the document registry immediately rather than waiting in a queue. I do 
not know if there are or are not any documents in a queue at present. The 
Unicode Consortium has declared that it wishes to increase public involvement, 
so why the queue system?
 
William Overington
 
23 April 2013









RE: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-23 Thread Erkki I Kolehmainen
Only a formal proposal can be properly discussed and subsequently rejected at 
both UTC and SC2/WG2. At this stage there is only a lot of hot air and waste of 
time and effort.

 

Sincerely, Erkki

 

Lähettäjä: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] 
Puolesta Charlie Ruland ?
Lähetetty: 23. huhtikuuta 2013 9:24
Vastaanottaja: unicode@unicode.org
Aihe: Re: Encoding localizable sentences (was: RE: UTC Document Register Now 
Public)

 

* Asmus Freytag [2013/4/22]:



On 4/22/2013 4:27 AM, Charlie Ruland ☘ wrote:

[...]

Please submit a formal proposal that can serve as a basis for further 
discussion of the topic.

[...]

Mr. Overington is quite aware of what would be the inevitable outcome of 
submitting an actual proposal, that's why he keeps raising this issue with some 
regularity here on the open  list.

Taken together the above sentences mean that he has to face the fact that there 
is no “basis for further discussion of the topic.”

Charlie Ruland ☘




A./



Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-22 Thread Charlie Ruland ☘

* Asmus Freytag [2013/4/22]:

On 4/22/2013 4:27 AM, Charlie Ruland ☘ wrote:

[...]

Please submit a formal proposal that can serve as a basis for further 
discussion of the topic.

[...]

Mr. Overington is quite aware of what would be the inevitable outcome 
of submitting an actual proposal, that's why he keeps raising this 
issue with some regularity here on the open  list.
Taken together the above sentences mean that he has to face the fact 
that there is no “basis for further discussion of the topic.”


Charlie Ruland ☘


A./


Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-22 Thread William_J_G Overington
On Monday 22 April 2013 I wrote:
 
> This will need first of all a new version of the font so as to have symbols 
> for the localizable sentence markup bubble brackets and ten localizable 
> digits for use solely within localizable sentence markup bubbles.
 
After sending that post I made the new version of the font.
 
It is available from a post in the High-Logic forum.
 
http://forum.high-logic.com/viewtopic.php?p=18680#p18680
 
The ten localizable digits for use solely within localizable sentence markup 
bubbles are encoded from U+ED80 through to U+ED89 with Alt codes from Alt 60800 
through to Alt 60809.
 

 
The localizable sentence markup bubble brackets are encoded at U+ED90 and 
U+ED91 with Alt codes of Alt 60816 and Alt 60817.
 

 
I have made the designs for the two localizable sentence markup bubble brackets 
deliberately not horizontal mirror images of each other in case that might 
cause problems if intermixing them within right to left scripts. I do not know 
enough of right to left scripts to know if there would be a problem, so I 
thought that I would seek to design the glyphs so as to avoid any problems that 
might arise.
 
William Overington
   
23 April 2013






Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-22 Thread Asmus Freytag

On 4/22/2013 12:35 PM, Stephan Stiller wrote:

[Charlie Ruland:]
The Unicode Consortium is prepared to encode all characters that can 
be shown to be in actual use.
Are you sure there is a precedent for what is essentially markup for a 
system of (alpha)numerical IDs?



You don't even have to look that far. These inventions utterly fail the 
"actual use" test, in the sense that I explained in my other message.


I'm always suspicious if someone wants to discuss scope of the standard 
before demonstrating a compelling case on the merits of wide-spread 
actual use.


A./




Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-22 Thread Asmus Freytag

On 4/22/2013 4:27 AM, Charlie Ruland ☘ wrote:

* William_J_G Overington [2013/4/22]:

[...]

If the scope of Unicode becomes widened in this way, this will provide a basis 
upon which those people who so choose may research and develop localizable 
sentence technology with the knowledge that such research and development 
could, if successful, lead to encoding in plane 13 of the Unicode system.
I don’t think your problem is “the scope of Unicode” but the size of 
the community that uses “localizable sentences.” The Unicode 
Consortium is prepared to encode all characters that can be shown to 
be in actual use.


Please submit a formal proposal that can serve as a basis for further 
discussion of the topic.


I'm afraid that any proposal submitted this way would just become the 
basis for a rejection "with prejudice". Independent of the lack of 
technical merit of the proposal, the utter lack of support (or use) by 
any established community would make such a proposal a non-starter.


In other words "can be shown to be in actual use" is an important hurdle 
that this scheme, however dear to its inventor, cannot seem to pass.


The actual bar would actually be a bit higher than you state it. The use 
has to be of a kind that benefits from standardization. Usually, that 
means that the use is wide-spread, or failing that, that the 
character(s) in question are essential elements of a script or notation 
that, while themselves perhaps rare, complete a repertoire that has 
sufficient established use.


Characters invented for "possible" use (as in "could become successful") 
simply don't pass that hurdle, even if for example, the inventor were to 
publish documents using these characters. There are honest attempts, for 
example, to add new symbols to mathematical notation, which have to wait 
until there's evidence that they have become accepted by the community 
before they can be considered for encoding.


Mr. Overington is quite aware of what would be the inevitable outcome of 
submitting an actual proposal, that's why he keeps raising this issue 
with some regularity here on the open  list.


A./



Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-22 Thread Stephan Stiller

[Charlie Ruland:]
The Unicode Consortium is prepared to encode all characters that can 
be shown to be in actual use.
Are you sure there is a precedent for what is essentially markup for a 
system of (alpha)numerical IDs?


Stephan




Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-22 Thread Charlie Ruland ☘

* William_J_G Overington [2013/4/22]:

[...]

If the scope of Unicode becomes widened in this way, this will provide a basis 
upon which those people who so choose may research and develop localizable 
sentence technology with the knowledge that such research and development 
could, if successful, lead to encoding in plane 13 of the Unicode system.
I don’t think your problem is “the scope of Unicode” but the size of the 
community that uses “localizable sentences.” The Unicode Consortium is 
prepared to encode all characters that can be shown to be in actual use.


Please submit a formal proposal that can serve as a basis for further 
discussion of the topic.


Charlie Ruland ☘

William Overington
  
22 April 2013


Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-22 Thread William_J_G Overington
On Saturday 20 April 2013, Erkki I Kolehmainen  wrote:
 
> I'm sorry to have to admit that I cannot follow at all your train of thought 
> on what would be the practical value of localizable sentences in any of the 
> forms that you are contemplating. In my mind, they would not appear to 
> broaden the understanding between different cultures (and languages), quite 
> the contrary.
  
Well, most of the localizable sentences are not intended to broaden the 
understanding between different cultures (and languages). Broadening the 
understanding between different cultures (and languages) is a good thing, at an 
appropriate time. Localizable sentences are intended to assist communication 
through the language barrier for particular circumstances, which is a different 
situation.
 
For example, seeking information about relatives and friends after a disaster 
in a country whose language one does not know.
 
I have produced some simulations.
 
Please consider the simulations in the locse027_four_simulations.pdf document 
that is available from the following forum post.
 
http://forum.high-logic.com/viewtopic.php?p=16264#p16264
 
Consider please a derivative work of simulation 2. Simulation 2 is in pages 8 
through to 17 of the pdf document.
 
Let us suppose that, in this derivative version of simulation 2, that the 
Information Management Centre is located in Finland and that the native 
language of Sonja is Finnish.
 
 enter simulation
 
Sonja has, at various times, three different messages displayed upon the screen 
of the computer that she is using.
 
There is the message from Albert Johnson.
 
There is Sonja's first reply to Albert Johnson.
 
There is Sonja's second reply to Albert Johnson.
 
The messages are displayed in Finnish on the screen of the computer that Sonja 
is using.
 
 leave simulation
 
Now, if the three messages that are written in English in the text of the 
simulations as I wrote them were each translated into Finnish then the text of 
the derivative simulation could include those three messages in Finnish as well 
as in English. That would provide a good simulation of how the messages would 
be displayed on the computer screen that Sonja is using and on the computer 
screen that Albert Johnson is using.
 
I am hoping to prepare Simulation 6 to show a simulation where the localizable 
sentences could be encoded within a plain text message using localizable 
sentence markup bubbles and Simulation 7 where there is a mixture of the two 
encoding methods. This will need first of all a new version of the font so as 
to have symbols for the localizable sentence markup bubble brackets and ten 
localizable digits for use solely within localizable sentence markup bubbles.
 
I am then hoping to prepare a document to send to the Unicode Technical 
Committee making reference to the simulations.
 
The purpose of the document that I am hoping to prepare for the Unicode 
Technical Committee is to ask for consideration of whether the scope of Unicode 
should be widened so as to allow for localizable items to become encoded in 
plane 13 at some future time.
 
Those localizable items, at present, would be two localizable sentence markup 
bubble brackets, ten localizable digits for use solely within localizable 
sentence markup bubbles, a number of localizable sentences and a number of 
localizable stand-alone phrases.
 
Each localizable item encoded within plane 13 would have an associated symbol 
for display in situations where automated localization were either not 
available or were not switched on.
 
If the scope of Unicode becomes widened in this way, this will provide a basis 
upon which those people who so choose may research and develop localizable 
sentence technology with the knowledge that such research and development 
could, if successful, lead to encoding in plane 13 of the Unicode system. 
  
William Overington
 
22 April 2013










Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-21 Thread Tom Gewecke

On Apr 21, 2013, at 11:01 AM, Christopher Fynn wrote:

>  In India you could have telegrams
> containing such sentences delivered in any of the major Indian
> regional languages.

There is apparently a version of this still in use, seen in the List of 
Standard Phrases for Greeting Telegrams at the bottom of this page:

http://www.pondyonline.com/User/static/TelegramService.aspx

But it's not clear whether language translation is provided.

Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-21 Thread Stephan Stiller



In India you could have telegrams
containing such sentences delivered in any of the major Indian
regional languages.

This was a good idea in the days of the low-bandwidth telegraph

And it was a domain-restricted application.

Stephan




Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-21 Thread Christopher Fynn
William

Your  "localizable sentences" idea reminds me of telegraph companies
that used to have a number of common sentences that could be
transmitted in morse code by number. In India you could have telegrams
containing such sentences delivered in any of the major Indian
regional languages.

This was a good idea in the days of the low-bandwidth telegraph - but,
as Ken suggested, with modern technology there are now far more
sophisticated ways of accomplishing the same sort of thing.

regards

- Chris



Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-21 Thread Philippe Verdy
Some better proaches have been used with practical applications, on TRUE
languages supported by ACTIVE communities : it is sign-writing which
represent sign languages which are FAR richer than what is proposed. They
have a true grammar, a true syntax, they are versatile, with good links to
other oral languages. And they solve practical problems.

Other approches includes the proliferation of *conventional* pictograms to
represent only basic meanings. But what ius important is that they are used
under a convention that is widely recognized, and supported by active
standards. This includes trafic signs on roads, rivers, railways, or
pictograms frequently seen on maps or on directing banners in closed spaces
(e.g. toilets, phone, stairs...). This uncludes also conventonal pictograms
for representing a set of dangers or health safety, or environmental issues
(recycling...). Or those used in meteoroly. Or the set of logos (logograms)
used by organizations as trademarks. But they do not encode sentences, but
essential items in their own specific domain of application ; they are
essentially static in nature, not dynamic like actual humane languages and
cannot be used to define other concepts than what they represent isolately.

You can't really "speak" with pictograms and logograms. But to develop it
to represent true languages, you'll need centuries if not milleniums to
represent concepts and articulate them, and to include also some honograms.
This results in ideograms, and notably the very rich (and still uncounted)
set of sinograms used to write Chinese and partly Japanese and Korean.

But in fact this system becomes so complex that it naturelly evolved to
keep only the phonograms and you get the various alphabets of the world.
The development of orthography comes later, when this written form of the
language wants to "normalize" exchanges in a population using various
spoken dialects, and when phonograms alone become ambiguous. For Chinese
the system has evolved by compbining ideograms and phonograms to solve the
ambiguities that phonograms alone can't solve without an orthography, and
that ideograms alone can't solve with a rich enough set of ideograms only.

Sign writing belongs to the categoty of alphabets. Its "phonograms"
represent gestures, and they are combined to create semantics according to
the orthgraphy and syntax of the sign languages they are used for. Even if
some gestures used in sign languages may be perceived as ideograms, their
use is in fact not significant alone outside of the grammatical context
where these signs are used.


Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-20 Thread Stephan Stiller



I am wondering whether it would be a good idea for there to be a list of 
numbered preset sentences that are an international standard and then if Google 
chose to front end Google Translate with precise translations of that list of 
sentences made by professional linguists who are native speakers, then there 
could be a system that can produce a translation that is precise for the 
sentences that are on the list and machine translated for everything else.
Phrase-based machine translation goes much further: it already lets you 
pair up far more sentences than would fit into a standard with a limited 
code inventory such as Unicode, and it lets you pair up phrases. The 
fact that translations are not precise is a problem that has to do with 
context and with natural language per se.



Maybe there could then just be two special Unicode characters, one to indicate 
that the number of a preset sentence is to follow and one to indicate that the 
number has finished.

That would belong into a higher-level protocol, not Unicode.


If that were the case then there might well not be symbols for the sentences, 
yet the precise conveying of messages as envisaged in the simulations would 
still be achievable.
The sentences will be as precise as the scope of the sentence inventory 
allows. Enumerating sentences or phrasal fragments (I'm hesitant to talk 
of "phrases", which for me have constituent nature, but maybe that's 
just me) is unrealistic unless you are trying to cover only a /very/ 
limited domain. If all you encode is (say) requests for meals with the 
100 most frequently wanted combinations of nutritional restrictions, 
your sentence inventory will encode those requests precisely, but as 
soon as you're trying to make adjustments to your formulaic requests 
(you're willing to eat /any/ vegetarian, gluten-free meal each time of 
the day and day of the year? of /any/ size?), the sentences won't be of 
use anymore. This is really why an approach that enumerates large text 
chunks is unworkable. (I won't say "useless", but of limited use; 
"point-at-me" picture books and imprecise translations are likely to do 
a tolerable job already.) The number of sentences you'll need will be 
exponential in the number of ingredient options you are intending to 
vary over. In any case, we are all left guessing about the intended 
coverage of any set of sentences you have mind. From your previous 
writings I'm guessing (as implied earlier) that you mean something like 
"travel and emergency communication", but that is already a large 
domain. If you try to delimit the coverage and come up with a finite 
list of sentences, you will see that you'll end up with far too many. 
You'd also need to think about how to make these sentences accessible 
(via number/ID? that would be difficult or require training for the user 
if the number of sentences isn't very small). What if you only want the 
inventory of a travel phrasebook? For that, you have the travel 
phrasebook (hierarchically organized, not by number), and I have heard 
of limited-domain computers/apps for crisis situations (the details 
elude me at the moment).



Perhaps that is the way forward for some aspects of communication through the 
language barrier.
You would need to specify which problems precisely you are attempting to 
solve, what is wrong with the approaches presently available, and 
why/how your approach does a better job.


Stephan



Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-20 Thread Curtis Clark

On 2013-04-20 2:38 AM, William_J_G Overington wrote:

I am thinking that the fact that I am not a linguist and that I am implicitly 
seeking the precision of mathematics and seeking provenance of a translation is 
perhaps the explanation of why I am thinking that localizable sentences is the 
way forward. There seems to a fundamental mismatch deep in human culture of the 
way that mathematics works precisely yet that translation often conveys an 
impression of meaning that is not congruently exact. Perhaps that is a factor 
in all of this.


Natural language lacks the logic and precision of mathematics, and is 
only unpredictably unambiguous. That's why lojban was invented.


https://en.wikipedia.org/wiki/Lojban

--
Curtis Clarkhttp://www.csupomona.edu/~jcclark
Biological Sciences   +1 909 869 4140
Cal Poly Pomona, Pomona CA 91768




RE: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-20 Thread Mark Davis ☕
LOL...

{phone}
On Apr 20, 2013 8:44 PM, "Erkki I Kolehmainen"  wrote:

> Mr. Overington,
>
> I'm sorry to have to admit that I cannot follow at all your train of
> thought on what would be the practical value of localizable sentences in
> any of the forms that you are contemplating. In my mind, they would not
> appear to broaden the understanding between different cultures (and
> languages), quite the contrary. I appreciate the fact that there are
> several respectable members of this community who are far too polite to
> state bluntly what they think of the technical merits of your proposal.
>
> Sincerely, Erkki I. Kolehmainen
>
> -Alkuperäinen viesti-
> Lähettäjä: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org]
> Puolesta William_J_G Overington
> Lähetetty: 20. huhtikuuta 2013 12:39
> Vastaanottaja: KenWhistler
> Kopio: unicode@unicode.org; KenWhistler; wjgo_10...@btinternet.com
> Aihe: Re: Encoding localizable sentences (was: RE: UTC Document Register
> Now Public)
>
> On Friday 19 April 2013, Whistler, Ken  wrote:
>
> > You are aware of Google Translate, for example, right?
>
> Yes. I use it from time to time, mostly to translate into English: it is
> very helpful.
>
> > If you input sentences such as those in your scenarios or the other
> examples, such as:
>
> > Where can I buy a vegetarian meal with no gluten-containing ingredients
> in it please?
>
> > You can get immediately serviceable and understandable translations in
> dozens of languages. For example:
>
> > Wo kann ich ein vegetarisches Essen ohne Gluten-haltigen Bestandteile
> davon, bitte?
>
> > Not perfect, perhaps, but perfectly comprehensible. And the application
> will even do a very decent job of text to speech for you.
>
> I am not a linguist and I know literally almost no German, so I am not
> able to assess the translation quality of sentences. Perhaps someone on
> this list who is a native speaker of German might comment please.
>
> I am thinking that the fact that I am not a linguist and that I am
> implicitly seeking the precision of mathematics and seeking provenance of a
> translation is perhaps the explanation of why I am thinking that
> localizable sentences is the way forward. There seems to a fundamental
> mismatch deep in human culture of the way that mathematics works precisely
> yet that translation often conveys an impression of meaning that is not
> congruently exact. Perhaps that is a factor in all of this.
>
> Thank you for your reply and for taking the time to look through the
> simulations and for commenting.
>
> Having read what you have written and having thought about it for a while
> I am wondering whether it would be a good idea for there to be a list of
> numbered preset sentences that are an international standard and then if
> Google chose to front end Google Translate with precise translations of
> that list of sentences made by professional linguists who are native
> speakers, then there could be a system that can produce a translation that
> is precise for the sentences that are on the list and machine translated
> for everything else.
>
> Maybe there could then just be two special Unicode characters, one to
> indicate that the number of a preset sentence is to follow and one to
> indicate that the number has finished.
>
> In that way, text and localizable sentences could still be intermixed in a
> plain text message. For me, the concept of being able to mix text and
> localizable sentences in a plain text message is important. Having two
> special characters of international standard provenance for denoting a
> localizable sentence markup bubble unambiguously in a plain text document
> could provide an exact platform. If a software package that can handle
> automated localization were active then it could replace the sequence with
> the text of the sentence localized into the local language: otherwise the
> open localizable sentence bubble symbol, some digits and the close
> localizable sentence bubble symbol would be displayed.
>
> If that were the case then there might well not be symbols for the
> sentences, yet the precise conveying of messages as envisaged in the
> simulations would still be achievable.
>
> Perhaps that is the way forward for some aspects of communication through
> the language barrier.
>
> Another possibility would be to have just a few localizable sentences with
> symbols as individual characters and to have quite a lot of numbered
> sentences using a localizable sentence markup bubble and then everything
> else by machine translation.
>
> I shall try to think some more about this.
>
> > At any rate, if Margaret Gat

RE: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-20 Thread Erkki I Kolehmainen
Mr. Overington,

I'm sorry to have to admit that I cannot follow at all your train of thought on 
what would be the practical value of localizable sentences in any of the forms 
that you are contemplating. In my mind, they would not appear to broaden the 
understanding between different cultures (and languages), quite the contrary. I 
appreciate the fact that there are several respectable members of this 
community who are far too polite to state bluntly what they think of the 
technical merits of your proposal.

Sincerely, Erkki I. Kolehmainen   

-Alkuperäinen viesti-
Lähettäjä: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] 
Puolesta William_J_G Overington
Lähetetty: 20. huhtikuuta 2013 12:39
Vastaanottaja: KenWhistler
Kopio: unicode@unicode.org; KenWhistler; wjgo_10...@btinternet.com
Aihe: Re: Encoding localizable sentences (was: RE: UTC Document Register Now 
Public)

On Friday 19 April 2013, Whistler, Ken  wrote:
 
> You are aware of Google Translate, for example, right?
 
Yes. I use it from time to time, mostly to translate into English: it is very 
helpful.
 
> If you input sentences such as those in your scenarios or the other examples, 
> such as:
 
> Where can I buy a vegetarian meal with no gluten-containing ingredients in it 
> please?
 
> You can get immediately serviceable and understandable translations in dozens 
> of languages. For example:
 
> Wo kann ich ein vegetarisches Essen ohne Gluten-haltigen Bestandteile davon, 
> bitte?
 
> Not perfect, perhaps, but perfectly comprehensible. And the application will 
> even do a very decent job of text to speech for you.
 
I am not a linguist and I know literally almost no German, so I am not able to 
assess the translation quality of sentences. Perhaps someone on this list who 
is a native speaker of German might comment please.
 
I am thinking that the fact that I am not a linguist and that I am implicitly 
seeking the precision of mathematics and seeking provenance of a translation is 
perhaps the explanation of why I am thinking that localizable sentences is the 
way forward. There seems to a fundamental mismatch deep in human culture of the 
way that mathematics works precisely yet that translation often conveys an 
impression of meaning that is not congruently exact. Perhaps that is a factor 
in all of this.
 
Thank you for your reply and for taking the time to look through the 
simulations and for commenting. 
 
Having read what you have written and having thought about it for a while I am 
wondering whether it would be a good idea for there to be a list of numbered 
preset sentences that are an international standard and then if Google chose to 
front end Google Translate with precise translations of that list of sentences 
made by professional linguists who are native speakers, then there could be a 
system that can produce a translation that is precise for the sentences that 
are on the list and machine translated for everything else.
 
Maybe there could then just be two special Unicode characters, one to indicate 
that the number of a preset sentence is to follow and one to indicate that the 
number has finished.
 
In that way, text and localizable sentences could still be intermixed in a 
plain text message. For me, the concept of being able to mix text and 
localizable sentences in a plain text message is important. Having two special 
characters of international standard provenance for denoting a localizable 
sentence markup bubble unambiguously in a plain text document could provide an 
exact platform. If a software package that can handle automated localization 
were active then it could replace the sequence with the text of the sentence 
localized into the local language: otherwise the open localizable sentence 
bubble symbol, some digits and the close localizable sentence bubble symbol 
would be displayed.
 
If that were the case then there might well not be symbols for the sentences, 
yet the precise conveying of messages as envisaged in the simulations would 
still be achievable.
 
Perhaps that is the way forward for some aspects of communication through the 
language barrier.
 
Another possibility would be to have just a few localizable sentences with 
symbols as individual characters and to have quite a lot of numbered sentences 
using a localizable sentence markup bubble and then everything else by machine 
translation.
 
I shall try to think some more about this.
 
> At any rate, if Margaret Gattenford and her niece are still stuck at their 
> hotel and the snow is blocking the railway line, my suggestion would be that 
> Margaret whip out her mobile phone. And if she doesn't have one, perhaps her 
> niece will lend hers to Margaret.
 
Well, they were still staying at the hotel were some time ago.
 
They feature in locse027_simulation_five.pdf available from the following post.
 
http://forum.high-logic.com/viewtopic.php?p=16378#p16378
 
The

Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-20 Thread William_J_G Overington
On Friday 19 April 2013, Whistler, Ken  wrote:
 
> You are aware of Google Translate, for example, right?
 
Yes. I use it from time to time, mostly to translate into English: it is very 
helpful.
 
> If you input sentences such as those in your scenarios or the other examples, 
> such as:
 
> Where can I buy a vegetarian meal with no gluten-containing ingredients in it 
> please?
 
> You can get immediately serviceable and understandable translations in dozens 
> of languages. For example:
 
> Wo kann ich ein vegetarisches Essen ohne Gluten-haltigen Bestandteile davon, 
> bitte?
 
> Not perfect, perhaps, but perfectly comprehensible. And the application will 
> even do a very decent job of text to speech for you.
 
I am not a linguist and I know literally almost no German, so I am not able to 
assess the translation quality of sentences. Perhaps someone on this list who 
is a native speaker of German might comment please.
 
I am thinking that the fact that I am not a linguist and that I am implicitly 
seeking the precision of mathematics and seeking provenance of a translation is 
perhaps the explanation of why I am thinking that localizable sentences is the 
way forward. There seems to a fundamental mismatch deep in human culture of the 
way that mathematics works precisely yet that translation often conveys an 
impression of meaning that is not congruently exact. Perhaps that is a factor 
in all of this.
 
Thank you for your reply and for taking the time to look through the 
simulations and for commenting. 
 
Having read what you have written and having thought about it for a while I am 
wondering whether it would be a good idea for there to be a list of numbered 
preset sentences that are an international standard and then if Google chose to 
front end Google Translate with precise translations of that list of sentences 
made by professional linguists who are native speakers, then there could be a 
system that can produce a translation that is precise for the sentences that 
are on the list and machine translated for everything else.
 
Maybe there could then just be two special Unicode characters, one to indicate 
that the number of a preset sentence is to follow and one to indicate that the 
number has finished.
 
In that way, text and localizable sentences could still be intermixed in a 
plain text message. For me, the concept of being able to mix text and 
localizable sentences in a plain text message is important. Having two special 
characters of international standard provenance for denoting a localizable 
sentence markup bubble unambiguously in a plain text document could provide an 
exact platform. If a software package that can handle automated localization 
were active then it could replace the sequence with the text of the sentence 
localized into the local language: otherwise the open localizable sentence 
bubble symbol, some digits and the close localizable sentence bubble symbol 
would be displayed.
 
If that were the case then there might well not be symbols for the sentences, 
yet the precise conveying of messages as envisaged in the simulations would 
still be achievable.
 
Perhaps that is the way forward for some aspects of communication through the 
language barrier.
 
Another possibility would be to have just a few localizable sentences with 
symbols as individual characters and to have quite a lot of numbered sentences 
using a localizable sentence markup bubble and then everything else by machine 
translation.
 
I shall try to think some more about this.
 
> At any rate, if Margaret Gattenford and her niece are still stuck at their 
> hotel and the snow is blocking the railway line, my suggestion would be that 
> Margaret whip out her mobile phone. And if she doesn't have one, perhaps her 
> niece will lend hers to Margaret.
 
Well, they were still staying at the hotel were some time ago.
 
They feature in locse027_simulation_five.pdf available from the following post.
 
http://forum.high-logic.com/viewtopic.php?p=16378#p16378
 
They also feature in the following document available from the forum post 
listed below it.
 
a_simulation_about_an_idea_that_would_use_qr_codes.pdf
 
http://forum.high-logic.com/viewtopic.php?p=16692#p16692
 
That idea is not about localizable sentences, yet I found that being able to 
use the continuing characters and the scenario from the previous simulations 
was helpful in the creative writing of that simulation.
 
William Overington
 
20 April 2013






Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-19 Thread John H. Jenkins

On 2013年4月19日, at 下午1:52, Stephan Stiller  wrote:

> But I'd argue that the distance of the information content of such 
> low-quality translations to the information content conveyed by correct and 
> polished language is often tolerable. Grammar isn't that important for 
> getting one's point across.

As my daughter says, "Talking is for to be understood, so if the meaning 
conveyed, the point happened."



Re: Encoding localizable sentences (was: RE: UTC Document Register Now Public)

2013-04-19 Thread Stephan Stiller



Not perfect, perhaps, but perfectly comprehensible. And the application will 
even
do a very decent job of text to speech for you.

and

The quality of the
translation for these kinds of applications has rapidly improved in recent years


Not that the ability of MT to deal with long/discontinuous dependencies 
or morphology impresses me. And not that this is gonna significantly 
change without actual natural language understanding (read: major 
advances in AI) – this is not only my opinion.


/But/ I'd argue that the distance of the information content of such 
low-quality translations to the information content conveyed by correct 
and polished language is often tolerable. Grammar isn't that important 
for getting one's point across.


Images such as those shown on the linked webpage don't convey any 
subtlety. This is a different problem from the morphology or syntax 
being broken in a present-day, "state of the art" :-) MT rendition. But 
I don't see how such images would constitute an improvement, as far as 
information transmission is concerned.


Stephan