RE: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-12 Thread Sławomir Osipiuk via Unicode
On Wed, Feb 12, 2020 at 11:28 AM wjgo_10...@btinternet.com via Unicode 
 wrote:
>
> I am reminded of the teletext system (with brand names such as Ceefax and 
> Oracle) in the United KIngdom, which was a broadcasting technology introduced 
> in the 1970s and which became very much a part of British culture during the 
> 1980s and 1990s. A digital signal of a special purpose 7-bit character set 
> was broadcast in the vertical blanking interval of a 625 line analogue 
> television signal.
[...]
> It seems to me that there could be, in the future, a type of thing that sends 
> out a continuous signal over a wire of, say, a temperature reading at its 
> location, all formatted in several languages. So, no passwords, no input from 
> an end user, just a continuous feeding into The Internet of Things its 
> output, with the numerical value in the messages changed as the temperature 
> changes. This would allow the digits to be expressed in the digits used in 
> the particular script of the particular language used in an individual 
> message.

Teletext had a data rate of 7 kilobits/s (less than 1 kilobyte/s), was cleverly 
grafted onto a system never designed for it, and the terminals to display it 
couldn't handle modern markup. Language tags, or something very like them, 
would make sense for very low-rate transmissions like Teletext (or the similar 
Line 21 closed captions in NTSC). It's too late for them, though.

The proposal is for "Internet of Things". In 2020, 1kpbs transmissions are 
laughably slow, unless you're talking to the Voyager space probes. Receiving 
equipment, even at the lowest end, has more than enough processing power to 
interpret a proper markup language. If for some reason you really do want to 
minimize data rate, you're better off with data compression rather than saving 
bytes by using Unicode language tags instead of XML. The receiving equipment 
can handle a decompression step at basically no cost (that wasn't true in the 
1970s), and markup languages compress very well.

The particular circumstances that would encourage unicode tag characters don't 
exist today: Razor-thin data rate and miniscule receiver processing power. With 
the resources we have now, anything done by tag characters can be done BETTER 
with proper encapsulating protocols and markup.

With all that said, there is no Unicode Police that will come banging on your 
door if you make a system that uses the tag characters. If you, or anyone, 
thinks it's the best solution for a particular project, then do it. Deprecation 
just means, "There are better ways of doing this. Seriously, please look 
around." And I think that message is still valid.

(This reply may read overly critical, but I'm very much enjoying this 
discussion.)

Sławomir Osipiuk





RE: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-12 Thread wjgo_10...@btinternet.com via Unicode


Hi

At the time, I thought that my post yesterday concluded the thread. 
However, later something occurred to me as a result of something in the 
post by Sławomir Osipiuk.


The gentleman wrote as follows:

Sending multiples of the same message in different languages is really 
only applicable to broadcast/multicast scenarios, where you have a 
transmission going out live to multiple recipients who have different 
language demands. I can't immediately think of any examples where this 
is done with plain-text only, though I'd be glad to learn about them, 
if they exist.
Whilst I do not know of anything of where this is presently done, I 
realized that this would be a practical proposition for some of the 
things in the Internet of things.
I am reminded of the teletext system (with brand names such as Ceefax 
and Oracle) in the United KIngdom, which was a broadcasting technology 
introduced in the 1970s and which became very much a part of British 
culture during the 1980s and 1990s. A digital signal of a special 
purpose 7-bit character set was broadcast in the vertical blanking 
interval of a 625 line analogue television signal. Basically in some 
lines normally used for the colour picture but some lines were not used 
during the time allowed for the scan go back to the top of the picture 
once it reached the lower edge of the picture. So this digital 
information service got a free ride in the picture signal going out to 
receivers all over the country. The information was organised into pages 
and an end user could go to "text" and then wait for a selected page to 
come round again in the continuous cyclic broadcasting of pages. Pages 
could be arranged by the broadcaster so that, say, the news headlines 
page came around maybe four times in each, say, 20 second cycle and some 
pages only once. It was very effective as the special purpose 7-bit 
character set, while being basically ASCII, had control characters that 
were stateful and displayed each as a space yet some of them switched 
the colour of the following text until a new control character for a 
colour were received, if it indeed one were received; or until the end 
of the 40 character line of the display. Each line started  with white 
text, though if the first character of the line switched to a colour, 
the end user would not see any white text. The control codes set also 
included switching to chunky graphics mode. There was also a facility to 
use the system for subtitles to the television programme, optional 
subtitles so that end users could have them on if desired yet other 
users were not thereby forced to have subtitles. It was good, as various 
participants in a discussion - whether news or drama - could each have a 
colour for their speaking, such as green, yellow, cyan, white. No return 
link was needed to send information from the end user to the central 
broadcasting computer.
A system with the same format of display was a viewdata system (brand 
name Prestel) but that was very different from teletext and used a 
two-way telephone line connection. In a viewdata system, the end user 
selected a page from a menu then a message requesting that page was sent 
to the central computer and just that page was sent to the end user. A 
fee for a page was often charged and the system never really took off. 
Teletext thrived because economy of scale brought the cost of 
teletext-capable electronics down and it was installed using a set of 
for-the-purpose integrated circuits during manufacture of most colour 
television sets in that era, and once installed then it was a free 
add-on with no ongoing cost apart from the ordinary television licence.
It seems to me that there could be, in the future, a type of thing that 
sends out a continuous signal over a wire of, say, a temperature reading 
at its location, all formatted in several languages. So, no passwords, 
no input from an end user, just a continuous feeding into The Internet 
of Things its output, with the numerical value in the messages changed 
as the temperature changes. This would allow the digits to be expressed 
in the digits used in the particular script of the particular language 
used in an individual  message.

William Overington
Wednesday 12 February 2020




Re: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-11 Thread wjgo_10...@btinternet.com via Unicode

Hi

Thank you to everybody who replied to this thread, both online and 
offline.


Sławomir Osipiuk wrote:

As for "concatenation of such plain text sequences" where each 
sequence is in a different language, ...


Actually I was meaning the concatenation of a number of messages, one 
from each of a number "things", where each message includes text in 
several languages. The result being a report in several languages, just 
by simple concatenation of the number of reports. That is, if there are 
seven sensors, the final report has seven uses of the language code for 
English, seven for French, seven for German, seven for Polish, and so 
on.


Mark E. Shoulson wrote:

So at least this particular application would be a solution to a 
problem that's already been solved.


Well, maybe it is now a solution that is out there and maybe some day a 
problem will arise for which this would be a solution worth considering. 
So for now it drifts into the archives.


Best regards,

William Overington

Tuesday 11 February 2020




Re: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-10 Thread Mark E. Shoulson via Unicode

On 2/10/20 6:14 PM, Sławomir Osipiuk via Unicode wrote:

As for "concatenation of such plain text sequences" where each sequence is in a 
different language, I must again ask: Is there a system that actually does this, that 
does not have a higher-level protocol that can carry metadata about the natural language 
of the text sequences?
Indeed, it seems to me that concatenating such sequences *is* in itself 
a higher-level protocol.  After all, it isn't  "plain text" anymore when 
you have to suppress printing out some of it.  And we already have other 
higher-level protocols that can do the job about as efficiently.  So at 
least this particular application would be a solution to a problem 
that's already been solved.


~mark



RE: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-10 Thread Sławomir Osipiuk via Unicode
The examples given don't convince me that "higher-level protocols" would not be 
sufficient.

There are very few messages being sent in the "Internet of Things" that are 
truly plain-text. Even those that use a text base (as opposed to binary data) 
are still in some kind of structured computer language, be it HTML, XML, JSON, 
etc. The intended natural language can be specified using that structure.

Sending multiples of the same message in different languages is really only 
applicable to broadcast/multicast scenarios, where you have a transmission 
going out live to multiple recipients who have different language demands. I 
can't immediately think of any examples where this is done with plain-text 
only, though I'd be glad to learn about them, if they exist. 

For any peer-to-peer or client-server interaction, as in your password example, 
it makes more sense to have the recipient request a specific language (e.g. 
using HTTP's "Accept-Language" header) and the sender to send its message in 
that language automatically.

As for "concatenation of such plain text sequences" where each sequence is in a 
different language, I must again ask: Is there a system that actually does 
this, that does not have a higher-level protocol that can carry metadata about 
the natural language of the text sequences?

Basically, I doubt Unicode language tags would be useful here because there 
simply is no Internet-based system that transmits human-readable text, in 
multiple natural languages, in such a rudimentary way, with no encapsulating 
protocol or metadata. And I doubt there will be; it seems like such a strange 
design choice in this day and age. Though I'd be glad to be corrected if 
someone has an example.

Sławomir Osipiuk





Re: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask

2020-02-10 Thread Steffen Nurpmeso via Unicode
wjgo_10...@btinternet.com via Unicode wrote in
<141cecf1.23e.1702ea529c1.webtop@btinternet.com>:
 |Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good 
 |reason why I ask
 |
 |There is a German song, Lorelei, and I searched to find an English 
 |translation.

Regarding Rhine and this thing of yours, there is also the German
joke from the middle of the 1950s, i think, with "Tünnes und
Schäl".

  Tünnes und Schäl stehen auf der Rheinbrücke.
  Da fällt Tünnes die Brille in den Fluß und er sagt
  "Da schau, jetzt ist mir die Brille in die Mosel gefallen",
  worauf Schäl sagt, "Mensch, Tünnes, dat is doch de Ring!",
  und Tünnes antwortet "Da kannste mal sehen wie schlecht ich ohne
  Brille sehen kann!"

  Tuennes und Schael stand on the Rhine bridge.
  Then Tuennes glasses fall into the river, and he says
  "Look, now i lost my glasses to the Moselle",
  whereupon Schael says "Crumbs!, Tuennes, that is the Rhine!",
  and Tuennes responds "There you can say how bad i can see
  without glasses!"

P.S.: i cannot speak "Kösch" aka Cologne dialect.
P.P.S.: i think i got you wrong.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)