RE: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask
On Wed, Feb 12, 2020 at 11:28 AM wjgo_10...@btinternet.com via Unicode wrote: > > I am reminded of the teletext system (with brand names such as Ceefax and > Oracle) in the United KIngdom, which was a broadcasting technology introduced > in the 1970s and which became very much a part of British culture during the > 1980s and 1990s. A digital signal of a special purpose 7-bit character set > was broadcast in the vertical blanking interval of a 625 line analogue > television signal. [...] > It seems to me that there could be, in the future, a type of thing that sends > out a continuous signal over a wire of, say, a temperature reading at its > location, all formatted in several languages. So, no passwords, no input from > an end user, just a continuous feeding into The Internet of Things its > output, with the numerical value in the messages changed as the temperature > changes. This would allow the digits to be expressed in the digits used in > the particular script of the particular language used in an individual > message. Teletext had a data rate of 7 kilobits/s (less than 1 kilobyte/s), was cleverly grafted onto a system never designed for it, and the terminals to display it couldn't handle modern markup. Language tags, or something very like them, would make sense for very low-rate transmissions like Teletext (or the similar Line 21 closed captions in NTSC). It's too late for them, though. The proposal is for "Internet of Things". In 2020, 1kpbs transmissions are laughably slow, unless you're talking to the Voyager space probes. Receiving equipment, even at the lowest end, has more than enough processing power to interpret a proper markup language. If for some reason you really do want to minimize data rate, you're better off with data compression rather than saving bytes by using Unicode language tags instead of XML. The receiving equipment can handle a decompression step at basically no cost (that wasn't true in the 1970s), and markup languages compress very well. The particular circumstances that would encourage unicode tag characters don't exist today: Razor-thin data rate and miniscule receiver processing power. With the resources we have now, anything done by tag characters can be done BETTER with proper encapsulating protocols and markup. With all that said, there is no Unicode Police that will come banging on your door if you make a system that uses the tag characters. If you, or anyone, thinks it's the best solution for a particular project, then do it. Deprecation just means, "There are better ways of doing this. Seriously, please look around." And I think that message is still valid. (This reply may read overly critical, but I'm very much enjoying this discussion.) Sławomir Osipiuk
RE: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask
Hi At the time, I thought that my post yesterday concluded the thread. However, later something occurred to me as a result of something in the post by Sławomir Osipiuk. The gentleman wrote as follows: Sending multiples of the same message in different languages is really only applicable to broadcast/multicast scenarios, where you have a transmission going out live to multiple recipients who have different language demands. I can't immediately think of any examples where this is done with plain-text only, though I'd be glad to learn about them, if they exist. Whilst I do not know of anything of where this is presently done, I realized that this would be a practical proposition for some of the things in the Internet of things. I am reminded of the teletext system (with brand names such as Ceefax and Oracle) in the United KIngdom, which was a broadcasting technology introduced in the 1970s and which became very much a part of British culture during the 1980s and 1990s. A digital signal of a special purpose 7-bit character set was broadcast in the vertical blanking interval of a 625 line analogue television signal. Basically in some lines normally used for the colour picture but some lines were not used during the time allowed for the scan go back to the top of the picture once it reached the lower edge of the picture. So this digital information service got a free ride in the picture signal going out to receivers all over the country. The information was organised into pages and an end user could go to "text" and then wait for a selected page to come round again in the continuous cyclic broadcasting of pages. Pages could be arranged by the broadcaster so that, say, the news headlines page came around maybe four times in each, say, 20 second cycle and some pages only once. It was very effective as the special purpose 7-bit character set, while being basically ASCII, had control characters that were stateful and displayed each as a space yet some of them switched the colour of the following text until a new control character for a colour were received, if it indeed one were received; or until the end of the 40 character line of the display. Each line started with white text, though if the first character of the line switched to a colour, the end user would not see any white text. The control codes set also included switching to chunky graphics mode. There was also a facility to use the system for subtitles to the television programme, optional subtitles so that end users could have them on if desired yet other users were not thereby forced to have subtitles. It was good, as various participants in a discussion - whether news or drama - could each have a colour for their speaking, such as green, yellow, cyan, white. No return link was needed to send information from the end user to the central broadcasting computer. A system with the same format of display was a viewdata system (brand name Prestel) but that was very different from teletext and used a two-way telephone line connection. In a viewdata system, the end user selected a page from a menu then a message requesting that page was sent to the central computer and just that page was sent to the end user. A fee for a page was often charged and the system never really took off. Teletext thrived because economy of scale brought the cost of teletext-capable electronics down and it was installed using a set of for-the-purpose integrated circuits during manufacture of most colour television sets in that era, and once installed then it was a free add-on with no ongoing cost apart from the ordinary television licence. It seems to me that there could be, in the future, a type of thing that sends out a continuous signal over a wire of, say, a temperature reading at its location, all formatted in several languages. So, no passwords, no input from an end user, just a continuous feeding into The Internet of Things its output, with the numerical value in the messages changed as the temperature changes. This would allow the digits to be expressed in the digits used in the particular script of the particular language used in an individual message. William Overington Wednesday 12 February 2020
Re: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask
Hi Thank you to everybody who replied to this thread, both online and offline. Sławomir Osipiuk wrote: As for "concatenation of such plain text sequences" where each sequence is in a different language, ... Actually I was meaning the concatenation of a number of messages, one from each of a number "things", where each message includes text in several languages. The result being a report in several languages, just by simple concatenation of the number of reports. That is, if there are seven sensors, the final report has seven uses of the language code for English, seven for French, seven for German, seven for Polish, and so on. Mark E. Shoulson wrote: So at least this particular application would be a solution to a problem that's already been solved. Well, maybe it is now a solution that is out there and maybe some day a problem will arise for which this would be a solution worth considering. So for now it drifts into the archives. Best regards, William Overington Tuesday 11 February 2020
Re: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask
On 2/10/20 6:14 PM, Sławomir Osipiuk via Unicode wrote: As for "concatenation of such plain text sequences" where each sequence is in a different language, I must again ask: Is there a system that actually does this, that does not have a higher-level protocol that can carry metadata about the natural language of the text sequences? Indeed, it seems to me that concatenating such sequences *is* in itself a higher-level protocol. After all, it isn't "plain text" anymore when you have to suppress printing out some of it. And we already have other higher-level protocols that can do the job about as efficiently. So at least this particular application would be a solution to a problem that's already been solved. ~mark
RE: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask
The examples given don't convince me that "higher-level protocols" would not be sufficient. There are very few messages being sent in the "Internet of Things" that are truly plain-text. Even those that use a text base (as opposed to binary data) are still in some kind of structured computer language, be it HTML, XML, JSON, etc. The intended natural language can be specified using that structure. Sending multiples of the same message in different languages is really only applicable to broadcast/multicast scenarios, where you have a transmission going out live to multiple recipients who have different language demands. I can't immediately think of any examples where this is done with plain-text only, though I'd be glad to learn about them, if they exist. For any peer-to-peer or client-server interaction, as in your password example, it makes more sense to have the recipient request a specific language (e.g. using HTTP's "Accept-Language" header) and the sender to send its message in that language automatically. As for "concatenation of such plain text sequences" where each sequence is in a different language, I must again ask: Is there a system that actually does this, that does not have a higher-level protocol that can carry metadata about the natural language of the text sequences? Basically, I doubt Unicode language tags would be useful here because there simply is no Internet-based system that transmits human-readable text, in multiple natural languages, in such a rudimentary way, with no encapsulating protocol or metadata. And I doubt there will be; it seems like such a strange design choice in this day and age. Though I'd be glad to be corrected if someone has an example. Sławomir Osipiuk
Re: Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good reason why I ask
wjgo_10...@btinternet.com via Unicode wrote in <141cecf1.23e.1702ea529c1.webtop@btinternet.com>: |Could U+E0001 LANGUAGE TAG become undeprecated please? There is a good |reason why I ask | |There is a German song, Lorelei, and I searched to find an English |translation. Regarding Rhine and this thing of yours, there is also the German joke from the middle of the 1950s, i think, with "Tünnes und Schäl". Tünnes und Schäl stehen auf der Rheinbrücke. Da fällt Tünnes die Brille in den Fluß und er sagt "Da schau, jetzt ist mir die Brille in die Mosel gefallen", worauf Schäl sagt, "Mensch, Tünnes, dat is doch de Ring!", und Tünnes antwortet "Da kannste mal sehen wie schlecht ich ohne Brille sehen kann!" Tuennes und Schael stand on the Rhine bridge. Then Tuennes glasses fall into the river, and he says "Look, now i lost my glasses to the Moselle", whereupon Schael says "Crumbs!, Tuennes, that is the Rhine!", and Tuennes responds "There you can say how bad i can see without glasses!" P.S.: i cannot speak "Kösch" aka Cologne dialect. P.P.S.: i think i got you wrong. --steffen | |Der Kragenbaer,The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)