Re: [HOT] Name tag in non-latin script - hindrance for NGOs/aid agencies?

john whelan Thu, 28 Nov 2019 12:02:53 -0800

I think the major problem is how do we move forward.

XML is basically a system of containers a beginning tag and an end tag.
The nice thing about it is you can add fields to the file and existing
programs will still work. They'll just ignore the new fields.  They don't
have to understand or use unicode.  The data in the containers themselves
may or may not use the unicode sixteen bit character set.


We have legacy data, the suggestion of copying the contents if latin
alphabet into name:en is sensible.  It preserves the existing data.

Adding the local name value in a name: whatever at least allows people to
work in the language of their choice.

The secondary problem of legacy data is how it has been used in the past.
There are probably software programs, documentation including teachers'
notes, wiki and tutorials that would need to be updated to handle putting
the local language in the name field.  In the past on one of my systems a
technical person once changed the value of the port to connect to.  We'd
just printed and sent out a stack of user manuals that had the old port
number printed in them the cost to us was quite high.  We also found that
people had made their own shortened notes and they kept tripping us up for
a long time afterwards.  In the case of OpenStreetMap you've no idea who
has written instructions, or in what language.

Then you get to the person in the field who may not have the latest smart
phone or technical support or even internet.  To quote Philippe "mobile
phones that don't have a lot of embedded fonts and support a more limited
set " What they have today works.  Change it so they see small blocks
instead of letters and it no longer works for them.  Are we asking them to
buy new phones?

In commercial computer systems there is something called change management
and it is recognised that changes to computer systems can have a big
impact.  Yes if we were starting again from scratch we wouldn't have done
things the way they were done but they were done in a particular way
usually for cost reasons and I think it should be recognised.

Cheerio John

On Thu, 28 Nov 2019 at 14:11, Nasir Khan <nasir8...@gmail.com> wrote:

> Hi,
> I believe a little more information needed to be added here to point the
> discussion to the right direction. The language usage a is not Latin script
> and the Unicode block is completely ok. So there is no issue in writing
> that language anywhere in internet and no additional special font is needed
> as well to render properly. At it is true that Unicode contains all the
> letter of all the languages, so if the font has the language specific
> Unicode block it should be displayed properly.
>
> So far from my experience i can say, that country map is not complete in
> OSM. Being an open source product there is a trust and dependability issue
> as well. More people are trying to use and showing interest here now a
> days, because Google is expensive. What are the outlets of using OSM from
> desktop and mobile? Those who are active and contributing for a longer than
> me can easily list top most popular/ used apps or sites to use OSM. How
> many of them supports complex non-latin unicode characters perfectly? I
> found only a very few but there could be more.
>
> So the map is incomplete with less data, there are also incorrect data and
> we are forcing to move it in a place where it will become completely
> useless. Because the softwares can not show the texts properly. Will it be
> helpful for that language or for that country?
>
> I am not against using the native language, rather i am contributing to
> Wikipedia and a number of open source communities on the same language
> version for more than 12 years. I am also involved in a number of language
> specific national expert committees. But here i am giving my opinion not to
> use the native language now atleast for the time being.
>
> It is true that all the people of the country are comfortable and prefer
> native language. Can you please provide me data what percentage of them are
> using OSM website and how many of them have a navigation app which is based
> on OSM? what are the use cases we are targeting to cover?
>
> If we write `name` in English and add the native name in `name:xx` and
> also add English in `name:en` for now and will it be impossible to move all
> the `name:xx` to `name` when scenario improves? I believe it could be done
> via automated scripts.
>
> Regards
> Nasir Khan
>
> --
> *Nasir Khan Saikat*
> www.nasirkhn.com
>
>
>
> On Fri, 29 Nov 2019 at 00:18, Philippe Verdy <verd...@wanadoo.fr> wrote:
>
>> XML never started from scratch based on old versions of SGML or any
>> updated version of SGML.
>> When it was created, Unicode was already there and its support in XML was
>> mandatary from the start, including the support for UTF-8 by default. And
>> It was based on the earlier work on XHTML which already included Unicode
>> support by default as well, from the current development of HTML4 which was
>> also updated to enforce the behavior for Unicode (notably it was made clear
>> that to be conforming, the numeric character references could only refer to
>> the UCS codepoints, independently of the charset used for the document, and
>> that all charsets had to have a mapping to the UCS.
>>
>> Now the issue is possibly elsewhere: when languages uses a script or
>> orthography not based on Unicode because it is still not well supported or
>> has problems.
>> - there were problems for Korean in Unicode 1.0 before the merge with the
>> ISO 10646, but Unicode 1.0 is dead since long and no software today are
>> making any reference to Unicode 1.0;
>> - there has been problems with the Unicode encoding for Burmese, and
>> Mongolian, they are mostly solved, except Mongolian with works still
>> pending for the behavior of some clusters and the best way to encode the
>> vowels, this will soon change but yes in that case there are problems; but
>> the change will not be from adopting or not Unicode, but in the best
>> sequences of Unicode characters to use to represent these clusters: this is
>> an orthographic change, not a change of encodings, but yes in that ase it
>> measn changing Unicode fonts for other updated Unicode fonts; no hack based
>> on legacy charsets are involded.
>>
>> Now there remains languages/scripts not encoded at all (not in Unicode
>> and not even in any other charset): making a reference to a legacy ISO
>> chartset is inapplicable as there's no such legacy charset. All that an be
>> done for now in these languages is to use some transliteration (but not
>> necessarily Latin): Uyghur for example is generally written in that case
>> using Chinese sinograms (with some specific forms in rare cases), or Arabic
>> (with some additional diacritics and forms, but if thee forms are not
>> handled in fonts, at least there's a basic orthography that is readable,
>> the same way that we can substitute some characters in Latin or remove some
>> diacritics for African languages, or simply not encoding some ligatures by
>> writing digrams instead: this is what happens already when these langauges
>> are used in some international documents and forms like passports: there's
>> a degraded orthography, but this is still readable and sufficiently
>> distinctive for practical uses and isolated text fragemtsn are not the
>> onily source of disambiguation as there are other contextual information,
>> including photo and biometric data or unique identifiers, and a scanned
>> handwritten signature, plus personal data, including address for
>> identification purpose).
>>
>> Anyway, even if there's a prefered orthography, slight deviation of
>> orthograhy is very common and frequently used in public displays or
>> advertizing, and no one is confused. And the "prefered" orthography is just
>> a matter of choice and is unstable across time, or even space when there
>> are competing authorities providing their own local terminology for some
>> local official uses, and not mandatory everywhere (and most languages also
>> have lot of dialects that may use different orthography to render their own
>> local phonology and accents: not everyone agree with these prefered form,
>> even in the same location where dialects are also competing. and let's
>> remember that all modern language continue to evolve and borrow a lot from
>> other languages and new terms are creatively added. Finally there are
>> orthographic reforms, but they take a considerable time to be adopted or
>> never reah any acceptation and legacy orthographies remain visible in lot
>> of places and publications (plus, people are much more mobile today and
>> there are widespread communities located around the world that adapt
>> constantly to their new context and on which the official reforms have no
>> impact).
>>
>> So in conclusion, there's no other choice than Unicode today. Unicode is
>> mandatory in XML, and in OSM. Don't spak about legacy charsets. But we are
>> jsut concerned by support in fonts: ALL characters encoded up to Unicode
>> 9.0 have suitable fonts immediately usable, and these fonts are all free
>> for use, and based on TrueType/OpenType. All OSM rendering softwares should
>> be able to use TrueType/Opentype fonts. The only remaining problem is the
>> existence of mobile phones that don't have a lot of embedded fonts and
>> support a more limited set. But none of them are using or need any legacy
>> charsets.
>>
>>
>> Le jeu. 28 nov. 2019 à 15:11, John Whelan <jwhelan0...@gmail.com> a
>> écrit :
>>
>>> The way I would approach this professionally would be to define the
>>> requirements first.
>>>
>>> In this case we have a requirement to display the name in the language
>>> of choice.
>>>
>>> We also have a requirement to be compatible with existing software.
>>>
>>> Pragmatically I would recommend changing the name field to use only an 8
>>> bit Latin alphabet character set recognizing that not all systems can
>>> handle more complex character sets.  Which precise character set should be
>>> chosen would a be subject for discussion but either ISO-8859-1 or 
>>> Windows-1252
>>> would be contenders.  My personal preference would be the ISO standard.
>>>
>>> Unicode is nice but we managed with 6 bit character sets for many years
>>> when I started with computers.  Even accented characters were a major
>>> problem.  Also remember that .OSM data is in XML format and XML came out of
>>> SGML which was first used to transmit documents over modems so only 7 bits
>>> where available for encoding characters.  The extended characters use a
>>> special escape code sequence to hold the unicode characters.
>>>
>>> Realistically software never wears out but source code gets lost.
>>> Compilers and operating systems get updated.  It may not be possible to
>>> modify existing software to handle unicode characters.  I have a perfectly
>>> good scanner sitting in the corner that no long can be used with Win 10
>>> because of a new and improved driver.  With the OpenStreetMap environment
>>> there isn't even a way to get a complete list of software that uses the
>>> OpenStreetMap data so it can be tested.
>>>
>>> The local language can be added in a name:  then software that can
>>> handle the local names can pick it up.  Osmand etc. can be configured to
>>> use the local name transparently so the local population can use it in the
>>> language of their choice.
>>>
>>> This approach would appear to meet the requirements.  The argument that
>>> we should change all the existing software to meet a requirement that was
>>> not clearly defined when the software was written doesn't make sense to me.
>>>
>>> Cheerio John
>>>
>>> Frederik Ramm wrote on 2019-11-28 3:25 AM:
>>>
>>> John,
>>>
>>> On 28.11.19 01:40, John Whelan wrote:
>>>
>>> Is there any reason why name:en could not be used?
>>>
>>> The country's official language requires a "non-standard" font to be
>>> available which does not seem to be a given on all platforms. Like if
>>> you set up a standard tile server and don't install extra fonts you will
>>> see little squares instead of place names all over China.
>>>
>>> Apparently not all applications are as good in name:xx handling as
>>> OsmAnd. A recurring point in the discussion is that the proponents of
>>> using the official language say "we shouldn't fall back to English name
>>> tags just because some apps/web sites are broken, we should file bug
>>> reports with them instead", and the proponents of using English say
>>> "let's be pragmatic, there's no way all these apps/sites will be fixed
>>> within a short time, so we should use English".
>>>
>>> Bye
>>> Frederik
>>>
>>>
>>>
>>> --
>>> Sent from Postbox <https://www.postbox-inc.com>
>>> _______________________________________________
>>> HOT mailing list
>>> HOT@openstreetmap.org
>>> https://lists.openstreetmap.org/listinfo/hot
>>>
>> _______________________________________________
>> HOT mailing list
>> HOT@openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/hot
>>
>

_______________________________________________
HOT mailing list
HOT@openstreetmap.org
https://lists.openstreetmap.org/listinfo/hot

Re: [HOT] Name tag in non-latin script - hindrance for NGOs/aid agencies?

Reply via email to