Re: [Tagging] Nonbreakable spaces in name tags (Pander)
Unicode has a non-breaking hyphen U+2011 On Feb 1, 2018 13:03, wrote: Send Tagging mailing list submissions to tagging@openstreetmap.org To subscribe or unsubscribe via the World Wide Web, visit https://lists.openstreetmap.org/listinfo/tagging or, via email, send a message with subject or body 'help' to tagging-requ...@openstreetmap.org You can reach the person managing the list at tagging-ow...@openstreetmap.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Tagging digest..." Today's Topics: 1. Re: Nonbreakable spaces in name tags (Pander) -- Message: 1 Date: Wed, 31 Jan 2018 22:03:29 +0100 From: Pander To: tagging@openstreetmap.org Subject: Re: [Tagging] Nonbreakable spaces in name tags Message-ID: <503386b9-baba-930d-ecff-1ea03c801...@users.sourceforge.net> Content-Type: text/plain; charset=utf-8 On 01/31/2018 08:09 PM, Matej Lieskovský wrote: > @Marc: > Nominatim handles nbsp well. > Renderers seem to either ignore it or make use of it. > Most editors seem to handle it well, but whitespace highlighting would > be welcome. > Overpass... theoretically, it is doing exactly what it should be > doing. Somehow making it simpler to create a regex that does Unicode > collation would be nice. > > Is there anything that literally breaks? I don't think not doing > Unicode collation at all is an excuse. Is there also a case for non-breaking hyphens? For example 's-Hertogenbosch (a place in the Netherlands) is hyphenated as 's- Hertogenbosch which is incorrect. In Dutch you are not allowed to hyphenate less than two letters. Aparently the apostrophe is calculated as one too. It should remain one word or hyphenate, when really needed, to 's-Hertogen- bosch > > On 31 January 2018 at 19:43, Simon Poole wrote: >> >> Am 31.01.2018 um 19:33 schrieb Simon Poole: >>> IMHO we should in general treat all unicode space variants as a nomal >> that should have been "normal" >> >> >> >> ___ >> Tagging mailing list >> Tagging@openstreetmap.org >> https://lists.openstreetmap.org/listinfo/tagging >> > ___ > Tagging mailing list > Tagging@openstreetmap.org > https://lists.openstreetmap.org/listinfo/tagging -- Subject: Digest Footer ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging -- End of Tagging Digest, Vol 101, Issue 1 *** ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
On 01/31/2018 08:09 PM, Matej Lieskovský wrote: > @Marc: > Nominatim handles nbsp well. > Renderers seem to either ignore it or make use of it. > Most editors seem to handle it well, but whitespace highlighting would > be welcome. > Overpass... theoretically, it is doing exactly what it should be > doing. Somehow making it simpler to create a regex that does Unicode > collation would be nice. > > Is there anything that literally breaks? I don't think not doing > Unicode collation at all is an excuse. Is there also a case for non-breaking hyphens? For example 's-Hertogenbosch (a place in the Netherlands) is hyphenated as 's- Hertogenbosch which is incorrect. In Dutch you are not allowed to hyphenate less than two letters. Aparently the apostrophe is calculated as one too. It should remain one word or hyphenate, when really needed, to 's-Hertogen- bosch > > On 31 January 2018 at 19:43, Simon Poole wrote: >> >> Am 31.01.2018 um 19:33 schrieb Simon Poole: >>> IMHO we should in general treat all unicode space variants as a nomal >> that should have been "normal" >> >> >> >> ___ >> Tagging mailing list >> Tagging@openstreetmap.org >> https://lists.openstreetmap.org/listinfo/tagging >> > ___ > Tagging mailing list > Tagging@openstreetmap.org > https://lists.openstreetmap.org/listinfo/tagging ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
On Wed, Jan 31, 2018 at 6:49 PM, marc marc wrote: > I remain convinced that spelling rules have no place in osm tags > even if it would be convenient. > They are not spelling rules. And your comment implies that people shouldn't try to spell names correctly. If "spelling rules have no place in OSM tags" then there can be no objection to mapping the capital of the UK as "Lundun," etc. There is every reason to hope that mappers will spell names correctly according to the rules of the local language. So spelling rules do have a place in the values of free-form OSM tags such as name=*. These are, in fact, local typographical conventions. Almost as important as local spelling rules. Hyphenation conventions, for example, can be quite complex. Even in English, it would be undesirable for "Therapist's Lane" to be hyphenated as "The- rapist's Lane." If they are to be added, the primary tools should first be asked to > manage them before considering their use. otherwise the slightest search > on a street name can fail, it's worse than having an incorrect return to > the line. > I'd be very scathing if there is *any* widely-used OSM tool that does not handle Unicode correctly. It's been over 40 years since 7-bit ASCII could be considered adequate. A couple of decades since operating systems did not handle Unicode as standard. There are specifications of how Unicode strings should be compared. Programming libraries which follow those specifications shouldn't suffer the problems you anticipate. Programming libraries which don't follow Unicode specifications are *broken* . You appear not to know that correct Unicode handling is essential for many languages where a single glyph may be composed of two or more combining characters. Without correct Unicode handling it is impossible to represent names correctly in those languages. The correct course of action is to check that widely-used OSM tools handle Unicode correctly (which they should, anyway) and fix them if they do not. -- Paul ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
@Marc: Nominatim handles nbsp well. Renderers seem to either ignore it or make use of it. Most editors seem to handle it well, but whitespace highlighting would be welcome. Overpass... theoretically, it is doing exactly what it should be doing. Somehow making it simpler to create a regex that does Unicode collation would be nice. Is there anything that literally breaks? I don't think not doing Unicode collation at all is an excuse. On 31 January 2018 at 19:43, Simon Poole wrote: > > > Am 31.01.2018 um 19:33 schrieb Simon Poole: >> IMHO we should in general treat all unicode space variants as a nomal > that should have been "normal" > > > > ___ > Tagging mailing list > Tagging@openstreetmap.org > https://lists.openstreetmap.org/listinfo/tagging > ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
Marc, I think that spelling rules is part of the specific language's rules/naming, and should be allowed in tags. If some tool does not normalize unicode for searching, the tool should be fixed (shouldn't be too hard in most cases). Most search engines do these kinds of normalizations before indexing anyway. Fixing Unicode normalization is far simpler than building a grammar engine that knows how to break words in every language, and maintaining a huge set of exceptions for each (there are always exceptions in these things), and attaching this engine to every rendering system. On Wed, Jan 31, 2018 at 1:49 PM, marc marc wrote: > I remain convinced that spelling rules have no place in osm tags > even if it would be convenient. > If they are to be added, the primary tools should first be asked to > manage them before considering their use. otherwise the slightest search > on a street name can fail, it's worse than having an incorrect return to > the line. > > Le 31. 01. 18 à 19:33, Simon Poole a écrit : > > IMHO we should in general treat all unicode space variants as a nomal > > ASCII space for processing and comparision purposes and leave it at that. > > > > And we don't have the issues just in name tags, see > > > > SKIP : > > { > >"\r" > > | "\n" > > | " " > > | "\t" > > | "\u200A" > > | "\u2009" > > | "\u00A0" > > | "\u2008" > > | "\u2002" > > | "\u2007" > > | "\u3000" > > | "\u2003" > > | "\u2006" > > | "\u2005" > > | "\u2004" > > } > > > > from my OH parser. > > > > Simon > > > > > > Am 31.01.2018 um 16:25 schrieb Matej Lieskovský: > >> So... can we reach some conclusion? > >> > >> I have a particular situation I need to resolve - some streets consist > >> of ways that (among other, meaningful differences) vary in their usage > >> of non-breakable spaces. Here are the possible solutions: > >> > >> 1) Start removing nbsp from local data > >> 2) In case of conflict, prefer the variant without nbsp > >> 3) In case of conflict, choose the more common variant > >> 4) In case of conflict, prefer the variant with (correctly placed) nbsp > >> 5) Start adding nbsp to local data > >> 6) Leave things as they are > >> > >> To be perfectly honest, unless we can agree on whether nbsp should be > >> encouraged or removed, I will use option 4. Option 6 (status quo) is > >> pretty much the worst of both worlds, 5 is undeniably adding nbsp to > >> the data (and too much work for now), and an eventual conversion from > >> anything to 1 is trivial (which does not work for converting from 2 or > >> 3 to 5). Since option 4 at least makes entire streets have the same > >> name without loss of data or adding nbsp to streets that are ok so > >> far, I consider it to be the best compromise in case of no consensus. > >> > >> Matej Lieskovský > >> > >> PS: I am starting to suspect that we might need a wiki page concerning > >> Unicode usage in general (nbsp, soft hyphens, roman numerals, > >> normalisation...). The link below does seem a little underwhelming: > >> https://wiki.openstreetmap.org/wiki/Any_tags_you_like#Characters > >> > >> On 27 January 2018 at 01:50, Johnparis wrote: > >>> HTML has for non-breakable spaces (Unicode U+00A0). > >>> > >>> HTML has for soft hyphens (Unicode U+00AD). > >>> > >>> -- > >>> > >>> Message: 2 > >>> Date: Fri, 26 Jan 2018 23:04:32 +0100 > >>> From: Richard > >>> To: "Tag discussion, strategy and related tools" > >>> > >>> Subject: Re: [Tagging] Nonbreakable spaces in name tags > >>> Message-ID: <20180126220432.GA10615@rz.localhost.localdomain> > >>> Content-Type: text/plain; charset=iso-8859-1 > >>> > >>> On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote: > >>>> Greetings! > >>>> > >>>> Several Slavic languages have rather formal rules about line breaks. > >>> the problem is much broader, sooner or later OSM rendering will hit > word > >>> splitting. > >>> > >>>> PS: The rules are formal enough that there exists a 1997 program > >>>> "Vlna" (
Re: [Tagging] Nonbreakable spaces in name tags
Am 31.01.2018 um 19:33 schrieb Simon Poole: > IMHO we should in general treat all unicode space variants as a nomal that should have been "normal" signature.asc Description: OpenPGP digital signature ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
I remain convinced that spelling rules have no place in osm tags even if it would be convenient. If they are to be added, the primary tools should first be asked to manage them before considering their use. otherwise the slightest search on a street name can fail, it's worse than having an incorrect return to the line. Le 31. 01. 18 à 19:33, Simon Poole a écrit : > IMHO we should in general treat all unicode space variants as a nomal > ASCII space for processing and comparision purposes and leave it at that. > > And we don't have the issues just in name tags, see > > SKIP : > { > "\r" > | "\n" > | " " > | "\t" > | "\u200A" > | "\u2009" > | "\u00A0" > | "\u2008" > | "\u2002" > | "\u2007" > | "\u3000" > | "\u2003" > | "\u2006" > | "\u2005" > | "\u2004" > } > > from my OH parser. > > Simon > > > Am 31.01.2018 um 16:25 schrieb Matej Lieskovský: >> So... can we reach some conclusion? >> >> I have a particular situation I need to resolve - some streets consist >> of ways that (among other, meaningful differences) vary in their usage >> of non-breakable spaces. Here are the possible solutions: >> >> 1) Start removing nbsp from local data >> 2) In case of conflict, prefer the variant without nbsp >> 3) In case of conflict, choose the more common variant >> 4) In case of conflict, prefer the variant with (correctly placed) nbsp >> 5) Start adding nbsp to local data >> 6) Leave things as they are >> >> To be perfectly honest, unless we can agree on whether nbsp should be >> encouraged or removed, I will use option 4. Option 6 (status quo) is >> pretty much the worst of both worlds, 5 is undeniably adding nbsp to >> the data (and too much work for now), and an eventual conversion from >> anything to 1 is trivial (which does not work for converting from 2 or >> 3 to 5). Since option 4 at least makes entire streets have the same >> name without loss of data or adding nbsp to streets that are ok so >> far, I consider it to be the best compromise in case of no consensus. >> >> Matej Lieskovský >> >> PS: I am starting to suspect that we might need a wiki page concerning >> Unicode usage in general (nbsp, soft hyphens, roman numerals, >> normalisation...). The link below does seem a little underwhelming: >> https://wiki.openstreetmap.org/wiki/Any_tags_you_like#Characters >> >> On 27 January 2018 at 01:50, Johnparis wrote: >>> HTML has for non-breakable spaces (Unicode U+00A0). >>> >>> HTML has for soft hyphens (Unicode U+00AD). >>> >>> -- >>> >>> Message: 2 >>> Date: Fri, 26 Jan 2018 23:04:32 +0100 >>> From: Richard >>> To: "Tag discussion, strategy and related tools" >>> >>> Subject: Re: [Tagging] Nonbreakable spaces in name tags >>> Message-ID: <20180126220432.GA10615@rz.localhost.localdomain> >>> Content-Type: text/plain; charset=iso-8859-1 >>> >>> On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote: >>>> Greetings! >>>> >>>> Several Slavic languages have rather formal rules about line breaks. >>> the problem is much broader, sooner or later OSM rendering will hit word >>> splitting. >>> >>>> PS: The rules are formal enough that there exists a 1997 program >>>> "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files >>>> and is commonly used for important documents. >>> probably not all OSM languaes have such tools and even if they have it can >>> be tricky to determine which language rules to apply. >>> >>> I would think.. >>> * if someone wants to use nonbreakable spaces he should be allowed to do >>>so and tools should tolerate it (not necessarilly understand but not >>>break) >>> * if someone wants to use explicit word-split marks/soft-hyphens >>>this should be somehow allowed too. >>> >>> Otherwise the software should try to do its best and apply heuristics to >>> avoid >>> splitting lines in wrong places. >>> Not splitting 1000 034 should be obvious, roman numbers as well. Prefer not >>> splitting around "lonely" characters. >>> The rendering software can also compare texts with name tags and prefer not >>> to split names at all. >>> >>> Richard >>> >>> >>> >>> >>> ___ >>> Tagging mailing list >>> Tagging@openstreetmap.org >>> https://lists.openstreetmap.org/listinfo/tagging >>> >> ___ >> Tagging mailing list >> Tagging@openstreetmap.org >> https://lists.openstreetmap.org/listinfo/tagging > > > > > ___ > Tagging mailing list > Tagging@openstreetmap.org > https://lists.openstreetmap.org/listinfo/tagging > ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
IMHO we should in general treat all unicode space variants as a nomal ASCII space for processing and comparision purposes and leave it at that. And we don't have the issues just in name tags, see SKIP : { "\r" | "\n" | " " | "\t" | "\u200A" | "\u2009" | "\u00A0" | "\u2008" | "\u2002" | "\u2007" | "\u3000" | "\u2003" | "\u2006" | "\u2005" | "\u2004" } from my OH parser. Simon Am 31.01.2018 um 16:25 schrieb Matej Lieskovský: > So... can we reach some conclusion? > > I have a particular situation I need to resolve - some streets consist > of ways that (among other, meaningful differences) vary in their usage > of non-breakable spaces. Here are the possible solutions: > > 1) Start removing nbsp from local data > 2) In case of conflict, prefer the variant without nbsp > 3) In case of conflict, choose the more common variant > 4) In case of conflict, prefer the variant with (correctly placed) nbsp > 5) Start adding nbsp to local data > 6) Leave things as they are > > To be perfectly honest, unless we can agree on whether nbsp should be > encouraged or removed, I will use option 4. Option 6 (status quo) is > pretty much the worst of both worlds, 5 is undeniably adding nbsp to > the data (and too much work for now), and an eventual conversion from > anything to 1 is trivial (which does not work for converting from 2 or > 3 to 5). Since option 4 at least makes entire streets have the same > name without loss of data or adding nbsp to streets that are ok so > far, I consider it to be the best compromise in case of no consensus. > > Matej Lieskovský > > PS: I am starting to suspect that we might need a wiki page concerning > Unicode usage in general (nbsp, soft hyphens, roman numerals, > normalisation...). The link below does seem a little underwhelming: > https://wiki.openstreetmap.org/wiki/Any_tags_you_like#Characters > > On 27 January 2018 at 01:50, Johnparis wrote: >> HTML has for non-breakable spaces (Unicode U+00A0). >> >> HTML has for soft hyphens (Unicode U+00AD). >> >> -- >> >> Message: 2 >> Date: Fri, 26 Jan 2018 23:04:32 +0100 >> From: Richard >> To: "Tag discussion, strategy and related tools" >> >> Subject: Re: [Tagging] Nonbreakable spaces in name tags >> Message-ID: <20180126220432.GA10615@rz.localhost.localdomain> >> Content-Type: text/plain; charset=iso-8859-1 >> >> On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote: >>> Greetings! >>> >>> Several Slavic languages have rather formal rules about line breaks. >> the problem is much broader, sooner or later OSM rendering will hit word >> splitting. >> >>> PS: The rules are formal enough that there exists a 1997 program >>> "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files >>> and is commonly used for important documents. >> probably not all OSM languaes have such tools and even if they have it can >> be tricky to determine which language rules to apply. >> >> I would think.. >> * if someone wants to use nonbreakable spaces he should be allowed to do >> so and tools should tolerate it (not necessarilly understand but not >> break) >> * if someone wants to use explicit word-split marks/soft-hyphens >> this should be somehow allowed too. >> >> Otherwise the software should try to do its best and apply heuristics to >> avoid >> splitting lines in wrong places. >> Not splitting 1000 034 should be obvious, roman numbers as well. Prefer not >> splitting around "lonely" characters. >> The rendering software can also compare texts with name tags and prefer not >> to split names at all. >> >> Richard >> >> >> >> >> ___ >> Tagging mailing list >> Tagging@openstreetmap.org >> https://lists.openstreetmap.org/listinfo/tagging >> > ___ > Tagging mailing list > Tagging@openstreetmap.org > https://lists.openstreetmap.org/listinfo/tagging signature.asc Description: OpenPGP digital signature ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
So... can we reach some conclusion? I have a particular situation I need to resolve - some streets consist of ways that (among other, meaningful differences) vary in their usage of non-breakable spaces. Here are the possible solutions: 1) Start removing nbsp from local data 2) In case of conflict, prefer the variant without nbsp 3) In case of conflict, choose the more common variant 4) In case of conflict, prefer the variant with (correctly placed) nbsp 5) Start adding nbsp to local data 6) Leave things as they are To be perfectly honest, unless we can agree on whether nbsp should be encouraged or removed, I will use option 4. Option 6 (status quo) is pretty much the worst of both worlds, 5 is undeniably adding nbsp to the data (and too much work for now), and an eventual conversion from anything to 1 is trivial (which does not work for converting from 2 or 3 to 5). Since option 4 at least makes entire streets have the same name without loss of data or adding nbsp to streets that are ok so far, I consider it to be the best compromise in case of no consensus. Matej Lieskovský PS: I am starting to suspect that we might need a wiki page concerning Unicode usage in general (nbsp, soft hyphens, roman numerals, normalisation...). The link below does seem a little underwhelming: https://wiki.openstreetmap.org/wiki/Any_tags_you_like#Characters On 27 January 2018 at 01:50, Johnparis wrote: > HTML has for non-breakable spaces (Unicode U+00A0). > > HTML has for soft hyphens (Unicode U+00AD). > > -- > > Message: 2 > Date: Fri, 26 Jan 2018 23:04:32 +0100 > From: Richard > To: "Tag discussion, strategy and related tools" > > Subject: Re: [Tagging] Nonbreakable spaces in name tags > Message-ID: <20180126220432.GA10615@rz.localhost.localdomain> > Content-Type: text/plain; charset=iso-8859-1 > > On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote: >> Greetings! >> >> Several Slavic languages have rather formal rules about line breaks. > > the problem is much broader, sooner or later OSM rendering will hit word > splitting. > >> PS: The rules are formal enough that there exists a 1997 program >> "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files >> and is commonly used for important documents. > > probably not all OSM languaes have such tools and even if they have it can > be tricky to determine which language rules to apply. > > I would think.. > * if someone wants to use nonbreakable spaces he should be allowed to do > so and tools should tolerate it (not necessarilly understand but not > break) > * if someone wants to use explicit word-split marks/soft-hyphens > this should be somehow allowed too. > > Otherwise the software should try to do its best and apply heuristics to > avoid > splitting lines in wrong places. > Not splitting 1000 034 should be obvious, roman numbers as well. Prefer not > splitting around "lonely" characters. > The rendering software can also compare texts with name tags and prefer not > to split names at all. > > Richard > > > > > ___ > Tagging mailing list > Tagging@openstreetmap.org > https://lists.openstreetmap.org/listinfo/tagging > ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
HTML has for non-breakable spaces (Unicode U+00A0). HTML has for soft hyphens (Unicode U+00AD). -- Message: 2 Date: Fri, 26 Jan 2018 23:04:32 +0100 From: Richard To: "Tag discussion, strategy and related tools" Subject: Re: [Tagging] Nonbreakable spaces in name tags Message-ID: <20180126220432.GA10615@rz.localhost.localdomain> Content-Type: text/plain; charset=iso-8859-1 On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote: > Greetings! > > Several Slavic languages have rather formal rules about line breaks. the problem is much broader, sooner or later OSM rendering will hit word splitting. > PS: The rules are formal enough that there exists a 1997 program > "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files > and is commonly used for important documents. probably not all OSM languaes have such tools and even if they have it can be tricky to determine which language rules to apply. I would think.. * if someone wants to use nonbreakable spaces he should be allowed to do so and tools should tolerate it (not necessarilly understand but not break) * if someone wants to use explicit word-split marks/soft-hyphens this should be somehow allowed too. Otherwise the software should try to do its best and apply heuristics to avoid splitting lines in wrong places. Not splitting 1000 034 should be obvious, roman numbers as well. Prefer not splitting around "lonely" characters. The rendering software can also compare texts with name tags and prefer not to split names at all. Richard ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote: > Greetings! > > Several Slavic languages have rather formal rules about line breaks. the problem is much broader, sooner or later OSM rendering will hit word splitting. > PS: The rules are formal enough that there exists a 1997 program > "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files > and is commonly used for important documents. probably not all OSM languaes have such tools and even if they have it can be tricky to determine which language rules to apply. I would think.. * if someone wants to use nonbreakable spaces he should be allowed to do so and tools should tolerate it (not necessarilly understand but not break) * if someone wants to use explicit word-split marks/soft-hyphens this should be somehow allowed too. Otherwise the software should try to do its best and apply heuristics to avoid splitting lines in wrong places. Not splitting 1000 034 should be obvious, roman numbers as well. Prefer not splitting around "lonely" characters. The rendering software can also compare texts with name tags and prefer not to split names at all. Richard ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
@Erkin: Yes, but the full form can contain a contraction. There is an public transport stop in Prague called "I. P. Pavlova" and (unlike the nearby "náměstí Ivana Petroviče Pavlova") it is ALWAYS written as an abbreviation. Signs, official documents, spoken language... there is a point after which it would be wrong to expand the name. https://www.openstreetmap.org/node/25936016 Matej On 26 January 2018 at 19:09, Erkin Alp Güney wrote: >> (and yes, there are cases when you should use a contraction) > name=* is full form. Not abbreviated in any way. > > Yours, faithfully > Erkin Alp > > ___ > Tagging mailing list > Tagging@openstreetmap.org > https://lists.openstreetmap.org/listinfo/tagging ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
On Fri, Jan 26, 2018 at 09:09:12PM +0300, Erkin Alp Güney wrote: > > (and yes, there are cases when you should use a contraction) > name=* is full form. Not abbreviated in any way. this is about the case where the official name is an abbreviation, happens often enough. Richard ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
2018-01-26 19:09 GMT+01:00 Erkin Alp Güney : > > (and yes, there are cases when you should use a contraction) > name=* is full form. Not abbreviated in any way. There are also other name tags, in particular "official_name", which also are about full (and official) forms. E.g. name=Italia official_name=Repubblica Italiana Cheers, Martin ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
> (and yes, there are cases when you should use a contraction) name=* is full form. Not abbreviated in any way. Yours, faithfully Erkin Alp ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
On 26/01/2018 14:48, Matej Lieskovský wrote: ... we reached out to the DWG, which did not solve the dispute. For completeness, I actually suggested posting here rather than have the DWG issue a "commandment" - I have some knowledge of Czech and Czech grammar but not much, and it made sense to have a wider discussion. There's also a talk-cz discussion https://lists.openstreetmap.org/pipermail/talk-cz/2018-January/thread.html#18356 (if you're not a fluent Czech reader you'll want to have both the original and translated version of that open together so you can see the original Czech abbreviations etc.). There are also a bunch of Unicode references in github, such as https://github.com/openstreetmap/openstreetmap-website/search?q=unicode&type=Issues&utf8=%E2%9C%93 and in particular https://github.com/openstreetmap/openstreetmap-website/issues/1213 which describes how some Unicode characters are handled now. Best Regards, Andy (DWG member and occasional orderer of "Tři piva prosím") ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
Your explanation clearly spoke of a space between words. I was referring to syllables only as an example. In French also there are nonbreakable space like for "M. Dupont". I don't think it's the right place to put grammar rules in osm. imho we need a place where all tools that need it can retrieve it. this does not prevent you from asking that the tools use a non-breakable space as an alias for space for query. in the same way that some tools are able to find an object with wrong name="St Pierre" if you request "Saint-Pierre". But imho this remains a mistake, and error management is often desirable. Le 26. 01. 18 à 16:50, Matej Lieskovský a écrit : > @marc: I just realized - I'm not talking about breaking words between > syllables but about breaking lines between words. It is not adding a > character, just using a nonbreakable version of a space. Sorry if I'm > not being clear. > > On 26 January 2018 at 16:47, Matej Lieskovský > wrote: >> In Czech, a nonbreakable space should follow any single-letter >> preposition or conjunction and academic or military titles. A >> nonbreakable space should also be used due to some common >> contractions, between a number and a unit, and around some punctuation >> marks. >> >> I noticed that some Overpass queries were not returning some elements >> - that is how I found out that we actually have a rather large number >> of nonbreakable spaces in the data. >> >> Nonbreakable spaces are currently quite troublesome - not all >> consumers actually use Unicode collation, it is invisible in JOSM and >> it is not exactly easy to input. Also, the chance that we convince all >> contributors to use it correctly is exactly zero. Along with this >> potentially being "tagging for the renderer", there are many calls for >> a mass-removal. >> >> On the other hand, there is software that actually handles Unicode >> collation well and it does make the correct rendering of names an >> order of magnitude easier. Leaving this up to the renderer sounds >> logical, but imagine forcing every renderer to figure out what >> language any given name is in and then running the appropriate >> subprogram to fill in the nonbreakable spaces. This could require >> semantic analysis due to the need to add a nonbreakable space after >> the "V" in "V jámě" (preposition) but before the "V" in "Jiří V." >> (roman ordinal number) and after the "V." in "V. Špidla" (contraction >> of name (and yes, there are cases when you should use a contraction)). >> >> Nonbreakable spaces are strange - you cannot reliably tell if they are >> used OTG (but in some cases you can), official documents often ignore >> them (leaving them up to the automated systems in office software, so >> they do occur sometimes) and the rules governing them are older than >> computers, so asking if they are a rule or a character is... dubious. >> >> And yes, we do have really long names of things. Names of POIs named >> after people are a common use case. >> >> Matej >> >> On 26 January 2018 at 16:11, marc marc wrote: >>> Le 26. 01. 18 à 15:48, Matej Lieskovský a écrit : Several Slavic languages have rather formal rules about line breaks. >>> >>> it depends on whether it is a grammar rule or a "char". >>> In French, it is a rule to know how to cut a word at the end of a line. >>> Since it's a grammar rule, I don't see any point in adding a character >>> between syllables to describe it. it's up to the render >>> to know when it can do it if ppl wants this feature. >>> I know nothing about your language, but I feel it look like the same. >>> If my understanding is correct, I am in favour of not putting >>> this "nonbreakable" information into a value and moving it to app code >>> that need it (witch ? have you so long value that's needed to break it >>> in several line ?) >>> >>> Regards, >>> Marc >>> ___ >>> Tagging mailing list >>> Tagging@openstreetmap.org >>> https://lists.openstreetmap.org/listinfo/tagging > > ___ > Tagging mailing list > Tagging@openstreetmap.org > https://lists.openstreetmap.org/listinfo/tagging > ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
I think it would be best to make the tools we use JOSM, Overpass API, iD, etc. Unicode aware, so they can handle this correctly. Polyglot 2018-01-26 16:50 GMT+01:00 Matej Lieskovský : > @marc: I just realized - I'm not talking about breaking words between > syllables but about breaking lines between words. It is not adding a > character, just using a nonbreakable version of a space. Sorry if I'm > not being clear. > > On 26 January 2018 at 16:47, Matej Lieskovský > wrote: > > In Czech, a nonbreakable space should follow any single-letter > > preposition or conjunction and academic or military titles. A > > nonbreakable space should also be used due to some common > > contractions, between a number and a unit, and around some punctuation > > marks. > > > > I noticed that some Overpass queries were not returning some elements > > - that is how I found out that we actually have a rather large number > > of nonbreakable spaces in the data. > > > > Nonbreakable spaces are currently quite troublesome - not all > > consumers actually use Unicode collation, it is invisible in JOSM and > > it is not exactly easy to input. Also, the chance that we convince all > > contributors to use it correctly is exactly zero. Along with this > > potentially being "tagging for the renderer", there are many calls for > > a mass-removal. > > > > On the other hand, there is software that actually handles Unicode > > collation well and it does make the correct rendering of names an > > order of magnitude easier. Leaving this up to the renderer sounds > > logical, but imagine forcing every renderer to figure out what > > language any given name is in and then running the appropriate > > subprogram to fill in the nonbreakable spaces. This could require > > semantic analysis due to the need to add a nonbreakable space after > > the "V" in "V jámě" (preposition) but before the "V" in "Jiří V." > > (roman ordinal number) and after the "V." in "V. Špidla" (contraction > > of name (and yes, there are cases when you should use a contraction)). > > > > Nonbreakable spaces are strange - you cannot reliably tell if they are > > used OTG (but in some cases you can), official documents often ignore > > them (leaving them up to the automated systems in office software, so > > they do occur sometimes) and the rules governing them are older than > > computers, so asking if they are a rule or a character is... dubious. > > > > And yes, we do have really long names of things. Names of POIs named > > after people are a common use case. > > > > Matej > > > > On 26 January 2018 at 16:11, marc marc > wrote: > >> Le 26. 01. 18 à 15:48, Matej Lieskovský a écrit : > >>> Several Slavic languages have rather formal rules about line breaks. > >> > >> it depends on whether it is a grammar rule or a "char". > >> In French, it is a rule to know how to cut a word at the end of a line. > >> Since it's a grammar rule, I don't see any point in adding a character > >> between syllables to describe it. it's up to the render > >> to know when it can do it if ppl wants this feature. > >> I know nothing about your language, but I feel it look like the same. > >> If my understanding is correct, I am in favour of not putting > >> this "nonbreakable" information into a value and moving it to app code > >> that need it (witch ? have you so long value that's needed to break it > >> in several line ?) > >> > >> Regards, > >> Marc > >> ___ > >> Tagging mailing list > >> Tagging@openstreetmap.org > >> https://lists.openstreetmap.org/listinfo/tagging > > ___ > Tagging mailing list > Tagging@openstreetmap.org > https://lists.openstreetmap.org/listinfo/tagging > ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
@marc: I just realized - I'm not talking about breaking words between syllables but about breaking lines between words. It is not adding a character, just using a nonbreakable version of a space. Sorry if I'm not being clear. On 26 January 2018 at 16:47, Matej Lieskovský wrote: > In Czech, a nonbreakable space should follow any single-letter > preposition or conjunction and academic or military titles. A > nonbreakable space should also be used due to some common > contractions, between a number and a unit, and around some punctuation > marks. > > I noticed that some Overpass queries were not returning some elements > - that is how I found out that we actually have a rather large number > of nonbreakable spaces in the data. > > Nonbreakable spaces are currently quite troublesome - not all > consumers actually use Unicode collation, it is invisible in JOSM and > it is not exactly easy to input. Also, the chance that we convince all > contributors to use it correctly is exactly zero. Along with this > potentially being "tagging for the renderer", there are many calls for > a mass-removal. > > On the other hand, there is software that actually handles Unicode > collation well and it does make the correct rendering of names an > order of magnitude easier. Leaving this up to the renderer sounds > logical, but imagine forcing every renderer to figure out what > language any given name is in and then running the appropriate > subprogram to fill in the nonbreakable spaces. This could require > semantic analysis due to the need to add a nonbreakable space after > the "V" in "V jámě" (preposition) but before the "V" in "Jiří V." > (roman ordinal number) and after the "V." in "V. Špidla" (contraction > of name (and yes, there are cases when you should use a contraction)). > > Nonbreakable spaces are strange - you cannot reliably tell if they are > used OTG (but in some cases you can), official documents often ignore > them (leaving them up to the automated systems in office software, so > they do occur sometimes) and the rules governing them are older than > computers, so asking if they are a rule or a character is... dubious. > > And yes, we do have really long names of things. Names of POIs named > after people are a common use case. > > Matej > > On 26 January 2018 at 16:11, marc marc wrote: >> Le 26. 01. 18 à 15:48, Matej Lieskovský a écrit : >>> Several Slavic languages have rather formal rules about line breaks. >> >> it depends on whether it is a grammar rule or a "char". >> In French, it is a rule to know how to cut a word at the end of a line. >> Since it's a grammar rule, I don't see any point in adding a character >> between syllables to describe it. it's up to the render >> to know when it can do it if ppl wants this feature. >> I know nothing about your language, but I feel it look like the same. >> If my understanding is correct, I am in favour of not putting >> this "nonbreakable" information into a value and moving it to app code >> that need it (witch ? have you so long value that's needed to break it >> in several line ?) >> >> Regards, >> Marc >> ___ >> Tagging mailing list >> Tagging@openstreetmap.org >> https://lists.openstreetmap.org/listinfo/tagging ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
In Czech, a nonbreakable space should follow any single-letter preposition or conjunction and academic or military titles. A nonbreakable space should also be used due to some common contractions, between a number and a unit, and around some punctuation marks. I noticed that some Overpass queries were not returning some elements - that is how I found out that we actually have a rather large number of nonbreakable spaces in the data. Nonbreakable spaces are currently quite troublesome - not all consumers actually use Unicode collation, it is invisible in JOSM and it is not exactly easy to input. Also, the chance that we convince all contributors to use it correctly is exactly zero. Along with this potentially being "tagging for the renderer", there are many calls for a mass-removal. On the other hand, there is software that actually handles Unicode collation well and it does make the correct rendering of names an order of magnitude easier. Leaving this up to the renderer sounds logical, but imagine forcing every renderer to figure out what language any given name is in and then running the appropriate subprogram to fill in the nonbreakable spaces. This could require semantic analysis due to the need to add a nonbreakable space after the "V" in "V jámě" (preposition) but before the "V" in "Jiří V." (roman ordinal number) and after the "V." in "V. Špidla" (contraction of name (and yes, there are cases when you should use a contraction)). Nonbreakable spaces are strange - you cannot reliably tell if they are used OTG (but in some cases you can), official documents often ignore them (leaving them up to the automated systems in office software, so they do occur sometimes) and the rules governing them are older than computers, so asking if they are a rule or a character is... dubious. And yes, we do have really long names of things. Names of POIs named after people are a common use case. Matej On 26 January 2018 at 16:11, marc marc wrote: > Le 26. 01. 18 à 15:48, Matej Lieskovský a écrit : >> Several Slavic languages have rather formal rules about line breaks. > > it depends on whether it is a grammar rule or a "char". > In French, it is a rule to know how to cut a word at the end of a line. > Since it's a grammar rule, I don't see any point in adding a character > between syllables to describe it. it's up to the render > to know when it can do it if ppl wants this feature. > I know nothing about your language, but I feel it look like the same. > If my understanding is correct, I am in favour of not putting > this "nonbreakable" information into a value and moving it to app code > that need it (witch ? have you so long value that's needed to break it > in several line ?) > > Regards, > Marc > ___ > Tagging mailing list > Tagging@openstreetmap.org > https://lists.openstreetmap.org/listinfo/tagging ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
Le 26. 01. 18 à 15:48, Matej Lieskovský a écrit : > Several Slavic languages have rather formal rules about line breaks. it depends on whether it is a grammar rule or a "char". In French, it is a rule to know how to cut a word at the end of a line. Since it's a grammar rule, I don't see any point in adding a character between syllables to describe it. it's up to the render to know when it can do it if ppl wants this feature. I know nothing about your language, but I feel it look like the same. If my understanding is correct, I am in favour of not putting this "nonbreakable" information into a value and moving it to app code that need it (witch ? have you so long value that's needed to break it in several line ?) Regards, Marc ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
Re: [Tagging] Nonbreakable spaces in name tags
Can you elaborate a bit more? 26-01-2018 17:48 tarihinde Matej Lieskovský yazdı: > Greetings! > > Several Slavic languages have rather formal rules about line breaks. > We in Czechia have a few contributors who take the time to add > nonbreakable spaces to names that "need" them. Needless to say, the > current situation is rather inconsistent, with nonbreakable spaces > occurring in the data but nowhere near being reliable. The local talk > is also divided on the topic of whether nonbreakable spaces should be > encouraged or removed. > > We know that at least some renderers (including osm.org) actually make > use of the nonbreakable spaces. Nominatim does Unicode collation, > handling nonbreakable spaces well. Overpass does not and its > suspicious behaviour was what alerted us to the problem in the first > place. > > Both having and not having nonbreakable spaces has its pros and cons. > The current state of uncertainty is the worst of both worlds. In an > attempt to find a resolution and prevent an edit war, we reached out > to the DWG, which did not solve the dispute. We now ask for opinions > here. > > Thank you in advance, > Matej Lieskovský > > PS: The rules are formal enough that there exists a 1997 program > "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files > and is commonly used for important documents. > > ___ > Tagging mailing list > Tagging@openstreetmap.org > https://lists.openstreetmap.org/listinfo/tagging ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging
[Tagging] Nonbreakable spaces in name tags
Greetings! Several Slavic languages have rather formal rules about line breaks. We in Czechia have a few contributors who take the time to add nonbreakable spaces to names that "need" them. Needless to say, the current situation is rather inconsistent, with nonbreakable spaces occurring in the data but nowhere near being reliable. The local talk is also divided on the topic of whether nonbreakable spaces should be encouraged or removed. We know that at least some renderers (including osm.org) actually make use of the nonbreakable spaces. Nominatim does Unicode collation, handling nonbreakable spaces well. Overpass does not and its suspicious behaviour was what alerted us to the problem in the first place. Both having and not having nonbreakable spaces has its pros and cons. The current state of uncertainty is the worst of both worlds. In an attempt to find a resolution and prevent an edit war, we reached out to the DWG, which did not solve the dispute. We now ask for opinions here. Thank you in advance, Matej Lieskovský PS: The rules are formal enough that there exists a 1997 program "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files and is commonly used for important documents. ___ Tagging mailing list Tagging@openstreetmap.org https://lists.openstreetmap.org/listinfo/tagging