Re: [Tagging] Nonbreakable spaces in name tags (Pander)

2018-02-01 Thread Johnparis
Unicode has a non-breaking hyphen

U+2011




On Feb 1, 2018 13:03, <tagging-requ...@openstreetmap.org> wrote:

Send Tagging mailing list submissions to
tagging@openstreetmap.org

To subscribe or unsubscribe via the World Wide Web, visit
https://lists.openstreetmap.org/listinfo/tagging
or, via email, send a message with subject or body 'help' to
tagging-requ...@openstreetmap.org

You can reach the person managing the list at
tagging-ow...@openstreetmap.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Tagging digest..."


Today's Topics:

   1. Re: Nonbreakable spaces in name tags (Pander)


--

Message: 1
Date: Wed, 31 Jan 2018 22:03:29 +0100
From: Pander <pan...@users.sourceforge.net>
To: tagging@openstreetmap.org
Subject: Re: [Tagging] Nonbreakable spaces in name tags
Message-ID:
<503386b9-baba-930d-ecff-1ea03c801...@users.sourceforge.net>
Content-Type: text/plain; charset=utf-8



On 01/31/2018 08:09 PM, Matej Lieskovský wrote:
> @Marc:
> Nominatim handles nbsp well.
> Renderers seem to either ignore it or make use of it.
> Most editors seem to handle it well, but whitespace highlighting would
> be welcome.
> Overpass... theoretically, it is doing exactly what it should be
> doing. Somehow making it simpler to create a regex that does Unicode
> collation would be nice.
>
> Is there anything that literally breaks? I don't think not doing
> Unicode collation at all is an excuse.
Is there also a case for non-breaking hyphens? For example
   's-Hertogenbosch (a place in the Netherlands)

is hyphenated as

   's-
   Hertogenbosch

 which is incorrect. In Dutch you are not allowed to hyphenate less than
two letters. Aparently the apostrophe is calculated as one too. It
should remain one word or hyphenate, when really needed, to

   's-Hertogen-
   bosch

>
> On 31 January 2018 at 19:43, Simon Poole <si...@poole.ch> wrote:
>>
>> Am 31.01.2018 um 19:33 schrieb Simon Poole:
>>> IMHO we should in general treat all unicode space variants as a nomal
>> that should have been "normal"
>>
>>
>>
>> ___
>> Tagging mailing list
>> Tagging@openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/tagging
>>
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging




--

Subject: Digest Footer

___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


--

End of Tagging Digest, Vol 101, Issue 1
***
___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread Pander


On 01/31/2018 08:09 PM, Matej Lieskovský wrote:
> @Marc:
> Nominatim handles nbsp well.
> Renderers seem to either ignore it or make use of it.
> Most editors seem to handle it well, but whitespace highlighting would
> be welcome.
> Overpass... theoretically, it is doing exactly what it should be
> doing. Somehow making it simpler to create a regex that does Unicode
> collation would be nice.
>
> Is there anything that literally breaks? I don't think not doing
> Unicode collation at all is an excuse.
Is there also a case for non-breaking hyphens? For example
   's-Hertogenbosch (a place in the Netherlands)

is hyphenated as

   's-
   Hertogenbosch

 which is incorrect. In Dutch you are not allowed to hyphenate less than
two letters. Aparently the apostrophe is calculated as one too. It
should remain one word or hyphenate, when really needed, to

   's-Hertogen-
   bosch

>
> On 31 January 2018 at 19:43, Simon Poole  wrote:
>>
>> Am 31.01.2018 um 19:33 schrieb Simon Poole:
>>> IMHO we should in general treat all unicode space variants as a nomal
>> that should have been "normal"
>>
>>
>>
>> ___
>> Tagging mailing list
>> Tagging@openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/tagging
>>
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging


___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread Paul Allen
On Wed, Jan 31, 2018 at 6:49 PM, marc marc 
wrote:

> I remain convinced that spelling rules have no place in osm tags
> even if it would be convenient.
>

They are not spelling rules.  And your comment implies that people
shouldn't try to spell names
correctly.  If "spelling rules have no place in OSM tags" then there can be
no objection to mapping
the capital of the UK as "Lundun," etc.  There is every reason to hope that
mappers will spell
names correctly according to the rules of the local language.  So spelling
rules do have a
place in the values of free-form OSM tags such as name=*.

These are, in fact, local typographical conventions.  Almost as important
as local spelling rules.
Hyphenation conventions, for example, can be quite complex.  Even in
English, it would be
undesirable for "Therapist's Lane" to be hyphenated as "The- rapist's Lane."

If they are to be added, the primary tools should first be asked to
> manage them before considering their use. otherwise the slightest search
> on a street name can fail, it's worse than having an incorrect return to
> the line.
>

I'd be very scathing if there is *any* widely-used OSM tool that does not
handle Unicode
correctly.  It's been over 40 years since 7-bit ASCII could be considered
adequate.  A
couple of decades since operating systems did not handle Unicode as
standard.

There are specifications of how Unicode strings should be compared.
Programming
libraries which follow those specifications shouldn't suffer the problems
you anticipate.
Programming libraries which don't follow Unicode specifications are *broken*
.

You appear not to know that correct Unicode handling is essential for many
languages
where a single glyph may be composed of two or more combining characters.
Without
correct Unicode handling it is impossible to represent names correctly in
those
languages.

The correct course of action is to check that widely-used OSM tools handle
Unicode
correctly (which they should, anyway) and fix them if they do not.

-- 
Paul
___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread Matej Lieskovský
@Marc:
Nominatim handles nbsp well.
Renderers seem to either ignore it or make use of it.
Most editors seem to handle it well, but whitespace highlighting would
be welcome.
Overpass... theoretically, it is doing exactly what it should be
doing. Somehow making it simpler to create a regex that does Unicode
collation would be nice.

Is there anything that literally breaks? I don't think not doing
Unicode collation at all is an excuse.

On 31 January 2018 at 19:43, Simon Poole  wrote:
>
>
> Am 31.01.2018 um 19:33 schrieb Simon Poole:
>> IMHO we should in general treat all unicode space variants as a nomal
> that should have been "normal"
>
>
>
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging
>

___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread Yuri Astrakhan
Marc, I think that spelling rules is part of the specific language's
rules/naming, and should be allowed in tags. If some tool does not
normalize unicode for searching, the tool should be fixed (shouldn't be too
hard in most cases). Most search engines do these kinds of normalizations
before indexing anyway. Fixing Unicode normalization is far simpler than
building a grammar engine that knows how to break words in every language,
and maintaining a huge set of exceptions for each (there are always
exceptions in these things), and attaching this engine to every rendering
system.

On Wed, Jan 31, 2018 at 1:49 PM, marc marc <marc_marc_...@hotmail.com>
wrote:

> I remain convinced that spelling rules have no place in osm tags
> even if it would be convenient.
> If they are to be added, the primary tools should first be asked to
> manage them before considering their use. otherwise the slightest search
> on a street name can fail, it's worse than having an incorrect return to
> the line.
>
> Le 31. 01. 18 à 19:33, Simon Poole a écrit :
> > IMHO we should in general treat all unicode space variants as a nomal
> > ASCII space for processing and comparision purposes and leave it at that.
> >
> > And we don't have the issues just in name tags, see
> >
> > SKIP :
> > {
> >"\r"
> > | "\n"
> > | " "
> > | "\t"
> > | "\u200A"
> > | "\u2009"
> > | "\u00A0"
> > | "\u2008"
> > | "\u2002"
> > | "\u2007"
> > | "\u3000"
> > | "\u2003"
> > | "\u2006"
> > | "\u2005"
> > | "\u2004"
> > }
> >
> > from my OH parser.
> >
> > Simon
> >
> >
> > Am 31.01.2018 um 16:25 schrieb Matej Lieskovský:
> >> So... can we reach some conclusion?
> >>
> >> I have a particular situation I need to resolve - some streets consist
> >> of ways that (among other, meaningful differences) vary in their usage
> >> of non-breakable spaces. Here are the possible solutions:
> >>
> >> 1) Start removing nbsp from local data
> >> 2) In case of conflict, prefer the variant without nbsp
> >> 3) In case of conflict, choose the more common variant
> >> 4) In case of conflict, prefer the variant with (correctly placed) nbsp
> >> 5) Start adding nbsp to local data
> >> 6) Leave things as they are
> >>
> >> To be perfectly honest, unless we can agree on whether nbsp should be
> >> encouraged or removed, I will use option 4. Option 6 (status quo) is
> >> pretty much the worst of both worlds, 5 is undeniably adding nbsp to
> >> the data (and too much work for now), and an eventual conversion from
> >> anything to 1 is trivial (which does not work for converting from 2 or
> >> 3 to 5). Since option 4 at least makes entire streets have the same
> >> name without loss of data or adding nbsp to streets that are ok so
> >> far, I consider it to be the best compromise in case of no consensus.
> >>
> >> Matej Lieskovský
> >>
> >> PS: I am starting to suspect that we might need a wiki page concerning
> >> Unicode usage in general (nbsp, soft hyphens, roman numerals,
> >> normalisation...). The link below does seem a little underwhelming:
> >> https://wiki.openstreetmap.org/wiki/Any_tags_you_like#Characters
> >>
> >> On 27 January 2018 at 01:50, Johnparis <ok...@johnfreed.com> wrote:
> >>> HTML has   for non-breakable spaces (Unicode U+00A0).
> >>>
> >>> HTML has ­ for soft hyphens (Unicode U+00AD).
> >>>
> >>> --
> >>>
> >>> Message: 2
> >>> Date: Fri, 26 Jan 2018 23:04:32 +0100
> >>> From: Richard <ricoz@gmail.com>
> >>> To: "Tag discussion, strategy and related tools"
> >>>  <tagging@openstreetmap.org>
> >>> Subject: Re: [Tagging] Nonbreakable spaces in name tags
> >>> Message-ID: <20180126220432.GA10615@rz.localhost.localdomain>
> >>> Content-Type: text/plain; charset=iso-8859-1
> >>>
> >>> On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote:
> >>>> Greetings!
> >>>>
> >>>> Several Slavic languages have rather formal rules about line breaks.
> >>> the problem is much broader, sooner or later OSM rendering will hit
> word
> >>> splitting.
> >>>
> >>>> PS: The rules are formal enoug

Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread Simon Poole


Am 31.01.2018 um 19:33 schrieb Simon Poole:
> IMHO we should in general treat all unicode space variants as a nomal
that should have been "normal"




signature.asc
Description: OpenPGP digital signature
___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread marc marc
I remain convinced that spelling rules have no place in osm tags
even if it would be convenient.
If they are to be added, the primary tools should first be asked to 
manage them before considering their use. otherwise the slightest search 
on a street name can fail, it's worse than having an incorrect return to 
the line.

Le 31. 01. 18 à 19:33, Simon Poole a écrit :
> IMHO we should in general treat all unicode space variants as a nomal
> ASCII space for processing and comparision purposes and leave it at that.
> 
> And we don't have the issues just in name tags, see
> 
> SKIP :
> {
>    "\r"
> | "\n"
> | " "
> | "\t"
> | "\u200A"
> | "\u2009"
> | "\u00A0"
> | "\u2008"
> | "\u2002"
> | "\u2007"
> | "\u3000"
> | "\u2003"
> | "\u2006"
> | "\u2005"
> | "\u2004"
> }
> 
> from my OH parser.
> 
> Simon
> 
> 
> Am 31.01.2018 um 16:25 schrieb Matej Lieskovský:
>> So... can we reach some conclusion?
>>
>> I have a particular situation I need to resolve - some streets consist
>> of ways that (among other, meaningful differences) vary in their usage
>> of non-breakable spaces. Here are the possible solutions:
>>
>> 1) Start removing nbsp from local data
>> 2) In case of conflict, prefer the variant without nbsp
>> 3) In case of conflict, choose the more common variant
>> 4) In case of conflict, prefer the variant with (correctly placed) nbsp
>> 5) Start adding nbsp to local data
>> 6) Leave things as they are
>>
>> To be perfectly honest, unless we can agree on whether nbsp should be
>> encouraged or removed, I will use option 4. Option 6 (status quo) is
>> pretty much the worst of both worlds, 5 is undeniably adding nbsp to
>> the data (and too much work for now), and an eventual conversion from
>> anything to 1 is trivial (which does not work for converting from 2 or
>> 3 to 5). Since option 4 at least makes entire streets have the same
>> name without loss of data or adding nbsp to streets that are ok so
>> far, I consider it to be the best compromise in case of no consensus.
>>
>> Matej Lieskovský
>>
>> PS: I am starting to suspect that we might need a wiki page concerning
>> Unicode usage in general (nbsp, soft hyphens, roman numerals,
>> normalisation...). The link below does seem a little underwhelming:
>> https://wiki.openstreetmap.org/wiki/Any_tags_you_like#Characters
>>
>> On 27 January 2018 at 01:50, Johnparis <ok...@johnfreed.com> wrote:
>>> HTML has  for non-breakable spaces (Unicode U+00A0).
>>>
>>> HTML has  for soft hyphens (Unicode U+00AD).
>>>
>>> --
>>>
>>> Message: 2
>>> Date: Fri, 26 Jan 2018 23:04:32 +0100
>>> From: Richard <ricoz@gmail.com>
>>> To: "Tag discussion, strategy and related tools"
>>>  <tagging@openstreetmap.org>
>>> Subject: Re: [Tagging] Nonbreakable spaces in name tags
>>> Message-ID: <20180126220432.GA10615@rz.localhost.localdomain>
>>> Content-Type: text/plain; charset=iso-8859-1
>>>
>>> On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote:
>>>> Greetings!
>>>>
>>>> Several Slavic languages have rather formal rules about line breaks.
>>> the problem is much broader, sooner or later OSM rendering will hit word
>>> splitting.
>>>
>>>> PS: The rules are formal enough that there exists a 1997 program
>>>> "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files
>>>> and is commonly used for important documents.
>>> probably not all OSM languaes have such tools and even if they have it can
>>> be tricky to determine which language rules to apply.
>>>
>>> I would think..
>>> * if someone wants to use nonbreakable spaces he should be allowed to do
>>>so and tools should tolerate it (not necessarilly understand but not
>>>break)
>>> * if someone wants to use explicit word-split marks/soft-hyphens
>>>this should be somehow allowed too.
>>>
>>> Otherwise the software should try to do its best and apply heuristics to
>>> avoid
>>> splitting lines in wrong places.
>>> Not splitting 1000 034 should be obvious, roman numbers as well. Prefer not
>>> splitting around "lonely" characters.
>>> The rendering software can also compare texts with name tags and prefer not
>>> to split names at all.
>>>
>>> Richard
>>>
>>>
>>>
>>>
>>> ___
>>> Tagging mailing list
>>> Tagging@openstreetmap.org
>>> https://lists.openstreetmap.org/listinfo/tagging
>>>
>> ___
>> Tagging mailing list
>> Tagging@openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/tagging
> 
> 
> 
> 
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging
> 

___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread Simon Poole
IMHO we should in general treat all unicode space variants as a nomal
ASCII space for processing and comparision purposes and leave it at that.

And we don't have the issues just in name tags, see

SKIP :
{
  "\r"
| "\n"
| " "
| "\t"
| "\u200A"
| "\u2009"
| "\u00A0"
| "\u2008"
| "\u2002"
| "\u2007"
| "\u3000"
| "\u2003"
| "\u2006"
| "\u2005"
| "\u2004"
}

from my OH parser.

Simon


Am 31.01.2018 um 16:25 schrieb Matej Lieskovský:
> So... can we reach some conclusion?
>
> I have a particular situation I need to resolve - some streets consist
> of ways that (among other, meaningful differences) vary in their usage
> of non-breakable spaces. Here are the possible solutions:
>
> 1) Start removing nbsp from local data
> 2) In case of conflict, prefer the variant without nbsp
> 3) In case of conflict, choose the more common variant
> 4) In case of conflict, prefer the variant with (correctly placed) nbsp
> 5) Start adding nbsp to local data
> 6) Leave things as they are
>
> To be perfectly honest, unless we can agree on whether nbsp should be
> encouraged or removed, I will use option 4. Option 6 (status quo) is
> pretty much the worst of both worlds, 5 is undeniably adding nbsp to
> the data (and too much work for now), and an eventual conversion from
> anything to 1 is trivial (which does not work for converting from 2 or
> 3 to 5). Since option 4 at least makes entire streets have the same
> name without loss of data or adding nbsp to streets that are ok so
> far, I consider it to be the best compromise in case of no consensus.
>
> Matej Lieskovský
>
> PS: I am starting to suspect that we might need a wiki page concerning
> Unicode usage in general (nbsp, soft hyphens, roman numerals,
> normalisation...). The link below does seem a little underwhelming:
> https://wiki.openstreetmap.org/wiki/Any_tags_you_like#Characters
>
> On 27 January 2018 at 01:50, Johnparis <ok...@johnfreed.com> wrote:
>> HTML has  for non-breakable spaces (Unicode U+00A0).
>>
>> HTML has  for soft hyphens (Unicode U+00AD).
>>
>> --
>>
>> Message: 2
>> Date: Fri, 26 Jan 2018 23:04:32 +0100
>> From: Richard <ricoz@gmail.com>
>> To: "Tag discussion, strategy and related tools"
>> <tagging@openstreetmap.org>
>> Subject: Re: [Tagging] Nonbreakable spaces in name tags
>> Message-ID: <20180126220432.GA10615@rz.localhost.localdomain>
>> Content-Type: text/plain; charset=iso-8859-1
>>
>> On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote:
>>> Greetings!
>>>
>>> Several Slavic languages have rather formal rules about line breaks.
>> the problem is much broader, sooner or later OSM rendering will hit word
>> splitting.
>>
>>> PS: The rules are formal enough that there exists a 1997 program
>>> "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files
>>> and is commonly used for important documents.
>> probably not all OSM languaes have such tools and even if they have it can
>> be tricky to determine which language rules to apply.
>>
>> I would think..
>> * if someone wants to use nonbreakable spaces he should be allowed to do
>>   so and tools should tolerate it (not necessarilly understand but not
>>   break)
>> * if someone wants to use explicit word-split marks/soft-hyphens
>>   this should be somehow allowed too.
>>
>> Otherwise the software should try to do its best and apply heuristics to
>> avoid
>> splitting lines in wrong places.
>> Not splitting 1000 034 should be obvious, roman numbers as well. Prefer not
>> splitting around "lonely" characters.
>> The rendering software can also compare texts with name tags and prefer not
>> to split names at all.
>>
>> Richard
>>
>>
>>
>>
>> ___
>> Tagging mailing list
>> Tagging@openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/tagging
>>
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging




signature.asc
Description: OpenPGP digital signature
___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread Matej Lieskovský
So... can we reach some conclusion?

I have a particular situation I need to resolve - some streets consist
of ways that (among other, meaningful differences) vary in their usage
of non-breakable spaces. Here are the possible solutions:

1) Start removing nbsp from local data
2) In case of conflict, prefer the variant without nbsp
3) In case of conflict, choose the more common variant
4) In case of conflict, prefer the variant with (correctly placed) nbsp
5) Start adding nbsp to local data
6) Leave things as they are

To be perfectly honest, unless we can agree on whether nbsp should be
encouraged or removed, I will use option 4. Option 6 (status quo) is
pretty much the worst of both worlds, 5 is undeniably adding nbsp to
the data (and too much work for now), and an eventual conversion from
anything to 1 is trivial (which does not work for converting from 2 or
3 to 5). Since option 4 at least makes entire streets have the same
name without loss of data or adding nbsp to streets that are ok so
far, I consider it to be the best compromise in case of no consensus.

Matej Lieskovský

PS: I am starting to suspect that we might need a wiki page concerning
Unicode usage in general (nbsp, soft hyphens, roman numerals,
normalisation...). The link below does seem a little underwhelming:
https://wiki.openstreetmap.org/wiki/Any_tags_you_like#Characters

On 27 January 2018 at 01:50, Johnparis <ok...@johnfreed.com> wrote:
> HTML has  for non-breakable spaces (Unicode U+00A0).
>
> HTML has  for soft hyphens (Unicode U+00AD).
>
> --
>
> Message: 2
> Date: Fri, 26 Jan 2018 23:04:32 +0100
> From: Richard <ricoz@gmail.com>
> To: "Tag discussion, strategy and related tools"
>     <tagging@openstreetmap.org>
> Subject: Re: [Tagging] Nonbreakable spaces in name tags
> Message-ID: <20180126220432.GA10615@rz.localhost.localdomain>
> Content-Type: text/plain; charset=iso-8859-1
>
> On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote:
>> Greetings!
>>
>> Several Slavic languages have rather formal rules about line breaks.
>
> the problem is much broader, sooner or later OSM rendering will hit word
> splitting.
>
>> PS: The rules are formal enough that there exists a 1997 program
>> "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files
>> and is commonly used for important documents.
>
> probably not all OSM languaes have such tools and even if they have it can
> be tricky to determine which language rules to apply.
>
> I would think..
> * if someone wants to use nonbreakable spaces he should be allowed to do
>   so and tools should tolerate it (not necessarilly understand but not
>   break)
> * if someone wants to use explicit word-split marks/soft-hyphens
>   this should be somehow allowed too.
>
> Otherwise the software should try to do its best and apply heuristics to
> avoid
> splitting lines in wrong places.
> Not splitting 1000 034 should be obvious, roman numbers as well. Prefer not
> splitting around "lonely" characters.
> The rendering software can also compare texts with name tags and prefer not
> to split names at all.
>
> Richard
>
>
>
>
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging
>

___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-26 Thread Johnparis
HTML has  for non-breakable spaces (Unicode U+00A0).

HTML has  for soft hyphens (Unicode U+00AD).

--

Message: 2
Date: Fri, 26 Jan 2018 23:04:32 +0100
From: Richard <ricoz@gmail.com>
To: "Tag discussion, strategy and related tools"
<tagging@openstreetmap.org>
Subject: Re: [Tagging] Nonbreakable spaces in name tags
Message-ID: <20180126220432.GA10615@rz.localhost.localdomain>
Content-Type: text/plain; charset=iso-8859-1

On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote:
> Greetings!
>
> Several Slavic languages have rather formal rules about line breaks.

the problem is much broader, sooner or later OSM rendering will hit word
splitting.

> PS: The rules are formal enough that there exists a 1997 program
> "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files
> and is commonly used for important documents.

probably not all OSM languaes have such tools and even if they have it can
be tricky to determine which language rules to apply.

I would think..
* if someone wants to use nonbreakable spaces he should be allowed to do
  so and tools should tolerate it (not necessarilly understand but not
  break)
* if someone wants to use explicit word-split marks/soft-hyphens
  this should be somehow allowed too.

Otherwise the software should try to do its best and apply heuristics to
avoid
splitting lines in wrong places.
Not splitting 1000 034 should be obvious, roman numbers as well. Prefer not
splitting around "lonely" characters.
The rendering software can also compare texts with name tags and prefer not
to split names at all.

Richard
___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-26 Thread Richard
On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote:
> Greetings!
> 
> Several Slavic languages have rather formal rules about line breaks.

the problem is much broader, sooner or later OSM rendering will hit word 
splitting.

> PS: The rules are formal enough that there exists a 1997 program
> "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files
> and is commonly used for important documents.

probably not all OSM languaes have such tools and even if they have it can 
be tricky to determine which language rules to apply.

I would think..
* if someone wants to use nonbreakable spaces he should be allowed to do 
  so and tools should tolerate it (not necessarilly understand but not 
  break)
* if someone wants to use explicit word-split marks/soft-hyphens 
  this should be somehow allowed too.

Otherwise the software should try to do its best and apply heuristics to avoid 
splitting lines in wrong places.
Not splitting 1000 034 should be obvious, roman numbers as well. Prefer not 
splitting around "lonely" characters. 
The rendering software can also compare texts with name tags and prefer not 
to split names at all.

Richard

___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-26 Thread Matej Lieskovský
@Erkin: Yes, but the full form can contain a contraction. There is an
public transport stop in Prague called "I. P. Pavlova" and (unlike the
nearby "náměstí Ivana Petroviče Pavlova") it is ALWAYS written as an
abbreviation. Signs, official documents, spoken language... there is a
point after which it would be wrong to expand the name.

https://www.openstreetmap.org/node/25936016

Matej

On 26 January 2018 at 19:09, Erkin Alp Güney  wrote:
>> (and yes, there are cases when you should use a contraction)
> name=* is full form. Not abbreviated in any way.
>
> Yours, faithfully
> Erkin Alp
>
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging

___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-26 Thread Richard
On Fri, Jan 26, 2018 at 09:09:12PM +0300, Erkin Alp Güney wrote:
> > (and yes, there are cases when you should use a contraction)
> name=* is full form. Not abbreviated in any way.

this is about the case where the official name is an abbreviation,
happens often enough.

Richard

___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-26 Thread Martin Koppenhoefer
2018-01-26 19:09 GMT+01:00 Erkin Alp Güney :

> > (and yes, there are cases when you should use a contraction)
> name=* is full form. Not abbreviated in any way.




There are also other name tags, in particular "official_name", which also
are about full (and official) forms. E.g. name=Italia
official_name=Repubblica Italiana


Cheers,
Martin
___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-26 Thread Erkin Alp Güney
> (and yes, there are cases when you should use a contraction)
name=* is full form. Not abbreviated in any way.

Yours, faithfully
Erkin Alp

___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-26 Thread Andy Townsend

On 26/01/2018 14:48, Matej Lieskovský wrote:

... we reached out
to the DWG, which did not solve the dispute.


For completeness, I actually suggested posting here rather than have the 
DWG issue a "commandment" - I have some knowledge of Czech and Czech 
grammar but not much, and it made sense to have a wider discussion.


There's also a talk-cz discussion 
https://lists.openstreetmap.org/pipermail/talk-cz/2018-January/thread.html#18356 
(if you're not a fluent Czech reader you'll want to have both the 
original and translated version of that open together so you can see the 
original Czech abbreviations etc.).  There are also a bunch of Unicode 
references in github, such as 
https://github.com/openstreetmap/openstreetmap-website/search?q=unicode=Issues=%E2%9C%93 
and in particular 
https://github.com/openstreetmap/openstreetmap-website/issues/1213 which 
describes how some Unicode characters are handled now.


 Best Regards,
Andy (DWG member and occasional orderer of "Tři piva prosím")


___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-26 Thread marc marc
Your explanation clearly spoke of a space between words.
I was referring to syllables only as an example.
In French also there are nonbreakable space like for "M. Dupont".
I don't think it's the right place to put grammar rules in osm.
imho we need a place where all tools that need it can retrieve it.
this does not prevent you from asking that the tools use a non-breakable 
space as an alias for space for query.
in the same way that some tools are able to find an object
with wrong name="St Pierre" if you request "Saint-Pierre".
But imho this remains a mistake, and error management is often desirable.

Le 26. 01. 18 à 16:50, Matej Lieskovský a écrit :
> @marc: I just realized - I'm not talking about breaking words between
> syllables but about breaking lines between words. It is not adding a
> character, just using a nonbreakable version of a space. Sorry if I'm
> not being clear.
> 
> On 26 January 2018 at 16:47, Matej Lieskovský
>  wrote:
>> In Czech, a nonbreakable space should follow any single-letter
>> preposition or conjunction and academic or military titles. A
>> nonbreakable space should also be used due to some common
>> contractions, between a number and a unit, and around some punctuation
>> marks.
>>
>> I noticed that some Overpass queries were not returning some elements
>> - that is how I found out that we actually have a rather large number
>> of nonbreakable spaces in the data.
>>
>> Nonbreakable spaces are currently quite troublesome - not all
>> consumers actually use Unicode collation, it is invisible in JOSM and
>> it is not exactly easy to input. Also, the chance that we convince all
>> contributors to use it correctly is exactly zero. Along with this
>> potentially being "tagging for the renderer", there are many calls for
>> a mass-removal.
>>
>> On the other hand, there is software that actually handles Unicode
>> collation well and it does make the correct rendering of names an
>> order of magnitude easier. Leaving this up to the renderer sounds
>> logical, but imagine forcing every renderer to figure out what
>> language any given name is in and then running the appropriate
>> subprogram to fill in the nonbreakable spaces. This could require
>> semantic analysis due to the need to add a nonbreakable space after
>> the "V" in "V jámě" (preposition) but before the "V" in "Jiří V."
>> (roman ordinal number) and after the "V." in "V. Špidla" (contraction
>> of name (and yes, there are cases when you should use a contraction)).
>>
>> Nonbreakable spaces are strange - you cannot reliably tell if they are
>> used OTG (but in some cases you can), official documents often ignore
>> them (leaving them up to the automated systems in office software, so
>> they do occur sometimes) and the rules governing them are older than
>> computers, so asking if they are a rule or a character is... dubious.
>>
>> And yes, we do have really long names of things. Names of POIs named
>> after people are a common use case.
>>
>> Matej
>>
>> On 26 January 2018 at 16:11, marc marc  wrote:
>>> Le 26. 01. 18 à 15:48, Matej Lieskovský a écrit :
 Several Slavic languages have rather formal rules about line breaks.
>>>
>>> it depends on whether it is a grammar rule or a "char".
>>> In French, it is a rule to know how to cut a word at the end of a line.
>>> Since it's a grammar rule, I don't see any point in adding a character
>>> between syllables to describe it. it's up to the render
>>> to know when it can do it if ppl wants this feature.
>>> I know nothing about your language, but I feel it look like the same.
>>> If my understanding is correct, I am in favour of not putting
>>> this "nonbreakable" information into a value and moving it to app code
>>> that need it (witch ? have you so long value that's needed to break it
>>> in several line ?)
>>>
>>> Regards,
>>> Marc
>>> ___
>>> Tagging mailing list
>>> Tagging@openstreetmap.org
>>> https://lists.openstreetmap.org/listinfo/tagging
> 
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging
> 

___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-26 Thread Jo
I think it would be best to make the tools we use JOSM, Overpass API, iD,
etc. Unicode aware, so they can handle this correctly.

Polyglot

2018-01-26 16:50 GMT+01:00 Matej Lieskovský :

> @marc: I just realized - I'm not talking about breaking words between
> syllables but about breaking lines between words. It is not adding a
> character, just using a nonbreakable version of a space. Sorry if I'm
> not being clear.
>
> On 26 January 2018 at 16:47, Matej Lieskovský
>  wrote:
> > In Czech, a nonbreakable space should follow any single-letter
> > preposition or conjunction and academic or military titles. A
> > nonbreakable space should also be used due to some common
> > contractions, between a number and a unit, and around some punctuation
> > marks.
> >
> > I noticed that some Overpass queries were not returning some elements
> > - that is how I found out that we actually have a rather large number
> > of nonbreakable spaces in the data.
> >
> > Nonbreakable spaces are currently quite troublesome - not all
> > consumers actually use Unicode collation, it is invisible in JOSM and
> > it is not exactly easy to input. Also, the chance that we convince all
> > contributors to use it correctly is exactly zero. Along with this
> > potentially being "tagging for the renderer", there are many calls for
> > a mass-removal.
> >
> > On the other hand, there is software that actually handles Unicode
> > collation well and it does make the correct rendering of names an
> > order of magnitude easier. Leaving this up to the renderer sounds
> > logical, but imagine forcing every renderer to figure out what
> > language any given name is in and then running the appropriate
> > subprogram to fill in the nonbreakable spaces. This could require
> > semantic analysis due to the need to add a nonbreakable space after
> > the "V" in "V jámě" (preposition) but before the "V" in "Jiří V."
> > (roman ordinal number) and after the "V." in "V. Špidla" (contraction
> > of name (and yes, there are cases when you should use a contraction)).
> >
> > Nonbreakable spaces are strange - you cannot reliably tell if they are
> > used OTG (but in some cases you can), official documents often ignore
> > them (leaving them up to the automated systems in office software, so
> > they do occur sometimes) and the rules governing them are older than
> > computers, so asking if they are a rule or a character is... dubious.
> >
> > And yes, we do have really long names of things. Names of POIs named
> > after people are a common use case.
> >
> > Matej
> >
> > On 26 January 2018 at 16:11, marc marc 
> wrote:
> >> Le 26. 01. 18 à 15:48, Matej Lieskovský a écrit :
> >>> Several Slavic languages have rather formal rules about line breaks.
> >>
> >> it depends on whether it is a grammar rule or a "char".
> >> In French, it is a rule to know how to cut a word at the end of a line.
> >> Since it's a grammar rule, I don't see any point in adding a character
> >> between syllables to describe it. it's up to the render
> >> to know when it can do it if ppl wants this feature.
> >> I know nothing about your language, but I feel it look like the same.
> >> If my understanding is correct, I am in favour of not putting
> >> this "nonbreakable" information into a value and moving it to app code
> >> that need it (witch ? have you so long value that's needed to break it
> >> in several line ?)
> >>
> >> Regards,
> >> Marc
> >> ___
> >> Tagging mailing list
> >> Tagging@openstreetmap.org
> >> https://lists.openstreetmap.org/listinfo/tagging
>
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging
>
___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-26 Thread Matej Lieskovský
@marc: I just realized - I'm not talking about breaking words between
syllables but about breaking lines between words. It is not adding a
character, just using a nonbreakable version of a space. Sorry if I'm
not being clear.

On 26 January 2018 at 16:47, Matej Lieskovský
 wrote:
> In Czech, a nonbreakable space should follow any single-letter
> preposition or conjunction and academic or military titles. A
> nonbreakable space should also be used due to some common
> contractions, between a number and a unit, and around some punctuation
> marks.
>
> I noticed that some Overpass queries were not returning some elements
> - that is how I found out that we actually have a rather large number
> of nonbreakable spaces in the data.
>
> Nonbreakable spaces are currently quite troublesome - not all
> consumers actually use Unicode collation, it is invisible in JOSM and
> it is not exactly easy to input. Also, the chance that we convince all
> contributors to use it correctly is exactly zero. Along with this
> potentially being "tagging for the renderer", there are many calls for
> a mass-removal.
>
> On the other hand, there is software that actually handles Unicode
> collation well and it does make the correct rendering of names an
> order of magnitude easier. Leaving this up to the renderer sounds
> logical, but imagine forcing every renderer to figure out what
> language any given name is in and then running the appropriate
> subprogram to fill in the nonbreakable spaces. This could require
> semantic analysis due to the need to add a nonbreakable space after
> the "V" in "V jámě" (preposition) but before the "V" in "Jiří V."
> (roman ordinal number) and after the "V." in "V. Špidla" (contraction
> of name (and yes, there are cases when you should use a contraction)).
>
> Nonbreakable spaces are strange - you cannot reliably tell if they are
> used OTG (but in some cases you can), official documents often ignore
> them (leaving them up to the automated systems in office software, so
> they do occur sometimes) and the rules governing them are older than
> computers, so asking if they are a rule or a character is... dubious.
>
> And yes, we do have really long names of things. Names of POIs named
> after people are a common use case.
>
> Matej
>
> On 26 January 2018 at 16:11, marc marc  wrote:
>> Le 26. 01. 18 à 15:48, Matej Lieskovský a écrit :
>>> Several Slavic languages have rather formal rules about line breaks.
>>
>> it depends on whether it is a grammar rule or a "char".
>> In French, it is a rule to know how to cut a word at the end of a line.
>> Since it's a grammar rule, I don't see any point in adding a character
>> between syllables to describe it. it's up to the render
>> to know when it can do it if ppl wants this feature.
>> I know nothing about your language, but I feel it look like the same.
>> If my understanding is correct, I am in favour of not putting
>> this "nonbreakable" information into a value and moving it to app code
>> that need it (witch ? have you so long value that's needed to break it
>> in several line ?)
>>
>> Regards,
>> Marc
>> ___
>> Tagging mailing list
>> Tagging@openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/tagging

___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-26 Thread Matej Lieskovský
In Czech, a nonbreakable space should follow any single-letter
preposition or conjunction and academic or military titles. A
nonbreakable space should also be used due to some common
contractions, between a number and a unit, and around some punctuation
marks.

I noticed that some Overpass queries were not returning some elements
- that is how I found out that we actually have a rather large number
of nonbreakable spaces in the data.

Nonbreakable spaces are currently quite troublesome - not all
consumers actually use Unicode collation, it is invisible in JOSM and
it is not exactly easy to input. Also, the chance that we convince all
contributors to use it correctly is exactly zero. Along with this
potentially being "tagging for the renderer", there are many calls for
a mass-removal.

On the other hand, there is software that actually handles Unicode
collation well and it does make the correct rendering of names an
order of magnitude easier. Leaving this up to the renderer sounds
logical, but imagine forcing every renderer to figure out what
language any given name is in and then running the appropriate
subprogram to fill in the nonbreakable spaces. This could require
semantic analysis due to the need to add a nonbreakable space after
the "V" in "V jámě" (preposition) but before the "V" in "Jiří V."
(roman ordinal number) and after the "V." in "V. Špidla" (contraction
of name (and yes, there are cases when you should use a contraction)).

Nonbreakable spaces are strange - you cannot reliably tell if they are
used OTG (but in some cases you can), official documents often ignore
them (leaving them up to the automated systems in office software, so
they do occur sometimes) and the rules governing them are older than
computers, so asking if they are a rule or a character is... dubious.

And yes, we do have really long names of things. Names of POIs named
after people are a common use case.

Matej

On 26 January 2018 at 16:11, marc marc  wrote:
> Le 26. 01. 18 à 15:48, Matej Lieskovský a écrit :
>> Several Slavic languages have rather formal rules about line breaks.
>
> it depends on whether it is a grammar rule or a "char".
> In French, it is a rule to know how to cut a word at the end of a line.
> Since it's a grammar rule, I don't see any point in adding a character
> between syllables to describe it. it's up to the render
> to know when it can do it if ppl wants this feature.
> I know nothing about your language, but I feel it look like the same.
> If my understanding is correct, I am in favour of not putting
> this "nonbreakable" information into a value and moving it to app code
> that need it (witch ? have you so long value that's needed to break it
> in several line ?)
>
> Regards,
> Marc
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging

___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-26 Thread marc marc
Le 26. 01. 18 à 15:48, Matej Lieskovský a écrit :
> Several Slavic languages have rather formal rules about line breaks.

it depends on whether it is a grammar rule or a "char".
In French, it is a rule to know how to cut a word at the end of a line. 
Since it's a grammar rule, I don't see any point in adding a character 
between syllables to describe it. it's up to the render
to know when it can do it if ppl wants this feature.
I know nothing about your language, but I feel it look like the same.
If my understanding is correct, I am in favour of not putting
this "nonbreakable" information into a value and moving it to app code 
that need it (witch ? have you so long value that's needed to break it
in several line ?)

Regards,
Marc
___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-26 Thread Erkin Alp Güney
Can you elaborate a bit more?


26-01-2018 17:48 tarihinde Matej Lieskovský yazdı:
> Greetings!
>
> Several Slavic languages have rather formal rules about line breaks.
> We in Czechia have a few contributors who take the time to add
> nonbreakable spaces to names that "need" them. Needless to say, the
> current situation is rather inconsistent, with nonbreakable spaces
> occurring in the data but nowhere near being reliable. The local talk
> is also divided on the topic of whether nonbreakable spaces should be
> encouraged or removed.
>
> We know that at least some renderers (including osm.org) actually make
> use of the nonbreakable spaces. Nominatim does Unicode collation,
> handling nonbreakable spaces well. Overpass does not and its
> suspicious behaviour was what alerted us to the problem in the first
> place.
>
> Both having and not having nonbreakable spaces has its pros and cons.
> The current state of uncertainty is the worst of both worlds. In an
> attempt to find a resolution and prevent an edit war, we reached out
> to the DWG, which did not solve the dispute. We now ask for opinions
> here.
>
> Thank you in advance,
> Matej Lieskovský
>
> PS: The rules are formal enough that there exists a 1997 program
> "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files
> and is commonly used for important documents.
>
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging


___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging