Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread Pander


On 01/31/2018 08:09 PM, Matej Lieskovský wrote:
> @Marc:
> Nominatim handles nbsp well.
> Renderers seem to either ignore it or make use of it.
> Most editors seem to handle it well, but whitespace highlighting would
> be welcome.
> Overpass... theoretically, it is doing exactly what it should be
> doing. Somehow making it simpler to create a regex that does Unicode
> collation would be nice.
>
> Is there anything that literally breaks? I don't think not doing
> Unicode collation at all is an excuse.
Is there also a case for non-breaking hyphens? For example
   's-Hertogenbosch (a place in the Netherlands)

is hyphenated as

   's-
   Hertogenbosch

 which is incorrect. In Dutch you are not allowed to hyphenate less than
two letters. Aparently the apostrophe is calculated as one too. It
should remain one word or hyphenate, when really needed, to

   's-Hertogen-
   bosch

>
> On 31 January 2018 at 19:43, Simon Poole  wrote:
>>
>> Am 31.01.2018 um 19:33 schrieb Simon Poole:
>>> IMHO we should in general treat all unicode space variants as a nomal
>> that should have been "normal"
>>
>>
>>
>> ___
>> Tagging mailing list
>> Tagging@openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/tagging
>>
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging


___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread Paul Allen
On Wed, Jan 31, 2018 at 6:49 PM, marc marc 
wrote:

> I remain convinced that spelling rules have no place in osm tags
> even if it would be convenient.
>

They are not spelling rules.  And your comment implies that people
shouldn't try to spell names
correctly.  If "spelling rules have no place in OSM tags" then there can be
no objection to mapping
the capital of the UK as "Lundun," etc.  There is every reason to hope that
mappers will spell
names correctly according to the rules of the local language.  So spelling
rules do have a
place in the values of free-form OSM tags such as name=*.

These are, in fact, local typographical conventions.  Almost as important
as local spelling rules.
Hyphenation conventions, for example, can be quite complex.  Even in
English, it would be
undesirable for "Therapist's Lane" to be hyphenated as "The- rapist's Lane."

If they are to be added, the primary tools should first be asked to
> manage them before considering their use. otherwise the slightest search
> on a street name can fail, it's worse than having an incorrect return to
> the line.
>

I'd be very scathing if there is *any* widely-used OSM tool that does not
handle Unicode
correctly.  It's been over 40 years since 7-bit ASCII could be considered
adequate.  A
couple of decades since operating systems did not handle Unicode as
standard.

There are specifications of how Unicode strings should be compared.
Programming
libraries which follow those specifications shouldn't suffer the problems
you anticipate.
Programming libraries which don't follow Unicode specifications are *broken*
.

You appear not to know that correct Unicode handling is essential for many
languages
where a single glyph may be composed of two or more combining characters.
Without
correct Unicode handling it is impossible to represent names correctly in
those
languages.

The correct course of action is to check that widely-used OSM tools handle
Unicode
correctly (which they should, anyway) and fix them if they do not.

-- 
Paul
___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread Matej Lieskovský
@Marc:
Nominatim handles nbsp well.
Renderers seem to either ignore it or make use of it.
Most editors seem to handle it well, but whitespace highlighting would
be welcome.
Overpass... theoretically, it is doing exactly what it should be
doing. Somehow making it simpler to create a regex that does Unicode
collation would be nice.

Is there anything that literally breaks? I don't think not doing
Unicode collation at all is an excuse.

On 31 January 2018 at 19:43, Simon Poole  wrote:
>
>
> Am 31.01.2018 um 19:33 schrieb Simon Poole:
>> IMHO we should in general treat all unicode space variants as a nomal
> that should have been "normal"
>
>
>
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging
>

___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread Yuri Astrakhan
Marc, I think that spelling rules is part of the specific language's
rules/naming, and should be allowed in tags. If some tool does not
normalize unicode for searching, the tool should be fixed (shouldn't be too
hard in most cases). Most search engines do these kinds of normalizations
before indexing anyway. Fixing Unicode normalization is far simpler than
building a grammar engine that knows how to break words in every language,
and maintaining a huge set of exceptions for each (there are always
exceptions in these things), and attaching this engine to every rendering
system.

On Wed, Jan 31, 2018 at 1:49 PM, marc marc 
wrote:

> I remain convinced that spelling rules have no place in osm tags
> even if it would be convenient.
> If they are to be added, the primary tools should first be asked to
> manage them before considering their use. otherwise the slightest search
> on a street name can fail, it's worse than having an incorrect return to
> the line.
>
> Le 31. 01. 18 à 19:33, Simon Poole a écrit :
> > IMHO we should in general treat all unicode space variants as a nomal
> > ASCII space for processing and comparision purposes and leave it at that.
> >
> > And we don't have the issues just in name tags, see
> >
> > SKIP :
> > {
> >"\r"
> > | "\n"
> > | " "
> > | "\t"
> > | "\u200A"
> > | "\u2009"
> > | "\u00A0"
> > | "\u2008"
> > | "\u2002"
> > | "\u2007"
> > | "\u3000"
> > | "\u2003"
> > | "\u2006"
> > | "\u2005"
> > | "\u2004"
> > }
> >
> > from my OH parser.
> >
> > Simon
> >
> >
> > Am 31.01.2018 um 16:25 schrieb Matej Lieskovský:
> >> So... can we reach some conclusion?
> >>
> >> I have a particular situation I need to resolve - some streets consist
> >> of ways that (among other, meaningful differences) vary in their usage
> >> of non-breakable spaces. Here are the possible solutions:
> >>
> >> 1) Start removing nbsp from local data
> >> 2) In case of conflict, prefer the variant without nbsp
> >> 3) In case of conflict, choose the more common variant
> >> 4) In case of conflict, prefer the variant with (correctly placed) nbsp
> >> 5) Start adding nbsp to local data
> >> 6) Leave things as they are
> >>
> >> To be perfectly honest, unless we can agree on whether nbsp should be
> >> encouraged or removed, I will use option 4. Option 6 (status quo) is
> >> pretty much the worst of both worlds, 5 is undeniably adding nbsp to
> >> the data (and too much work for now), and an eventual conversion from
> >> anything to 1 is trivial (which does not work for converting from 2 or
> >> 3 to 5). Since option 4 at least makes entire streets have the same
> >> name without loss of data or adding nbsp to streets that are ok so
> >> far, I consider it to be the best compromise in case of no consensus.
> >>
> >> Matej Lieskovský
> >>
> >> PS: I am starting to suspect that we might need a wiki page concerning
> >> Unicode usage in general (nbsp, soft hyphens, roman numerals,
> >> normalisation...). The link below does seem a little underwhelming:
> >> https://wiki.openstreetmap.org/wiki/Any_tags_you_like#Characters
> >>
> >> On 27 January 2018 at 01:50, Johnparis  wrote:
> >>> HTML has   for non-breakable spaces (Unicode U+00A0).
> >>>
> >>> HTML has ­ for soft hyphens (Unicode U+00AD).
> >>>
> >>> --
> >>>
> >>> Message: 2
> >>> Date: Fri, 26 Jan 2018 23:04:32 +0100
> >>> From: Richard 
> >>> To: "Tag discussion, strategy and related tools"
> >>>  
> >>> Subject: Re: [Tagging] Nonbreakable spaces in name tags
> >>> Message-ID: <20180126220432.GA10615@rz.localhost.localdomain>
> >>> Content-Type: text/plain; charset=iso-8859-1
> >>>
> >>> On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote:
>  Greetings!
> 
>  Several Slavic languages have rather formal rules about line breaks.
> >>> the problem is much broader, sooner or later OSM rendering will hit
> word
> >>> splitting.
> >>>
>  PS: The rules are formal enough that there exists a 1997 program
>  "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files
>  and is commonly used for important documents.
> >>> probably not all OSM languaes have such tools and even if they have it
> can
> >>> be tricky to determine which language rules to apply.
> >>>
> >>> I would think..
> >>> * if someone wants to use nonbreakable spaces he should be allowed to
> do
> >>>so and tools should tolerate it (not necessarilly understand but not
> >>>break)
> >>> * if someone wants to use explicit word-split marks/soft-hyphens
> >>>this should be somehow allowed too.
> >>>
> >>> Otherwise the software should try to do its best and apply heuristics
> to
> >>> avoid
> >>> splitting lines in wrong places.
> >>> Not splitting 1000 034 should be obvious, roman numbers as well.
> Prefer not
> >>> splitting around "lonely" characters.
> >>> The rendering software can also compare texts with name tags and
> prefer not
> >>> to split names at all.
> >>>
> >>> Richard
> >>>
> >>>
> >>>
> >>>
> >

Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread Simon Poole


Am 31.01.2018 um 19:33 schrieb Simon Poole:
> IMHO we should in general treat all unicode space variants as a nomal
that should have been "normal"




signature.asc
Description: OpenPGP digital signature
___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread marc marc
I remain convinced that spelling rules have no place in osm tags
even if it would be convenient.
If they are to be added, the primary tools should first be asked to 
manage them before considering their use. otherwise the slightest search 
on a street name can fail, it's worse than having an incorrect return to 
the line.

Le 31. 01. 18 à 19:33, Simon Poole a écrit :
> IMHO we should in general treat all unicode space variants as a nomal
> ASCII space for processing and comparision purposes and leave it at that.
> 
> And we don't have the issues just in name tags, see
> 
> SKIP :
> {
>    "\r"
> | "\n"
> | " "
> | "\t"
> | "\u200A"
> | "\u2009"
> | "\u00A0"
> | "\u2008"
> | "\u2002"
> | "\u2007"
> | "\u3000"
> | "\u2003"
> | "\u2006"
> | "\u2005"
> | "\u2004"
> }
> 
> from my OH parser.
> 
> Simon
> 
> 
> Am 31.01.2018 um 16:25 schrieb Matej Lieskovský:
>> So... can we reach some conclusion?
>>
>> I have a particular situation I need to resolve - some streets consist
>> of ways that (among other, meaningful differences) vary in their usage
>> of non-breakable spaces. Here are the possible solutions:
>>
>> 1) Start removing nbsp from local data
>> 2) In case of conflict, prefer the variant without nbsp
>> 3) In case of conflict, choose the more common variant
>> 4) In case of conflict, prefer the variant with (correctly placed) nbsp
>> 5) Start adding nbsp to local data
>> 6) Leave things as they are
>>
>> To be perfectly honest, unless we can agree on whether nbsp should be
>> encouraged or removed, I will use option 4. Option 6 (status quo) is
>> pretty much the worst of both worlds, 5 is undeniably adding nbsp to
>> the data (and too much work for now), and an eventual conversion from
>> anything to 1 is trivial (which does not work for converting from 2 or
>> 3 to 5). Since option 4 at least makes entire streets have the same
>> name without loss of data or adding nbsp to streets that are ok so
>> far, I consider it to be the best compromise in case of no consensus.
>>
>> Matej Lieskovský
>>
>> PS: I am starting to suspect that we might need a wiki page concerning
>> Unicode usage in general (nbsp, soft hyphens, roman numerals,
>> normalisation...). The link below does seem a little underwhelming:
>> https://wiki.openstreetmap.org/wiki/Any_tags_you_like#Characters
>>
>> On 27 January 2018 at 01:50, Johnparis  wrote:
>>> HTML has   for non-breakable spaces (Unicode U+00A0).
>>>
>>> HTML has ­ for soft hyphens (Unicode U+00AD).
>>>
>>> --
>>>
>>> Message: 2
>>> Date: Fri, 26 Jan 2018 23:04:32 +0100
>>> From: Richard 
>>> To: "Tag discussion, strategy and related tools"
>>>  
>>> Subject: Re: [Tagging] Nonbreakable spaces in name tags
>>> Message-ID: <20180126220432.GA10615@rz.localhost.localdomain>
>>> Content-Type: text/plain; charset=iso-8859-1
>>>
>>> On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote:
 Greetings!

 Several Slavic languages have rather formal rules about line breaks.
>>> the problem is much broader, sooner or later OSM rendering will hit word
>>> splitting.
>>>
 PS: The rules are formal enough that there exists a 1997 program
 "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files
 and is commonly used for important documents.
>>> probably not all OSM languaes have such tools and even if they have it can
>>> be tricky to determine which language rules to apply.
>>>
>>> I would think..
>>> * if someone wants to use nonbreakable spaces he should be allowed to do
>>>so and tools should tolerate it (not necessarilly understand but not
>>>break)
>>> * if someone wants to use explicit word-split marks/soft-hyphens
>>>this should be somehow allowed too.
>>>
>>> Otherwise the software should try to do its best and apply heuristics to
>>> avoid
>>> splitting lines in wrong places.
>>> Not splitting 1000 034 should be obvious, roman numbers as well. Prefer not
>>> splitting around "lonely" characters.
>>> The rendering software can also compare texts with name tags and prefer not
>>> to split names at all.
>>>
>>> Richard
>>>
>>>
>>>
>>>
>>> ___
>>> Tagging mailing list
>>> Tagging@openstreetmap.org
>>> https://lists.openstreetmap.org/listinfo/tagging
>>>
>> ___
>> Tagging mailing list
>> Tagging@openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/tagging
> 
> 
> 
> 
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging
> 

___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread Simon Poole
IMHO we should in general treat all unicode space variants as a nomal
ASCII space for processing and comparision purposes and leave it at that.

And we don't have the issues just in name tags, see

SKIP :
{
  "\r"
| "\n"
| " "
| "\t"
| "\u200A"
| "\u2009"
| "\u00A0"
| "\u2008"
| "\u2002"
| "\u2007"
| "\u3000"
| "\u2003"
| "\u2006"
| "\u2005"
| "\u2004"
}

from my OH parser.

Simon


Am 31.01.2018 um 16:25 schrieb Matej Lieskovský:
> So... can we reach some conclusion?
>
> I have a particular situation I need to resolve - some streets consist
> of ways that (among other, meaningful differences) vary in their usage
> of non-breakable spaces. Here are the possible solutions:
>
> 1) Start removing nbsp from local data
> 2) In case of conflict, prefer the variant without nbsp
> 3) In case of conflict, choose the more common variant
> 4) In case of conflict, prefer the variant with (correctly placed) nbsp
> 5) Start adding nbsp to local data
> 6) Leave things as they are
>
> To be perfectly honest, unless we can agree on whether nbsp should be
> encouraged or removed, I will use option 4. Option 6 (status quo) is
> pretty much the worst of both worlds, 5 is undeniably adding nbsp to
> the data (and too much work for now), and an eventual conversion from
> anything to 1 is trivial (which does not work for converting from 2 or
> 3 to 5). Since option 4 at least makes entire streets have the same
> name without loss of data or adding nbsp to streets that are ok so
> far, I consider it to be the best compromise in case of no consensus.
>
> Matej Lieskovský
>
> PS: I am starting to suspect that we might need a wiki page concerning
> Unicode usage in general (nbsp, soft hyphens, roman numerals,
> normalisation...). The link below does seem a little underwhelming:
> https://wiki.openstreetmap.org/wiki/Any_tags_you_like#Characters
>
> On 27 January 2018 at 01:50, Johnparis  wrote:
>> HTML has   for non-breakable spaces (Unicode U+00A0).
>>
>> HTML has ­ for soft hyphens (Unicode U+00AD).
>>
>> --
>>
>> Message: 2
>> Date: Fri, 26 Jan 2018 23:04:32 +0100
>> From: Richard 
>> To: "Tag discussion, strategy and related tools"
>> 
>> Subject: Re: [Tagging] Nonbreakable spaces in name tags
>> Message-ID: <20180126220432.GA10615@rz.localhost.localdomain>
>> Content-Type: text/plain; charset=iso-8859-1
>>
>> On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote:
>>> Greetings!
>>>
>>> Several Slavic languages have rather formal rules about line breaks.
>> the problem is much broader, sooner or later OSM rendering will hit word
>> splitting.
>>
>>> PS: The rules are formal enough that there exists a 1997 program
>>> "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files
>>> and is commonly used for important documents.
>> probably not all OSM languaes have such tools and even if they have it can
>> be tricky to determine which language rules to apply.
>>
>> I would think..
>> * if someone wants to use nonbreakable spaces he should be allowed to do
>>   so and tools should tolerate it (not necessarilly understand but not
>>   break)
>> * if someone wants to use explicit word-split marks/soft-hyphens
>>   this should be somehow allowed too.
>>
>> Otherwise the software should try to do its best and apply heuristics to
>> avoid
>> splitting lines in wrong places.
>> Not splitting 1000 034 should be obvious, roman numbers as well. Prefer not
>> splitting around "lonely" characters.
>> The rendering software can also compare texts with name tags and prefer not
>> to split names at all.
>>
>> Richard
>>
>>
>>
>>
>> ___
>> Tagging mailing list
>> Tagging@openstreetmap.org
>> https://lists.openstreetmap.org/listinfo/tagging
>>
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging




signature.asc
Description: OpenPGP digital signature
___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Nonbreakable spaces in name tags

2018-01-31 Thread Matej Lieskovský
So... can we reach some conclusion?

I have a particular situation I need to resolve - some streets consist
of ways that (among other, meaningful differences) vary in their usage
of non-breakable spaces. Here are the possible solutions:

1) Start removing nbsp from local data
2) In case of conflict, prefer the variant without nbsp
3) In case of conflict, choose the more common variant
4) In case of conflict, prefer the variant with (correctly placed) nbsp
5) Start adding nbsp to local data
6) Leave things as they are

To be perfectly honest, unless we can agree on whether nbsp should be
encouraged or removed, I will use option 4. Option 6 (status quo) is
pretty much the worst of both worlds, 5 is undeniably adding nbsp to
the data (and too much work for now), and an eventual conversion from
anything to 1 is trivial (which does not work for converting from 2 or
3 to 5). Since option 4 at least makes entire streets have the same
name without loss of data or adding nbsp to streets that are ok so
far, I consider it to be the best compromise in case of no consensus.

Matej Lieskovský

PS: I am starting to suspect that we might need a wiki page concerning
Unicode usage in general (nbsp, soft hyphens, roman numerals,
normalisation...). The link below does seem a little underwhelming:
https://wiki.openstreetmap.org/wiki/Any_tags_you_like#Characters

On 27 January 2018 at 01:50, Johnparis  wrote:
> HTML has   for non-breakable spaces (Unicode U+00A0).
>
> HTML has ­ for soft hyphens (Unicode U+00AD).
>
> --
>
> Message: 2
> Date: Fri, 26 Jan 2018 23:04:32 +0100
> From: Richard 
> To: "Tag discussion, strategy and related tools"
> 
> Subject: Re: [Tagging] Nonbreakable spaces in name tags
> Message-ID: <20180126220432.GA10615@rz.localhost.localdomain>
> Content-Type: text/plain; charset=iso-8859-1
>
> On Fri, Jan 26, 2018 at 03:48:42PM +0100, Matej Lieskovský wrote:
>> Greetings!
>>
>> Several Slavic languages have rather formal rules about line breaks.
>
> the problem is much broader, sooner or later OSM rendering will hit word
> splitting.
>
>> PS: The rules are formal enough that there exists a 1997 program
>> "Vlna" ("Tilde"), that can add nonbreakable spaces to TeX source files
>> and is commonly used for important documents.
>
> probably not all OSM languaes have such tools and even if they have it can
> be tricky to determine which language rules to apply.
>
> I would think..
> * if someone wants to use nonbreakable spaces he should be allowed to do
>   so and tools should tolerate it (not necessarilly understand but not
>   break)
> * if someone wants to use explicit word-split marks/soft-hyphens
>   this should be somehow allowed too.
>
> Otherwise the software should try to do its best and apply heuristics to
> avoid
> splitting lines in wrong places.
> Not splitting 1000 034 should be obvious, roman numbers as well. Prefer not
> splitting around "lonely" characters.
> The rendering software can also compare texts with name tags and prefer not
> to split names at all.
>
> Richard
>
>
>
>
> ___
> Tagging mailing list
> Tagging@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/tagging
>

___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Public art definition

2018-01-31 Thread Daniel Koć

W dniu 31.01.2018 o 09:51, Janko Mihelić pisze:


On Sun, Jan 28, 2018, 10:50 Tom Pfeifer > wrote:



So, how does "exhibit=artwork" work for you?


+1
I like that key because it could have lots of useful values, like 
exhibit=animal, exhibit=car, exhibit=moon_rock, etc.


This key has been used a bit already:

https://taginfo.openstreetmap.org/keys/exhibit

so it's a good proposition (for example not all cars or aicrafts are 
historic).


Thanks for all the comments int his thread. It seems like there is a 
common view on this, so I will update the wiki soon to reflect it.


--
"My method is uncertain/ It's a mess but it's working" [F. Apple]

___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Public art definition

2018-01-31 Thread Janko Mihelić
On Sun, Jan 28, 2018, 10:50 Tom Pfeifer  wrote:

>
> So, how does "exhibit=artwork" work for you?
>

+1
I like that key because it could have lots of useful values, like
exhibit=animal, exhibit=car, exhibit=moon_rock, etc.

Janko

>
___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging