[OSM-talk] Search results quality (and some testing on Elasticsearch)

2020-05-28 Thread Josรฉ Juan Montes
Hi all,

This is my first message to the list so I take the opportunity to say hello
to all and thanks to the community for the awesome software, data, and
organisation.

Now to the point. At the ES comunity, we've been discussing how difficult
is to obtain useful results from OSM. Too many times results are odd or
surprising: ordering puts better results down, sometimes it misses obvious
matches entirely... Specifically, we are referring about the search engine
of OSM front page, and other Nominatim bsaed services.

After some anaysis, issues seem related to:

- stop words usage (prepositions, articles...)
- result scoring and ordering (a perfect match placed below far and
unrelated results)
- word matching when there are tildes or non-unicode chars
- synonyms / ignoring for some categories and common nouns (street /
road...)
- lack of autocompletion (helps users finding a result when they don't
quite know the exact term)
- lack of cross-langugae search (eg. in regions with several official
languages, people mixes street names and road types between languages)
- support for typo errors

Part of the problem is that every language requires particular
considerations, which impacts most of the points above. So in my view, a
suitable solution would need to have good i18n support bottom up.

We think that other communities (language-wise) may be hitting the same
issues according to Github issues. I list some references at the bottom,
but they don't seem to get much attention.

Ultimately, the technology stack Nominatim is built upon is not state of
the art. I have done a quick test with Elasticsearch and a simple default
installation with naive data loading already produces decent results. I
later found that alternative search engines exist, for example "Pelias",
which are implemented on top of newer technologies, and their demo seems to
work fine...

Has any alternative to the current geocoder been tested? What would it take
for this to be improved? If alternatives exist, can the search engine at
the front page be changed? or provide options so users can choose their
preferred search engine? maybe even from specialized local/themed search
providers? Perhaps something like that would pave the way for alternative
search software and services, and foster innovation.

Cheers!

Refs:

- https://github.com/osm-search/Nominatim/issues/1811
- https://github.com/osm-search/Nominatim/issues/333
- https://github.com/osm-search/Nominatim/issues/1208
- https://wiki.openstreetmap.org/wiki/Search_engines
- source code of my tests:
https://github.com/jjmontesl/cubetl/tree/master/examples/osm


Jose Juan Montes
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread Bryce Cogswell via talk

> On May 28, 2020, at 9:38 AM, Andy Townsend wrote:
> 
> It sounds like he's learnt something really useful about the computer 
> representation of written characters!

This page generates all manner of font variations for a name:
http://qaz.wtf/u/convert.cgi?text=OpenStreetMap

๐Ž๐ฉ๐ž๐ง๐’๐ญ๐ซ๐ž๐ž๐ญ๐Œ๐š๐ฉ
๐•บ๐–•๐–Š๐–“๐•พ๐–™๐–—๐–Š๐–Š๐–™๐•ธ๐–†๐–•
๐‘ถ๐’‘๐’†๐’๐‘บ๐’•๐’“๐’†๐’†๐’•๐‘ด๐’‚๐’‘
๐“ž๐“น๐“ฎ๐“ท๐“ข๐“ฝ๐“ป๐“ฎ๐“ฎ๐“ฝ๐“œ๐“ช๐“น
๐•†๐•ก๐•–๐•Ÿ๐•Š๐•ฅ๐•ฃ๐•–๐•–๐•ฅ๐•„๐•’๐•ก
etc.

___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread Andy Townsend

On 28/05/2020 17:29, mbranco2 wrote:
๐–’๐–†๐–˜๐–™๐–—๐–”ย ย has all to do with mastro, because it's the surname of a 
student in an Italian high school where I'm teachingย  OSMย  :-)


It sounds like he's learnt something really useful about the computer 
representation of written characters!


Best Regards,

Andy


___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread mbranco2
๐–’๐–†๐–˜๐–™๐–—๐–”  has all to do with mastro, because it's the surname of a
student in an Italian high school where I'm teaching  OSM  :-)
His classroom is participating in a mapathon for UN (
https://tasks.teachosm.org/project/998), commenting changeset withs
#AAvogadroONU.
When I asked him how he registered in OSM, he told me that he likes gothic
font, and in every social he registered in that way (writing his surname in
gothic with a wordprocessor, and copying/pasting it in the registering form.

Surely I agree everyone in the world must be able to register using
characters of his language, but I supposed that the sequence to  change the
font character is something different from characters in other alphabets
(sorry, I'm not an expert of character encoding, maybe this is not true).
Or maybe we could use nicknames  also bolded, underlined, ...

Il giorno gio 28 mag 2020 alle ore 15:30 Martin Koppenhoefer <
dieterdre...@gmail.com> ha scritto:

>
>
> sent from a phone
>
> On 28. May 2020, at 15:08, mbranco2  wrote:
>
> I was surprised finding an OSM username written in gothic characters: I'm
> not sure if this mailing list could show such font, the
> nickname is ๐–’๐–†๐–˜๐–™๐–—๐–” ("mastro" in normal characters).
> The problem is that, if you want to access this user profile, you've to
> copy and paste his name written with such font, searching with
> osm.org/user/mastro give no results.
>
> Isn't this an anomaly?
>
>
>
> itโ€™s normal, we allow unicode characters for usernames, and there is no
> tolerant โ€œsearchโ€œ behind osm.org/user/username
> AGAIK it requires the exact string (maybe whitespace trimmed) that the
> mapper has used for registering.
> If the user had written in a different script which you do not have
> available on your keyboard you would have had equally to use copy+paste (or
> click on a link to the user).
>
>  ๐–’๐–†๐–˜๐–™๐–—๐–” has nothing to do with mastro, although it might look as if
> it has.
>
> Cheers Martin
>
>
>
>
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread Steve Friedl
> If there is any real problem here, it could be with people using names that 
> look like the names of other community members (but technically are 
> different) for abusive scopes

Can I be แน tereo ? :-)

-Original Message-
From: Martin Koppenhoefer  
Sent: Thursday, May 28, 2020 8:38 AM
To: Oleksiy Muzalyev 
Cc: talk@openstreetmap.org
Subject: Re: [OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)



sent from a phone

> On 28. May 2020, at 17:20, Oleksiy Muzalyev  
> wrote:
> 
> Practically all people in Russia studied a foreign language within the 
> compulsory education system. ... I am sure that typing several Latin letters 
> would not be a challenge.
> 
> Besides in such disciplines as mathematics or chemistry the Latin letters are 
> being used widely for variables in formulas, etc.
> Customarily, a keyboard with the Russian layout has got also the Latin 
> letters on the keys (buttons) [1].
> 
> I do not know exactly about Japan and China, but I guess that it is about the 
> same there too.


I think the question is not so much whether someone not usually writing in 
latin characters will be able to do it, but more whether a name written in 
latin is suitable for them to identify with. IMHO there is great benefit in 
allowing unicode names, and very little problem with people using โ€œstrange 
lookingโ€ characters to mean something in different than what the characters are 
originally intended for.

If there is any real problem here, it could be with people using names that 
look like the names of other community members (but technically are different) 
for abusive scopes. This shouldnโ€™t be tolerated of course, and would be 
individually reacted to by the admins, if it is detected.

Cheers Martin 
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk




___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread Martin Koppenhoefer


sent from a phone

> On 28. May 2020, at 17:20, Oleksiy Muzalyev  
> wrote:
> 
> Practically all people in Russia studied a foreign language within the 
> compulsory education system. ... I am sure that typing several Latin letters 
> would not be a challenge.
> 
> Besides in such disciplines as mathematics or chemistry the Latin letters are 
> being used widely for variables in formulas, etc.
> Customarily, a keyboard with the Russian layout has got also the Latin 
> letters on the keys (buttons) [1].
> 
> I do not know exactly about Japan and China, but I guess that it is about the 
> same there too.


I think the question is not so much whether someone not usually writing in 
latin characters will be able to do it, but more whether a name written in 
latin is suitable for them to identify with. IMHO there is great benefit in 
allowing unicode names, and very little problem with people using โ€œstrange 
lookingโ€ characters to mean something in different than what the characters are 
originally intended for.

If there is any real problem here, it could be with people using names that 
look like the names of other community members (but technically are different) 
for abusive scopes. This shouldnโ€™t be tolerated of course, and would be 
individually reacted to by the admins, if it is detected.

Cheers Martin 
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread Oleksiy Muzalyev
Practically all people in Russia studied a foreign language within the 
compulsory education system. Usually it is English, French, or German.
How well they may know the language is debatable, however I am sure that 
typing several Latin letters would not be a challenge.


Besides in such disciplines as mathematics or chemistry the Latin 
letters are being used widely for variables in formulas, etc.
Customarily, a keyboard with the Russian layout has got also the Latin 
letters on the keys (buttons) [1].


I do not know exactly about Japan and China, but I guess that it is 
about the same there too.


[1] 
https://www.pngfind.com/pngs/m/326-3264221_russian-keyboard-layout-norwegian-keyboard-layout-windows-hd.png


On 28-May-20 15:28, Maarten Deen wrote:
I'm sure this is a feature that's very helpful for everyone not 
writing in latin script (Russia, China, Japan, etc).




___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread Roland Olbricht
See 
https://www.openstreetmap.org/user/%F0%9D%96%92%F0%9D%96%86%F0%9D%96%98%F0%9D%96%99%F0%9D%96%97%F0%9D%96%94
URL-Percent-Encoding works fine, so does the involved XML. It is
solely a problem of UX software.

___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread Hartmut Holzgraefe
On 2020-05-28 15:27, Mateusz Konieczny via talk wrote:
> See case of anyone from Russia or ลukasz from Poland.

or people from Germany who were not as lucky as me who had
the 'รค' in the family name converted to ASCII-friendly 'ae'
some generations ago already (due to a transmission error
at the town hall, nobody would even have known about ASCII
back in those days ...)

-- 
hartmut


___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread Mateusz Konieczny via talk



May 28, 2020, 15:38 by martin.zd...@freemap.sk:

> On Thu, May 28, 2020 at 3:34 PM Martin Koppenhoefer <> 
> dieterdre...@gmail.com> > wrote:
>
>>
>> ย ๐–’๐–†๐–˜๐–™๐–—๐–” has nothingย to do with mastro, although it might look as if it has.
>>
>
> Google has different opinion ;-)
>
"๐–’๐–†๐–˜๐–™๐–—๐–” has nothingย to do with mastro" seems to be something unique to Unicode

https://en.wikipedia.org/wiki/Fraktur#Fraktur_in_Unicode

___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


[OSM-talk] ๐Ÿ˜ | Re: OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread Rory McCann

ha! Awesome!

My OSM username is like that. It's so out there that I am unable to log 
into the OSM Forum, & help.osm.org. When I try to put it on the wiki 
directly, it deletes the whole page. It's great fun. ๐Ÿ™‚ โฝยนโพ


It's a nice way to find unicode bugs in OSM software.

(And remember: people can change their usernames)

โฝยนโพ Much less software broke than I thought.

On 28/05/2020 15:02, mbranco2 wrote:

Hallo,
I was surprised finding an OSM username written in gothic characters: 
I'm not sure if this mailing list could show such font, the 
nicknameย isย ๐–’๐–†๐–˜๐–™๐–—๐–” ("mastro" in normal characters).
The problem is that, if you want to access this user profile, you've to 
copy and paste his name written with such font, searching with 
osm.org/user/mastro  give no results.


Isn't this an anomaly?

___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk



___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread Martin ลฝdila
On Thu, May 28, 2020 at 3:34 PM Martin Koppenhoefer 
wrote:

>
>  ๐–’๐–†๐–˜๐–™๐–—๐–” has nothing to do with mastro, although it might look as if
> it has.
>

Google has different opinion ;-)

-- 
Martin ลฝdila
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread Martin Koppenhoefer


sent from a phone

> On 28. May 2020, at 15:08, mbranco2  wrote:
> 
> I was surprised finding an OSM username written in gothic characters: I'm not 
> sure if this mailing list could show such font, the nickname is ๐–’๐–†๐–˜๐–™๐–—๐–” 
> ("mastro" in normal characters).
> The problem is that, if you want to access this user profile, you've to copy 
> and paste his name written with such font, searching with osm.org/user/mastro 
> give no results.
> 
> Isn't this an anomaly?


itโ€™s normal, we allow unicode characters for usernames, and there is no 
tolerant โ€œsearchโ€œ behind osm.org/user/username 
AGAIK it requires the exact string (maybe whitespace trimmed) that the mapper 
has used for registering.
If the user had written in a different script which you do not have available 
on your keyboard you would have had equally to use copy+paste (or click on a 
link to the user).

 ๐–’๐–†๐–˜๐–™๐–—๐–” has nothing to do with mastro, although it might look as if it has.

Cheers Martin 



___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread Tom Hughes via talk

On 28/05/2020 14:02, mbranco2 wrote:

I was surprised finding an OSM username written in gothic characters: 
I'm not sure if this mailing list could show such font, the 
nicknameย isย ๐–’๐–†๐–˜๐–™๐–—๐–” ("mastro" in normal characters).
The problem is that, if you want to access this user profile, you've to 
copy and paste his name written with such font, searching with 
osm.org/user/mastro  give no results.


Isn't this an anomaly?



So we shouldn't allow people who don't use the latin
alphabet to register using names in their native language?

Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/

___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread Mateusz Konieczny via talk
See case of anyone from Russia or ลukasz from Poland.

Or people from China/Japan.

While banning some characters may be reasonable it is complex,
and unlikely to be very important.


May 28, 2020, 15:02 by mbran...@gmail.com:

> Hallo,
> I was surprised finding an OSM username written in gothic characters: I'm not 
> sure if this mailing list could show such font, the nicknameย isย ๐–’๐–†๐–˜๐–™๐–—๐–” 
> ("mastro" in normal characters).
> The problem is that, if you want to access this user profile, you've to copy 
> and paste his name written with such font, searching with > 
> osm.org/user/mastro >  give no results.
>
> Isn't this an anomaly?
>

___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


Re: [OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread Maarten Deen

On 2020-05-28 15:02, mbranco2 wrote:

Hallo,
I was surprised finding an OSM username written in gothic characters:
I'm not sure if this mailing list could show such font, the nickname
is ๐–’๐–†๐–˜๐–™๐–—๐–” ("mastro" in normal characters).
The problem is that, if you want to access this user profile, you've
to copy and paste his name written with such font, searching with
osm.org/user/mastro [1] give no results.

Isn't this an anomaly?


I'm sure this is a feature that's very helpful for everyone not writing 
in latin script (Russia, China, Japan, etc). I'm sure there are many 
users that have their name written in Cyrillic or Hanzi or Kanji but I 
can't give examples because I can't enter those letters.

I won't call it an anomaly but an inconvenience.

Regards,
Maarten

___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


[OSM-talk] OSM nicknames are Unicode characters? (not Ascii?)

2020-05-28 Thread mbranco2
Hallo,
I was surprised finding an OSM username written in gothic characters: I'm
not sure if this mailing list could show such font, the
nickname is ๐–’๐–†๐–˜๐–™๐–—๐–” ("mastro" in normal characters).
The problem is that, if you want to access this user profile, you've to
copy and paste his name written with such font, searching with
osm.org/user/mastro give no results.

Isn't this an anomaly?
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk