Re: [OSM-talk] Search results quality (and some testing on Elasticsearch)

2020-05-29 Thread Simon Poole
Hi Jose

Maybe you should have a look at  https://github.com/komoot/photon which
is the go to ES based solution for OSM data (I'm not quite sure how you
missed it with the large amount of research you did, but anyway).

The other bit to understand is that the design goals of Nominatim, at
least historically, were not "return a result at all cost" but, "return
a result if the object is tagged correctly", which goes hand in hand
with the target audience and goals of the openstreetmap.org. In any case
the main reason we're not running photon on openstreetmap.org are mainly
operational, not technical (aka somebody needs to volunteer to a)
integrate it in to the web site, b) integrate it in to our chef
deployment, c) provide operational support).

Simon

Am 29.05.2020 um 04:19 schrieb José Juan Montes:
>
> Hi all,
>
> This is my first message to the list so I take the opportunity to say
> hello to all and thanks to the community for the awesome
> software, data, and organisation.
>
> Now to the point. At the ES comunity, we've been discussing how
> difficult is to obtain useful results from OSM. Too many times results
> are odd or surprising: ordering puts better results down, sometimes it
> misses obvious matches entirely... Specifically, we are referring
> about the search engine of OSM front page, and other Nominatim
> bsaed services. 
>
> After some anaysis, issues seem related to:
>
> - stop words usage (prepositions, articles...)
> - result scoring and ordering (a perfect match placed below far and
> unrelated results)
> - word matching when there are tildes or non-unicode chars
> - synonyms / ignoring for some categories and common nouns (street /
> road...)
> - lack of autocompletion (helps users finding a result when they don't
> quite know the exact term)
> - lack of cross-langugae search (eg. in regions with several official
> languages, people mixes street names and road types between languages)
> - support for typo errors
>
> Part of the problem is that every language requires particular
> considerations, which impacts most of the points above. So in my view,
> a suitable solution would need to have good i18n support bottom up.
>
> We think that other communities (language-wise) may be hitting the
> same issues according to Github issues. I list some references at the
> bottom, but they don't seem to get much attention.
>
> Ultimately, the technology stack Nominatim is built upon is not state
> of the art. I have done a quick test with Elasticsearch and a simple
> default installation with naive data loading already produces decent
> results. I later found that alternative search engines exist, for
> example "Pelias", which are implemented on top of newer technologies,
> and their demo seems to work fine... 
>
> Has any alternative to the current geocoder been tested? What would it
> take for this to be improved? If alternatives exist, can the search
> engine at the front page be changed? or provide options so users can
> choose their preferred search engine? maybe even from specialized
> local/themed search providers? Perhaps something like that would pave
> the way for alternative search software and services, and foster
> innovation. 
>
> Cheers!
>
> Refs:
>
> - https://github.com/osm-search/Nominatim/issues/1811
> - https://github.com/osm-search/Nominatim/issues/333
> - https://github.com/osm-search/Nominatim/issues/1208
> - https://wiki.openstreetmap.org/wiki/Search_engines
> - source code of my
> tests: https://github.com/jjmontesl/cubetl/tree/master/examples/osm
>
>
> Jose Juan Montes
>
>
> ___
> talk mailing list
> talk@openstreetmap.org
> https://lists.openstreetmap.org/listinfo/talk


signature.asc
Description: OpenPGP digital signature
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk


[OSM-talk] Search results quality (and some testing on Elasticsearch)

2020-05-28 Thread José Juan Montes
Hi all,

This is my first message to the list so I take the opportunity to say hello
to all and thanks to the community for the awesome software, data, and
organisation.

Now to the point. At the ES comunity, we've been discussing how difficult
is to obtain useful results from OSM. Too many times results are odd or
surprising: ordering puts better results down, sometimes it misses obvious
matches entirely... Specifically, we are referring about the search engine
of OSM front page, and other Nominatim bsaed services.

After some anaysis, issues seem related to:

- stop words usage (prepositions, articles...)
- result scoring and ordering (a perfect match placed below far and
unrelated results)
- word matching when there are tildes or non-unicode chars
- synonyms / ignoring for some categories and common nouns (street /
road...)
- lack of autocompletion (helps users finding a result when they don't
quite know the exact term)
- lack of cross-langugae search (eg. in regions with several official
languages, people mixes street names and road types between languages)
- support for typo errors

Part of the problem is that every language requires particular
considerations, which impacts most of the points above. So in my view, a
suitable solution would need to have good i18n support bottom up.

We think that other communities (language-wise) may be hitting the same
issues according to Github issues. I list some references at the bottom,
but they don't seem to get much attention.

Ultimately, the technology stack Nominatim is built upon is not state of
the art. I have done a quick test with Elasticsearch and a simple default
installation with naive data loading already produces decent results. I
later found that alternative search engines exist, for example "Pelias",
which are implemented on top of newer technologies, and their demo seems to
work fine...

Has any alternative to the current geocoder been tested? What would it take
for this to be improved? If alternatives exist, can the search engine at
the front page be changed? or provide options so users can choose their
preferred search engine? maybe even from specialized local/themed search
providers? Perhaps something like that would pave the way for alternative
search software and services, and foster innovation.

Cheers!

Refs:

- https://github.com/osm-search/Nominatim/issues/1811
- https://github.com/osm-search/Nominatim/issues/333
- https://github.com/osm-search/Nominatim/issues/1208
- https://wiki.openstreetmap.org/wiki/Search_engines
- source code of my tests:
https://github.com/jjmontesl/cubetl/tree/master/examples/osm


Jose Juan Montes
___
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk