Hi, On Mon, Mar 23, 2020 at 05:21:42AM +0530, K Rahul Reddy wrote: > Hi! > > I have been going through various files to find where the tab space is being > dropped. I found that the normalization works as expected and converts tab > space to single space. But the final query phrase contained tab spaces. The > reason is: > > Geocode.php:532 > > $sQuery = $this->sQuery; > > > When it is replaced with > > $sQuery = $sNormQuery; > > all tab spaces and other white space character are replaced with single > space. > > > Is there any reason why the initial line was used? Or is it safe to replace?
The normalization done for $sNormQuery is a different one than the one done later in line 630 when make_standard_name() is called on the phrase. It serves the purpose for rechecking but it cannot be used for looking up search terms. Why? As I said before, normalisation is done twice, once with the input data (the names of the places you actually want to search for) and again with the query input (the one in line 630). It is very important that the two normalizations produce exactly the same result. Imagine that the data import normalizes away all capital letters. So 'London' becomes 'london' before it is saved in the database. Now somebody want to search for exactly the same term 'London' but the normalization for the search query keeps the capital letters. Then our original data would never be found because 'London' != 'london'. Captialisaztion isn't usually the problem, but the two forms of normalization have a different way of handling diacritics which leads exactly to the problem described above: use $sNormQuery instead of $sQuery and half the places won't be found anymore because the normalised names do no longer match. Unfortunately, this also means I have to reject your PR for the moment. Not because it is wrong, but because we have no good way of changing the normalization without breaking an existing database. There are quite a lot of Nominatim installations out there which would be expensive to reinstall. So one of the policies with changes is that they do not break existing databases or that we provide an upgrade path for them. In the case of normalization this would mean that we either have a mechanism where an existing database keeps its original normalization algorithm even when upgrading to a newer Nominatim version or that there is a script to 'convert' a database to the new normalization schema. It is definitely something we want to have in the future but at the moment we have neither. So that's why we can't accept normalisation changes. Kind regards Sarah _______________________________________________ Geocoding mailing list [email protected] https://lists.openstreetmap.org/listinfo/geocoding

