Hi, On Wed, Apr 01, 2020 at 11:19:52PM +0530, K Rahul Reddy wrote: > I have written test cases in test/bdd. But I found something else while > doing so. setNearPointFromQuery function used to detect LatLon pairs is > processed separately. This causes the last two examples in the following > Scenario to fail. > > Scenario Outline: Search with white space characters > When sending json search query "<data>" > Then exactly 1 result is returned > > Examples: > | data | > | amerlugalpe, N 47.15739° E 9.61264° | > | amerlugalpe, N 47.15739° E 9.61264° | > | amerlugalpe , N 47.15739° E 9.61264° | > | amerlugalpe, N 47.15739° E 9.61264° | > | amerlugalpe, N 47.15739° E 9.61264° | > > > This could be fixed by using a preg_replace in setNearPointFromQuery > function in SearchContext.php or by applying preg_replace on $sQuery. The > former will fix LatLon, but the main query string will still have those > characters.
Looks like the regexes in parseLatLong() are rather picky there and only accept real spaces. That could be replaced with the more generic '\s'. Cheers Sarah > > Which approach should I follow? Or should I ignore this, as this is a part > of LanLon, and would not consist of other white space characters in general? > > Regards, > > Rahul > > On 01/04/20 11:42 am, Sarah Hoffmann wrote: > > Hi Rahul, > > > > On Wed, Apr 01, 2020 at 05:36:00AM +0530, K Rahul Reddy wrote: > > > For issue #967 <https://github.com/osm-search/Nominatim/issues/967>, These > > > are some points I found so far: > > > > > > In Geocode.php lookup(), > > > > > > 1) The sNormQuery is made by using PHP's Transliterator. > > > > > > 2) The normalization method make_standard_name is used on phrases in line > > > 630. This is an sql function which returns > > > trim(public.gettokenstring(public.transliteration(name))). > > > > > > We need to replace %09-%0d characters in phrases. This can be done > > > simply by adding > > > > > > $sPhrase = preg_replace('/[\x09|\x0a|\x0b|\x0c|\x0d]/', > > > ' ', > > > $sPhrase); > > > > > > before normalization function is called. > > > > > > 3) Other solution would be to change normalization(breaks the DB). The > > > transliteration() uses the utfasciitable.h > > > > > > Changing UTFASCIILOOKUP by replacing 9-13 th position elements by '2' > > > does the job. > > > > > > > > > I have tested both the ways, and both seem to work as expected. What > > > should > > > I do now? > > Go for solution 3). It is true that it breaks the DB but only for places > > that have characters %09-%0d in their name. That's basically data that is > > broken in the OSM database already and should be fixed. Therefore it is > > okay to make an exception to the rule not to change the normalization. > > > > Cheers > > > > Sarah _______________________________________________ Geocoding mailing list [email protected] https://lists.openstreetmap.org/listinfo/geocoding

