Hi Pavel, I had the similar problem several years ago - I had to find geographical locations in textual descriptions, geocode these objects to lat/long during indexing process and allow users to filter/sort search results to specific geographical areas. The important issue was that there were several types of geographical objects - street < town < region < country. The idea was to geocode to most narrow geographical area as possible. Relevance logic in this case could be specified as "find the most narrow result that is unique identified by your text or search query". So I came up with custom algorithm that was quite good in terms of performance and precision/recall. Here's the simple description: * You can intersect all text/searchquery terms with locations dictionary to find only geo terms * Search in your locations Lucene index and filter only street objects (the most narrow areas). Due to tf*idf formula you'll get the most relevant results. Then you need to post process N (3/5/10) results and verify that they are matches indeed. I did intersect search terms with result's terms and make another lucene search to verify if these terms are unique identifying the match. If it's then return matching street. If there's no any match proceed using the same algorithm with towns, regions, countries.
HTH, Alexey On Wed, Dec 15, 2010 at 6:28 PM, Pavel Minchenkov <char...@gmail.com> wrote: > Hi, > Please give me advise how to create custom scoring. I need to result that > documents were in order, depending on how popular each term in the document > (popular = how many times it appears in the index) and length of the > document (less terms - higher in search results). > > For example, index contains following data: > > ID | SEARCH_FIELD > ------------------------------ > 1 | Russia > 2 | Russia, Moscow > 3 | Russia, Volgograd > 4 | Russia, Ivanovo > 5 | Russia, Ivanovo, Altayskaya street 45 > 6 | Russia, Moscow, Kremlin > 7 | Russia, Moscow, Altayskaya street > 8 | Russia, Moscow, Altayskaya street 15 > 9 | Russia, Moscow, Altayskaya street 15/26 > > > And I should get next results: > > > Query | Document result set > ---------------------------------------------- > Russia | 1,2,4,3,6,7,8,9,5 > Moscow | 2,6,7,8,9 > Ivanovo | 4,5 > Altayskaya | 7,8,9,5 > > In fact --- it is a search for geographic objects (cities, streets, houses). > At the same time can be given only part of the address, and the results > should appear the most relevant results. > > Thanks. > -- > Pavel Minchenkov > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org