Re: how do I get search for "fort st john" to match "ft saint john"
thanks guys. unfortunately the solr that contains this schema/data is in a legacy system that requires the fields to not be changed. we will, hopefully in the near future, be able to look at redesigning the schema. alternatively, I could look at boning up on Java (which I havent used in a long time) and see if I can write a subword synonym plugin of some sort to perform this type of synonyming thanks anyhow. -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128914.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for "fort st john" to match "ft saint john"
And, if you use the pf, pf2, and pf3 parameters of edismax, with boosting, you can assure that the closest matches always appear first. And assuming you do index-time synonym expansion. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Wednesday, April 2, 2014 3:09 PM To: solr-user@lucene.apache.org Subject: Re: how do I get search for "fort st john" to match "ft saint john" No, there isn't a tokenizer that'll do what you want that I know about. Really, I suspect you need to back up a bit and re-think the problem. It looks to me like you've taken a path that's going to cause you endless grief when, as Jack says, phrase searches are built in to the tokenization process. Best, Erick On Wed, Apr 2, 2014 at 12:58 PM, Jack Krupansky wrote: Query by phrase is a core feature of tokenized text in Lucene and Solr, so there is no need to use a pattern token filter for that purpose. And yes, doing so pretty much breaks most token filters that would assume that the text is tokenized. -- Jack Krupansky -Original Message- From: solr-user Sent: Wednesday, April 2, 2014 12:46 PM To: solr-user@lucene.apache.org Subject: Re: how do I get search for "fort st john" to match "ft saint john" Hi Eric. No, that doesnt fix the problem either (I have tested this previously and did so again just now) Since the PatternTokenizerFactory is not tokenizing on whitespace(by design since I want the user to search by phrase), the phrase "marina former fort ord" (for example) does not get turned into four tokens ("marina", "former", "fort" and "ord"), and so the SynonymFilterFactory does not create synonyms for them (by design) the original question remains: is there a tokenizer/plugin that will allow me to synonym words in a unbroken phrase? note: the reason I dont want to tokenize the data by whitespace is that it would cause way to many results to get returned if I, for example, search on "new" or "st" ... However, I still want to be able to include "fort saint john" in the results if the user searches for "ft st john" or "fort st john" or ... -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for "fort st john" to match "ft saint john"
No, there isn't a tokenizer that'll do what you want that I know about. Really, I suspect you need to back up a bit and re-think the problem. It looks to me like you've taken a path that's going to cause you endless grief when, as Jack says, phrase searches are built in to the tokenization process. Best, Erick On Wed, Apr 2, 2014 at 12:58 PM, Jack Krupansky wrote: > Query by phrase is a core feature of tokenized text in Lucene and Solr, so > there is no need to use a pattern token filter for that purpose. And yes, > doing so pretty much breaks most token filters that would assume that the > text is tokenized. > > -- Jack Krupansky > > -Original Message- From: solr-user > Sent: Wednesday, April 2, 2014 12:46 PM > To: solr-user@lucene.apache.org > > Subject: Re: how do I get search for "fort st john" to match "ft saint john" > > Hi Eric. > > No, that doesnt fix the problem either (I have tested this previously and > did so again just now) > > Since the PatternTokenizerFactory is not tokenizing on whitespace(by design > since I want the user to search by phrase), the phrase "marina former fort > ord" (for example) does not get turned into four tokens ("marina", "former", > "fort" and "ord"), and so the SynonymFilterFactory does not create synonyms > for them (by design) > > the original question remains: is there a tokenizer/plugin that will allow > me to synonym words in a unbroken phrase? > > note: the reason I dont want to tokenize the data by whitespace is that it > would cause way to many results to get returned if I, for example, search on > "new" or "st" ... However, I still want to be able to include "fort saint > john" in the results if the user searches for "ft st john" or "fort st john" > or ... > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for "fort st john" to match "ft saint john"
Query by phrase is a core feature of tokenized text in Lucene and Solr, so there is no need to use a pattern token filter for that purpose. And yes, doing so pretty much breaks most token filters that would assume that the text is tokenized. -- Jack Krupansky -Original Message- From: solr-user Sent: Wednesday, April 2, 2014 12:46 PM To: solr-user@lucene.apache.org Subject: Re: how do I get search for "fort st john" to match "ft saint john" Hi Eric. No, that doesnt fix the problem either (I have tested this previously and did so again just now) Since the PatternTokenizerFactory is not tokenizing on whitespace(by design since I want the user to search by phrase), the phrase "marina former fort ord" (for example) does not get turned into four tokens ("marina", "former", "fort" and "ord"), and so the SynonymFilterFactory does not create synonyms for them (by design) the original question remains: is there a tokenizer/plugin that will allow me to synonym words in a unbroken phrase? note: the reason I dont want to tokenize the data by whitespace is that it would cause way to many results to get returned if I, for example, search on "new" or "st" ... However, I still want to be able to include "fort saint john" in the results if the user searches for "ft st john" or "fort st john" or ... -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for "fort st john" to match "ft saint john"
Hi Eric. No, that doesnt fix the problem either (I have tested this previously and did so again just now) Since the PatternTokenizerFactory is not tokenizing on whitespace(by design since I want the user to search by phrase), the phrase "marina former fort ord" (for example) does not get turned into four tokens ("marina", "former", "fort" and "ord"), and so the SynonymFilterFactory does not create synonyms for them (by design) the original question remains: is there a tokenizer/plugin that will allow me to synonym words in a unbroken phrase? note: the reason I dont want to tokenize the data by whitespace is that it would cause way to many results to get returned if I, for example, search on "new" or "st" ... However, I still want to be able to include "fort saint john" in the results if the user searches for "ft st john" or "fort st john" or ... -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for "fort st john" to match "ft saint john"
It seems to me that, you are missing this line under Alex. -Original Message- From: solr-user To: solr-user Sent: Tue, Apr 1, 2014 5:01 pm Subject: Re: how do I get search for "fort st john" to match "ft saint john" Hi Eric. Sorry, been away. The city_index_synonyms.txt file is pretty small as it contains just these two lines: saint,st,ste fort,ft There is nothing at all in the city_query_synonyms.txt file, and it isn't used either. My understanding is that solr would create the appropriate synonym entries in the index and so treat "fort" and "ft" as equal if you have a simple one line schema (that uses the type definition from my original email) and index "fort saint john", does it work for you? i.e. does it return results if you search for "ft st john" and "ft saint john" and "fort st john"? My Solr 4.6.1 instance doesn't. I am wondering if synonyms just don't work for all/some words in a phrase -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128500.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for "fort st john" to match "ft saint john"
Hi Eric. Sorry, been away. The city_index_synonyms.txt file is pretty small as it contains just these two lines: saint,st,ste fort,ft There is nothing at all in the city_query_synonyms.txt file, and it isn't used either. My understanding is that solr would create the appropriate synonym entries in the index and so treat "fort" and "ft" as equal if you have a simple one line schema (that uses the type definition from my original email) and index "fort saint john", does it work for you? i.e. does it return results if you search for "ft st john" and "ft saint john" and "fort st john"? My Solr 4.6.1 instance doesn't. I am wondering if synonyms just don't work for all/some words in a phrase -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128500.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for "fort st john" to match "ft saint john"
What does your synonyms file look like? Because this breaks things up fine, you get individual tokens etc. it seems. If your synonyms file maps saint to st, and fort to ft (or vice-versa) it should work. If this is off base, could you post the synonyms you expect to be applied? Best, Erick On Fri, Mar 28, 2014 at 1:52 PM, solr-user wrote: > yes, and I can see that (as expected) per the field type: > > 1. the indexed value is lowercased > 2. stripped of non-alpha characters > 3. multiple consecutive whitespace is removed > 4. trimmed > 5. goes thru the SynonymFilterFactory where: > > a. the indexed value of "Marina/Former Fort Ord" is "marina former fort ord" > b. the search value of "Marina/Former Ft Ord" is "marina former ft ord" > > This I already knew. My question wasn't "why" they dont match, it is: how > do I get search for "fort st john" to match "ft saint john". ie is there a > way to index/search that would allow the search to match. > > the SynonymFilterFactory during indexing does not create a matching term for > "marina former ft ord", which I think it would do if the indexed value was a > word instead of a phrase (ie "fort" vs "Marina/Former Fort Ord") > > (note that my terms/understanding of how this works may be incorrect, hence > my request for assistance/understanding) > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4127764.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for "fort st john" to match "ft saint john"
yes, and I can see that (as expected) per the field type: 1. the indexed value is lowercased 2. stripped of non-alpha characters 3. multiple consecutive whitespace is removed 4. trimmed 5. goes thru the SynonymFilterFactory where: a. the indexed value of "Marina/Former Fort Ord" is "marina former fort ord" b. the search value of "Marina/Former Ft Ord" is "marina former ft ord" This I already knew. My question wasn't "why" they dont match, it is: how do I get search for "fort st john" to match "ft saint john". ie is there a way to index/search that would allow the search to match. the SynonymFilterFactory during indexing does not create a matching term for "marina former ft ord", which I think it would do if the indexed value was a word instead of a phrase (ie "fort" vs "Marina/Former Fort Ord") (note that my terms/understanding of how this works may be incorrect, hence my request for assistance/understanding) -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4127764.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for "fort st john" to match "ft saint john"
Step 1 is to use the Analysis tool in the admin UI. That will show what each step in your pipeline is doing. wunder On Mar 26, 2014, at 2:10 PM, solr-user wrote: > I have been using solr for a while but started running across situations > where synonyms are required. > > the example I have is group of city names that look like "Fort Saint John" > (a city), in a text field. Users may want to search for "Ft St John" or > "Fort St John" or "Ft Saint John" however > > My attempted solution was to create a type that uses SynonymFilterFactory > and a text file of city based synonyms like this: > > saint,st,ste > fort,ft > > this doesnt work however and I am not sure I understand why. > > any help appreciated. thx > > p.s. I am using Solr 4.6.1 and here is the field type definition from the > solrconfig.xml: > > positionIncrementGap="100"> > > group="-1" /> > > replacement=" " replace="all" /> > replacement=" " replace="all" /> > > synonyms="city_index_synonyms.txt" ignoreCase="true" expand="true" /> > > > group="-1" /> > > replacement=" " replace="all" /> > replacement=" " replace="all" /> > > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231.html > Sent from the Solr - User mailing list archive at Nabble.com. -- Walter Underwood wun...@wunderwood.org
how do I get search for "fort st john" to match "ft saint john"
I have been using solr for a while but started running across situations where synonyms are required. the example I have is group of city names that look like "Fort Saint John" (a city), in a text field. Users may want to search for "Ft St John" or "Fort St John" or "Ft Saint John" however My attempted solution was to create a type that uses SynonymFilterFactory and a text file of city based synonyms like this: saint,st,ste fort,ft this doesnt work however and I am not sure I understand why. any help appreciated. thx p.s. I am using Solr 4.6.1 and here is the field type definition from the solrconfig.xml: -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231.html Sent from the Solr - User mailing list archive at Nabble.com.