Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-03 Thread solr-user
thanks guys.

unfortunately the solr that contains this schema/data is in a legacy system
that requires the fields to not be changed.

we will, hopefully in the near future, be able to look at redesigning the
schema.

alternatively, I could look at boning up on Java (which I havent used in a
long time) and see if I can write a subword synonym plugin of some sort to
perform this type of synonyming

thanks anyhow.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128914.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-02 Thread Jack Krupansky
And, if you use the pf, pf2, and pf3 parameters of edismax, with boosting, 
you can assure that the closest matches always appear first.


And assuming you do index-time synonym expansion.

-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Wednesday, April 2, 2014 3:09 PM
To: solr-user@lucene.apache.org
Subject: Re: how do I get search for "fort st john" to match "ft saint john"

No, there isn't a tokenizer that'll do what you want that I know
about. Really, I suspect you need to back up a bit and re-think the
problem. It looks to me like you've taken a path that's going to cause
you endless grief when, as Jack says, phrase searches are built in to
the tokenization process.

Best,
Erick


On Wed, Apr 2, 2014 at 12:58 PM, Jack Krupansky  
wrote:

Query by phrase is a core feature of tokenized text in Lucene and Solr, so
there is no need to use a pattern token filter for that purpose. And yes,
doing so pretty much breaks most token filters that would assume that the
text is tokenized.

-- Jack Krupansky

-Original Message- From: solr-user
Sent: Wednesday, April 2, 2014 12:46 PM
To: solr-user@lucene.apache.org

Subject: Re: how do I get search for "fort st john" to match "ft saint 
john"


Hi Eric.

No, that doesnt fix the problem either (I have tested this previously and
did so again just now)

Since the PatternTokenizerFactory is not tokenizing on whitespace(by 
design

since I want the user to search by phrase), the phrase "marina former fort
ord" (for example) does not get turned into four tokens ("marina", 
"former",
"fort" and "ord"), and so the SynonymFilterFactory does not create 
synonyms

for them (by design)

the original question remains: is there a tokenizer/plugin that will allow
me to synonym words in a unbroken phrase?

note: the reason I dont want to tokenize the data by whitespace is that it
would cause way to many results to get returned if I, for example, search 
on

"new" or "st" ...  However, I still want to be able to include "fort saint
john" in the results if the user searches for "ft st john" or "fort st 
john"

or ...



--
View this message in context:
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html
Sent from the Solr - User mailing list archive at Nabble.com. 




Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-02 Thread Erick Erickson
No, there isn't a tokenizer that'll do what you want that I know
about. Really, I suspect you need to back up a bit and re-think the
problem. It looks to me like you've taken a path that's going to cause
you endless grief when, as Jack says, phrase searches are built in to
the tokenization process.

Best,
Erick


On Wed, Apr 2, 2014 at 12:58 PM, Jack Krupansky  wrote:
> Query by phrase is a core feature of tokenized text in Lucene and Solr, so
> there is no need to use a pattern token filter for that purpose. And yes,
> doing so pretty much breaks most token filters that would assume that the
> text is tokenized.
>
> -- Jack Krupansky
>
> -Original Message- From: solr-user
> Sent: Wednesday, April 2, 2014 12:46 PM
> To: solr-user@lucene.apache.org
>
> Subject: Re: how do I get search for "fort st john" to match "ft saint john"
>
> Hi Eric.
>
> No, that doesnt fix the problem either (I have tested this previously and
> did so again just now)
>
> Since the PatternTokenizerFactory is not tokenizing on whitespace(by design
> since I want the user to search by phrase), the phrase "marina former fort
> ord" (for example) does not get turned into four tokens ("marina", "former",
> "fort" and "ord"), and so the SynonymFilterFactory does not create synonyms
> for them (by design)
>
> the original question remains: is there a tokenizer/plugin that will allow
> me to synonym words in a unbroken phrase?
>
> note: the reason I dont want to tokenize the data by whitespace is that it
> would cause way to many results to get returned if I, for example, search on
> "new" or "st" ...  However, I still want to be able to include "fort saint
> john" in the results if the user searches for "ft st john" or "fort st john"
> or ...
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-02 Thread Jack Krupansky
Query by phrase is a core feature of tokenized text in Lucene and Solr, so 
there is no need to use a pattern token filter for that purpose. And yes, 
doing so pretty much breaks most token filters that would assume that the 
text is tokenized.


-- Jack Krupansky

-Original Message- 
From: solr-user

Sent: Wednesday, April 2, 2014 12:46 PM
To: solr-user@lucene.apache.org
Subject: Re: how do I get search for "fort st john" to match "ft saint john"

Hi Eric.

No, that doesnt fix the problem either (I have tested this previously and
did so again just now)

Since the PatternTokenizerFactory is not tokenizing on whitespace(by design
since I want the user to search by phrase), the phrase "marina former fort
ord" (for example) does not get turned into four tokens ("marina", "former",
"fort" and "ord"), and so the SynonymFilterFactory does not create synonyms
for them (by design)

the original question remains: is there a tokenizer/plugin that will allow
me to synonym words in a unbroken phrase?

note: the reason I dont want to tokenize the data by whitespace is that it
would cause way to many results to get returned if I, for example, search on
"new" or "st" ...  However, I still want to be able to include "fort saint
john" in the results if the user searches for "ft st john" or "fort st john"
or ...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-02 Thread solr-user
Hi Eric.

No, that doesnt fix the problem either (I have tested this previously and
did so again just now)

Since the PatternTokenizerFactory is not tokenizing on whitespace(by design
since I want the user to search by phrase), the phrase "marina former fort
ord" (for example) does not get turned into four tokens ("marina", "former",
"fort" and "ord"), and so the SynonymFilterFactory does not create synonyms
for them (by design)

the original question remains: is there a tokenizer/plugin that will allow
me to synonym words in a unbroken phrase?

note: the reason I dont want to tokenize the data by whitespace is that it
would cause way to many results to get returned if I, for example, search on
"new" or "st" ...  However, I still want to be able to include "fort saint
john" in the results if the user searches for "ft st john" or "fort st john"
or ...



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-01 Thread alxsss
It seems to me that, you are missing this line  

  

under
 

Alex.

 

 

-Original Message-
From: solr-user 
To: solr-user 
Sent: Tue, Apr 1, 2014 5:01 pm
Subject: Re: how do I get search for "fort st john" to match "ft saint john"


Hi Eric.

Sorry, been away.  

The city_index_synonyms.txt file is pretty small as it contains just these
two lines:

saint,st,ste
fort,ft

There is nothing at all in the city_query_synonyms.txt file, and it isn't
used either.

My understanding is that solr would create the appropriate synonym entries
in the index and so treat "fort" and "ft" as equal

if you have a simple one line schema (that uses the type definition from my
original email) and index "fort saint john", does it work for you?  i.e.
does it return results if you search for "ft st john" and "ft saint john"
and "fort st john"?  

My Solr 4.6.1 instance doesn't.  I am wondering if synonyms just don't work
for all/some words in a phrase



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128500.html
Sent from the Solr - User mailing list archive at Nabble.com.

 


Re: how do I get search for "fort st john" to match "ft saint john"

2014-04-01 Thread solr-user
Hi Eric.

Sorry, been away.  

The city_index_synonyms.txt file is pretty small as it contains just these
two lines:

saint,st,ste
fort,ft

There is nothing at all in the city_query_synonyms.txt file, and it isn't
used either.

My understanding is that solr would create the appropriate synonym entries
in the index and so treat "fort" and "ft" as equal

if you have a simple one line schema (that uses the type definition from my
original email) and index "fort saint john", does it work for you?  i.e.
does it return results if you search for "ft st john" and "ft saint john"
and "fort st john"?  

My Solr 4.6.1 instance doesn't.  I am wondering if synonyms just don't work
for all/some words in a phrase



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128500.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how do I get search for "fort st john" to match "ft saint john"

2014-03-29 Thread Erick Erickson
What does your synonyms file look like? Because this breaks things up
fine, you get individual tokens etc. it seems. If your synonyms file
maps saint to st, and fort to ft (or vice-versa) it should work.

If this is off base, could you post the synonyms you expect to be applied?

Best,
Erick

On Fri, Mar 28, 2014 at 1:52 PM, solr-user  wrote:
> yes, and I can see that (as expected) per the field type:
>
> 1. the indexed value is lowercased
> 2. stripped of non-alpha characters
> 3. multiple consecutive whitespace is removed
> 4. trimmed
> 5. goes thru the SynonymFilterFactory where:
>
> a. the indexed value of "Marina/Former Fort Ord" is "marina former fort ord"
> b. the search value of "Marina/Former Ft Ord" is "marina former ft ord"
>
> This I already knew.  My question wasn't "why" they dont match, it is: how
> do I get search for "fort st john" to match "ft saint john".  ie is there a
> way to index/search that would allow the search to match.
>
> the SynonymFilterFactory during indexing does not create a matching term for
> "marina former ft ord", which I think it would do if the indexed value was a
> word instead of a phrase (ie "fort" vs "Marina/Former Fort Ord")
>
> (note that my terms/understanding of how this works may be incorrect, hence
> my request for assistance/understanding)
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4127764.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: how do I get search for "fort st john" to match "ft saint john"

2014-03-28 Thread solr-user
yes, and I can see that (as expected) per the field type:

1. the indexed value is lowercased
2. stripped of non-alpha characters
3. multiple consecutive whitespace is removed
4. trimmed
5. goes thru the SynonymFilterFactory where:

a. the indexed value of "Marina/Former Fort Ord" is "marina former fort ord"
b. the search value of "Marina/Former Ft Ord" is "marina former ft ord"

This I already knew.  My question wasn't "why" they dont match, it is: how
do I get search for "fort st john" to match "ft saint john".  ie is there a
way to index/search that would allow the search to match.

the SynonymFilterFactory during indexing does not create a matching term for
"marina former ft ord", which I think it would do if the indexed value was a
word instead of a phrase (ie "fort" vs "Marina/Former Fort Ord")

(note that my terms/understanding of how this works may be incorrect, hence
my request for assistance/understanding)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4127764.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how do I get search for "fort st john" to match "ft saint john"

2014-03-26 Thread Walter Underwood
Step 1 is to use the Analysis tool in the admin UI. That will show what each 
step in your pipeline is doing.

wunder

On Mar 26, 2014, at 2:10 PM, solr-user  wrote:

> I have been using solr for a while but started running across situations
> where synonyms are required.
> 
> the example I have is group of city names that look like "Fort Saint John"
> (a city), in a text field.  Users may want to search for "Ft St John" or
> "Fort St John" or "Ft Saint John" however
> 
> My attempted solution was to create a type that uses SynonymFilterFactory
> and a text file of city based synonyms like this:
> 
>   saint,st,ste
>   fort,ft
> 
> this doesnt work however and I am not sure I understand why.
> 
> any help appreciated.  thx
> 
> p.s. I am using Solr 4.6.1 and here is the field type definition from the
> solrconfig.xml:
> 
> positionIncrementGap="100">
>  
> group="-1" />
>
> replacement=" " replace="all" />
> replacement=" " replace="all" />
>
> synonyms="city_index_synonyms.txt" ignoreCase="true" expand="true" />
>  
>      
>     group="-1" />
>    
> replacement=" " replace="all" />
> replacement=" " replace="all" />
>
>  
>
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231.html
> Sent from the Solr - User mailing list archive at Nabble.com.

--
Walter Underwood
wun...@wunderwood.org





how do I get search for "fort st john" to match "ft saint john"

2014-03-26 Thread solr-user
I have been using solr for a while but started running across situations
where synonyms are required.

the example I have is group of city names that look like "Fort Saint John"
(a city), in a text field.  Users may want to search for "Ft St John" or
"Fort St John" or "Ft Saint John" however

My attempted solution was to create a type that uses SynonymFilterFactory
and a text file of city based synonyms like this:

   saint,st,ste
   fort,ft

this doesnt work however and I am not sure I understand why.

any help appreciated.  thx

p.s. I am using Solr 4.6.1 and here is the field type definition from the
solrconfig.xml:


  






  
  





  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231.html
Sent from the Solr - User mailing list archive at Nabble.com.