Re: Question regarding synonym

2009-10-05 Thread darniz

yes that's what we decided to expand these terms while indexing.
if we have
bayrische motoren werke = bmw

and i have a document which has bmw in it, searching for text:bayrische does
not give me results. i have to give
text:bayrische motoren werke then it actually takes the synonym and gets
me the document.

Now if i change the synonym mapping to 
bayrische motoren werke , bmw with expand parameter to true and also use
this file at indexing.

now at the  time i index this document along with bmw i also index the
following words bayrische motoren werke

any text query like text:motoren or text:bayrische will give me results now.

Please correct me if my assumption is wrong.

Thanks
darniz









Christian Zambrano wrote:
 
 
 
 On 10/02/2009 06:02 PM, darniz wrote:
 Thanks
 As i said it even works by giving double quotes too.
 like carDescription:austin martin

 So is that the conclusion that in order to map two word synonym i have to
 always enclose in double quotes, so that it doen not split the words




 Yes, but there are things you need to keep in mind.
 
  From the solr wiki:
 
 Keep in mind that while the SynonymFilter will happily work with 
 *synonyms* containing multiple words (ie: 
 sea biscuit, sea biscit, seabiscuit) The recommended approach for 
 dealing with *synonyms* like this, is to expand the synonym when 
 indexing. This is because there are two potential issues that can arrise 
 at query time:
 
1.
 
   The Lucene QueryParser tokenizes on white space before giving any
   text to the Analyzer, so if a person searches for the words
   sea biscit the analyzer will be given the words sea and biscit
   seperately, and will not know that they match a synonym.
 
2.
 
   Phrase searching (ie: sea biscit) will cause the QueryParser to
   pass the entire string to the analyzer, but if the SynonymFilter
   is configured to expand the *synonyms*, then when the QueryParser
   gets the resulting list of tokens back from the Analyzer, it will
   construct a MultiPhraseQuery that will not have the desired
   effect. This is because of the limited mechanism available for the
   Analyzer to indicate that two terms occupy the same position:
   there is no way to indicate that a phrase occupies the same
   position as a term. For our example the resulting MultiPhraseQuery
   would be (sea | sea | seabiscuit) (biscuit | biscit) which would
   not match the simple case of seabisuit occuring in a document
 
 







 Christian Zambrano wrote:

 When you use a field qualifier(fieldName:valueToLookFor) it only applies
 to the word right after the semicolon. If you look at the debug
 infomation you will notice that for the second word it is using the
 default field.

 str name=parsedquery_toStringcarDescription:austin
 *text*:martin/str

 the following should word:

 carDescription:(austin martin)


 On 10/02/2009 05:46 PM, darniz wrote:
  
 This is not working when i search documents i have a document which
 contains
 text aston martin

 when i search carDescription:austin martin i get a match but when i
 dont
 give double quotes

 like carDescription:austin martin
 there is no match

 in the analyser if i give austin martin with out quotes, when it passes
 through synonym filter it matches aston martin ,
 may be by default analyser treats it as a phrase austin martin but
 when
 i
 try to do a query by typing
 carDescription:austin martin i get 0 documents. the following is the
 debug
 node info with debugQuery=on

 str name=rawquerystringcarDescription:austin martin/str
 str name=querystringcarDescription:austin martin/str
 str name=parsedquerycarDescription:austin text:martin/str
 str name=parsedquery_toStringcarDescription:austin
 text:martin/str

 dont know why it breaks the word, may be its a desired behaviour
 when i give carDescription:austin martin of course in this its able
 to
 map
 to synonym and i get the desired result

 Any opinion

 darniz



 Ensdorf Ken wrote:



  
 Hi
 i have a question regarding synonymfilter
 i have a one way mapping defined
 austin martin, astonmartin =   aston martin



 ...

  
 Can anybody please explain if my observation is correct. This is a
 very
 critical aspect for my work.


 That is correct - the synonym filter can recognize multi-token
 synonyms
 from consecutive tokens in a stream.




  



  

 
 

-- 
View this message in context: 
http://www.nabble.com/Question-regarding-synonym-tp25720572p25754288.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question regarding synonym

2009-10-05 Thread Christian Zambrano

You are correct.

I would recommend to only use the Synonym TokenFilter at index time 
unless you have a very good reason to do it at query time.


On 10/05/2009 11:46 AM, darniz wrote:

yes that's what we decided to expand these terms while indexing.
if we have
bayrische motoren werke =  bmw

and i have a document which has bmw in it, searching for text:bayrische does
not give me results. i have to give
text:bayrische motoren werke then it actually takes the synonym and gets
me the document.

Now if i change the synonym mapping to
bayrische motoren werke , bmw with expand parameter to true and also use
this file at indexing.

now at the  time i index this document along with bmw i also index the
following words bayrische motoren werke

any text query like text:motoren or text:bayrische will give me results now.

Please correct me if my assumption is wrong.

Thanks
darniz









Christian Zambrano wrote:
   



On 10/02/2009 06:02 PM, darniz wrote:
 

Thanks
As i said it even works by giving double quotes too.
like carDescription:austin martin

So is that the conclusion that in order to map two word synonym i have to
always enclose in double quotes, so that it doen not split the words




   

Yes, but there are things you need to keep in mind.

  From the solr wiki:

Keep in mind that while the SynonymFilter will happily work with
*synonyms* containing multiple words (ie:
sea biscuit, sea biscit, seabiscuit) The recommended approach for
dealing with *synonyms* like this, is to expand the synonym when
indexing. This is because there are two potential issues that can arrise
at query time:

1.

   The Lucene QueryParser tokenizes on white space before giving any
   text to the Analyzer, so if a person searches for the words
   sea biscit the analyzer will be given the words sea and biscit
   seperately, and will not know that they match a synonym.

2.

   Phrase searching (ie: sea biscit) will cause the QueryParser to
   pass the entire string to the analyzer, but if the SynonymFilter
   is configured to expand the *synonyms*, then when the QueryParser
   gets the resulting list of tokens back from the Analyzer, it will
   construct a MultiPhraseQuery that will not have the desired
   effect. This is because of the limited mechanism available for the
   Analyzer to indicate that two terms occupy the same position:
   there is no way to indicate that a phrase occupies the same
   position as a term. For our example the resulting MultiPhraseQuery
   would be (sea | sea | seabiscuit) (biscuit | biscit) which would
   not match the simple case of seabisuit occuring in a document


 







Christian Zambrano wrote:

   

When you use a field qualifier(fieldName:valueToLookFor) it only applies
to the word right after the semicolon. If you look at the debug
infomation you will notice that for the second word it is using the
default field.

str name=parsedquery_toStringcarDescription:austin
*text*:martin/str

the following should word:

carDescription:(austin martin)


On 10/02/2009 05:46 PM, darniz wrote:

 

This is not working when i search documents i have a document which
contains
text aston martin

when i search carDescription:austin martin i get a match but when i
dont
give double quotes

like carDescription:austin martin
there is no match

in the analyser if i give austin martin with out quotes, when it passes
through synonym filter it matches aston martin ,
may be by default analyser treats it as a phrase austin martin but
when
i
try to do a query by typing
carDescription:austin martin i get 0 documents. the following is the
debug
node info with debugQuery=on

str name=rawquerystringcarDescription:austin martin/str
str name=querystringcarDescription:austin martin/str
str name=parsedquerycarDescription:austin text:martin/str
str name=parsedquery_toStringcarDescription:austin
text:martin/str

dont know why it breaks the word, may be its a desired behaviour
when i give carDescription:austin martin of course in this its able
to
map
to synonym and i get the desired result

Any opinion

darniz



Ensdorf Ken wrote:


   


 

Hi
i have a question regarding synonymfilter
i have a one way mapping defined
austin martin, astonmartin =aston martin



   

...


 

Can anybody please explain if my observation is correct. This is a
very
critical aspect for my work.


   

That is correct - the synonym filter can recognize multi-token
synonyms
from consecutive tokens in a stream.





 


   


 


   


 
   


Re: Question regarding synonym

2009-10-04 Thread Christian Zambrano



On 10/02/2009 06:02 PM, darniz wrote:

Thanks
As i said it even works by giving double quotes too.
like carDescription:austin martin

So is that the conclusion that in order to map two word synonym i have to
always enclose in double quotes, so that it doen not split the words



   

Yes, but there are things you need to keep in mind.

From the solr wiki:

Keep in mind that while the SynonymFilter will happily work with 
*synonyms* containing multiple words (ie: 
sea biscuit, sea biscit, seabiscuit) The recommended approach for 
dealing with *synonyms* like this, is to expand the synonym when 
indexing. This is because there are two potential issues that can arrise 
at query time:


  1.

 The Lucene QueryParser tokenizes on white space before giving any
 text to the Analyzer, so if a person searches for the words
 sea biscit the analyzer will be given the words sea and biscit
 seperately, and will not know that they match a synonym.

  2.

 Phrase searching (ie: sea biscit) will cause the QueryParser to
 pass the entire string to the analyzer, but if the SynonymFilter
 is configured to expand the *synonyms*, then when the QueryParser
 gets the resulting list of tokens back from the Analyzer, it will
 construct a MultiPhraseQuery that will not have the desired
 effect. This is because of the limited mechanism available for the
 Analyzer to indicate that two terms occupy the same position:
 there is no way to indicate that a phrase occupies the same
 position as a term. For our example the resulting MultiPhraseQuery
 would be (sea | sea | seabiscuit) (biscuit | biscit) which would
 not match the simple case of seabisuit occuring in a document










Christian Zambrano wrote:
   

When you use a field qualifier(fieldName:valueToLookFor) it only applies
to the word right after the semicolon. If you look at the debug
infomation you will notice that for the second word it is using the
default field.

str name=parsedquery_toStringcarDescription:austin *text*:martin/str

the following should word:

carDescription:(austin martin)


On 10/02/2009 05:46 PM, darniz wrote:
 

This is not working when i search documents i have a document which
contains
text aston martin

when i search carDescription:austin martin i get a match but when i
dont
give double quotes

like carDescription:austin martin
there is no match

in the analyser if i give austin martin with out quotes, when it passes
through synonym filter it matches aston martin ,
may be by default analyser treats it as a phrase austin martin but when
i
try to do a query by typing
carDescription:austin martin i get 0 documents. the following is the
debug
node info with debugQuery=on

str name=rawquerystringcarDescription:austin martin/str
str name=querystringcarDescription:austin martin/str
str name=parsedquerycarDescription:austin text:martin/str
str name=parsedquery_toStringcarDescription:austin text:martin/str

dont know why it breaks the word, may be its a desired behaviour
when i give carDescription:austin martin of course in this its able to
map
to synonym and i get the desired result

Any opinion

darniz



Ensdorf Ken wrote:

   


 

Hi
i have a question regarding synonymfilter
i have a one way mapping defined
austin martin, astonmartin =   aston martin


   

...

 

Can anybody please explain if my observation is correct. This is a very
critical aspect for my work.

   

That is correct - the synonym filter can recognize multi-token synonyms
from consecutive tokens in a stream.




 


   


 
   


Question regarding synonym

2009-10-02 Thread darniz

Hi 
i have a question regarding synonymfilter
i have a one way mapping defined 
austin martin, astonmartin = aston martin

what baffling me is that if i give at query time the word austin martin 

it first goes through white space and generate two words in analysis page
austin and  martin

then after synonym filter it replace it with words
aston martin

Thats good and thats what i want but i am wodering sicne it went to white
space tokeniser first and split the word in to two different word austin
and martin how come it was able to map the entire synonym and replace it.
If i give only austin the after passing thruough synonym filter it does not
replace it with aston.
That leads me to conclude that even though austin martin went thru
whitespace tokenizer factory and got split into two the word ordering is
still preserved to find a synonym match.

Can anybody please explain if my observation is correct. This is a very
critical aspect for my work.

Thanks
darniz 
-- 
View this message in context: 
http://www.nabble.com/Question-regarding-synonym-tp25720572p25720572.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Question regarding synonym

2009-10-02 Thread Ensdorf Ken
 Hi
 i have a question regarding synonymfilter
 i have a one way mapping defined
 austin martin, astonmartin = aston martin
 
...
 
 Can anybody please explain if my observation is correct. This is a very
 critical aspect for my work.

That is correct - the synonym filter can recognize multi-token synonyms from 
consecutive tokens in a stream.



RE: Question regarding synonym

2009-10-02 Thread darniz

This is not working when i search documents i have a document which contains
text aston martin

when i search carDescription:austin martin i get a match but when i dont
give double quotes

like carDescription:austin martin
there is no match

in the analyser if i give austin martin with out quotes, when it passes
through synonym filter it matches aston martin ,
may be by default analyser treats it as a phrase austin martin but when i
try to do a query by typing
carDescription:austin martin i get 0 documents. the following is the debug
node info with debugQuery=on

str name=rawquerystringcarDescription:austin martin/str
str name=querystringcarDescription:austin martin/str
str name=parsedquerycarDescription:austin text:martin/str
str name=parsedquery_toStringcarDescription:austin text:martin/str

dont know why it breaks the word, may be its a desired behaviour 
when i give carDescription:austin martin of course in this its able to map
to synonym and i get the desired result

Any opinion

darniz



Ensdorf Ken wrote:
 
 Hi
 i have a question regarding synonymfilter
 i have a one way mapping defined
 austin martin, astonmartin = aston martin
 
 ...
 
 Can anybody please explain if my observation is correct. This is a very
 critical aspect for my work.
 
 That is correct - the synonym filter can recognize multi-token synonyms
 from consecutive tokens in a stream.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Question-regarding-synonym-tp25720572p25723829.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question regarding synonym

2009-10-02 Thread Christian Zambrano
When you use a field qualifier(fieldName:valueToLookFor) it only applies 
to the word right after the semicolon. If you look at the debug 
infomation you will notice that for the second word it is using the 
default field.


str name=parsedquery_toStringcarDescription:austin *text*:martin/str

the following should word:

carDescription:(austin martin)


On 10/02/2009 05:46 PM, darniz wrote:

This is not working when i search documents i have a document which contains
text aston martin

when i search carDescription:austin martin i get a match but when i dont
give double quotes

like carDescription:austin martin
there is no match

in the analyser if i give austin martin with out quotes, when it passes
through synonym filter it matches aston martin ,
may be by default analyser treats it as a phrase austin martin but when i
try to do a query by typing
carDescription:austin martin i get 0 documents. the following is the debug
node info with debugQuery=on

str name=rawquerystringcarDescription:austin martin/str
str name=querystringcarDescription:austin martin/str
str name=parsedquerycarDescription:austin text:martin/str
str name=parsedquery_toStringcarDescription:austin text:martin/str

dont know why it breaks the word, may be its a desired behaviour
when i give carDescription:austin martin of course in this its able to map
to synonym and i get the desired result

Any opinion

darniz



Ensdorf Ken wrote:
   
 

Hi
i have a question regarding synonymfilter
i have a one way mapping defined
austin martin, astonmartin =  aston martin

   

...
 

Can anybody please explain if my observation is correct. This is a very
critical aspect for my work.
   

That is correct - the synonym filter can recognize multi-token synonyms
from consecutive tokens in a stream.



 
   


Re: Question regarding synonym

2009-10-02 Thread darniz

Thanks 
As i said it even works by giving double quotes too.
like carDescription:austin martin

So is that the conclusion that in order to map two word synonym i have to
always enclose in double quotes, so that it doen not split the words











Christian Zambrano wrote:
 
 When you use a field qualifier(fieldName:valueToLookFor) it only applies 
 to the word right after the semicolon. If you look at the debug 
 infomation you will notice that for the second word it is using the 
 default field.
 
 str name=parsedquery_toStringcarDescription:austin *text*:martin/str
 
 the following should word:
 
 carDescription:(austin martin)
 
 
 On 10/02/2009 05:46 PM, darniz wrote:
 This is not working when i search documents i have a document which
 contains
 text aston martin

 when i search carDescription:austin martin i get a match but when i
 dont
 give double quotes

 like carDescription:austin martin
 there is no match

 in the analyser if i give austin martin with out quotes, when it passes
 through synonym filter it matches aston martin ,
 may be by default analyser treats it as a phrase austin martin but when
 i
 try to do a query by typing
 carDescription:austin martin i get 0 documents. the following is the
 debug
 node info with debugQuery=on

 str name=rawquerystringcarDescription:austin martin/str
 str name=querystringcarDescription:austin martin/str
 str name=parsedquerycarDescription:austin text:martin/str
 str name=parsedquery_toStringcarDescription:austin text:martin/str

 dont know why it breaks the word, may be its a desired behaviour
 when i give carDescription:austin martin of course in this its able to
 map
 to synonym and i get the desired result

 Any opinion

 darniz



 Ensdorf Ken wrote:

  
 Hi
 i have a question regarding synonymfilter
 i have a one way mapping defined
 austin martin, astonmartin =  aston martin


 ...
  
 Can anybody please explain if my observation is correct. This is a very
 critical aspect for my work.

 That is correct - the synonym filter can recognize multi-token synonyms
 from consecutive tokens in a stream.



  

 
 

-- 
View this message in context: 
http://www.nabble.com/Question-regarding-synonym-tp25720572p25723980.html
Sent from the Solr - User mailing list archive at Nabble.com.