Re: questions about synonyms

2010-09-02 Thread Lance Norskog
2. Is there a way to do synonyms' highlight in search result?

From the highlighter's point of view, there are one or more terms at a
position. The SynonymFilter adds or changes those terms. Other filters
also add or change those terms. The highlighter highlights whatever it
finds.

On Tue, Aug 31, 2010 at 2:06 PM, Geert-Jan Brits gbr...@gmail.com wrote:
 concerning:
 . I got a very big text file of synonyms. How I can use it? Do I need to
 index this text file first?

 have you seen
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter ?

 Cheers,
 Geert-Jan
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter

 2010/8/31 Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov

 Hello,



 I have an couple of questions about synonyms.



 1. I got a very big text file of synonyms. How I can use it? Do I need to
 index this text file first?



 2. Is there a way to do synonyms' highlight in search result?



 3. Does anyone use WordNet to solr?





 Thanks so much in advance,






-- 
Lance Norskog
goks...@gmail.com


questions about synonyms

2010-08-31 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Hello, 

I have an couple of questions about synonyms.

1. I got a very big text file of synonyms. How I can use it? Do I need to index 
this text file first?

2. Is there a way to do synonyms' highlight in search result?

3. Does anyone use WordNet to solr? 


Thanks so much in advance, 


Re: questions about synonyms

2010-08-31 Thread Chris Hostetter

: Subject: questions about synonyms
: References: b28a6774-1ccc-4c2a-8d7b-0ee2b07a5...@apache.org
: In-Reply-To: b28a6774-1ccc-4c2a-8d7b-0ee2b07a5...@apache.org

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



questions about synonyms

2010-08-31 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Hello,



I have an couple of questions about synonyms.



1. I got a very big text file of synonyms. How I can use it? Do I need to index 
this text file first?



2. Is there a way to do synonyms' highlight in search result?



3. Does anyone use WordNet to solr?





Thanks so much in advance,



Re: questions about synonyms

2010-08-31 Thread Geert-Jan Brits
concerning:
 . I got a very big text file of synonyms. How I can use it? Do I need to
index this text file first?

have you seen
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter ?

Cheers,
Geert-Jan
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter

2010/8/31 Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov

 Hello,



 I have an couple of questions about synonyms.



 1. I got a very big text file of synonyms. How I can use it? Do I need to
 index this text file first?



 2. Is there a way to do synonyms' highlight in search result?



 3. Does anyone use WordNet to solr?





 Thanks so much in advance,




Re: Questions about synonyms and highlighting

2009-10-07 Thread Shalin Shekhar Mangar
I'm not an expert on hit highlighting but please find some answers inline:

On Wed, Sep 30, 2009 at 9:03 PM, Nourredine K. nourredin...@yahoo.comwrote:

 Hi,

 Can you please give me some answers for those questions :

 1 - How can I get synonyms found for  a keyword ?

 I mean i search foo and i have in my synonyms.txt file the following
 tokens : foo, foobar, fee (with expand = true)
 My index contains foo and foobar. I want to display a message in a
 result page, on the header for example, only the 2 matched tokens and not
 fee  like Results found for foo and foobar


Whatever token is available in the index, will be matched but I don't think
it is possible to show only those synonyms which matched some documents.
Adding debugQuery=on can give you some more information like how the score
for a particular document was calculated for the given query.


 2 - Can solR make analysis on an index to extract associations between
 tokens ?

 for example , if foo often appears with fee in a field, it will
 associate the 2 tokens.


Solr won't compute associations but there are ways of achieving something
similar. For example, the MoreLikeThis functionality clusters related
documents through co-occurrence of terms in a given field. Also, the
TermVectorComponent can give you position information for terms in a
document. You can use that to build your own co-occurrence associations.

If you just want to query for two words within a fixed position difference,
you can do proximity matches.

http://lucene.apache.org/java/2_9_0/queryparsersyntax.html#Proximity%20Searches

Perhaps somebody else can weigh on your question #3 and #4.

-- 
Regards,
Shalin Shekhar Mangar.


Re : Questions about synonyms and highlighting

2009-10-07 Thread Nourredine K.
 I'm not an expert on hit highlighting but please find some answers inline:

Thanks Shalin for your answers. It helps a lot.

I post again questions #3 and #4 for the others :)


3 - Is it possible and if so How can I configure solR to set or not highlighting
for tokens with diacritics ? 


Settings for vélo (all highlighted) == the two words emvélo/em and
emvelo/em are highlighted
Settings for vélo == the first word emvélo/em is highlighted but not
the second  : velo 


4 - the same question for highlighting with lemmatisation? 


Settings for manage (all highlighted) == the two wordsemmanage/em and
emmanagement/em are highlighted
Settings for manage == the first word emmanage/em is highlighted but
not the second  : management 
Regard,

Nourredine.


  

Re: Re : Questions about synonyms and highlighting

2009-10-07 Thread Avlesh Singh

 4 - the same question for highlighting with lemmatisation?
 Settings for manage (all highlighted) == the two wordsemmanage/em
 and
 emmanagement/em are highlighted
 Settings for manage == the first word emmanage/em is highlighted
 but
 not the second  : management


There is no Lemmatisation support in Solr as of now. The only support you
get is stemming.
Let me understand this correctly - you basically want the searches to happen
with stemmed base but want to selectively highlight the original and/or
stemmed words. Right? If yes, then AFAIK, this is not possible. Search
passes through your fields analyzers (tokenizers and filters). Highlighters,
typically, use the same set of analyzers and the behavior will be the same
as in search; this essentially means that the keywords manage, managing,
management and manager are REDUCED to manage for searchers and
highlighters.
If this can be done, then the only place to enable your feature could be
Lucene highlighter api's. Someone more knowledegable can tell you, if that
is possible.

I have no idea about your #3, though my idea of handling accentuation is to
apply a  ISOLatin1AccentFilterFactory and get rid of them altogether :)
I am curious to know the answer though.

Cheers
Avlesh

On Wed, Oct 7, 2009 at 3:17 PM, Nourredine K. nourredin...@yahoo.comwrote:

  I'm not an expert on hit highlighting but please find some answers
 inline:

 Thanks Shalin for your answers. It helps a lot.

 I post again questions #3 and #4 for the others :)


 3 - Is it possible and if so How can I configure solR to set or not
 highlighting
 for tokens with diacritics ?


 Settings for vélo (all highlighted) == the two words emvélo/em and
 emvelo/em are highlighted
 Settings for vélo == the first word emvélo/em is highlighted but
 not
 the second  : velo


 4 - the same question for highlighting with lemmatisation?


 Settings for manage (all highlighted) == the two wordsemmanage/em
 and
 emmanagement/em are highlighted
 Settings for manage == the first word emmanage/em is highlighted
 but
 not the second  : management
 Regard,

 Nourredine.





Re : Re : Questions about synonyms and highlighting

2009-10-07 Thread Nourredine K.
Thanks Avlesh.

Now, I understand better how higtlighting works.

As you've said, since it is based on the analysers, higtlighting will handle 
things like search.

A precision about #3 and #4 examples , they are exclusives : I wanted to know 
how to do higtlighting with stemming OR without (not both in same time)

So I think you've answered to #3 too :) All depend on your analysers. And for 
my case, the ISOLatin1AccentFilterFactory could do the job.

Thanks again Shalin and Avlesh.

Regard,

Nourredine.


 There is no Lemmatisation support in Solr as of now. The only support you
 get is stemming.
 Let me understand this correctly - you basically want the searches to happen
 with stemmed base but want to selectively highlight the original and/or
 stemmed words. Right? If yes, then AFAIK, this is not possible. Search
 passes through your fields analyzers (tokenizers and filters). Highlighters,
 typically, use the same set of analyzers and the behavior will be the same
 as in search; this essentially means that the keywords manage, managing,
 management and manager are REDUCED to manage for searchers and
 highlighters.
 If this can be done, then the only place to enable your feature could be
 Lucene highlighter api's. Someone more knowledegable can tell you, if that
 is possible.

 I have no idea about your #3, though my idea of handling accentuation is to
 apply a  ISOLatin1AccentFilterFactory and get rid of them altogether :)
 I am curious to know the answer though.

__
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible 
contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail 

Tr : Questions about synonyms and highlighting

2009-10-06 Thread Nourredine K.
Hello,

Even short/partial answers could satisfy me :)


Nourredine.


Hi,

Can you please give me some answers for those questions : 

1 - How can I get synonyms found for  a keyword ? 
  
I mean i search foo and i have in my synonyms.txt file the following tokens 
: foo, foobar, fee (with expand = true)
My index contains foo and foobar. I want to display a message in a result 
page, on the header for example, only the 2 matched tokens and not fee  
like Results found for foo and foobar 

2 - Can solR make analysis on an index to extract associations between tokens ?

for example , if foo often appears with fee in a field, it will associate 
the 2 tokens.

3 - Is it possible and if so How can I configure solR to set or not 
highlighting for tokens with diacritics ? 

Settings for vélo (all highlighted) == the two words emvélo/em and 
emvelo/em are highlighted
Settings for vélo == the first word emvélo/em is highlighted but not 
the second  : velo

4 - the same question for highlighting with lemmatisation?

Settings for manage (all highlighted) == the two wordsemmanage/em and 
emmanagement/em are highlighted
Settings for manage == the first word emmanage/em is highlighted but 
not the second  : management


Thanks in advance.

Regards 

Nourredine.

__
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible 
contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail 

Questions about synonyms and highlighting

2009-09-30 Thread Nourredine K.
Hi,

Can you please give me some answers for those questions : 

1 - How can I get synonyms found for  a keyword ? 

I mean i search foo and i have in my synonyms.txt file the following tokens : 
foo, foobar, fee (with expand = true)
My index contains foo and foobar. I want to display a message in a result 
page, on the header for example, only the 2 matched tokens and not fee  like 
Results found for foo and foobar 

2 - Can solR make analysis on an index to extract associations between tokens ?

for example , if foo often appears with fee in a field, it will associate 
the 2 tokens.

3 - Is it possible and if so How can I configure solR to set or not 
highlighting for tokens with diacritics ? 

Settings for vélo (all highlighted) == the two words emvélo/em and 
emvelo/em are highlighted
Settings for vélo == the first word emvélo/em is highlighted but not 
the second  : velo

4 - the same question for highlighting with lemmatisation?

Settings for manage (all highlighted) == the two wordsemmanage/em and 
emmanagement/em are highlighted
Settings for manage == the first word emmanage/em is highlighted but 
not the second  : management


Thanks in advance.

Regards 

Nourredine.


__
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible 
contre les messages non sollicités 
http://mail.yahoo.fr Yahoo! Mail