Re: Solr Phonetic Search Highlight issue in search results

2013-04-02 Thread Jan Høydahl
If you want to highlight, you need to turn on highlighting for the actual field 
you search, and that field needs to be stored, i.e. hl.fl=ContentSearchPhonetic

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

1. apr. 2013 kl. 14:16 skrev Erick Erickson erickerick...@gmail.com:

 Good question, you're causing me to think... about code I know very
 little about G.
 
 So rather than spouting off, I tried it and.. it works fine for me, either 
 with
 or without using fast vector highlighter on, admittedly, a very simple test.
 
 So I think I'd try peeling off all the extra stuff you've put into your 
 configs
 (sorry, I don't have time right now to try to reproduce) and get the very
 simple case working, then build the rest back up and see where the
 problem begins.
 
 Sorry for the mis-direction!
 
 Erick
 
 
 
 On Mon, Apr 1, 2013 at 1:07 AM, Soumyanayan Kar
 soumyanayan@rebaca.com wrote:
 Hi Erick,
 
 Thanks for the reply. But help me understand this: If Solr is able to
 isolate the two documents which contain the term fact being the phonetic
 equivalent of the search term fakt, then why will it be unable to
 highlight the terms based on the same logic it uses to search the documents.
 
 Also, it is correctly highlighting the results in other searches which are
 also approximate searches and not exact ones for eg. Fuzzy or Synonym
 search. In these cases also the highlights in the search results are far
 from the actual search term but still they are getting correctly
 highlighted.
 
 Maybe I am getting it completely wrong but it looks like there is something
 wrong with my implementation.
 
 Thanks  Regards,
 
 Soumya.
 
 
 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: 27 March 2013 06:07 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Phonetic Search Highlight issue in search results
 
 How would you expect it to highlight successfully? The term is fakt,
 there's nothing built in (and, indeed couldn't be) to un-phoneticize it into
 fact and apply that to the Content field. The whole point of phonetic
 processing is to do a lossy translation from the word into some variant,
 losing precision all the way.
 
 So this behavior is unsurprising...
 
 Best
 Erick
 
 
 
 
 On Tue, Mar 26, 2013 at 7:28 AM, Soumyanayan Kar soumyanayan@rebaca.com
 wrote:
 
 When we are issuing a query with Phonetic Search, it is returning the
 correct documents but not returning the highlights. When we use
 Stemming or Synonym searches we are getting the proper highlights.
 
 
 
 For example, when we execute a phonetic query for the term
 fakt(ContentSearchPhonetic:fakt) in the Solr Admin interface, it
 returns two documents containing the term fact(phonetic token
 equivalent), but the list of highlights is empty as shown in the
 response below.
 
 
 
response
 
lst name=responseHeader
 
int name=status0/int
 
int name=QTime16/int
 
lst name=params
 
  str name=qContentSearchPhonetic:fakt/str
 
  str name=wtxml/str
 
/lst
 
  /lst
 
result name=response numFound=2 start=0
 
doc
 
  long name=DocId1/long
 
  str name=DocTitleDoc 1/str
 
  str name=ContentAnyway, this game was excellent and was
 well worth the time.  The graphics are truly amazing and the sound
 track was pretty pleasant also. The  preacher was in  fact a
 thief./str
 
  long name=_version_1430480998833848320/long
 
/doc
 
doc
 
  long name=DocId2/long
 
  str name=DocTitleDoc 2/str
 
  str name=Contentstunning. The  preacher was in  fact an
 excellent thief who  had stolen the original manuscript of Hamlet
 from an exhibit on the  Riviera, where  he also  acquired his
 remarkable and tan./str
 
  long name=_version_1430480998841188352/long
 
/doc
 
  /result
 
  lst name=highlighting
 
lst name=1/
 
lst name=2/
 
  /lst
 
/response
 
 
 
 Relevant section of Solr schema:
 
 
 
field name=DocId type=long indexed=true stored=true
 required=true/
 
field name=DocTitle type=string indexed=false stored=true
 required=true/
 
field name=Content type=text_general indexed=false
 stored=true
 required=true/
 
 
 
field name=ContentSearch type=text_general indexed=true
 stored=false multiValued=true/
 
field name=ContentSearchStemming type=text_stem indexed=true
 stored=false multiValued=true/
 
field name=ContentSearchPhonetic type=text_phonetic
 indexed=true
 stored=false multiValued=true/
 
field name=ContentSearchSynonym type=text_synonym indexed=true
 stored=false multiValued=true/
 
 
 
uniqueKeyDocId/uniqueKey
 
copyField source=Content dest=ContentSearch/
 
copyField source=Content dest=ContentSearchStemming/
 
copyField source=Content dest=ContentSearchPhonetic/
 
copyField source=Content dest=ContentSearchSynonym

RE: Solr Phonetic Search Highlight issue in search results

2013-04-02 Thread Soumyanayan Kar
Thanks a lot Erick for trying this out.

Will wait for a reply from your end.

Thanks  Regards,

Soumya.


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 01 April 2013 05:46 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Phonetic Search Highlight issue in search results

Good question, you're causing me to think... about code I know very little
about G.

So rather than spouting off, I tried it and.. it works fine for me, either
with or without using fast vector highlighter on, admittedly, a very simple
test.

So I think I'd try peeling off all the extra stuff you've put into your
configs (sorry, I don't have time right now to try to reproduce) and get the
very simple case working, then build the rest back up and see where the
problem begins.

Sorry for the mis-direction!

Erick



On Mon, Apr 1, 2013 at 1:07 AM, Soumyanayan Kar soumyanayan@rebaca.com
wrote:
 Hi Erick,

 Thanks for the reply. But help me understand this: If Solr is able to 
 isolate the two documents which contain the term fact being the 
 phonetic equivalent of the search term fakt, then why will it be 
 unable to highlight the terms based on the same logic it uses to search
the documents.

 Also, it is correctly highlighting the results in other searches which 
 are also approximate searches and not exact ones for eg. Fuzzy or 
 Synonym search. In these cases also the highlights in the search 
 results are far from the actual search term but still they are getting 
 correctly highlighted.

 Maybe I am getting it completely wrong but it looks like there is 
 something wrong with my implementation.

 Thanks  Regards,

 Soumya.


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: 27 March 2013 06:07 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Phonetic Search Highlight issue in search results

 How would you expect it to highlight successfully? The term is fakt, 
 there's nothing built in (and, indeed couldn't be) to un-phoneticize 
 it into fact and apply that to the Content field. The whole point of 
 phonetic processing is to do a lossy translation from the word into 
 some variant, losing precision all the way.

 So this behavior is unsurprising...

 Best
 Erick




 On Tue, Mar 26, 2013 at 7:28 AM, Soumyanayan Kar 
 soumyanayan@rebaca.com
 wrote:

 When we are issuing a query with Phonetic Search, it is returning the 
 correct documents but not returning the highlights. When we use 
 Stemming or Synonym searches we are getting the proper highlights.



 For example, when we execute a phonetic query for the term
 fakt(ContentSearchPhonetic:fakt) in the Solr Admin interface, it 
 returns two documents containing the term fact(phonetic token 
 equivalent), but the list of highlights is empty as shown in the 
 response below.



 response

 lst name=responseHeader

 int name=status0/int

 int name=QTime16/int

 lst name=params

   str name=qContentSearchPhonetic:fakt/str

   str name=wtxml/str

 /lst

   /lst

 result name=response numFound=2 start=0

 doc

   long name=DocId1/long

   str name=DocTitleDoc 1/str

   str name=ContentAnyway, this game was excellent and was 
 well worth the time.  The graphics are truly amazing and the sound 
 track was pretty pleasant also. The  preacher was in  fact a 
 thief./str

   long name=_version_1430480998833848320/long

 /doc

 doc

   long name=DocId2/long

   str name=DocTitleDoc 2/str

   str name=Contentstunning. The  preacher was in  fact an 
 excellent thief who  had stolen the original manuscript of Hamlet 
 from an exhibit on the  Riviera, where  he also  acquired his 
 remarkable and tan./str

   long name=_version_1430480998841188352/long

 /doc

   /result

   lst name=highlighting

 lst name=1/

 lst name=2/

   /lst

 /response



 Relevant section of Solr schema:



 field name=DocId type=long indexed=true stored=true
 required=true/

 field name=DocTitle type=string indexed=false stored=true
 required=true/

 field name=Content type=text_general indexed=false
 stored=true
 required=true/



 field name=ContentSearch type=text_general indexed=true
 stored=false multiValued=true/

 field name=ContentSearchStemming type=text_stem indexed=true
 stored=false multiValued=true/

 field name=ContentSearchPhonetic type=text_phonetic
 indexed=true
 stored=false multiValued=true/

 field name=ContentSearchSynonym type=text_synonym indexed=true
 stored=false multiValued=true/



 uniqueKeyDocId/uniqueKey

 copyField source=Content dest=ContentSearch/

 copyField source=Content dest=ContentSearchStemming/

 copyField source=Content dest=ContentSearchPhonetic/

 copyField source=Content dest=ContentSearchSynonym/



 fieldType name=text_stem class=solr.TextField 

   analyzer

Re: Solr Phonetic Search Highlight issue in search results

2013-04-01 Thread Erick Erickson
Good question, you're causing me to think... about code I know very
little about G.

So rather than spouting off, I tried it and.. it works fine for me, either with
or without using fast vector highlighter on, admittedly, a very simple test.

So I think I'd try peeling off all the extra stuff you've put into your configs
(sorry, I don't have time right now to try to reproduce) and get the very
simple case working, then build the rest back up and see where the
problem begins.

Sorry for the mis-direction!

Erick



On Mon, Apr 1, 2013 at 1:07 AM, Soumyanayan Kar
soumyanayan@rebaca.com wrote:
 Hi Erick,

 Thanks for the reply. But help me understand this: If Solr is able to
 isolate the two documents which contain the term fact being the phonetic
 equivalent of the search term fakt, then why will it be unable to
 highlight the terms based on the same logic it uses to search the documents.

 Also, it is correctly highlighting the results in other searches which are
 also approximate searches and not exact ones for eg. Fuzzy or Synonym
 search. In these cases also the highlights in the search results are far
 from the actual search term but still they are getting correctly
 highlighted.

 Maybe I am getting it completely wrong but it looks like there is something
 wrong with my implementation.

 Thanks  Regards,

 Soumya.


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: 27 March 2013 06:07 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Phonetic Search Highlight issue in search results

 How would you expect it to highlight successfully? The term is fakt,
 there's nothing built in (and, indeed couldn't be) to un-phoneticize it into
 fact and apply that to the Content field. The whole point of phonetic
 processing is to do a lossy translation from the word into some variant,
 losing precision all the way.

 So this behavior is unsurprising...

 Best
 Erick




 On Tue, Mar 26, 2013 at 7:28 AM, Soumyanayan Kar soumyanayan@rebaca.com
 wrote:

 When we are issuing a query with Phonetic Search, it is returning the
 correct documents but not returning the highlights. When we use
 Stemming or Synonym searches we are getting the proper highlights.



 For example, when we execute a phonetic query for the term
 fakt(ContentSearchPhonetic:fakt) in the Solr Admin interface, it
 returns two documents containing the term fact(phonetic token
 equivalent), but the list of highlights is empty as shown in the
 response below.



 response

 lst name=responseHeader

 int name=status0/int

 int name=QTime16/int

 lst name=params

   str name=qContentSearchPhonetic:fakt/str

   str name=wtxml/str

 /lst

   /lst

 result name=response numFound=2 start=0

 doc

   long name=DocId1/long

   str name=DocTitleDoc 1/str

   str name=ContentAnyway, this game was excellent and was
 well worth the time.  The graphics are truly amazing and the sound
 track was pretty pleasant also. The  preacher was in  fact a
 thief./str

   long name=_version_1430480998833848320/long

 /doc

 doc

   long name=DocId2/long

   str name=DocTitleDoc 2/str

   str name=Contentstunning. The  preacher was in  fact an
 excellent thief who  had stolen the original manuscript of Hamlet
 from an exhibit on the  Riviera, where  he also  acquired his
 remarkable and tan./str

   long name=_version_1430480998841188352/long

 /doc

   /result

   lst name=highlighting

 lst name=1/

 lst name=2/

   /lst

 /response



 Relevant section of Solr schema:



 field name=DocId type=long indexed=true stored=true
 required=true/

 field name=DocTitle type=string indexed=false stored=true
 required=true/

 field name=Content type=text_general indexed=false
 stored=true
 required=true/



 field name=ContentSearch type=text_general indexed=true
 stored=false multiValued=true/

 field name=ContentSearchStemming type=text_stem indexed=true
 stored=false multiValued=true/

 field name=ContentSearchPhonetic type=text_phonetic
 indexed=true
 stored=false multiValued=true/

 field name=ContentSearchSynonym type=text_synonym indexed=true
 stored=false multiValued=true/



 uniqueKeyDocId/uniqueKey

 copyField source=Content dest=ContentSearch/

 copyField source=Content dest=ContentSearchStemming/

 copyField source=Content dest=ContentSearchPhonetic/

 copyField source=Content dest=ContentSearchSynonym/



 fieldType name=text_stem class=solr.TextField 

   analyzer

  tokenizer class=solr.WhitespaceTokenizerFactory/

  filter class=solr.SnowballPorterFilterFactory/

   /analyzer

 /fieldType



 fieldType name=text_phonetic class=solr.TextField 

   analyzer

  tokenizer class=solr.WhitespaceTokenizerFactory/

  filter class=solr.PhoneticFilterFactory

RE: Solr Phonetic Search Highlight issue in search results

2013-03-31 Thread Soumyanayan Kar
Hi Erick,

Thanks for the reply. But help me understand this: If Solr is able to
isolate the two documents which contain the term fact being the phonetic
equivalent of the search term fakt, then why will it be unable to
highlight the terms based on the same logic it uses to search the documents.

Also, it is correctly highlighting the results in other searches which are
also approximate searches and not exact ones for eg. Fuzzy or Synonym
search. In these cases also the highlights in the search results are far
from the actual search term but still they are getting correctly
highlighted.

Maybe I am getting it completely wrong but it looks like there is something
wrong with my implementation.

Thanks  Regards,

Soumya.


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 27 March 2013 06:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Phonetic Search Highlight issue in search results

How would you expect it to highlight successfully? The term is fakt,
there's nothing built in (and, indeed couldn't be) to un-phoneticize it into
fact and apply that to the Content field. The whole point of phonetic
processing is to do a lossy translation from the word into some variant,
losing precision all the way.

So this behavior is unsurprising...

Best
Erick




On Tue, Mar 26, 2013 at 7:28 AM, Soumyanayan Kar soumyanayan@rebaca.com
 wrote:

 When we are issuing a query with Phonetic Search, it is returning the 
 correct documents but not returning the highlights. When we use 
 Stemming or Synonym searches we are getting the proper highlights.



 For example, when we execute a phonetic query for the term
 fakt(ContentSearchPhonetic:fakt) in the Solr Admin interface, it 
 returns two documents containing the term fact(phonetic token 
 equivalent), but the list of highlights is empty as shown in the 
 response below.



 response

 lst name=responseHeader

 int name=status0/int

 int name=QTime16/int

 lst name=params

   str name=qContentSearchPhonetic:fakt/str

   str name=wtxml/str

 /lst

   /lst

 result name=response numFound=2 start=0

 doc

   long name=DocId1/long

   str name=DocTitleDoc 1/str

   str name=ContentAnyway, this game was excellent and was 
 well worth the time.  The graphics are truly amazing and the sound 
 track was pretty pleasant also. The  preacher was in  fact a 
 thief./str

   long name=_version_1430480998833848320/long

 /doc

 doc

   long name=DocId2/long

   str name=DocTitleDoc 2/str

   str name=Contentstunning. The  preacher was in  fact an 
 excellent thief who  had stolen the original manuscript of Hamlet  
 from an exhibit on the  Riviera, where  he also  acquired his 
 remarkable and tan./str

   long name=_version_1430480998841188352/long

 /doc

   /result

   lst name=highlighting

 lst name=1/

 lst name=2/

   /lst

 /response



 Relevant section of Solr schema:



 field name=DocId type=long indexed=true stored=true
 required=true/

 field name=DocTitle type=string indexed=false stored=true
 required=true/

 field name=Content type=text_general indexed=false
stored=true
 required=true/



 field name=ContentSearch type=text_general indexed=true
 stored=false multiValued=true/

 field name=ContentSearchStemming type=text_stem indexed=true
 stored=false multiValued=true/

 field name=ContentSearchPhonetic type=text_phonetic
indexed=true
 stored=false multiValued=true/

 field name=ContentSearchSynonym type=text_synonym indexed=true
 stored=false multiValued=true/



 uniqueKeyDocId/uniqueKey

 copyField source=Content dest=ContentSearch/

 copyField source=Content dest=ContentSearchStemming/

 copyField source=Content dest=ContentSearchPhonetic/

 copyField source=Content dest=ContentSearchSynonym/



 fieldType name=text_stem class=solr.TextField 

   analyzer

  tokenizer class=solr.WhitespaceTokenizerFactory/

  filter class=solr.SnowballPorterFilterFactory/

   /analyzer

 /fieldType



 fieldType name=text_phonetic class=solr.TextField 

   analyzer

  tokenizer class=solr.WhitespaceTokenizerFactory/

  filter class=solr.PhoneticFilterFactory
 encoder=DoubleMetaphone inject=false/

   /analyzer

 /fieldType



 fieldType name=text_synonym class=solr.TextField 

 analyzer

   tokenizer class=solr.WhitespaceTokenizerFactory/

   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/

 /analyzer

 /fieldType



 Relevant section of Solr config:



 requestHandler name=/select class=solr.SearchHandler

 !-- default values for query parameters can be specified, these

  will be overridden by parameters in the request

   --

  lst name=defaults

str name

Solr Phonetic Search Highlight issue in search results

2013-03-26 Thread Soumyanayan Kar
When we are issuing a query with Phonetic Search, it is returning the
correct documents but not returning the highlights. When we use Stemming or
Synonym searches we are getting the proper highlights.

 

For example, when we execute a phonetic query for the term
fakt(ContentSearchPhonetic:fakt) in the Solr Admin interface, it returns two
documents containing the term fact(phonetic token equivalent), but the
list of highlights is empty as shown in the response below.

 

response

lst name=responseHeader

int name=status0/int

int name=QTime16/int

lst name=params

  str name=qContentSearchPhonetic:fakt/str

  str name=wtxml/str

/lst

  /lst

result name=response numFound=2 start=0

doc

  long name=DocId1/long

  str name=DocTitleDoc 1/str

  str name=ContentAnyway, this game was excellent and was well
worth the time.  The graphics are truly amazing and the sound track was
pretty pleasant also. The  preacher was in  fact a thief./str

  long name=_version_1430480998833848320/long

/doc

doc

  long name=DocId2/long

  str name=DocTitleDoc 2/str

  str name=Contentstunning. The  preacher was in  fact an
excellent thief who  had stolen the original manuscript of Hamlet  from an
exhibit on the  Riviera, where  he also  acquired his remarkable and
tan./str

  long name=_version_1430480998841188352/long

/doc

  /result

  lst name=highlighting

lst name=1/

lst name=2/

  /lst

/response

 

Relevant section of Solr schema:

 

field name=DocId type=long indexed=true stored=true
required=true/

field name=DocTitle type=string indexed=false stored=true
required=true/

field name=Content type=text_general indexed=false stored=true
required=true/



field name=ContentSearch type=text_general indexed=true
stored=false multiValued=true/

field name=ContentSearchStemming type=text_stem indexed=true
stored=false multiValued=true/

field name=ContentSearchPhonetic type=text_phonetic indexed=true
stored=false multiValued=true/

field name=ContentSearchSynonym type=text_synonym indexed=true
stored=false multiValued=true/



uniqueKeyDocId/uniqueKey

copyField source=Content dest=ContentSearch/

copyField source=Content dest=ContentSearchStemming/

copyField source=Content dest=ContentSearchPhonetic/

copyField source=Content dest=ContentSearchSynonym/



fieldType name=text_stem class=solr.TextField 

  analyzer

 tokenizer class=solr.WhitespaceTokenizerFactory/

 filter class=solr.SnowballPorterFilterFactory/

  /analyzer  

/fieldType



fieldType name=text_phonetic class=solr.TextField 

  analyzer

 tokenizer class=solr.WhitespaceTokenizerFactory/

 filter class=solr.PhoneticFilterFactory
encoder=DoubleMetaphone inject=false/

  /analyzer  

/fieldType



fieldType name=text_synonym class=solr.TextField 

analyzer

  tokenizer class=solr.WhitespaceTokenizerFactory/

  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/

/analyzer 

/fieldType

 

Relevant section of Solr config:

 

requestHandler name=/select class=solr.SearchHandler

!-- default values for query parameters can be specified, these

 will be overridden by parameters in the request

  --

 lst name=defaults

   str name=echoParamsexplicit/str

   int name=rows100/int

   str name=dfContentSearch/str

 bool name=hltrue/bool

str name=hl.flContent/str

str name=f.Content.hl.fragsize150/str

  str name=f.Content.hl.snippets40/str

 /lst

/requestHandler

searchComponent class=solr.HighlightComponent name=highlight

highlighting

!-- Configure the standard fragmenter --

!-- This could most likely be commented out in the default case --

fragmenter name=gap 

default=true

class=solr.highlight.GapFragmenter

  lst name=defaults

int name=hl.fragsize100/int

  /lst

/fragmenter



!-- A regular-expression-based fragmenter 

 (for sentence extraction) 

  --

fragmenter name=regex 

class=solr.highlight.RegexFragmenter

  lst name=defaults

!-- slightly smaller fragsizes work better because of slop --

int name=hl.fragsize70/int

!-- allow 50% slop on fragment sizes --

float name=hl.regex.slop0.5/float

!-- a basic sentence pattern --

str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str

  /lst

/fragmenter

 

Has anyone experienced this kind of behaviour before? Need some direction
for troubleshooting.

 

Soumya.

 

 



Re: Solr Phonetic Search Highlight issue in search results

2013-03-26 Thread Erick Erickson
How would you expect it to highlight successfully? The term is fakt,
there's nothing built in (and, indeed couldn't be) to un-phoneticize it
into fact and apply that to the Content field. The whole point of
phonetic processing is to do a lossy translation from the word into some
variant, losing precision all the way.

So this behavior is unsurprising...

Best
Erick




On Tue, Mar 26, 2013 at 7:28 AM, Soumyanayan Kar soumyanayan@rebaca.com
 wrote:

 When we are issuing a query with Phonetic Search, it is returning the
 correct documents but not returning the highlights. When we use Stemming or
 Synonym searches we are getting the proper highlights.



 For example, when we execute a phonetic query for the term
 fakt(ContentSearchPhonetic:fakt) in the Solr Admin interface, it returns
 two
 documents containing the term fact(phonetic token equivalent), but the
 list of highlights is empty as shown in the response below.



 response

 lst name=responseHeader

 int name=status0/int

 int name=QTime16/int

 lst name=params

   str name=qContentSearchPhonetic:fakt/str

   str name=wtxml/str

 /lst

   /lst

 result name=response numFound=2 start=0

 doc

   long name=DocId1/long

   str name=DocTitleDoc 1/str

   str name=ContentAnyway, this game was excellent and was well
 worth the time.  The graphics are truly amazing and the sound track was
 pretty pleasant also. The  preacher was in  fact a thief./str

   long name=_version_1430480998833848320/long

 /doc

 doc

   long name=DocId2/long

   str name=DocTitleDoc 2/str

   str name=Contentstunning. The  preacher was in  fact an
 excellent thief who  had stolen the original manuscript of Hamlet  from an
 exhibit on the  Riviera, where  he also  acquired his remarkable and
 tan./str

   long name=_version_1430480998841188352/long

 /doc

   /result

   lst name=highlighting

 lst name=1/

 lst name=2/

   /lst

 /response



 Relevant section of Solr schema:



 field name=DocId type=long indexed=true stored=true
 required=true/

 field name=DocTitle type=string indexed=false stored=true
 required=true/

 field name=Content type=text_general indexed=false stored=true
 required=true/



 field name=ContentSearch type=text_general indexed=true
 stored=false multiValued=true/

 field name=ContentSearchStemming type=text_stem indexed=true
 stored=false multiValued=true/

 field name=ContentSearchPhonetic type=text_phonetic indexed=true
 stored=false multiValued=true/

 field name=ContentSearchSynonym type=text_synonym indexed=true
 stored=false multiValued=true/



 uniqueKeyDocId/uniqueKey

 copyField source=Content dest=ContentSearch/

 copyField source=Content dest=ContentSearchStemming/

 copyField source=Content dest=ContentSearchPhonetic/

 copyField source=Content dest=ContentSearchSynonym/



 fieldType name=text_stem class=solr.TextField 

   analyzer

  tokenizer class=solr.WhitespaceTokenizerFactory/

  filter class=solr.SnowballPorterFilterFactory/

   /analyzer

 /fieldType



 fieldType name=text_phonetic class=solr.TextField 

   analyzer

  tokenizer class=solr.WhitespaceTokenizerFactory/

  filter class=solr.PhoneticFilterFactory
 encoder=DoubleMetaphone inject=false/

   /analyzer

 /fieldType



 fieldType name=text_synonym class=solr.TextField 

 analyzer

   tokenizer class=solr.WhitespaceTokenizerFactory/

   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/

 /analyzer

 /fieldType



 Relevant section of Solr config:



 requestHandler name=/select class=solr.SearchHandler

 !-- default values for query parameters can be specified, these

  will be overridden by parameters in the request

   --

  lst name=defaults

str name=echoParamsexplicit/str

int name=rows100/int

str name=dfContentSearch/str

  bool name=hltrue/bool

 str name=hl.flContent/str

 str name=f.Content.hl.fragsize150/str

   str name=f.Content.hl.snippets40/str

  /lst

 /requestHandler

 searchComponent class=solr.HighlightComponent name=highlight

 highlighting

 !-- Configure the standard fragmenter --

 !-- This could most likely be commented out in the default case --

 fragmenter name=gap

 default=true

 class=solr.highlight.GapFragmenter

   lst name=defaults

 int name=hl.fragsize100/int

   /lst

 /fragmenter



 !-- A regular-expression-based fragmenter

  (for sentence extraction)

   --

 fragmenter name=regex

 class=solr.highlight.RegexFragmenter

   lst name=defaults

 !-- slightly smaller fragsizes work better because of 

Re: highlight issue

2011-12-02 Thread Ravish Bhagdev
Also, not entirely sure wild-cards are supported in text based fields, only
on strings.  Although things may have changed in recent versions of Solr, I
am not sure.

R

On Thu, Dec 1, 2011 at 3:55 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 Suppose my search query is *Rak*.In my database i have *Rakesh
 Chaturvedi
 * name.
 I am getting *emRak/ememRak/emesh Chaturvedi* as the response.

 Same the case with the following names.

 Search Dhar -- highlight emDhar/ememDhar/em**mesh Darshan
 Search Suda-- highlight emSuda/ememSuda/em**rshan Faakir

 Can someone help me?

 I am using the following filters for index and query.

 fieldType name=text_autofill class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.**KeywordTokenizerFactory/
 filter class=solr.**LowerCaseFilterFactory/
 filter class=solr.**WordDelimiterFilterFactory
 generateWordParts=1 preserveOriginal=1/
 filter class=solr.**EdgeNGramFilterFactory minGramSize=1
 maxGramSize=50 side=front/
   /analyzer
   analyzer type=query
 tokenizer class=solr.**StandardTokenizerFactory/
 filter class=solr.**LowerCaseFilterFactory/
 filter class=solr.**WordDelimiterFilterFactory
 generateWordParts=1 preserveOriginal=1/
   /analyzer
 /fieldType


 I don't think Highlighter can support n-gram field.
 Can you try to comment out EdgeNGramFilterFactory and re-index then
 highlight?

 koji
 --
 Check out Query Log Visualizer for Apache Solr
 http://www.rondhuit-demo.com/**loganalyzer/loganalyzer.htmlhttp://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
 http://www.rondhuit.com/en/



highlight issue

2011-12-01 Thread Radha Krishna Reddy
Hi,

I am indexing around 2000 names using solr. highlight flag is on while
querying.

For some name i am getting the search substring appened at the start.

Suppose my search query is *Rak*.In my database i have *Rakesh Chaturvedi
* name.
I am getting *emRak/ememRak/emesh Chaturvedi* as the response.

Same the case with the following names.

Search Dhar -- highlight emDhar/ememDhar/emmesh Darshan
Search Suda-- highlight emSuda/ememSuda/emrshan Faakir

Can someone help me?

I am using the following filters for index and query.

fieldType name=text_autofill class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 preserveOriginal=1/
filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=50 side=front/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 preserveOriginal=1/
  /analyzer
/fieldType

Thanks and Regards,
Radha Krishna Reddy.


Re: highlight issue

2011-12-01 Thread Koji Sekiguchi

Suppose my search query is *Rak*.In my database i have *Rakesh Chaturvedi
* name.
I am getting *emRak/ememRak/emesh Chaturvedi* as the response.

Same the case with the following names.

Search Dhar -- highlight emDhar/ememDhar/emmesh Darshan
Search Suda-- highlight emSuda/ememSuda/emrshan Faakir

Can someone help me?

I am using the following filters for index and query.

fieldType name=text_autofill class=solr.TextField
positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 preserveOriginal=1/
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=50 side=front/
   /analyzer
   analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 preserveOriginal=1/
   /analyzer
 /fieldType


I don't think Highlighter can support n-gram field.
Can you try to comment out EdgeNGramFilterFactory and re-index then highlight?

koji
--
Check out Query Log Visualizer for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/