Re: slow highlighting because of stemming

2011-08-01 Thread Orosz György
Thanks for the answers!
This was the solution! :) (my fault was that I tried to use the "on" value
instead of true - don't know why..)
Gyuri

2011/7/30 Michael Sokolov 

> On 7/30/2011 3:46 AM, Orosz György wrote:
>
>> Hi,
>>
>> Thanks for the answer!
>> I am doing some logging about stemming, and what I can see is that a lot
>> of
>> tokens are stemmed for the highlighting. It is the strange part, since I
>> don't understand why does any highlighter need stemming again.
>>
> Consider that the highlighter needs to match terms from the query with
> terms from the document, just like search. If the indexed document has been
> stemmed, then the query also needs to be stemmed, or you won't see matches.
>
> -Mike
>


Re: slow highlighting because of stemming

2011-07-30 Thread Orosz György
Hi,

Thanks for the answer!
I am doing some logging about stemming, and what I can see is that a lot of
tokens are stemmed for the highlighting. It is the strange part, since I
don't understand why does any highlighter need stemming again.
Anyway my docments are not really large, just a few kilobytes, but thanks
for this suggestion.

If you could help me in "how could I just ignore the stemming for
highlighting" thing it would be very great!

Thanks,
Gyuri

2011/7/29 Mike Sokolov 

> I'm not sure I would identify stemming as the culprit here.
>
> Do you have very large documents?  If so, there is a patch for FVH
> committed to limit the number of phrases it looks at; see hl.phraseLimit,
> but this won't be available until 3.4 is released.


> You can also limit the amount of each document that is analyzed by the
> regular Highlighter using maxDocCharsToAnalyze (and maybe this applies to
> FVH? not sure)
>
> Using RegexFragmenter is also probably slower than something like
> SimpleFragmenter.
>
> There is work to implement faster highlighting for Solr/Lucene, but it
> depends on some basic changes to the search architecture so it might be a
> while before that becomes available.  See https://issues.apache.org/**
> jira/browse/LUCENE-3318<https://issues.apache.org/jira/browse/LUCENE-3318>if 
> you're interested in following that development.
>
> -Mike
>
>
> On 07/29/2011 04:55 AM, Orosz György wrote:
>
>> Dear all,
>>
>> I am quite new about using Solr, but would like to ask your help.
>> I am developing an application which should be able to highlight the
>> results
>> of a query. For this I am using regex fragmenter:
>> 
>>> class="org.apache.solr.**highlight.RegexFragmenter">
>> 
>>   500
>>   0.5
>>   <**/str>
>>  

slow highlighting because of stemming

2011-07-29 Thread Orosz György
Dear all,

I am quite new about using Solr, but would like to ask your help.
I am developing an application which should be able to highlight the results
of a query. For this I am using regex fragmenter:

   

  500
  0.5
  
 
 true
  [-\w ,/\n\"']{20,300}[.?!]
  dokumentum_syn_query

   
  
The field is indexed with term vectors and offsets:


  


 

  
  

 
 


  


The highlighting works well, excepts that its really slow. I realized that
this is because the highlighter/fragmenter does stemming for all the results
documents again.

Could you please help me why does it happen an how should I avoid this? (I
thought that using fastvectorhighlighter will solve my problem, but it
didn't)

Thanks in advance!
Gyuri Orosz