Hi,
Thanks for the answer!
I am doing some logging about stemming, and what I can see is that a lot of
tokens are stemmed for the highlighting. It is the strange part, since I
don't understand why does any highlighter need stemming again.
Anyway my docments are not really large, just a few kilobytes, but thanks
for this suggestion.
If you could help me in how could I just ignore the stemming for
highlighting thing it would be very great!
Thanks,
Gyuri
2011/7/29 Mike Sokolov soko...@ifactory.com
I'm not sure I would identify stemming as the culprit here.
Do you have very large documents? If so, there is a patch for FVH
committed to limit the number of phrases it looks at; see hl.phraseLimit,
but this won't be available until 3.4 is released.
You can also limit the amount of each document that is analyzed by the
regular Highlighter using maxDocCharsToAnalyze (and maybe this applies to
FVH? not sure)
Using RegexFragmenter is also probably slower than something like
SimpleFragmenter.
There is work to implement faster highlighting for Solr/Lucene, but it
depends on some basic changes to the search architecture so it might be a
while before that becomes available. See https://issues.apache.org/**
jira/browse/LUCENE-3318https://issues.apache.org/jira/browse/LUCENE-3318if
you're interested in following that development.
-Mike
On 07/29/2011 04:55 AM, Orosz György wrote:
Dear all,
I am quite new about using Solr, but would like to ask your help.
I am developing an application which should be able to highlight the
results
of a query. For this I am using regex fragmenter:
highlighting
fragmenter name=regex
class=org.apache.solr.**highlight.RegexFragmenter
lst name=defaults
int name=hl.fragsize500/int
float name=hl.regex.slop0.5/**float
str name=hl.pre![CDATA[b]]**/str
str name=hl.post![CDATA[/b]]**/str
str name=hl.**useFastVectorHighlighter**true/str
str name=hl.regex.pattern[-\w ,/\n\']{20,300}[.?!]/str
str name=hl.fldokumentum_syn_**query/str
/lst
/fragmenter
/highlighting
The field is indexed with term vectors and offsets:
field name=dokumentum_syn_query type=huntext_syn indexed=true
stored=true multiValued=true termVectors=on termPositions=on
termOffsets=on/
fieldType name=huntext_syn class=solr.TextField stored=true
indexed=true positionIncrementGap=100
analyzer type=index
tokenizer
class=com.morphologic.solr.**huntoken.HunTokenizerFactory/**
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_query.txt enablePositionIncrements=**true /
filter class=com.morphologic.solr.**hunstem.**HumorStemFilterFactory
lex=/home/oroszgy/workspace/**morpho/solrplugins/data/lex
cache=alma/
filter class=solr.**LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.**StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_query.txt enablePositionIncrements=**true /
filter class=com.morphologic.solr.**hunstem.**HumorStemFilterFactory
lex=/home/oroszgy/workspace/**morpho/solrplugins/data/lex
cache=alma/
filter class=solr.**SynonymFilterFactory
synonyms=synonyms_query.txt ignoreCase=true expand=true/
filter class=solr.**LowerCaseFilterFactory/
/analyzer
/fieldType
The highlighting works well, excepts that its really slow. I realized that
this is because the highlighter/fragmenter does stemming for all the
results
documents again.
Could you please help me why does it happen an how should I avoid this? (I
thought that using fastvectorhighlighter will solve my problem, but it
didn't)
Thanks in advance!
Gyuri Orosz