Matthew Flowerday created SOLR-15246:
----------------------------------------

             Summary: A unified highlighting search under solr 8.8.0/8.8.1 can 
take over 20 mins to run and eventually times out.
                 Key: SOLR-15246
                 URL: https://issues.apache.org/jira/browse/SOLR-15246
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: highlighter
    Affects Versions: 8.8.1, 8.8
         Environment: I was running solr under windows
            Reporter: Matthew Flowerday


With solr 8.8.0 a new unified highlighting parameter &hl.fragAlignRatio was 
implemented which if not set defaults to 0.5. This attempts to improve the high 
lighting so that highlighted text does not appear right at the left. This works 
well but if you have a search result with numerous occurrences of the word in 
question within the record performance goes right down!

2021-02-27 06:45:03.151 INFO  (qtp762476028-20) [   x:uleaf] o.a.s.c.S.Request 
[uleaf]  webapp=/solr path=/select 
params=\{hl.snippets=2&q=test&hl=on&hl.maxAnalyzedChars=1000000&fl=id,description,specification,score&start=20&hl.fl=*&rows=10&_=1614405119134}
 hits=57008 status=0 QTime=1414320

2021-02-27 06:45:03.245 INFO  (qtp762476028-20) [   x:uleaf] 
o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we 
are shutting down => org.eclipse.jetty.io.EofException

              at 
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)

org.eclipse.jetty.io.EofException: null

              at 
org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279) 
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

              at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422) 
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

              at 
org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:378) 
~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]

 

when I set &hl.fragAlignRatio=0.25 results came back much quicker

2021-02-27 14:59:57.189 INFO  (qtp1291367132-24) [   x:holmes] 
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
params=\{hl.weightMatches=false&hl=on&fl=id,description,specification,score&start=1&hl.fragAlignRatio=0.25&rows=100&hl.snippets=2&q=test&hl.maxAnalyzedChars=1000000&hl.fl=*&hl.method=unified&timeAllowed=90000&_=1614430061690}
 hits=136939 status=0 QTime=87024

And  &hl.fragAlignRatio=0.1

2021-02-27 15:18:45.542 INFO  (qtp1291367132-19) [   x:holmes] 
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
params=\{hl.weightMatches=false&hl=on&fl=id,description,specification,score&start=1&hl.fragAlignRatio=0.1&rows=100&hl.snippets=2&q=test&hl.maxAnalyzedChars=1000000&hl.fl=*&hl.method=unified&timeAllowed=90000&_=1614430061690}
 hits=136939 status=0 QTime=69033

And &hl.fragAlignRatio=0.0

2021-02-27 15:20:38.194 INFO  (qtp1291367132-24) [   x:holmes] 
o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
params=\{hl.weightMatches=false&hl=on&fl=id,description,specification,score&start=1&hl.fragAlignRatio=0.0&rows=100&hl.snippets=2&q=test&hl.maxAnalyzedChars=1000000&hl.fl=*&hl.method=unified&timeAllowed=90000&_=1614430061690}
 hits=136939 status=0 QTime=2841

I left our setting at 0.0 – this presumably how it was in 7.7.1 (fully left 
aligned).  I am not too sure as to how many time a word has to occur in a 
record for performance to go right down – but if too many it can have a BIG 
impact.

It might be an idea to set the default value to be say 0.25 instead of 0.5 so 
that people are not caught out.

I also noticed that setting &timeAllowed=90000 did not break out of the query 
until it finished. Perhaps because the query finished quickly and what took the 
time was the highlighting. It might be an idea to get &timeAllowed to also 
cover any highlighting so that the query does not run until the jetty timeout 
is hit. The machine 100% one core for about 20 mins!.

I raised this at the request of a member of the user forum.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to