[ 
https://issues.apache.org/jira/browse/SOLR-15246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301051#comment-17301051
 ] 

Matthew Flowerday commented on SOLR-15246:
------------------------------------------

Hi David

Thank you for looking into the issue I raised. A member of the Solr user forum 
requested that I raise it on JIRA. I found a workaround after digging around a 
bit and set hl.fragAlignRatio=0.0 to put things back to how they behaved in 
7.7.1 and got the speed back.

Our product looks for all the matches that Solr finds from the query and then 
only returns a configurable number of fields to display the context (defaults 
to the first 10) - that is why we use hl.fl=*.

I tested out using hl.requireFieldMatch=true and hl.fl=*- but then no 
highlighted text was returned.

What I have seen by using hl.fl=* in the Solr Admin Tool is that all the fields 
on the database come back (well I did specify hl.fl=* after all!) regardless of 
whether any text is present. Could there be an improvement here by only 
returning just those fields which contain text (that would keep the data being 
transferred down...).

The word that was being searched on was 'test' which appeared in numerous 
fields on a record and numerous times within a field - so it was a bit of a 
'corner' case rather than main stream. We had been running with 8.8.1 for a 
while before we had the issue - so it was not that common. Our test team tend 
to search for the word test whilst the development team does not as it is a bit 
of a common word!

On our production servers we have solrCloud spread over a number of shards - so 
that will help as you suggested.

I will try out the test you suggested and get back to you.

Thank you again for getting in contact.

Matthew

Matthew Flowerday | Consultant | ULEAF
Unisys | 01908 774830| matthew.flower...@unisys.com 
Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17 8LX



THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is for use only by the intended recipient. If you received this in 
error, please contact the sender and delete the e-mail and its attachments from 
all devices.
   



> A unified highlighting search under solr 8.8.0/8.8.1 can take over 20 mins to 
> run and eventually times out.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-15246
>                 URL: https://issues.apache.org/jira/browse/SOLR-15246
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: highlighter
>    Affects Versions: 8.8, 8.8.1
>         Environment: I was running solr under windows
>            Reporter: Matthew Flowerday
>            Priority: Minor
>
> With solr 8.8.0 a new unified highlighting parameter &hl.fragAlignRatio was 
> implemented which if not set defaults to 0.5. This attempts to improve the 
> high lighting so that highlighted text does not appear right at the left. 
> This works well but if you have a search result with numerous occurrences of 
> the word in question within the record performance goes right down!
> 2021-02-27 06:45:03.151 INFO  (qtp762476028-20) [   x:uleaf] 
> o.a.s.c.S.Request [uleaf]  webapp=/solr path=/select 
> params=\{hl.snippets=2&q=test&hl=on&hl.maxAnalyzedChars=1000000&fl=id,description,specification,score&start=20&hl.fl=*&rows=10&_=1614405119134}
>  hits=57008 status=0 QTime=1414320
> 2021-02-27 06:45:03.245 INFO  (qtp762476028-20) [   x:uleaf] 
> o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we 
> are shutting down => org.eclipse.jetty.io.EofException
>               at 
> org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
> org.eclipse.jetty.io.EofException: null
>               at 
> org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279) 
> ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>               at 
> org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422) 
> ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>               at 
> org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:378) 
> ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>  
> when I set &hl.fragAlignRatio=0.25 results came back much quicker
> 2021-02-27 14:59:57.189 INFO  (qtp1291367132-24) [   x:holmes] 
> o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
> params=\{hl.weightMatches=false&hl=on&fl=id,description,specification,score&start=1&hl.fragAlignRatio=0.25&rows=100&hl.snippets=2&q=test&hl.maxAnalyzedChars=1000000&hl.fl=*&hl.method=unified&timeAllowed=90000&_=1614430061690}
>  hits=136939 status=0 QTime=87024
> And  &hl.fragAlignRatio=0.1
> 2021-02-27 15:18:45.542 INFO  (qtp1291367132-19) [   x:holmes] 
> o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
> params=\{hl.weightMatches=false&hl=on&fl=id,description,specification,score&start=1&hl.fragAlignRatio=0.1&rows=100&hl.snippets=2&q=test&hl.maxAnalyzedChars=1000000&hl.fl=*&hl.method=unified&timeAllowed=90000&_=1614430061690}
>  hits=136939 status=0 QTime=69033
> And &hl.fragAlignRatio=0.0
> 2021-02-27 15:20:38.194 INFO  (qtp1291367132-24) [   x:holmes] 
> o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
> params=\{hl.weightMatches=false&hl=on&fl=id,description,specification,score&start=1&hl.fragAlignRatio=0.0&rows=100&hl.snippets=2&q=test&hl.maxAnalyzedChars=1000000&hl.fl=*&hl.method=unified&timeAllowed=90000&_=1614430061690}
>  hits=136939 status=0 QTime=2841
> I left our setting at 0.0 – this presumably how it was in 7.7.1 (fully left 
> aligned).  I am not too sure as to how many time a word has to occur in a 
> record for performance to go right down – but if too many it can have a BIG 
> impact.
> It might be an idea to set the default value to be say 0.25 instead of 0.5 so 
> that people are not caught out.
> I also noticed that setting &timeAllowed=90000 did not break out of the query 
> until it finished. Perhaps because the query finished quickly and what took 
> the time was the highlighting. It might be an idea to get &timeAllowed to 
> also cover any highlighting so that the query does not run until the jetty 
> timeout is hit. The machine 100% one core for about 20 mins!.
> I raised this at the request of a member of the user forum.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to