[jira] [Commented] (SOLR-6889) Highlight using multiple threads
[ https://issues.apache.org/jira/browse/SOLR-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084589#comment-16084589 ] David Smiley commented on SOLR-6889: I do think there would be a benefit for some cases, yes. ANALYSIS offset source most likely, perhaps also with lots of fields to highlight to add to the workload. > Highlight using multiple threads > > > Key: SOLR-6889 > URL: https://issues.apache.org/jira/browse/SOLR-6889 > Project: Solr > Issue Type: Improvement >Affects Versions: 6.0 >Reporter: Shinichiro Abe > Attachments: SOLR-6889.patch > > > I think we could gain search performance a little bit using > Stream.parallel().forEach()~ which has processors awareness via f/j framework > under the hood. > Especially it would affect docList's for-loop processes, e.g. debugging, > highlighting. > It seems to me that this improvement is effective for many CPUs environment. > My test condition: > 1. Core i5(2core 4thead), standalone Solr. > 2. q=日本&debug=true&hl=true, other parameters are > [here|https://github.com/anond2/simplesearch/blob/master/conf/solrconfig.xml#L836]. > 3. 7171 hits / 12000 docs(taken from ja.wikipedia dump) > 4. compared to trunk, parallel streams are faster a little. > My query execution results(QTime): > {noformat} > == rows=10 == > trunk patch > 1st 236146 > 2nd 179100 > 3rd 79 72 > 4th 75 53 > 5th 91 80 > == rows=50 == > trunk patch > 1st 485325 > 2nd 225243 > 3rd 199151 > 4th 168127 > 5th 149118 > == rows=100 == > trunk patch > 1st 948607 > 2nd 472390 > 3rd 237201 > 4th 256200 > 5th 224178 > == rows=500 == > trunk patch > 1st 3248 2826 > 2nd 1545 1067 > 3rd 1563 801 > 4th 1551 816 > 5th 1452 777 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6889) Highlight using multiple threads
[ https://issues.apache.org/jira/browse/SOLR-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084502#comment-16084502 ] Shinichiro Abe commented on SOLR-6889: -- I agree. But could it make much faster by being parallelized when using FieldOffsetStrategy#getOffsetsEnums(), especially OffsetSource.ANALYSIS strategy case, i.e. storeOffsetsWithPositions = false case in which user can select fields to highlight after indexing? I assumed text analysis work, which the standard highlighter has, would be able to be parallelized, borrowed by an idea of facet.threads method at that time. Although I saw a benchmark where UH's offsetSource=ANALYSIS is already much faster than the standard one. > Highlight using multiple threads > > > Key: SOLR-6889 > URL: https://issues.apache.org/jira/browse/SOLR-6889 > Project: Solr > Issue Type: Improvement >Affects Versions: 6.0 >Reporter: Shinichiro Abe > Attachments: SOLR-6889.patch > > > I think we could gain search performance a little bit using > Stream.parallel().forEach()~ which has processors awareness via f/j framework > under the hood. > Especially it would affect docList's for-loop processes, e.g. debugging, > highlighting. > It seems to me that this improvement is effective for many CPUs environment. > My test condition: > 1. Core i5(2core 4thead), standalone Solr. > 2. q=日本&debug=true&hl=true, other parameters are > [here|https://github.com/anond2/simplesearch/blob/master/conf/solrconfig.xml#L836]. > 3. 7171 hits / 12000 docs(taken from ja.wikipedia dump) > 4. compared to trunk, parallel streams are faster a little. > My query execution results(QTime): > {noformat} > == rows=10 == > trunk patch > 1st 236146 > 2nd 179100 > 3rd 79 72 > 4th 75 53 > 5th 91 80 > == rows=50 == > trunk patch > 1st 485325 > 2nd 225243 > 3rd 199151 > 4th 168127 > 5th 149118 > == rows=100 == > trunk patch > 1st 948607 > 2nd 472390 > 3rd 237201 > 4th 256200 > 5th 224178 > == rows=500 == > trunk patch > 1st 3248 2826 > 2nd 1545 1067 > 3rd 1563 801 > 4th 1551 816 > 5th 1452 777 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6889) Highlight using multiple threads
[ https://issues.apache.org/jira/browse/SOLR-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083382#comment-16083382 ] David Smiley commented on SOLR-6889: Oh, and by sharding your data more you get some amount of threading automatically. So this also lowers the usefulness of adding threading. > Highlight using multiple threads > > > Key: SOLR-6889 > URL: https://issues.apache.org/jira/browse/SOLR-6889 > Project: Solr > Issue Type: Improvement >Affects Versions: 6.0 >Reporter: Shinichiro Abe > Attachments: SOLR-6889.patch > > > I think we could gain search performance a little bit using > Stream.parallel().forEach()~ which has processors awareness via f/j framework > under the hood. > Especially it would affect docList's for-loop processes, e.g. debugging, > highlighting. > It seems to me that this improvement is effective for many CPUs environment. > My test condition: > 1. Core i5(2core 4thead), standalone Solr. > 2. q=日本&debug=true&hl=true, other parameters are > [here|https://github.com/anond2/simplesearch/blob/master/conf/solrconfig.xml#L836]. > 3. 7171 hits / 12000 docs(taken from ja.wikipedia dump) > 4. compared to trunk, parallel streams are faster a little. > My query execution results(QTime): > {noformat} > == rows=10 == > trunk patch > 1st 236146 > 2nd 179100 > 3rd 79 72 > 4th 75 53 > 5th 91 80 > == rows=50 == > trunk patch > 1st 485325 > 2nd 225243 > 3rd 199151 > 4th 168127 > 5th 149118 > == rows=100 == > trunk patch > 1st 948607 > 2nd 472390 > 3rd 237201 > 4th 256200 > 5th 224178 > == rows=500 == > trunk patch > 1st 3248 2826 > 2nd 1545 1067 > 3rd 1563 801 > 4th 1551 816 > 5th 1452 777 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org