[jira] [Commented] (SOLR-6889) Highlight using multiple threads

2017-07-12 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084589#comment-16084589
 ] 

David Smiley commented on SOLR-6889:


I do think there would be a benefit for some cases, yes.  ANALYSIS offset 
source most likely, perhaps also with lots of fields to highlight to add to the 
workload.

> Highlight using multiple threads
> 
>
> Key: SOLR-6889
> URL: https://issues.apache.org/jira/browse/SOLR-6889
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 6.0
>Reporter: Shinichiro Abe
> Attachments: SOLR-6889.patch
>
>
> I think we could gain search performance a little bit using 
> Stream.parallel().forEach()~ which has processors awareness via f/j framework 
> under the hood.
> Especially it would affect docList's for-loop processes, e.g. debugging, 
> highlighting.
> It seems to me that this improvement is effective for many CPUs environment.
> My test condition:
> 1. Core i5(2core 4thead), standalone Solr.
> 2. q=日本&debug=true&hl=true, other parameters are 
> [here|https://github.com/anond2/simplesearch/blob/master/conf/solrconfig.xml#L836].
> 3. 7171 hits / 12000 docs(taken from ja.wikipedia dump)
> 4. compared to trunk, parallel streams are faster a little.
> My query execution results(QTime):
> {noformat}
> == rows=10 ==
> trunk  patch 
> 1st 236146
> 2nd 179100
> 3rd 79 72
> 4th 75 53
> 5th 91 80
> == rows=50 ==
> trunk  patch 
> 1st 485325
> 2nd 225243
> 3rd 199151
> 4th 168127
> 5th 149118
> == rows=100 ==
> trunk  patch 
> 1st 948607
> 2nd 472390
> 3rd 237201
> 4th 256200
> 5th 224178
> == rows=500 ==
> trunk  patch 
> 1st 3248   2826
> 2nd 1545   1067
> 3rd 1563   801
> 4th 1551   816
> 5th 1452   777
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6889) Highlight using multiple threads

2017-07-12 Thread Shinichiro Abe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084502#comment-16084502
 ] 

Shinichiro Abe commented on SOLR-6889:
--

I agree. But could it make much faster by being parallelized when using 
FieldOffsetStrategy#getOffsetsEnums(), especially OffsetSource.ANALYSIS 
strategy case, i.e. storeOffsetsWithPositions = false case in which user can 
select fields to highlight after indexing? I assumed text analysis work, which 
the standard highlighter has, would be able to be parallelized, borrowed by an 
idea of facet.threads method at that time. Although I saw a benchmark where 
UH's offsetSource=ANALYSIS is already much faster than the standard one.

> Highlight using multiple threads
> 
>
> Key: SOLR-6889
> URL: https://issues.apache.org/jira/browse/SOLR-6889
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 6.0
>Reporter: Shinichiro Abe
> Attachments: SOLR-6889.patch
>
>
> I think we could gain search performance a little bit using 
> Stream.parallel().forEach()~ which has processors awareness via f/j framework 
> under the hood.
> Especially it would affect docList's for-loop processes, e.g. debugging, 
> highlighting.
> It seems to me that this improvement is effective for many CPUs environment.
> My test condition:
> 1. Core i5(2core 4thead), standalone Solr.
> 2. q=日本&debug=true&hl=true, other parameters are 
> [here|https://github.com/anond2/simplesearch/blob/master/conf/solrconfig.xml#L836].
> 3. 7171 hits / 12000 docs(taken from ja.wikipedia dump)
> 4. compared to trunk, parallel streams are faster a little.
> My query execution results(QTime):
> {noformat}
> == rows=10 ==
> trunk  patch 
> 1st 236146
> 2nd 179100
> 3rd 79 72
> 4th 75 53
> 5th 91 80
> == rows=50 ==
> trunk  patch 
> 1st 485325
> 2nd 225243
> 3rd 199151
> 4th 168127
> 5th 149118
> == rows=100 ==
> trunk  patch 
> 1st 948607
> 2nd 472390
> 3rd 237201
> 4th 256200
> 5th 224178
> == rows=500 ==
> trunk  patch 
> 1st 3248   2826
> 2nd 1545   1067
> 3rd 1563   801
> 4th 1551   816
> 5th 1452   777
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6889) Highlight using multiple threads

2017-07-11 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083382#comment-16083382
 ] 

David Smiley commented on SOLR-6889:


Oh, and by sharding your data more you get some amount of threading 
automatically.  So this also lowers the usefulness of adding threading.

> Highlight using multiple threads
> 
>
> Key: SOLR-6889
> URL: https://issues.apache.org/jira/browse/SOLR-6889
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 6.0
>Reporter: Shinichiro Abe
> Attachments: SOLR-6889.patch
>
>
> I think we could gain search performance a little bit using 
> Stream.parallel().forEach()~ which has processors awareness via f/j framework 
> under the hood.
> Especially it would affect docList's for-loop processes, e.g. debugging, 
> highlighting.
> It seems to me that this improvement is effective for many CPUs environment.
> My test condition:
> 1. Core i5(2core 4thead), standalone Solr.
> 2. q=日本&debug=true&hl=true, other parameters are 
> [here|https://github.com/anond2/simplesearch/blob/master/conf/solrconfig.xml#L836].
> 3. 7171 hits / 12000 docs(taken from ja.wikipedia dump)
> 4. compared to trunk, parallel streams are faster a little.
> My query execution results(QTime):
> {noformat}
> == rows=10 ==
> trunk  patch 
> 1st 236146
> 2nd 179100
> 3rd 79 72
> 4th 75 53
> 5th 91 80
> == rows=50 ==
> trunk  patch 
> 1st 485325
> 2nd 225243
> 3rd 199151
> 4th 168127
> 5th 149118
> == rows=100 ==
> trunk  patch 
> 1st 948607
> 2nd 472390
> 3rd 237201
> 4th 256200
> 5th 224178
> == rows=500 ==
> trunk  patch 
> 1st 3248   2826
> 2nd 1545   1067
> 3rd 1563   801
> 4th 1551   816
> 5th 1452   777
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org