[ https://issues.apache.org/jira/browse/SOLR-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232772#comment-14232772 ]
Per Steffensen edited comment on SOLR-6810 at 12/3/14 3:53 PM: --------------------------------------------------------------- We have solved the problem (reducing response-time by a factor of 60 on our particular system/data/distribution) the following way Introduced the concept of "distributed query algorithm" (DQA) controlled by request parameter {{dqa}}. Naming the existing (default) distributed query algorithm {{find-id-relevance_fetch-by-ids}} (short-alias {{firfbi}}) and introducing a new alternative distributed query algorithm called {{find-relevance_find-ids-limited-rows_fetch-by-ids}} (short-alias {{frfilrfbi}}) * {{find-id-relevance_fetch-by-ids}} does as always - see JavaDoc of {{ShardParams.FIND_ID_RELEVANCE_FETCH_BY_IDS}} * {{find-relevance_find-ids-limited-rows_fetch-by-ids}} does it in a different way - see JavaDoc of {{ShardParams.FIND_RELEVANCE_FIND_IDS_LIMITED_ROWS_FETCH_BY_IDS}} Believe “distributed query algorithm” is a pendant to elasticsearch's “search type”, but just with much better naming that say something about what it is actually controlling :-) Both DQAs support the {{disturb.singlePass}} flag. I have *renamed* it to {{dqa.forceSkipGetIds}} because it is only {{find-id-relevance_fetch-by-ids}} that becomes single-pass (going from 2 to 1 pass) with this flag. {{find-relevance_find-ids-limited-rows_fetch-by-ids}} goes from 3 to 2 passes. {{dqa.forceSkipGetIds=true}} is default for {{find-relevance_find-ids-limited-rows_fetch-by-ids}}. There are really no need to ever run with {{dqa.forceSkipGetIds=false}} for this DQA, but it is supported. Attaching patch corresponding to our solution - going into production as we speak to reduce our response-times by a factor of 60. You do not necessarily need to just adopt it. But lets at least consider it a starting-point for a discussion. Details about the patch * {{ShardParams.DQA}}: Enum of the DQA’s, including different helper methods that IMHO belongs here * {{QueryComponent}}/{{ResponseBuilder}}: Changed to implement both DQA’s now * {{SolrIndexSearcher.doc}}: Does not go to store, if only asking for score. This is important for the optimization * {{TestIndexSearcher}}: Added a test to test this particular new aspect of {{SolrIndexSearcher}} * {{TestDistributedQueryAlgorithm}}: A new test-class dedicated tests of DQA’s. {{testDocReads}}-test really shows exactly what this new DQA does for you. Test asserts that you only go to store X times across the cluster and not (up to) #shards * X times (X = rows in outer query) * {{LeafReaderTestWrappers}}: Test-wrappers for {{LeafReader}} s. Can help collecting information about how {{LeafReader}} s are used in different test-scenarios. Used by {{TestIndexSearcher}}. Can be extended with other kinds of wrappers that collect different kinds of information. * {{SolrIndexSearcherTestWrapper}} and {{SolrCoreTestWrapper}}. Generic classes that can help wrapping all {{LeafReader}} s under a {{SolrIndexSearcher}} or a {{SolrCore}} respectively. Used by {{TestDistributedQueryAlgorithm}} * {{DistributedQueryComponentOptimizationTest}}: Updated with new tests around DQA’s. And made more systematic in the way the tests are performed. Do not want to add hundreds of almost similar code-lines * {{ShardRoutingTest}}: Same comments as for {{DistributedQueryComponentOptimizationTest}} above * {{SolrTestCaseJ4}}: Randomly selecting a DQA for each individual query fired running the test-suite - when you do not specify which DQA you want explicitly in the request. With helper-methods for fixing the DQA for tests that focus on DQA testing * Fix for SOLR-6812 is included in the patch because it is need to keep the test-suite green. But should probably be committed as part of SOLR-6812, and left out of this SOLR-6810. New DQA ({{find-relevance_find-ids-limited-rows_fetch-by-ids}}) has {{dqa.forceSkipGetIds}} (old {{disturb.singlePass}}) set to true by default. And since we run tests randomly selecting the DQA for every query, we are also indirectly randoming {{dqa.forceSkipGetIds}}. Therefore the test-suite will likely fail if skip-get-ids does not work for all kinds of requests. This is actually also a good way to have {{dqa.forceSkipGetIds}} (old {{distrib.singlePass}}) tested, so that we will not have a partially-working feature (as before SOLR-6795/SOLR-6796/SOLR-6812/SOLR-6813). The tests added to {{DistributedQueryComponentOptimizationTest}} in SOLR-6795 and SOLR-6796 have been removed again, because the problems (along with any other problems with {{dqa.forceSkipGetIds}}) will now (potentially) be revealed anyway because of indirect randomized testing of {{dqa.forceSkipGetIds}} * I do not have a solution to SOLR-6813, so temporarily making sure that it will not make the test-suite fail, by forcing the particular query in {{DistributedExpandComponentTest}} to use {{find-id-relevance_fetch-by-ids}} (making it use {{dqa.forceSkipGetIds=false}}) - the lines {{switchToOriginalDQADefaultProvider()}} and {{switchToTestDQADefaultProvider()}}. Those lines should be removed when SOLR-6813 has been resolved. It will also work with {{find-relevance_find-ids-limited-rows_fetch-by-ids}} and {{dqa.forceSkipGetIds=false}}, so it is not {{find-relevance_find-ids-limited-rows_fetch-by-ids}} that does not work. It is {{dqa.forceSkipGetIds=true}} that does not work for this particular query. was (Author: steff1193): We have solved the problem (reducing response-time by a factor of 60 on our particular system/data/distribution) the following way Introduced the concept of "distributed query algorithm" (DQA) controlled by request parameter {{dqa}}. Naming the existing (default) distributed query algorithm {{find-id-relevance_fetch-by-ids}} (short-alias {{firfbi}}) and introducing a new alternative distributed query algorithm called {{find-relevance_find-ids-limited-rows_fetch-by-ids}} (short-alias {{frfilrfbi}}) * {{find-id-relevance_fetch-by-ids}} does as always - see JavaDoc of {{ShardParams.FIND_ID_RELEVANCE_FETCH_BY_IDS}} * {{find-relevance_find-ids-limited-rows_fetch-by-ids}} does it in a different way - see JavaDoc of {{ShardParams.FIND_RELEVANCE_FIND_IDS_LIMITED_ROWS_FETCH_BY_IDS}} Believe “distributed query algorithm” is a pendant to elasticsearch's “search type”, but just with much better naming that say something about what it is actually controlling :-) Both DQAs support the {{disturb.singlePass}} flag. I have *renamed* it to {{dqa.forceSkipGetIds}} because it is only {{find-id-relevance_fetch-by-ids}} that becomes single-pass (going from 2 to 1 pass) with this flag. {{find-relevance_find-ids-limited-rows_fetch-by-ids}} goes from 3 to 2 passes. {{dqa.forceSkipGetIds=true}} is default for {{find-relevance_find-ids-limited-rows_fetch-by-ids}}. There are really no need to ever run with {{dqa.forceSkipGetIds=false}} for this DQA, but it is supported. Attaching patch corresponding to our solution - going into production as we speak to reduce our response-times by a factor of 60. You do not necessarily need to just adopt it. But lets at least consider it a starting-point for a discussion. Details about the patch * {{ShardParams.DQA}}: Enum of the DQA’s, including different helper methods that IMHO belongs here * {{QueryComponent}}/{{ResponseBuilder}}: Changed to implement both DQA’s now * {{SolrIndexSearcher.doc}}: Does not go to store, if only asking for score. This is important for the optimization * {{TestIndexSearcher}}: Added a test to test this particular new aspect of {{SolrIndexSearcher}} * {{TestDistributedQueryAlgorithm}}: A new test-class dedicated tests of DQA’s. {{testDocReads}}-test really shows exactly what this new DQA does for you. Test asserts that you only go to store X times across the cluster and not (up to) #shards * X times (X = rows in outer query) * {{LeafReaderTestWrappers}}: Test-wrappers for {{LeafReader}} s. Can help collecting information about how {{LeafReader}} s are used in different test-scenarios. Used by {{TestIndexSearcher}}. Can be extended with other kinds of wrappers that collect different kinds of information. * {{SolrIndexSearcherTestWrapper}} and {{SolrCoreTestWrapper}}. Generic classes that can help wrapping all {{LeafReader}} s under a {{SolrIndexSearcher}} or a {{SolrCore}} respectively. Used by {{TestDistributedQueryAlgorithm}} * {{DistributedQueryComponentOptimizationTest}}: Updated with new tests around DQA’s. And made more systematic in the way the tests are performed. Do not want to add hundreds of almost similar code-lines * {{ShardRoutingTest}}: Same comments as for {{DistributedQueryComponentOptimizationTest}} above * {{SolrTestCaseJ4}}: Randomly selecting a DQA for each individual query fired running the test-suite - when you do not specify which DQA you want explicitly in the request. With helper-methods for fixing the DQA for tests that focus on DQA testing * Fix for SOLR-6812 is included in the patch because it is need to keep the test-suite green. But should probably be committed as part of SOLR-6812, and left out of this SOLR-6810. New DQA ({{find-relevance_find-ids-limited-rows_fetch-by-ids}}) has {{dqa.forceSkipGetIds}} (old {{disturb.singlePass}}) set to true by default. And since we run tests randomly selecting the DQA for every query, we are also indirectly randoming {{dqa.forceSkipGetIds}}. Therefore the test-suite will likely fail if skip-get-ids does not work for all kinds of requests. This is actually also a good way to have {{dqa.forceSkipGetIds}} (old {{distrib.singlePass}}) tested, so that we will not have a partially-working feature (as before SOLR-6795/SOLR-6796/SOLR-6812/SOLR-6813). The tests added to {{DistributedQueryComponentOptimizationTest}} in SOLR-6795 and SOLR-6796 have been removed again, because the problems (along with any other problems with {{dqa.forceSkipGetIds}}) will now (potentially) be revealed anyway because of indirect randomized testing of {{dqa.forceSkipGetIds}} * I do not have a solution to SOLR-6813, so temporarily making sure that it will not make the test-suite fail, by forcing the particular query in {{DistributedExpandComponentTest}} to use {{find-id-relevance_fetch-by-ids}} (making it use {{dqa.forceSkipGetIds=true}}) - the lines {{switchToOriginalDQADefaultProvider()}} and {{switchToTestDQADefaultProvider()}}. Those lines should be removed when SOLR-6813 has been resolved. It will also work with {{find-relevance_find-ids-limited-rows_fetch-by-ids}} and {{dqa.forceSkipGetIds=false}}, so it is not {{find-relevance_find-ids-limited-rows_fetch-by-ids}} that does not work. It is {{dqa.forceSkipGetIds=false}} that does not work. > Faster searching limited but high rows across many shards all with many hits > ---------------------------------------------------------------------------- > > Key: SOLR-6810 > URL: https://issues.apache.org/jira/browse/SOLR-6810 > Project: Solr > Issue Type: Improvement > Components: search > Reporter: Per Steffensen > Labels: distributed_search, performance > Attachments: branch_5x_rev1642874.patch > > > Searching "limited but high rows across many shards all with many hits" is > slow > E.g. > * Query from outside client: q=something&rows=1000 > * Resulting in sub-requests to each shard something a-la this > ** 1) q=something&rows=1000&fl=id,score > ** 2) Request the full documents with ids in the global-top-1000 found among > the top-1000 from each shard > What does the subject mean > * "limited but high rows" means 1000 in the example above > * "many shards" means 200-1000 in our case > * "all with many hits" means that each of the shards have a significant > number of hits on the query > The problem grows on all three factors above > Doing such a query on our system takes between 5 min to 1 hour - depending on > a lot of things. It ought to be much faster, so lets make it. > Profiling show that the problem is that it takes lots of time to access the > store to get id’s for (up to) 1000 docs (value of rows parameter) per shard. > Having 1000 shards its up to 1 mio ids that has to be fetched. There is > really no good reason to ever read information from store for more than the > overall top-1000 documents, that has to be returned to the client. > For further detail see mail-thread "Slow searching limited but high rows > across many shards all with high hits" started 13/11-2014 on > dev@lucene.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org