Re: Performance degradation with distributed search
Yonik, thanks for your explanation. I've created a ticket here https://issues.apache.org/jira/browse/SOLR-3104 On Mon, Feb 6, 2012 at 4:28 PM, Yonik Seeley wrote: > On Mon, Feb 6, 2012 at 6:16 PM, XJ wrote: > > Sorry I didn't make this clear. Yeah we use dismax in main query, as > well as > > in sort orders (different from main queries). Because of our complicated > > business logic, we need many different relevancy queries in different > sort > > orders (other than sort by score, we also have around 20 other different > > sort orders, some of them are dismax queries). However, this is > something we > > can not get away from right now. What kind of optimization I can try to > do > > there? > > OK, so basically it's slow because functions with embedded relevancy > queries are "forward only" - if you request the value for a docid > previous to the last, we need to reboot the query (re-weight, ask for > the scorer, etc). This means that for your 30 documents, that will > require rebooting the query about 15 times (assuming that roughly half > of the time the next docid will be less than the previous one). > > Unfortunately there's not much you can do externally... we need to > implement optimizations at the Solr level for this. > Can you open a JIRA issue for this? > > -Yonik > lucidimagination.com >
Re: Performance degradation with distributed search
On Mon, Feb 6, 2012 at 5:53 PM, XJ wrote: > Yes as I mentioned in previous email, we do dismax queries(with different mm > values), solr function queries (map, etc) math calculations (sum, product, > log). I understand those are expensive. But worst case it should only double > the time not going from 200ms to 1200ms right? You mention dismax... but I assume that's as the main query and you sort by score (which is fine). The only issue with relevancy queries is if you sorted by one that was not the main query - this is not yet optimized. But for straight function queries that don't contain embedded relevancy queries, I would definitely not expect the degradation you are seeing - hence we should try to get to the bottom of this. -Yonik lucidimagination.com > XJ > > On Mon, Feb 6, 2012 at 2:37 PM, Yonik Seeley > wrote: >> >> On Mon, Feb 6, 2012 at 5:35 PM, XJ wrote: >> > hm.. just looked at the log only 112 matched, and start=0, rows=30 >> >> Are any of the sort criteria sort-by-function with anything complex >> (like an embedded relevance query)? >> >> -Yonik >> lucidimagination.com >> >> >> > >> > On Mon, Feb 6, 2012 at 1:33 PM, Yonik Seeley >> > >> > wrote: >> >> >> >> On Mon, Feb 6, 2012 at 3:30 PM, oleole wrote: >> >> > Thanks for your reply. Yeah that's the first thing I tried (adding >> >> > fsv=true >> >> > to the query) and it surprised me too. Could it due to we're using >> >> > many >> >> > complex sortings (20 sortings with dismax, and, or...). Any thing it >> >> > can >> >> > be >> >> > optimized? Looks like it's calculated twice in solr? >> >> >> >> It currently does calculate it twice... but only for those documents >> >> being returned (which should not be significant). >> >> What is "rows" set to? >> >> >> >> -Yonik >> >> lucidimagination.com >> > >> > > >
Re: Performance degradation with distributed search
Yes as I mentioned in previous email, we do dismax queries(with different mm values), solr function queries (map, etc) math calculations (sum, product, log). I understand those are expensive. But worst case it should only double the time not going from 200ms to 1200ms right? XJ On Mon, Feb 6, 2012 at 2:37 PM, Yonik Seeley wrote: > On Mon, Feb 6, 2012 at 5:35 PM, XJ wrote: > > hm.. just looked at the log only 112 matched, and start=0, rows=30 > > Are any of the sort criteria sort-by-function with anything complex > (like an embedded relevance query)? > > -Yonik > lucidimagination.com > > > > > > On Mon, Feb 6, 2012 at 1:33 PM, Yonik Seeley > > > wrote: > >> > >> On Mon, Feb 6, 2012 at 3:30 PM, oleole wrote: > >> > Thanks for your reply. Yeah that's the first thing I tried (adding > >> > fsv=true > >> > to the query) and it surprised me too. Could it due to we're using > many > >> > complex sortings (20 sortings with dismax, and, or...). Any thing it > can > >> > be > >> > optimized? Looks like it's calculated twice in solr? > >> > >> It currently does calculate it twice... but only for those documents > >> being returned (which should not be significant). > >> What is "rows" set to? > >> > >> -Yonik > >> lucidimagination.com > > > > >
Re: Performance degradation with distributed search
On Mon, Feb 6, 2012 at 5:35 PM, XJ wrote: > hm.. just looked at the log only 112 matched, and start=0, rows=30 Are any of the sort criteria sort-by-function with anything complex (like an embedded relevance query)? -Yonik lucidimagination.com > > On Mon, Feb 6, 2012 at 1:33 PM, Yonik Seeley > wrote: >> >> On Mon, Feb 6, 2012 at 3:30 PM, oleole wrote: >> > Thanks for your reply. Yeah that's the first thing I tried (adding >> > fsv=true >> > to the query) and it surprised me too. Could it due to we're using many >> > complex sortings (20 sortings with dismax, and, or...). Any thing it can >> > be >> > optimized? Looks like it's calculated twice in solr? >> >> It currently does calculate it twice... but only for those documents >> being returned (which should not be significant). >> What is "rows" set to? >> >> -Yonik >> lucidimagination.com > >
Re: Performance degradation with distributed search
hm.. just looked at the log only 112 matched, and start=0, rows=30 On Mon, Feb 6, 2012 at 1:33 PM, Yonik Seeley wrote: > On Mon, Feb 6, 2012 at 3:30 PM, oleole wrote: > > Thanks for your reply. Yeah that's the first thing I tried (adding > fsv=true > > to the query) and it surprised me too. Could it due to we're using many > > complex sortings (20 sortings with dismax, and, or...). Any thing it can > be > > optimized? Looks like it's calculated twice in solr? > > It currently does calculate it twice... but only for those documents > being returned (which should not be significant). > What is "rows" set to? > > -Yonik > lucidimagination.com >
Re: Performance degradation with distributed search
On Mon, Feb 6, 2012 at 3:30 PM, oleole wrote: > Thanks for your reply. Yeah that's the first thing I tried (adding fsv=true > to the query) and it surprised me too. Could it due to we're using many > complex sortings (20 sortings with dismax, and, or...). Any thing it can be > optimized? Looks like it's calculated twice in solr? It currently does calculate it twice... but only for those documents being returned (which should not be significant). What is "rows" set to? -Yonik lucidimagination.com
Re: Performance degradation with distributed search
BTW we just upgraded to Solr 3.5 from Solr 1.4. Thats why we want to explore the improvements/new features of distributed search. On Mon, Feb 6, 2012 at 12:30 PM, oleole wrote: > Yonik, > > Thanks for your reply. Yeah that's the first thing I tried (adding fsv=true > to the query) and it surprised me too. Could it due to we're using many > complex sortings (20 sortings with dismax, and, or...). Any thing it can be > optimized? Looks like it's calculated twice in solr? > > XJ > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Performance-degradation-with-distributed-search-tp3715060p3720739.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Performance degradation with distributed search
Yonik, Thanks for your reply. Yeah that's the first thing I tried (adding fsv=true to the query) and it surprised me too. Could it due to we're using many complex sortings (20 sortings with dismax, and, or...). Any thing it can be optimized? Looks like it's calculated twice in solr? XJ -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-degradation-with-distributed-search-tp3715060p3720739.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance degradation with distributed search
On Sat, Feb 4, 2012 at 1:20 AM, XJ wrote: > When I look into details (slow queries), I found some real issues that I > need help with. For example, a query which takes 200ms with geo sharding, > now timeout (>2000ms) with distributed search. And each shard query > (isShard=true) takes about 1200ms. But if I run the query toward the shard > only (without distributed search), it only takes <200ms. So I compared the > two query urls, the only difference is shard query using distribute > search has "fsv=true". That's odd... I wouldn't expect fsv to make much of a difference. Can you try running the query on the shard only and adding fsv=true to verify that it's the culprit? Also, what version of Solr are you using? -Yonik lucidimagination.com