Re: Performance degradation with distributed search

2012-02-06 Thread XJ
Yonik, thanks for your explanation. I've created a ticket here
https://issues.apache.org/jira/browse/SOLR-3104

On Mon, Feb 6, 2012 at 4:28 PM, Yonik Seeley wrote:

> On Mon, Feb 6, 2012 at 6:16 PM, XJ  wrote:
> > Sorry I didn't make this clear. Yeah we use dismax in main query, as
> well as
> > in sort orders (different from main queries). Because of our complicated
> > business logic, we need many different relevancy queries in different
> sort
> > orders (other than sort by score, we also have around 20 other different
> > sort orders, some of them are dismax queries). However, this is
> something we
> > can not get away from right now. What kind of optimization I can try to
> do
> > there?
>
> OK, so basically it's slow because functions with embedded relevancy
> queries are "forward only" - if you request the value for a docid
> previous to the last, we need to reboot the query (re-weight, ask for
> the scorer, etc).  This means that for your 30 documents, that will
> require rebooting the query about 15 times (assuming that roughly half
> of the time the next docid will be less than the previous one).
>
> Unfortunately there's not much you can do externally... we need to
> implement optimizations at the Solr level for this.
> Can you open a JIRA issue for this?
>
> -Yonik
> lucidimagination.com
>


Re: Performance degradation with distributed search

2012-02-06 Thread Yonik Seeley
On Mon, Feb 6, 2012 at 5:53 PM, XJ  wrote:
> Yes as I mentioned in previous email, we do dismax queries(with different mm
> values), solr function queries (map, etc) math calculations (sum, product,
> log). I understand those are expensive. But worst case it should only double
> the time not going from 200ms to 1200ms right?

You mention dismax... but I assume that's as the main query and you
sort by score (which is fine).
The only issue with relevancy queries is if you sorted by one that was
not the main query - this is not yet optimized.

But for straight function queries that don't contain embedded
relevancy queries, I would definitely not expect the degradation you
are seeing - hence we should try to get to the bottom of this.

-Yonik
lucidimagination.com



> XJ
>
> On Mon, Feb 6, 2012 at 2:37 PM, Yonik Seeley 
> wrote:
>>
>> On Mon, Feb 6, 2012 at 5:35 PM, XJ  wrote:
>> > hm.. just looked at the log only 112 matched, and start=0, rows=30
>>
>> Are any of the sort criteria sort-by-function with anything complex
>> (like an embedded relevance query)?
>>
>> -Yonik
>> lucidimagination.com
>>
>>
>> >
>> > On Mon, Feb 6, 2012 at 1:33 PM, Yonik Seeley
>> > 
>> > wrote:
>> >>
>> >> On Mon, Feb 6, 2012 at 3:30 PM, oleole  wrote:
>> >> > Thanks for your reply. Yeah that's the first thing I tried (adding
>> >> > fsv=true
>> >> > to the query) and it surprised me too. Could it due to we're using
>> >> > many
>> >> > complex sortings (20 sortings with dismax, and, or...). Any thing it
>> >> > can
>> >> > be
>> >> > optimized? Looks like it's calculated twice in solr?
>> >>
>> >> It currently does calculate it twice... but only for those documents
>> >> being returned (which should not be significant).
>> >> What is "rows" set to?
>> >>
>> >> -Yonik
>> >> lucidimagination.com
>> >
>> >
>
>


Re: Performance degradation with distributed search

2012-02-06 Thread XJ
Yes as I mentioned in previous email, we do dismax queries(with different
mm values), solr function queries (map, etc) math calculations (sum,
product, log). I understand those are expensive. But worst case it should
only double the time not going from 200ms to 1200ms right?

XJ

On Mon, Feb 6, 2012 at 2:37 PM, Yonik Seeley wrote:

> On Mon, Feb 6, 2012 at 5:35 PM, XJ  wrote:
> > hm.. just looked at the log only 112 matched, and start=0, rows=30
>
> Are any of the sort criteria sort-by-function with anything complex
> (like an embedded relevance query)?
>
> -Yonik
> lucidimagination.com
>
>
> >
> > On Mon, Feb 6, 2012 at 1:33 PM, Yonik Seeley  >
> > wrote:
> >>
> >> On Mon, Feb 6, 2012 at 3:30 PM, oleole  wrote:
> >> > Thanks for your reply. Yeah that's the first thing I tried (adding
> >> > fsv=true
> >> > to the query) and it surprised me too. Could it due to we're using
> many
> >> > complex sortings (20 sortings with dismax, and, or...). Any thing it
> can
> >> > be
> >> > optimized? Looks like it's calculated twice in solr?
> >>
> >> It currently does calculate it twice... but only for those documents
> >> being returned (which should not be significant).
> >> What is "rows" set to?
> >>
> >> -Yonik
> >> lucidimagination.com
> >
> >
>


Re: Performance degradation with distributed search

2012-02-06 Thread Yonik Seeley
On Mon, Feb 6, 2012 at 5:35 PM, XJ  wrote:
> hm.. just looked at the log only 112 matched, and start=0, rows=30

Are any of the sort criteria sort-by-function with anything complex
(like an embedded relevance query)?

-Yonik
lucidimagination.com


>
> On Mon, Feb 6, 2012 at 1:33 PM, Yonik Seeley 
> wrote:
>>
>> On Mon, Feb 6, 2012 at 3:30 PM, oleole  wrote:
>> > Thanks for your reply. Yeah that's the first thing I tried (adding
>> > fsv=true
>> > to the query) and it surprised me too. Could it due to we're using many
>> > complex sortings (20 sortings with dismax, and, or...). Any thing it can
>> > be
>> > optimized? Looks like it's calculated twice in solr?
>>
>> It currently does calculate it twice... but only for those documents
>> being returned (which should not be significant).
>> What is "rows" set to?
>>
>> -Yonik
>> lucidimagination.com
>
>


Re: Performance degradation with distributed search

2012-02-06 Thread XJ
hm.. just looked at the log only 112 matched, and start=0, rows=30

On Mon, Feb 6, 2012 at 1:33 PM, Yonik Seeley wrote:

> On Mon, Feb 6, 2012 at 3:30 PM, oleole  wrote:
> > Thanks for your reply. Yeah that's the first thing I tried (adding
> fsv=true
> > to the query) and it surprised me too. Could it due to we're using many
> > complex sortings (20 sortings with dismax, and, or...). Any thing it can
> be
> > optimized? Looks like it's calculated twice in solr?
>
> It currently does calculate it twice... but only for those documents
> being returned (which should not be significant).
> What is "rows" set to?
>
> -Yonik
> lucidimagination.com
>


Re: Performance degradation with distributed search

2012-02-06 Thread Yonik Seeley
On Mon, Feb 6, 2012 at 3:30 PM, oleole  wrote:
> Thanks for your reply. Yeah that's the first thing I tried (adding fsv=true
> to the query) and it surprised me too. Could it due to we're using many
> complex sortings (20 sortings with dismax, and, or...). Any thing it can be
> optimized? Looks like it's calculated twice in solr?

It currently does calculate it twice... but only for those documents
being returned (which should not be significant).
What is "rows" set to?

-Yonik
lucidimagination.com


Re: Performance degradation with distributed search

2012-02-06 Thread XJ
BTW we just upgraded to Solr 3.5 from Solr 1.4. Thats why we want to
explore the improvements/new features of distributed search.

On Mon, Feb 6, 2012 at 12:30 PM, oleole  wrote:

> Yonik,
>
> Thanks for your reply. Yeah that's the first thing I tried (adding fsv=true
> to the query) and it surprised me too. Could it due to we're using many
> complex sortings (20 sortings with dismax, and, or...). Any thing it can be
> optimized? Looks like it's calculated twice in solr?
>
> XJ
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Performance-degradation-with-distributed-search-tp3715060p3720739.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Performance degradation with distributed search

2012-02-06 Thread oleole
Yonik,

Thanks for your reply. Yeah that's the first thing I tried (adding fsv=true
to the query) and it surprised me too. Could it due to we're using many
complex sortings (20 sortings with dismax, and, or...). Any thing it can be
optimized? Looks like it's calculated twice in solr?

XJ

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-degradation-with-distributed-search-tp3715060p3720739.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance degradation with distributed search

2012-02-04 Thread Yonik Seeley
On Sat, Feb 4, 2012 at 1:20 AM, XJ  wrote:
> When I look into details (slow queries), I found some real issues that I
> need help with. For example, a query which takes 200ms with geo sharding,
> now timeout (>2000ms) with distributed search. And each shard query
> (isShard=true) takes about 1200ms. But if I run the query toward the shard
> only (without distributed search), it only takes <200ms. So I compared the
> two query urls, the only difference is shard query using distribute
> search has "fsv=true".

That's odd... I wouldn't expect fsv to make much of a difference.
Can you try running the query on the shard only and adding fsv=true to
verify that it's the culprit?

Also, what version of Solr are you using?

-Yonik
lucidimagination.com