fsv=true not returning sort_values for distributed searches

2012-05-22 Thread XJ
We use fsv=true to help debug sortings which works great for
non-distributed searches. However, its not working (no sort_values in
response) for multi shard queries. Any idea how to get this fixed?

thanks,
XJ


Re: Performance degradation with distributed search

2012-02-06 Thread XJ
Yonik, thanks for your explanation. I've created a ticket here
https://issues.apache.org/jira/browse/SOLR-3104

On Mon, Feb 6, 2012 at 4:28 PM, Yonik Seeley wrote:

> On Mon, Feb 6, 2012 at 6:16 PM, XJ  wrote:
> > Sorry I didn't make this clear. Yeah we use dismax in main query, as
> well as
> > in sort orders (different from main queries). Because of our complicated
> > business logic, we need many different relevancy queries in different
> sort
> > orders (other than sort by score, we also have around 20 other different
> > sort orders, some of them are dismax queries). However, this is
> something we
> > can not get away from right now. What kind of optimization I can try to
> do
> > there?
>
> OK, so basically it's slow because functions with embedded relevancy
> queries are "forward only" - if you request the value for a docid
> previous to the last, we need to reboot the query (re-weight, ask for
> the scorer, etc).  This means that for your 30 documents, that will
> require rebooting the query about 15 times (assuming that roughly half
> of the time the next docid will be less than the previous one).
>
> Unfortunately there's not much you can do externally... we need to
> implement optimizations at the Solr level for this.
> Can you open a JIRA issue for this?
>
> -Yonik
> lucidimagination.com
>


Re: Performance degradation with distributed search

2012-02-06 Thread XJ
Yes as I mentioned in previous email, we do dismax queries(with different
mm values), solr function queries (map, etc) math calculations (sum,
product, log). I understand those are expensive. But worst case it should
only double the time not going from 200ms to 1200ms right?

XJ

On Mon, Feb 6, 2012 at 2:37 PM, Yonik Seeley wrote:

> On Mon, Feb 6, 2012 at 5:35 PM, XJ  wrote:
> > hm.. just looked at the log only 112 matched, and start=0, rows=30
>
> Are any of the sort criteria sort-by-function with anything complex
> (like an embedded relevance query)?
>
> -Yonik
> lucidimagination.com
>
>
> >
> > On Mon, Feb 6, 2012 at 1:33 PM, Yonik Seeley  >
> > wrote:
> >>
> >> On Mon, Feb 6, 2012 at 3:30 PM, oleole  wrote:
> >> > Thanks for your reply. Yeah that's the first thing I tried (adding
> >> > fsv=true
> >> > to the query) and it surprised me too. Could it due to we're using
> many
> >> > complex sortings (20 sortings with dismax, and, or...). Any thing it
> can
> >> > be
> >> > optimized? Looks like it's calculated twice in solr?
> >>
> >> It currently does calculate it twice... but only for those documents
> >> being returned (which should not be significant).
> >> What is "rows" set to?
> >>
> >> -Yonik
> >> lucidimagination.com
> >
> >
>


Re: Performance degradation with distributed search

2012-02-06 Thread XJ
hm.. just looked at the log only 112 matched, and start=0, rows=30

On Mon, Feb 6, 2012 at 1:33 PM, Yonik Seeley wrote:

> On Mon, Feb 6, 2012 at 3:30 PM, oleole  wrote:
> > Thanks for your reply. Yeah that's the first thing I tried (adding
> fsv=true
> > to the query) and it surprised me too. Could it due to we're using many
> > complex sortings (20 sortings with dismax, and, or...). Any thing it can
> be
> > optimized? Looks like it's calculated twice in solr?
>
> It currently does calculate it twice... but only for those documents
> being returned (which should not be significant).
> What is "rows" set to?
>
> -Yonik
> lucidimagination.com
>


Re: Performance degradation with distributed search

2012-02-06 Thread XJ
BTW we just upgraded to Solr 3.5 from Solr 1.4. Thats why we want to
explore the improvements/new features of distributed search.

On Mon, Feb 6, 2012 at 12:30 PM, oleole  wrote:

> Yonik,
>
> Thanks for your reply. Yeah that's the first thing I tried (adding fsv=true
> to the query) and it surprised me too. Could it due to we're using many
> complex sortings (20 sortings with dismax, and, or...). Any thing it can be
> optimized? Looks like it's calculated twice in solr?
>
> XJ
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Performance-degradation-with-distributed-search-tp3715060p3720739.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Performance degradation with distributed search

2012-02-03 Thread XJ
Hello,

I am experimenting with solr distributed search/random sharding (currently
use geo sharding), hope to gain some performance and also scalability in
the future. (index size keep growing and geo shard is hard to scale)

However I'm seeing worse performance with distributed search, on a testing
server of 6 shards, 15 core cpu, 24G mem, index size is about 8G on each
shard. With geo sharding it can easily take 150 QPS load with good response
time. Now with distribute search, there are timeout and average response
time also inreases. This is probably no big surprise since I'm using same
amount of shards and plus overhead of distribute search/merge/http network
etc.

When I look into details (slow queries), I found some real issues that I
need help with. For example, a query which takes 200ms with geo sharding,
now timeout (>2000ms) with distributed search. And each shard query
(isShard=true) takes about 1200ms. But if I run the query toward the shard
only (without distributed search), it only takes <200ms. So I compared the
two query urls, the only difference is shard query using distribute
search has "fsv=true". I understand field sort values are need during merge
process, but didn't expect that'll make this much difference in
performance, although we do have lot of sort orders (about 20 different
sort orders).

Any suggestion/comment on the performance problem I'm having with
distributed search? Is distributed search the right choice for me? What
other setup/idea I can try?

thanks,
XJ