Is negative boost possible?

2009-08-18 Thread Larry He
Hi all,

I am looking for a way to assign negative boost to a term in Solr query.
Our use scenario is that we want to boost matching documents that are
updated recently and penalize those that have not been updated for a long
time.  There are other terms in the query that would affect the scores as
well.  For example we construct a query similar to this:

*:* field1:value1^2  field2:value2^2 lastUpdateTime:[NOW/DAY-90DAYS TO *]^5
lastUpdateTime:[* TO NOW/DAY-365DAYS]^-3

I notice it's not possible to simply use a negative boosting factor in the
query.  Is there any way to achieve such result?

Regards,
Shi Quan He


Re: Solr query performance issue

2009-05-26 Thread Larry He
We actually want OR operator on  those values.  Filters can only do AND,
right?

Is it better performance to have the query as field1:01 field1:02 field1:03
instead of field1:(01 02 03)?

BR,
Larry

On Tue, May 26, 2009 at 5:15 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

>
> What about field1:01 . field:100 being used as separate filters (that
> would then get ANDed) -- doable?
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
> > From: Development Team 
> > To: solr-user@lucene.apache.org; yo...@lucidimagination.com
> > Sent: Tuesday, May 26, 2009 4:54:34 PM
> > Subject: Re: Solr query performance issue
> >
> > Yes, those terms are important in calculating the relevancy scores so
> they
> > are not in the filter queries.  I was hoping if I can cache everything
> about
> > a field, any combinations on the field values will be read from cache.
> Then
> > it does not matter if I query for field1:(02 04 05), or field1:(01 02) or
> > field1:03 the response time is equally quick.  Is there anyway to achieve
> > that?
> > Yeah, the range queries are also a bottleneck too, I will give the
> TrieRange
> > fields a try.  Thanks for you advice.
> >
> > Best Regards,
> > Shi Quan He
> >
> > On Tue, May 26, 2009 at 3:55 PM, Yonik Seeley wrote:
> >
> > > On Tue, May 26, 2009 at 3:42 PM, Larry He wrote:
> > > > We have about 100 different fields and 1 million documents we indexed
> > > with
> > > > Solr.  Many of the fields are multi-valued, and some are numbers (for
> > > range
> > > > search).  We are expecting to perform solr queries contains over 30
> terms
> > > > and often the response time is well over a second.  I found that the
> > > caches
> > > > in Solr such as QueryResultCache and FilterCache does not help us
> much in
> > > > this case as most of the queries have combinations of terms that are
> > > > unlikely to repeat.  An example of our query would look like:
> > > >
> > > > field1:(02 04 05) field2:(01 02 03) field2:(01 02 03) ...
> > > >
> > > > My question is how can we improve performance of these queries?
> > >
> > > filters are independently cached... but they are currently only "AND"
> > > filters, so you could only split it up like so:
> > >
> > > fq=field1:(02 04 05)&fq=field2:(01 02 03)&fq=field2:(01 02 03)
> > > But that won't help unless any of the individual fq params are
> > > repeated across different queries.
> > >
> > > Range search can also be sped up a lot via the use of the new
> > > TrieRange fields, or via the frange (function range query)
> > > capabilities in Solr 1.4 it's not clear if the range queries or
> > > the term queries are your current bottleneck.
> > >
> > > If the range queries aren't your bottleneck and separate filters don't
> > > work, then a query type could be developed that would help your
> > > situation by caching matches on term queries. Are relevancy scores
> > > important for the clauses like field1:(02 04 05), or do you sort by
> > > some other criteria?
> > >
> > > -Yonik
> > > http://www.lucidimagination.com
> > >
>
>


Solr query performance issue

2009-05-26 Thread Larry He
Hi All,

We have about 100 different fields and 1 million documents we indexed with
Solr.  Many of the fields are multi-valued, and some are numbers (for range
search).  We are expecting to perform solr queries contains over 30 terms
and often the response time is well over a second.  I found that the caches
in Solr such as QueryResultCache and FilterCache does not help us much in
this case as most of the queries have combinations of terms that are
unlikely to repeat.  An example of our query would look like:

field1:(02 04 05) field2:(01 02 03) field3:(02 03 04 06) ...

My question is how can we improve performance of these queries?  Does Lucene
have to read the index file again if we first do a query containing the term
field1:01 then a second query containing field1:02?  If we have sufficient
memory, is it possible to cache certain fields so that it does not need to
read from index files at all?  Hope someone could provide me some
suggestions.

Thanks,
Larry He