Re: how to write an efficient query with a subquery to restrict the search space?

2014-02-03 Thread Otis Gospodnetic
Hi,

Sounds like a possible document and query routing use case.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Jan 31, 2014 7:11 AM, svante karlsson s...@csi.se wrote:

 It seems to be faster to first restrict the search space and then do the
 scoring compared to just use the full query and let solr handle everything.

 For example in my application one of the scoring fields effectivly hits
 1/12 of the database (a month field) and if we have 100'' items in the
 database the this matters.

 /svante


 2014-01-30 Jack Krupansky j...@basetechnology.com:

  Lucene's default scoring should give you much of what you want - ranking
  hits of low-frequency terms higher - without any special query syntax -
  just list out your terms and use OR as your default operator.
 
  -- Jack Krupansky
 
  -Original Message- From: svante karlsson
  Sent: Thursday, January 23, 2014 6:42 AM
  To: solr-user@lucene.apache.org
  Subject: how to write an efficient query with a subquery to restrict the
  search space?
 
 
  I have a solr db containing 1 billion records that I'm trying to use in a
  NoSQL fashion.
 
  What I want to do is find the best matches using all search terms but
  restrict the search space to the most unique terms
 
  In this example I know that val2 and val4 is rare terms and val1 and val3
  are more common. In my real scenario I'll have 20 fields that I want to
  include or exclude in the inner query depending on the uniqueness of the
  requested value.
 
 
  my first approach was:
  q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND
 (field2:val2
  OR field4:val4)rows=100fl=*
 
  but what I think I get is
  .  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
  OR'ed with the rest
 
  if I write
  q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
  (field2:val2 OR field4:val4)rows=100fl=*
 
  then what I think I get is two sub-queries that is evaluated separately
 and
  then joined - performance wise this is bad.
 
  Whats the best way to write these types of queries?
 
 
  Are there any performance issues when running it on several solrcloud
 nodes
  vs a single instance or should it scale?
 
 
 
  /svante
 



Re: how to write an efficient query with a subquery to restrict the search space?

2014-01-31 Thread svante karlsson
It seems to be faster to first restrict the search space and then do the
scoring compared to just use the full query and let solr handle everything.

For example in my application one of the scoring fields effectivly hits
1/12 of the database (a month field) and if we have 100'' items in the
database the this matters.

/svante


2014-01-30 Jack Krupansky j...@basetechnology.com:

 Lucene's default scoring should give you much of what you want - ranking
 hits of low-frequency terms higher - without any special query syntax -
 just list out your terms and use OR as your default operator.

 -- Jack Krupansky

 -Original Message- From: svante karlsson
 Sent: Thursday, January 23, 2014 6:42 AM
 To: solr-user@lucene.apache.org
 Subject: how to write an efficient query with a subquery to restrict the
 search space?


 I have a solr db containing 1 billion records that I'm trying to use in a
 NoSQL fashion.

 What I want to do is find the best matches using all search terms but
 restrict the search space to the most unique terms

 In this example I know that val2 and val4 is rare terms and val1 and val3
 are more common. In my real scenario I'll have 20 fields that I want to
 include or exclude in the inner query depending on the uniqueness of the
 requested value.


 my first approach was:
 q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2
 OR field4:val4)rows=100fl=*

 but what I think I get is
 .  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
 OR'ed with the rest

 if I write
 q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
 (field2:val2 OR field4:val4)rows=100fl=*

 then what I think I get is two sub-queries that is evaluated separately and
 then joined - performance wise this is bad.

 Whats the best way to write these types of queries?


 Are there any performance issues when running it on several solrcloud nodes
 vs a single instance or should it scale?



 /svante



Re: how to write an efficient query with a subquery to restrict the search space?

2014-01-30 Thread Jack Krupansky
Lucene's default scoring should give you much of what you want - ranking 
hits of low-frequency terms higher - without any special query syntax - just 
list out your terms and use OR as your default operator.


-- Jack Krupansky

-Original Message- 
From: svante karlsson

Sent: Thursday, January 23, 2014 6:42 AM
To: solr-user@lucene.apache.org
Subject: how to write an efficient query with a subquery to restrict the 
search space?


I have a solr db containing 1 billion records that I'm trying to use in a
NoSQL fashion.

What I want to do is find the best matches using all search terms but
restrict the search space to the most unique terms

In this example I know that val2 and val4 is rare terms and val1 and val3
are more common. In my real scenario I'll have 20 fields that I want to
include or exclude in the inner query depending on the uniqueness of the
requested value.


my first approach was:
q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2
OR field4:val4)rows=100fl=*

but what I think I get is
.  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
OR'ed with the rest

if I write
q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
(field2:val2 OR field4:val4)rows=100fl=*

then what I think I get is two sub-queries that is evaluated separately and
then joined - performance wise this is bad.

Whats the best way to write these types of queries?


Are there any performance issues when running it on several solrcloud nodes
vs a single instance or should it scale?



/svante 



how to write an efficient query with a subquery to restrict the search space?

2014-01-23 Thread svante karlsson
I have a solr db containing 1 billion records that I'm trying to use in a
NoSQL fashion.

What I want to do is find the best matches using all search terms but
restrict the search space to the most unique terms

In this example I know that val2 and val4 is rare terms and val1 and val3
are more common. In my real scenario I'll have 20 fields that I want to
include or exclude in the inner query depending on the uniqueness of the
requested value.


my first approach was:
q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2
OR field4:val4)rows=100fl=*

but what I think I get is
.  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
OR'ed with the rest

if I write
q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
(field2:val2 OR field4:val4)rows=100fl=*

then what I think I get is two sub-queries that is evaluated separately and
then joined - performance wise this is bad.

Whats the best way to write these types of queries?


Are there any performance issues when running it on several solrcloud nodes
vs a single instance or should it scale?



/svante


Re: how to write an efficient query with a subquery to restrict the search space?

2014-01-23 Thread Raymond Wiker
Maybe you could move (field2:val2 or field4:val4) into a filter? E.g,

q=(field1:val1 OR field2:val2 OR field3:val3 OR
field4:val4)fq=(field2:val2 OR field4:val4)

If I have this correctly, the fq part should be evaluated first, and may
even be found in the filter cache.



On Thu, Jan 23, 2014 at 12:42 PM, svante karlsson s...@csi.se wrote:

 I have a solr db containing 1 billion records that I'm trying to use in a
 NoSQL fashion.

 What I want to do is find the best matches using all search terms but
 restrict the search space to the most unique terms

 In this example I know that val2 and val4 is rare terms and val1 and val3
 are more common. In my real scenario I'll have 20 fields that I want to
 include or exclude in the inner query depending on the uniqueness of the
 requested value.


 my first approach was:
 q=field1:val1 OR field2:val2 OR field3:val3 OR field4:val4 AND (field2:val2
 OR field4:val4)rows=100fl=*

 but what I think I get is
 .  field4:val4 AND (field2:val2 OR field4:val4)   this result is then
 OR'ed with the rest

 if I write
 q=(field1:val1 OR field2:val2 OR field3:val3 OR field4:val4) AND
 (field2:val2 OR field4:val4)rows=100fl=*

 then what I think I get is two sub-queries that is evaluated separately and
 then joined - performance wise this is bad.

 Whats the best way to write these types of queries?


 Are there any performance issues when running it on several solrcloud nodes
 vs a single instance or should it scale?



 /svante