RE: nested dismax queries

2009-06-29 Thread Ensdorf Ken


> >> Filter queries with arbitrary text values may swamp the cache in
> 1.3.
> >
> > Are you implying this won't happen in 1.4?
>
> I intended to say just this, but I was on the wrong track.
>
> > Can you point me to the feature that would mitigate this?
>
> What I was thinking of is the following:
>
> [#SOLR-475] multi-valued faceting via un-inverted field
> https://issues.apache.org/jira/browse/SOLR-475
>
> But as you can see, this refers to faceting on multi-valued fields, not
> to filter queries with arbitrary text. I was off on a tangent. Sorry.
>
> To get back to your initial mail, I tend to think that drop-down boxes
> (the values of which you control) are a nice match for the filter
> query,
> whereas user-entered text is more likely to be a candidate for the main
> query.
>
> Michael Ludwig

I agree, which brings me back tot the issue of combining dismax with standard 
queries.  It looks like we may need to create a custom query parser to get 
optimal performance.  Thanks again.




Re: nested dismax queries

2009-06-29 Thread Michael Ludwig

Ensdorf Ken schrieb:

Filter queries with arbitrary text values may swamp the cache in 1.3.


Are you implying this won't happen in 1.4?


I intended to say just this, but I was on the wrong track.


Can you point me to the feature that would mitigate this?


What I was thinking of is the following:

[#SOLR-475] multi-valued faceting via un-inverted field
https://issues.apache.org/jira/browse/SOLR-475

But as you can see, this refers to faceting on multi-valued fields, not
to filter queries with arbitrary text. I was off on a tangent. Sorry.

To get back to your initial mail, I tend to think that drop-down boxes
(the values of which you control) are a nice match for the filter query,
whereas user-entered text is more likely to be a candidate for the main
query.

Michael Ludwig


RE: nested dismax queries

2009-06-29 Thread Ensdorf Ken
> Filter queries with arbitrary text values may swamp the cache in 1.3.

Are you implying this won't happen in 1.4?  Can you point me to the feature 
that would mitigate this?

>
> Otherwise, the combinations aren't infinite. Keep the filters seperate
> in order to limit their number. Specify two simple filters instead of
> one composite filter, "fq=x:bla" and "fq=y:blub" instead of "fq=x:bla
> AND y:blub". See:
>
> filterCache/@size, queryResultCache/@size, documentCache/@size
> http://markmail.org/thread/tb6aanicpt43okcm
>
> Michael Ludwig

That's what I was thinking would make the most sense, assuming the intersection 
of the cached bitmaps is efficient enough.  Thanks for the reply.

-Ken


Re: nested dismax queries

2009-06-29 Thread Michael Ludwig

Ensdorf Ken schrieb:


For exmaple, a user might enter "Alabama Biotechnology" in the main
search box, triggering a dismax request which returns lots of
different types of results.  They may then want to refine their search
by selecting a specific industry from a drop-down box.  We handle this
by adding a filterquery (fq=) to the original query.  We have dozens
of additional fields like this - some with a finite set of discrete
values, some with arbitrary text values.  The combinations are
infinite, and I'm worried we will overwhelm the filterCache by
supporting all of these cases as filter queries.


Filter queries with arbitrary text values may swamp the cache in 1.3.

Otherwise, the combinations aren't infinite. Keep the filters seperate
in order to limit their number. Specify two simple filters instead of
one composite filter, "fq=x:bla" and "fq=y:blub" instead of "fq=x:bla
AND y:blub". See:

filterCache/@size, queryResultCache/@size, documentCache/@size
http://markmail.org/thread/tb6aanicpt43okcm

Michael Ludwig


nested dismax queries

2009-06-19 Thread Ensdorf Ken
The recent discussion of filter queries has got me thinking about other ways to 
improve performance of our app.  We have an index with a lot of fields and we 
support both single-search-box style queries using DisMax and fielded search 
using the standard query handler.  We also support using both strategies in the 
same search.

For exmaple, a user might enter "Alabama Biotechnology" in the main search box, 
triggering a dismax request which returns lots of different types of results.  
They may then want to refine their search by selecting a specific industry from 
a drop-down box.  We handle this by adding a filterquery (fq=) to the original 
query.  We have dozens of additional fields like this - some with a finite set 
of discrete values, some with arbitrary text values.  The combinations are 
infinite, and I'm worried we will overwhelm the filterCache by supporting all 
of these cases as filter queries.

I'm investigating nested queries as an alternative way to support this type of 
hybrid-search.  It appears that this only works when the top-level request 
query is a standard lucene-style query and the nested query is a dismax, and 
not the other way arround - correct me if I am wrong here.  It also appears 
that what is specified in the {!xxx} as the nested query type must be an actual 
query type and not the name of a request handler defined in solrconfig.xml.  
Thus it would seem that the nested query string must supply all of the default 
parameters for a dismax request.  Is this correct?  Is there another approach 
that I am missing?  I suppose I could create a new query parser class that 
would supply the defaults, but that seems like overkill.

Any comments are welcome, I just want to know that I am not completely off 
track and there isn't some really simple way to achieve this that I have 
overlooked.  Thanks all!

-Ken