is there a downside to combining search fields with copyfield?

2012-04-12 Thread geeky2
hello everyone,

can people give me their thoughts on this.

currently, my schema has individual fields to search on.

are there advantages or disadvantages to taking several of the individual
search fields and combining them in to a single search field?

would this affect search times, term tokenization or possibly other things.

example of individual fields

brand
category
partno

example of a single combined search field

part_info (would combine brand, category and partno)

thank you for any feedback
mark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-a-downside-to-combining-search-fields-with-copyfield-tp3905349p3905349.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is there a downside to combining search fields with copyfield?

2012-04-12 Thread Shawn Heisey

On 4/12/2012 7:27 AM, geeky2 wrote:

currently, my schema has individual fields to search on.

are there advantages or disadvantages to taking several of the individual
search fields and combining them in to a single search field?

would this affect search times, term tokenization or possibly other things.

example of individual fields

brand
category
partno

example of a single combined search field

part_info (would combine brand, category and partno)


You end up with one multivalued field, which means that you can only 
have one analyzer chain.  With separate fields, each field can be 
analyzed differently.  Also, if you are indexing and/or storing the 
individual fields, you may have data duplication in your index, making 
it larger and increasing your disk/RAM requirements.  That field will 
have a higher termcount than the individual fields, which means that 
searches against it will naturally be just a little bit slower.  Your 
application will not have to do as much work to construct a query, though.


If you are already planning to use dismax/edismax, then you don't need 
the overhead of a copyField.  You can simply provide access to (e)dismax 
search with the qf (and possibly pf) parameters predefined, or your 
application can provide these parameters.


http://wiki.apache.org/solr/ExtendedDisMax

Thanks,
Shawn



Re: is there a downside to combining search fields with copyfield?

2012-04-12 Thread geeky2


You end up with one multivalued field, which means that you can only
have one analyzer chain.


actually two of the three fields being considered for combination in to a
single field ARE multivalued fields.

would this be an issue?


  With separate fields, each field can be
analyzed differently.  Also, if you are indexing and/or storing the
individual fields, you may have data duplication in your index, making
it larger and increasing your disk/RAM requirements.


this makes sense



  That field will
have a higher termcount than the individual fields, which means that
searches against it will naturally be just a little bit slower.


ok


  Your
application will not have to do as much work to construct a query, though.


actually this is the primary reason this came up.  


If you are already planning to use dismax/edismax, then you don't need
the overhead of a copyField.  You can simply provide access to (e)dismax
search with the qf (and possibly pf) parameters predefined, or your
application can provide these parameters.

http://wiki.apache.org/solr/ExtendedDisMax


can you elaborate on this and how EDisMax would preclude the need for
copyfield?

i am using extended dismax now in my response handlers.

here is an example of one of my requestHandlers

  requestHandler name=partItemNoSearch class=solr.SearchHandler
default=false
lst name=defaults
  str name=defTypeedismax/str
  str name=echoParamsall/str
  int name=rows5/int
  str name=qfitemNo^1.0/str
  str name=q.alt*:*/str
/lst
lst name=appends
  str name=fqitemType:1/str
  str name=sortrankNo asc, score desc/str
/lst
lst name=invariants
  str name=facetfalse/str
/lst
  /requestHandler






Thanks,
Shawn 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-a-downside-to-combining-search-fields-with-copyfield-tp3905349p3906265.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: is there a downside to combining search fields with copyfield?

2012-04-12 Thread Shawn Heisey

On 4/12/2012 1:37 PM, geeky2 wrote:

can you elaborate on this and how EDisMax would preclude the need for
copyfield?

i am using extended dismax now in my response handlers.

here is an example of one of my requestHandlers

   requestHandler name=partItemNoSearch class=solr.SearchHandler
default=false
 lst name=defaults
   str name=defTypeedismax/str
   str name=echoParamsall/str
   int name=rows5/int
   str name=qfitemNo^1.0/str
   str name=q.alt*:*/str
 /lst
 lst name=appends
   str name=fqitemType:1/str
   str name=sortrankNo asc, score desc/str
 /lst
 lst name=invariants
   str name=facetfalse/str
 /lst
   /requestHandler


I'm not sure whether or not you can use a multiValued field as the 
source for copyField.  This is the sort of thing that the devs tend to 
think of, so my initial thought would be that it should work, though I 
would definitely test it to be absolutely sure.


Your request handler above has qf set to include the field called 
itemNo.  If you made another that had the following in it, you could do 
without a copyField, by using that request handler.  You would want to 
customize the field boosts:


str name=qfbrand^2.0 category^3.0 partno/str

To really leverage edismax, assuming that you are using a tokenizer that 
splits any of these fields into multiple tokens, and that you want to 
use relevancy ranking, you might want to consider defining pf as well.


Some observations about your handler above... you are free to ignore 
this: I believe that you don't really need the ^1.0 that's in qf, 
because there's only one field, and 1.0 is the default boost.  Also, 
from what I can tell, because you are only using one qf field and are 
not using any of the dismax-specific goodies like pf or mm, you don't 
really need edismax at all here.  If I'm right, to remove edismax, just 
specify itemNo as the value for the df parameter (default field) and 
remove the defType.  The q.alt parameter might also need to come out.


Solr 3.6 (should be released soon) has deprecated the defaultSearchField 
and defaultOperator parameters in schema.xml, the df and q.op handler 
parameters are the replacement.  This will be enforced in Solr 4.0.


http://wiki.apache.org/solr/SearchHandler#Query_Params

Thanks,
Shawn