is there a downside to combining search fields with copyfield?
hello everyone, can people give me their thoughts on this. currently, my schema has individual fields to search on. are there advantages or disadvantages to taking several of the individual search fields and combining them in to a single search field? would this affect search times, term tokenization or possibly other things. example of individual fields brand category partno example of a single combined search field part_info (would combine brand, category and partno) thank you for any feedback mark -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-downside-to-combining-search-fields-with-copyfield-tp3905349p3905349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is there a downside to combining search fields with copyfield?
On 4/12/2012 7:27 AM, geeky2 wrote: currently, my schema has individual fields to search on. are there advantages or disadvantages to taking several of the individual search fields and combining them in to a single search field? would this affect search times, term tokenization or possibly other things. example of individual fields brand category partno example of a single combined search field part_info (would combine brand, category and partno) You end up with one multivalued field, which means that you can only have one analyzer chain. With separate fields, each field can be analyzed differently. Also, if you are indexing and/or storing the individual fields, you may have data duplication in your index, making it larger and increasing your disk/RAM requirements. That field will have a higher termcount than the individual fields, which means that searches against it will naturally be just a little bit slower. Your application will not have to do as much work to construct a query, though. If you are already planning to use dismax/edismax, then you don't need the overhead of a copyField. You can simply provide access to (e)dismax search with the qf (and possibly pf) parameters predefined, or your application can provide these parameters. http://wiki.apache.org/solr/ExtendedDisMax Thanks, Shawn
Re: is there a downside to combining search fields with copyfield?
You end up with one multivalued field, which means that you can only have one analyzer chain. actually two of the three fields being considered for combination in to a single field ARE multivalued fields. would this be an issue? With separate fields, each field can be analyzed differently. Also, if you are indexing and/or storing the individual fields, you may have data duplication in your index, making it larger and increasing your disk/RAM requirements. this makes sense That field will have a higher termcount than the individual fields, which means that searches against it will naturally be just a little bit slower. ok Your application will not have to do as much work to construct a query, though. actually this is the primary reason this came up. If you are already planning to use dismax/edismax, then you don't need the overhead of a copyField. You can simply provide access to (e)dismax search with the qf (and possibly pf) parameters predefined, or your application can provide these parameters. http://wiki.apache.org/solr/ExtendedDisMax can you elaborate on this and how EDisMax would preclude the need for copyfield? i am using extended dismax now in my response handlers. here is an example of one of my requestHandlers requestHandler name=partItemNoSearch class=solr.SearchHandler default=false lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows5/int str name=qfitemNo^1.0/str str name=q.alt*:*/str /lst lst name=appends str name=fqitemType:1/str str name=sortrankNo asc, score desc/str /lst lst name=invariants str name=facetfalse/str /lst /requestHandler Thanks, Shawn -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-downside-to-combining-search-fields-with-copyfield-tp3905349p3906265.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is there a downside to combining search fields with copyfield?
On 4/12/2012 1:37 PM, geeky2 wrote: can you elaborate on this and how EDisMax would preclude the need for copyfield? i am using extended dismax now in my response handlers. here is an example of one of my requestHandlers requestHandler name=partItemNoSearch class=solr.SearchHandler default=false lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows5/int str name=qfitemNo^1.0/str str name=q.alt*:*/str /lst lst name=appends str name=fqitemType:1/str str name=sortrankNo asc, score desc/str /lst lst name=invariants str name=facetfalse/str /lst /requestHandler I'm not sure whether or not you can use a multiValued field as the source for copyField. This is the sort of thing that the devs tend to think of, so my initial thought would be that it should work, though I would definitely test it to be absolutely sure. Your request handler above has qf set to include the field called itemNo. If you made another that had the following in it, you could do without a copyField, by using that request handler. You would want to customize the field boosts: str name=qfbrand^2.0 category^3.0 partno/str To really leverage edismax, assuming that you are using a tokenizer that splits any of these fields into multiple tokens, and that you want to use relevancy ranking, you might want to consider defining pf as well. Some observations about your handler above... you are free to ignore this: I believe that you don't really need the ^1.0 that's in qf, because there's only one field, and 1.0 is the default boost. Also, from what I can tell, because you are only using one qf field and are not using any of the dismax-specific goodies like pf or mm, you don't really need edismax at all here. If I'm right, to remove edismax, just specify itemNo as the value for the df parameter (default field) and remove the defType. The q.alt parameter might also need to come out. Solr 3.6 (should be released soon) has deprecated the defaultSearchField and defaultOperator parameters in schema.xml, the df and q.op handler parameters are the replacement. This will be enforced in Solr 4.0. http://wiki.apache.org/solr/SearchHandler#Query_Params Thanks, Shawn