Hi! wildcard queries have a built in upper limit of terms they search for, which by default is set to 512 (according to http://ferret.davebalmain.com/api/classes/Ferret/Search/WildcardQuery.html).
So when you query for asdf*, Ferret expands this to all terms in your index starting with asdf, but will stop after collecting 512 terms, then go and retrieve all documents containing these 512 terms, obviously missing those that would in theory match your query, but do this by containing a matching term that wasn't retrieved in the first step. Of course you can set the max_term count to a higher value, but in the long run this isn't really a solution. If I understand you correctly, your tuple field right now has a single term for each document, and that term is different for each document. Splitting up your tuple values into several different terms could help to reduce the number of terms needed to fetch for a wild card query. Cheers, Jens On Mon, Nov 05, 2007 at 04:11:53PM -0500, Noah M. Daniels wrote: > Hi, > Apologies for reposting this for those who read this via ruby-forum, > but it didn't make it to the list before, and the list seems more > active... > I'm using ferret (via acts_as_ferret) in a somewhat unorthodox > manner and am having a strange wildcard problem. Before anyone wonders > why we're doing things this way, the answer is basically that it lets > us precompute what would be expensive database queries and store the > results in a simple way (ferret index) prior to pushing the static > data to our production server. > Basically, I've got two (for the sake of simplicity) models, both of > which are indexed on a similar (but separate) non-model field. > However, one of those two models does not seem to get the proper > number of results for a wildcard search: > First of all, there's a non-indexed model called ProductTuple that's > got a supplier_id as well as a product_category_id and > product_material_id as well as some other id fields that aren't really > important here. Thus, a ProductTuple has foreign key relationships to > Suppliers and ProductCategories and ProductMaterials, but for ferret > purposes just think of those foreign keys as what they are - ids (e.g. > integers). > The first model, Supplier, is ferret-indexed on several fields, such > as the supplier name and supplier country, as well as the > 'ferret_product_tuples' non-model field. ferret_product_tuples simply > takes all the product tuples for a supplier and concatenates their > product_category_id, product_material_id, etc. with delimiters. > So, for a product tuple with product_category_id 82, > product_material_id 88, and undefined product_technique_id, the > resulting part of the ferret_product_tuple string would look like > x00082_00088_00000x (where we use 00000 to indicate null). the xs are > used as anchors, essentially, as a given supplier's > ferret_product_tuple string might look like 'x00082_00088_00000x > x00000_00081_00013x'. > Now, the ferret query that gets constructed when we do the relevant > queries simply looks like: > 'ferret_product_tuple:x00082_?????_?????x' > and this would, in the above instance, match that supplier. > Everything I've described works _perfectly_, EXCEPT... > we also index product_categories on this same string. So product > category #82 would have a bunch of ferret_product_tuple strings that > start out x00082 and have various things in the other positions. > Here's what's strange... a product_category query for > 'ferret_product_tuple:x?????_?????_?????x' should return ALL product > categories, right? Yet it only returns six. A product category query > for 'ferret_product_tuple:x?????_00081_?????x' should return all the > product categories that share product_tuples with product_material > #81, but in fact returns only a small number of categories. Yet making > the wildcard match MORE restrictive by substituting > 'ferret_product_tuple:x00082_00081_?????x' into that query yields > product_category #82, which is erroneously not included in the 6 > results for 'ferret_product_tuple:x?????_00081_?????x'. > So, have I stumbled upon a bug in the wildcard handling? My initial > thought was that the different analyzer I was using for the > product_category index was the culprit, but I changed that analyzer > out to no effect, so I've ruled that out. > Any ideas? Thanks! > _______________________________________________ > Ferret-talk mailing list > [email protected] > http://rubyforge.org/mailman/listinfo/ferret-talk > -- Jens Krämer webit! Gesellschaft für neue Medien mbH Schnorrstraße 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 [EMAIL PROTECTED] | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

