On Wed, Dec 27, 2006 at 06:05:46PM +0100, Martin Bernd Schmeil wrote: > Hi all, > > just started to play with (acts_as_)ferret a couple of hours ago, when I > learned that ferret supports fuzzy search. > > I could not find an answer to the problem i need to solve yet: > > I have a few models with one to many relations to Clients: Addresses, > Contacts, Phone numbers, etc. > > i.e. a client may have many addresses and so on. > > I need to match a "flat" (each attribute only once) client record > against all the models attributes mentioned above and get a list of > clients with descending probability of being a duplicate. > > Is this possible?
As a first try I'd build a single Ferret document for each client, containing all his contacts, addresses and phone numbers. For better results you could keep all addresses in one field, phone numbers in another, and contact names in a third field. Then take each record you suspect being a duplicate and build a query from it, using the same way of distributing the data to different fields. Running that query against the index should give you a list of possible duplicate records sorted by relevance. > Which options should I use to save memory and > performance? There seems to be no need to store the field contents themselves in the index, so this should be turned off with :store => :no when the index is created. Otherwise I'd first make it work and then look if further optimization is needed at all - Ferret is *really* fast. cheers, Jens -- webit! Gesellschaft für neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Krämer [EMAIL PROTECTED] Schnorrstraße 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66 _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

