On Wed, Dec 27, 2006 at 06:05:46PM +0100, Martin Bernd Schmeil wrote:
> Hi all,
> 
> just started to play with (acts_as_)ferret a couple of hours ago, when I 
> learned that ferret supports fuzzy search.
> 
> I could not find an answer to the problem i need to solve yet:
> 
> I have a few models with one to many relations to Clients: Addresses, 
> Contacts, Phone numbers, etc.
> 
> i.e. a client may have many addresses and so on.
> 
> I need to match a "flat" (each attribute only once) client record 
> against all the models attributes mentioned above and get a list of 
> clients with descending probability of being a duplicate.
> 
> Is this possible? 

As a first try I'd build a single Ferret document for each client,
containing all his contacts, addresses and phone numbers. For better
results you could keep all addresses in one field, phone numbers in
another, and contact names in a third field.

Then take each record you suspect being a duplicate and build a query 
from it, using the same way of distributing the data to different fields.
Running that query against the index should give you a list of possible 
duplicate records sorted by relevance.

> Which options should I use to save memory and 
> performance?

There seems to be no need to store the field contents themselves in the
index, so this should be turned off with :store => :no when the index is
created. Otherwise I'd first make it work and then look if further
optimization is needed at all - Ferret is *really* fast.

cheers,
Jens


-- 
webit! Gesellschaft für neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer       [EMAIL PROTECTED]
Schnorrstraße 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to