Derek Atkins <[EMAIL PROTECTED]> writes: > Chris and I were talking on #gnucash about potentially expanding the > match mapper to use some sort of Bayesian filtering to determine the > destination account mapping. However I'm not sure how such a system > would work -- or where the necessary databases would get stored (or > even what the databases would look like).
That would be what I proposed a few weeks ago. I've been thinking about it further, I'm not sure the database would have to be stored anywhere. When the import begins you scan the last 1,000 or so transactions on the account you're importing to; load them into an in-memory database and use that. > Adding in other information to the bayesian mix would certainly be > possible, once we come up with an architecture. But you really don't > have a lot of information to work with when trying to choose a > destination account. My thinking was to use the levenshtein distance (same idea as agrep) for the text fields, the difference between the amounts in percentage, the day of month, day of week etc. The algorithm would be a bit different from e-mail spam matching though. Instead of pulling out hundreds of attributes from an e-mail message and using an index to find the weights quickly, gnucash would have only a half dozen or so attributes but would have to scan the database completely to find approximate matches. -- greg _______________________________________________ gnucash-devel mailing list [EMAIL PROTECTED] http://www.gnucash.org/cgi-bin/mailman/listinfo/gnucash-devel
