Derek Atkins <[EMAIL PROTECTED]> writes: > The problem is that data import is "lossy", you don't necessarily have > all the import information in the GNC Transaction. For example, you > lose the QIF Category name, but you DEFINITELY want to be able to map > from QIF Category to GNC Account.
Well, my first reaction is that importing shouldn't be lossy. Having lossy steps closes options such as being able to show the user where a transaction came from and what the information the bank presented was. I could see that being useful in a dispute with a bank or debugging problems after the fact. > In order to just load txns and build the map at runtime you'd need to be > able to store all this information. You'd also lose badly when you try to go > across Accounting Periods. That sounds like a case of denormalizing the underlying data representation in order to implement a presentation level feature. I would have expected accounting periods to simply mark date boundaries or mark individual transactions as unmodifiable. I wouldn't have expected to actually move the transactions around and make accessing them require special actions. > Yea, I've read the email spam-matching schemes and you're absolutely > right that our dest-account matching would be different.. Actually I > think it would be VERY different. With spam you only need a binary > (or perhaps tri-state) answer to the question, "is this spam?". The > answers are yes, no, or maybe. Well not really. Spam-filtering is a degenerate case of a more general algorithm. "ifilter" for example is a mail filtering program that sorts mail into multiple folders automatically using the same algorithm. The difference I was pointing out is that while those implementations are optimized for having lots of fields and lots of data gnucash has a lot less data to work with. As a result those implementations use indexed databases but I'm thinking Gnucash will just iterate through a fixed number of transactions of history and apply the matching heurstics to every entry. I'm hoping it will be just as effective because the data is much more structured. e-mail is free-form text, the transaction information is at least trying to identify itself. > Choosing a destination account is much more tricky -- you've got > potentially hundreds of choices to match into. If you have ideas for > a decent matching algorithm I'd love to hear it. Code would be > better, but we should work on designs before coding, IMHO. I had a plan for a matching heuristic, but I think the bayesian filter is a better idea. Any hard coded heuristic will work well for some people but fail completely for others. A bayesian filter should adapt to various systems with different data formats much better. -- greg _______________________________________________ gnucash-devel mailing list [EMAIL PROTECTED] http://www.gnucash.org/cgi-bin/mailman/listinfo/gnucash-devel
