Re: [GNC-dev] Understanding the bayesian import matching algorithm

2020-07-02 Thread Derek Atkins
Hi,

On Thu, July 2, 2020 3:10 pm, Christian Gruber wrote:
> Hi,
>
> while further studying the bayesian import matching algorithm I'm now at
> the point, where I wanted to understand, how the bayes formula is
> applied to the problem of matching transactions to accounts using
> tokens. But I need further information, since it doesn't come clear to
> me what is really calculated there.
>
> The implementation can be found in the following functions in Account.cpp:
>
>   * get_first_pass_probabilities()
>   * build_probabilities()
>   * highest_probability()
>
> Actually, the latter could be omitted as it only selects the account
> with the highest matching probability.
>
> Studying the code and the rare comments on the implementation it seems
> to be a variant of the naive bayes classifier
> 
> with the tokens used as (independent) "features" and the accounts used
> as "classes". But comparing this algorithm to the code leaves several
> questions open.
>
> Does anybody know a more precise algorithm description, on which the
> implementation in GnuCash is based on?

I'm not sure how detailed you need right now; I helped with some of the
initial implementations but I'm sure it's all been rewritten by now.  The
idea is that the description/memo strings are tokenized and used as inputs
into the probabilities that the transaction would go into the target
account.  If you have a high-enough probability it will auto-select that
account for that transaction.

When you assign an account (during import), it adds those tokens to the
account's list of tokens for future guessing.

Did you have a specific question about the process?  For the complete
algorithm you can look at the code.  It's really not all that complicated
(or at least it wasn't when first implemented).

> Regards,
> Christian

-derek
-- 
   Derek Atkins 617-623-3745
   de...@ihtfp.com www.ihtfp.com
   Computer and Internet Security Consultant

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


[GNC-dev] Understanding the bayesian import matching algorithm

2020-07-02 Thread Christian Gruber

Hi,

while further studying the bayesian import matching algorithm I'm now at 
the point, where I wanted to understand, how the bayes formula is 
applied to the problem of matching transactions to accounts using 
tokens. But I need further information, since it doesn't come clear to 
me what is really calculated there.


The implementation can be found in the following functions in Account.cpp:

 * get_first_pass_probabilities()
 * build_probabilities()
 * highest_probability()

Actually, the latter could be omitted as it only selects the account 
with the highest matching probability.


Studying the code and the rare comments on the implementation it seems 
to be a variant of the naive bayes classifier 
 
with the tokens used as (independent) "features" and the accounts used 
as "classes". But comparing this algorithm to the code leaves several 
questions open.


Does anybody know a more precise algorithm description, on which the 
implementation in GnuCash is based on?


Regards,
Christian

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] About budgets in 3.8, 3.9 and 3.10

2020-07-02 Thread Christopher Lam
Now that 3.11 is truly out, the budget editor and report should now behave
exactly as in 3.7
4.0 should also behave similarly. Please verify, and any bug reports and
fixes will apply onto 4.x series.

On Fri, 8 May 2020 at 12:58, Christopher Lam 
wrote:

> 3.11 being due in 3 weeks' time, the candidate fix for budgets is merged
> in daily builds. The next build after 4th May in
> https://code.gnucash.org/builds/win32/maint/ will have the budgets
> reverted to 3.7 behaviour.
>
> On Wed, 29 Apr 2020 at 20:05, John Ralls  wrote:
>
>>
>>
>> > On Apr 29, 2020, at 11:45 AM, Phil Longstaff 
>> wrote:
>> >
>> > Agreed.
>> >
>> > It is correct that Assets = Liabilities + Equity uses only positive
>> values.
>> > However, each balance is a credit balance or a debit balance. It is
>> > perfectly reasonable to associate one of those types of balance with
>> > positive numbers and the other with negative numbers.
>> >
>>
>> It is. What's not reasonable is to have a preference (as opposed to a
>> book option) that changes which is negative, especially for stored values.
>> Even having a book option is dicey unless that option affects exactly one
>> place in all of GnuCash's code base.
>>
>> Regards,
>> John Ralls
>>
>> ___
>> gnucash-devel mailing list
>> gnucash-devel@gnucash.org
>> https://lists.gnucash.org/mailman/listinfo/gnucash-devel
>>
>
___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel