Re: Categorizing transactions automatically on import (OFX categorizer, dumb categorizer)

TRS-80 Wed, 15 Jul 2020 07:14:49 -0700

On 2020-04-10 04:23, Stefano Zacchiroli wrote:

I'm pretty sure it can be made fully general with a mixin that takesany

importer and use your categorizer before returning results. Try
something like this (again, untested):


------------------------------------------------------------------------

class CategorizerMixin():

    @staticmethod
    def categorize_entry(entry):
        entry.meta['test'] = 'foo'
        return entry

    def extract(self, file, existing_entries=None):
        entries = super().extract(file, existing_entries)
        return list(map(self.categorize_entry, entries))


class MyOFX(ofx.Importer, CategorizerMixin): pass


class MyOtherImporter(OtherImporter, CategorizerMixin): pass


CONFIG = [
    myOFX('1...', 'Assets:...'),
    MyOtherImporter(...),
]



@Martin,

I was curious if you ever got this working?

@All,

Thanks a lot Stefano for that example, it is greatly appreciated! LikeMartin, I am also still learning Python. But with the help of yourexample, I now have it working.

One thing I wanted to point out, not to be a grammar Nazi, but forothers like us who are Python novices, is that the "MyOFX" class wascapitalized where it was declared, but then in the "CONFIG = [" sectionit was referenced as "myOFX(...)". Just a small typo, but enough toresult in a "NameError: name 'myOFX' is not defined" error which Istarted searching the Internet about for a little while until I realizedit was just a capitalization mistake. :)

Since this is one of the few threads that come up when you search themailing list for "categorizer OFX" (and by far the most relevant, IMO) Iwill share my more complete working example. A few notes:

1. All I did was basically combine Stefano wrapper with my alreadypreviously working (with CSV anyway) categorizer, which itself I foundsome time ago searching around for "dumb categorizer."

2. I also added a few comments, mostly to remind myself of things thenext time I touch this which might be a long time from now. :) Maybethey will also be helpful for others, so I left them in.

3. I changed the name of the function from "categorize_entry" to"new_categorizer" (to distinguish it from the previous"dumb_categorizer" it was based upon). Oh yes and I changed "entry" to"txn." I am not sure which is better / more correct, but this was theway all 100+ of my pre-existing rules were written, so I decided it waseasier to change this a couple places in the invocation rather than all100+ of my existing rules (even with a good editor). ;)

4. Again, I am a Python noob, but from the little I read about Pythonmixins (https://www.ianlewis.org/en/mixins-and-python), I think the"parent" classes to the mixin are read in (and inheritance set) fromright to left, therefore we are supposed to write them like:


"Class myOFX(CategorizerMixin, ofx.Importer):" instead of like:
"Class myOFX(ofx.Importer, CategorizerMixin):"

I think in most cases it probably doesn't matter so much, but somethingI think I may have learned and wanted to share.

5. Be careful the order of your rules, the first one that matches will"win." I mostly keep mine alphabetical (to keep them organized) howeverI have had to move a few to the bottom for prioritization reasons. Imake sure to note them accordingly in their own section.

6. Note the ".lower()" function, which will transform theincoming/existing txn.narration to all lower case (just for purposes ofrule matching; it dosn't change it permanently). Which is also why allthe rules are also lower case.

7.a. If you are looking for a reference as to what other fields might beavailable for you to work with in a categorizer, then you can find youranswers in "beancount/core/data.py"[0], in particular the "Transaction"and "Posting" directives.

7.b. Somewhat related to above, I think this can be used as a base toextend the "dumb categorizer" quite far in custom directions, withoutthe need of using "AI" or any sort of Bayesian "smarts" which I don'treally want. Personally, I by far prefer to explicitly define mycategorizing rules, and I figure that there are probably others outthere who also feel the same way (I don't want any "surprises" nor do Iwant to fight with my machines; I like them to do /exactly/ as I tellthem). ;) For those who feel differently, there is Smart Categorizerof course (already mentioned further up thread).

8. There are couple different ways you can attach the Postings to theTransaction (whether to sort values or not), which I explain at thebottom, after the example. Other than that, the rest of this post willbe example.

[0]:https://github.com/beancount/beancount/blob/master/beancount/core/data.py#L168


Alright then, without further ado:

class CategorizerMixin():

    @staticmethod
    def new_categorizer(txn):

# If you want to add any meta data, do it here for alldirectives including eg. Balance

        # assertions (which do not have any legs):
        #
        txn.meta['meta_for_all_directives'] = 'foo'

        # At this time the txn has only one posting
        try:
            posting1 = txn.postings[0]
        except IndexError:
            return txn
        # Ex. Balance objects don't have any postings, either
        except AttributeError:
            return txn

# Otherwise to add metadata to all normal transactions (with oneor more legs), add them

        # here (after things above return):
        #
        txn.meta['meta_for_transactions'] = 'bar'


        # Guess the account(s) of the other posting(s)
        #

# Standard searches, listed alphabetically. Better to be longerthan shorter and end up with

        # false positives.

        if 'aldi' in txn.narration.lower():
            account = 'Expenses:Groceries'

        elif 'aliexpress' in txn.narration.lower():
            account = 'Expenses:Unknown:CheckReceipt'

        elif 'amazon' in txn.narration.lower():
            account = 'Expenses:Unknown:CheckReceipt'

        elif 'amzn' in txn.narration.lower():
            account = 'Expenses:Unknown:CheckReceipt'

        elif 'anytime fit' in txn.narration.lower():
            account = 'Expenses:Self:Fitness'

        elif 'applebees' in txn.narration.lower():
            account = 'Expenses:Social:EatingDrinkingOut'

        elif 'arby\'s' in txn.narration.lower():
            account = 'Expenses:Food:EatingOut'

elif 'atm' in txn.narration.lower(): # pretty broad but sofar working, includes deposits

            account = 'Assets:Self:Cash:Wallet-C'

        # ... 100+ more 'elif' statements ...  ;)

        else:

account = 'Expenses:Unknown:NewOneLeg' # default to thisif nothing else


        # Make the other posting(s)
        posting2 = posting1._replace(
            account=account,
            units=-posting1.units
        )

# Insert / Append the posting into the transaction (see notebelow)

        txn.postings.append(posting2)

        return txn


    def extract(self, file, existing_entries=None):
        entries = super().extract(file, existing_entries)
        return list(map(self.new_categorizer, entries))

END OF EXAMPLE

OK, so about that "Insert / Append" section. Originally the "dumbcategorizer" as I found it contained the following code. What this doesis to sort the Posting legs such that the smaller amount is alwaysfirst. For example, subtracting some amount out of a Checking accountwould be first, then the + to Expense account would be second. Which isfine, until you make a deposit. I prefer that whatever is happening tothe account in question (whether + or -) be listed first, and then the"other" account be listed second. It's all a matter of preference, Ijust wanted to point it out. I include the original code below in caseyou prefer the other way.


    # Insert / Append the posting into the transaction
    if posting1.units < posting2.units:
        txn.postings.append(posting2)
    else:
        txn.postings.insert(0, posting2)

    return txn

--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beancount+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/f7077431860cb94e17678036084522a5%40isnotmyreal.name.

Re: Categorizing transactions automatically on import (OFX categorizer, dumb categorizer)

Reply via email to