On 2020-04-10 04:23, Stefano Zacchiroli wrote:
I'm pretty sure it can be made fully general with a mixin that takes any
importer and use your categorizer before returning results. Try
something like this (again, untested):

------------------------------------------------------------------------

class CategorizerMixin():

    @staticmethod
    def categorize_entry(entry):
        entry.meta['test'] = 'foo'
        return entry

    def extract(self, file, existing_entries=None):
        entries = super().extract(file, existing_entries)
        return list(map(self.categorize_entry, entries))


class MyOFX(ofx.Importer, CategorizerMixin): pass


class MyOtherImporter(OtherImporter, CategorizerMixin): pass


CONFIG = [
    myOFX('1...', 'Assets:...'),
    MyOtherImporter(...),
]



@Martin,

I was curious if you ever got this working?

@All,

Thanks a lot Stefano for that example, it is greatly appreciated! Like Martin, I am also still learning Python. But with the help of your example, I now have it working.

One thing I wanted to point out, not to be a grammar Nazi, but for others like us who are Python novices, is that the "MyOFX" class was capitalized where it was declared, but then in the "CONFIG = [" section it was referenced as "myOFX(...)". Just a small typo, but enough to result in a "NameError: name 'myOFX' is not defined" error which I started searching the Internet about for a little while until I realized it was just a capitalization mistake. :)

Since this is one of the few threads that come up when you search the mailing list for "categorizer OFX" (and by far the most relevant, IMO) I will share my more complete working example. A few notes:

1. All I did was basically combine Stefano wrapper with my already previously working (with CSV anyway) categorizer, which itself I found some time ago searching around for "dumb categorizer."

2. I also added a few comments, mostly to remind myself of things the next time I touch this which might be a long time from now. :) Maybe they will also be helpful for others, so I left them in.

3. I changed the name of the function from "categorize_entry" to "new_categorizer" (to distinguish it from the previous "dumb_categorizer" it was based upon). Oh yes and I changed "entry" to "txn." I am not sure which is better / more correct, but this was the way all 100+ of my pre-existing rules were written, so I decided it was easier to change this a couple places in the invocation rather than all 100+ of my existing rules (even with a good editor). ;)

4. Again, I am a Python noob, but from the little I read about Python mixins (https://www.ianlewis.org/en/mixins-and-python), I think the "parent" classes to the mixin are read in (and inheritance set) from right to left, therefore we are supposed to write them like:

"Class myOFX(CategorizerMixin, ofx.Importer):" instead of like:
"Class myOFX(ofx.Importer, CategorizerMixin):"

I think in most cases it probably doesn't matter so much, but something I think I may have learned and wanted to share.

5. Be careful the order of your rules, the first one that matches will "win." I mostly keep mine alphabetical (to keep them organized) however I have had to move a few to the bottom for prioritization reasons. I make sure to note them accordingly in their own section.

6. Note the ".lower()" function, which will transform the incoming/existing txn.narration to all lower case (just for purposes of rule matching; it dosn't change it permanently). Which is also why all the rules are also lower case.

7.a. If you are looking for a reference as to what other fields might be available for you to work with in a categorizer, then you can find your answers in "beancount/core/data.py"[0], in particular the "Transaction" and "Posting" directives.

7.b. Somewhat related to above, I think this can be used as a base to extend the "dumb categorizer" quite far in custom directions, without the need of using "AI" or any sort of Bayesian "smarts" which I don't really want. Personally, I by far prefer to explicitly define my categorizing rules, and I figure that there are probably others out there who also feel the same way (I don't want any "surprises" nor do I want to fight with my machines; I like them to do /exactly/ as I tell them). ;) For those who feel differently, there is Smart Categorizer of course (already mentioned further up thread).

8. There are couple different ways you can attach the Postings to the Transaction (whether to sort values or not), which I explain at the bottom, after the example. Other than that, the rest of this post will be example.

[0]: https://github.com/beancount/beancount/blob/master/beancount/core/data.py#L168

Alright then, without further ado:

class CategorizerMixin():

    @staticmethod
    def new_categorizer(txn):

# If you want to add any meta data, do it here for all directives including eg. Balance
        # assertions (which do not have any legs):
        #
        txn.meta['meta_for_all_directives'] = 'foo'

        # At this time the txn has only one posting
        try:
            posting1 = txn.postings[0]
        except IndexError:
            return txn
        # Ex. Balance objects don't have any postings, either
        except AttributeError:
            return txn

# Otherwise to add metadata to all normal transactions (with one or more legs), add them
        # here (after things above return):
        #
        txn.meta['meta_for_transactions'] = 'bar'


        # Guess the account(s) of the other posting(s)
        #
# Standard searches, listed alphabetically. Better to be longer than shorter and end up with
        # false positives.

        if 'aldi' in txn.narration.lower():
            account = 'Expenses:Groceries'

        elif 'aliexpress' in txn.narration.lower():
            account = 'Expenses:Unknown:CheckReceipt'

        elif 'amazon' in txn.narration.lower():
            account = 'Expenses:Unknown:CheckReceipt'

        elif 'amzn' in txn.narration.lower():
            account = 'Expenses:Unknown:CheckReceipt'

        elif 'anytime fit' in txn.narration.lower():
            account = 'Expenses:Self:Fitness'

        elif 'applebees' in txn.narration.lower():
            account = 'Expenses:Social:EatingDrinkingOut'

        elif 'arby\'s' in txn.narration.lower():
            account = 'Expenses:Food:EatingOut'

elif 'atm' in txn.narration.lower(): # pretty broad but so far working, includes deposits
            account = 'Assets:Self:Cash:Wallet-C'

        # ... 100+ more 'elif' statements ...  ;)

        else:
account = 'Expenses:Unknown:NewOneLeg' # default to this if nothing else

        # Make the other posting(s)
        posting2 = posting1._replace(
            account=account,
            units=-posting1.units
        )

# Insert / Append the posting into the transaction (see note below)
        txn.postings.append(posting2)

        return txn


    def extract(self, file, existing_entries=None):
        entries = super().extract(file, existing_entries)
        return list(map(self.new_categorizer, entries))

END OF EXAMPLE

OK, so about that "Insert / Append" section. Originally the "dumb categorizer" as I found it contained the following code. What this does is to sort the Posting legs such that the smaller amount is always first. For example, subtracting some amount out of a Checking account would be first, then the + to Expense account would be second. Which is fine, until you make a deposit. I prefer that whatever is happening to the account in question (whether + or -) be listed first, and then the "other" account be listed second. It's all a matter of preference, I just wanted to point it out. I include the original code below in case you prefer the other way.

    # Insert / Append the posting into the transaction
    if posting1.units < posting2.units:
        txn.postings.append(posting2)
    else:
        txn.postings.insert(0, posting2)

    return txn

--
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beancount+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/f7077431860cb94e17678036084522a5%40isnotmyreal.name.

Reply via email to