On 2020-04-10 04:23, Stefano Zacchiroli wrote:
I'm pretty sure it can be made fully general with a mixin that takes
any
importer and use your categorizer before returning results. Try
something like this (again, untested):
------------------------------------------------------------------------
class CategorizerMixin():
@staticmethod
def categorize_entry(entry):
entry.meta['test'] = 'foo'
return entry
def extract(self, file, existing_entries=None):
entries = super().extract(file, existing_entries)
return list(map(self.categorize_entry, entries))
class MyOFX(ofx.Importer, CategorizerMixin): pass
class MyOtherImporter(OtherImporter, CategorizerMixin): pass
CONFIG = [
myOFX('1...', 'Assets:...'),
MyOtherImporter(...),
]
@Martin,
I was curious if you ever got this working?
@All,
Thanks a lot Stefano for that example, it is greatly appreciated! Like
Martin, I am also still learning Python. But with the help of your
example, I now have it working.
One thing I wanted to point out, not to be a grammar Nazi, but for
others like us who are Python novices, is that the "MyOFX" class was
capitalized where it was declared, but then in the "CONFIG = [" section
it was referenced as "myOFX(...)". Just a small typo, but enough to
result in a "NameError: name 'myOFX' is not defined" error which I
started searching the Internet about for a little while until I realized
it was just a capitalization mistake. :)
Since this is one of the few threads that come up when you search the
mailing list for "categorizer OFX" (and by far the most relevant, IMO) I
will share my more complete working example. A few notes:
1. All I did was basically combine Stefano wrapper with my already
previously working (with CSV anyway) categorizer, which itself I found
some time ago searching around for "dumb categorizer."
2. I also added a few comments, mostly to remind myself of things the
next time I touch this which might be a long time from now. :) Maybe
they will also be helpful for others, so I left them in.
3. I changed the name of the function from "categorize_entry" to
"new_categorizer" (to distinguish it from the previous
"dumb_categorizer" it was based upon). Oh yes and I changed "entry" to
"txn." I am not sure which is better / more correct, but this was the
way all 100+ of my pre-existing rules were written, so I decided it was
easier to change this a couple places in the invocation rather than all
100+ of my existing rules (even with a good editor). ;)
4. Again, I am a Python noob, but from the little I read about Python
mixins (https://www.ianlewis.org/en/mixins-and-python), I think the
"parent" classes to the mixin are read in (and inheritance set) from
right to left, therefore we are supposed to write them like:
"Class myOFX(CategorizerMixin, ofx.Importer):" instead of like:
"Class myOFX(ofx.Importer, CategorizerMixin):"
I think in most cases it probably doesn't matter so much, but something
I think I may have learned and wanted to share.
5. Be careful the order of your rules, the first one that matches will
"win." I mostly keep mine alphabetical (to keep them organized) however
I have had to move a few to the bottom for prioritization reasons. I
make sure to note them accordingly in their own section.
6. Note the ".lower()" function, which will transform the
incoming/existing txn.narration to all lower case (just for purposes of
rule matching; it dosn't change it permanently). Which is also why all
the rules are also lower case.
7.a. If you are looking for a reference as to what other fields might be
available for you to work with in a categorizer, then you can find your
answers in "beancount/core/data.py"[0], in particular the "Transaction"
and "Posting" directives.
7.b. Somewhat related to above, I think this can be used as a base to
extend the "dumb categorizer" quite far in custom directions, without
the need of using "AI" or any sort of Bayesian "smarts" which I don't
really want. Personally, I by far prefer to explicitly define my
categorizing rules, and I figure that there are probably others out
there who also feel the same way (I don't want any "surprises" nor do I
want to fight with my machines; I like them to do /exactly/ as I tell
them). ;) For those who feel differently, there is Smart Categorizer
of course (already mentioned further up thread).
8. There are couple different ways you can attach the Postings to the
Transaction (whether to sort values or not), which I explain at the
bottom, after the example. Other than that, the rest of this post will
be example.
[0]:
https://github.com/beancount/beancount/blob/master/beancount/core/data.py#L168
Alright then, without further ado:
class CategorizerMixin():
@staticmethod
def new_categorizer(txn):
# If you want to add any meta data, do it here for all
directives including eg. Balance
# assertions (which do not have any legs):
#
txn.meta['meta_for_all_directives'] = 'foo'
# At this time the txn has only one posting
try:
posting1 = txn.postings[0]
except IndexError:
return txn
# Ex. Balance objects don't have any postings, either
except AttributeError:
return txn
# Otherwise to add metadata to all normal transactions (with one
or more legs), add them
# here (after things above return):
#
txn.meta['meta_for_transactions'] = 'bar'
# Guess the account(s) of the other posting(s)
#
# Standard searches, listed alphabetically. Better to be longer
than shorter and end up with
# false positives.
if 'aldi' in txn.narration.lower():
account = 'Expenses:Groceries'
elif 'aliexpress' in txn.narration.lower():
account = 'Expenses:Unknown:CheckReceipt'
elif 'amazon' in txn.narration.lower():
account = 'Expenses:Unknown:CheckReceipt'
elif 'amzn' in txn.narration.lower():
account = 'Expenses:Unknown:CheckReceipt'
elif 'anytime fit' in txn.narration.lower():
account = 'Expenses:Self:Fitness'
elif 'applebees' in txn.narration.lower():
account = 'Expenses:Social:EatingDrinkingOut'
elif 'arby\'s' in txn.narration.lower():
account = 'Expenses:Food:EatingOut'
elif 'atm' in txn.narration.lower(): # pretty broad but so
far working, includes deposits
account = 'Assets:Self:Cash:Wallet-C'
# ... 100+ more 'elif' statements ... ;)
else:
account = 'Expenses:Unknown:NewOneLeg' # default to this
if nothing else
# Make the other posting(s)
posting2 = posting1._replace(
account=account,
units=-posting1.units
)
# Insert / Append the posting into the transaction (see note
below)
txn.postings.append(posting2)
return txn
def extract(self, file, existing_entries=None):
entries = super().extract(file, existing_entries)
return list(map(self.new_categorizer, entries))
END OF EXAMPLE
OK, so about that "Insert / Append" section. Originally the "dumb
categorizer" as I found it contained the following code. What this does
is to sort the Posting legs such that the smaller amount is always
first. For example, subtracting some amount out of a Checking account
would be first, then the + to Expense account would be second. Which is
fine, until you make a deposit. I prefer that whatever is happening to
the account in question (whether + or -) be listed first, and then the
"other" account be listed second. It's all a matter of preference, I
just wanted to point it out. I include the original code below in case
you prefer the other way.
# Insert / Append the posting into the transaction
if posting1.units < posting2.units:
txn.postings.append(posting2)
else:
txn.postings.insert(0, posting2)
return txn
--
You received this message because you are subscribed to the Google Groups
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to beancount+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/beancount/f7077431860cb94e17678036084522a5%40isnotmyreal.name.