On 21/12/2021 00:55, Aaron Stacy wrote:
Hi, I'm looking for suggestions for categorizing spending (not so much
things like paycheck, brokerage transactions, etc, but stuff like credit
card spending for budgeting). My ledger has around 2800 transactions
over about 2 years, so it's not a ton of data, but it seems like enough
that I could leverage something smarter than just string matching
the transaction narrations.
Does anyone have recommendations for categorizing spending?
I'm thinking of applying a full text search index as follows:
- Each expense account is a "document".
- The document contents is the narration of every transaction for that
account.
- To categorize a new transaction, use an engine like Lucene
<https://lucene.apache.org> to or sklearn.TfidfVectorizer
<http://sklearn.TfidfVectorizer> and pick the most likely account.
Any thoughts on this approach? (aside from being over-engineered. I'm an
engineer, IDK what to tell you it's what I do)
I use Beancount and to assign accounts to transactions I use a machine
learning classifier trained on my existing ledger implemented using sklearn.
This works reasonably well for recurring transactions but is not
infallible. I found that putting a threshold on the confidence score
from the classifier is essential for not ending up with completely bogus
account assignments.
Cheers,
Dan
--
---
You received this message because you are subscribed to the Google Groups "Ledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/ledger-cli/c34a24e0-2b47-9f73-f94a-2687b6b64360%40grinta.net.