On 21/12/2021 00:55, Aaron Stacy wrote:
Hi, I'm looking for suggestions for categorizing spending (not so much things like paycheck, brokerage transactions, etc, but stuff like credit card spending for budgeting). My ledger has around 2800 transactions over about 2 years, so it's not a ton of data, but it seems like enough that I could leverage something smarter than just string matching the transaction narrations.

Does anyone have recommendations for categorizing spending?

I'm thinking of applying a full text search index as follows:

- Each expense account is a "document".
- The document contents is the narration of every transaction for that account. - To categorize a new transaction, use an engine like Lucene <https://lucene.apache.org> to or sklearn.TfidfVectorizer <http://sklearn.TfidfVectorizer> and pick the most likely account.

Any thoughts on this approach? (aside from being over-engineered. I'm an engineer, IDK what to tell you it's what I do)

I use Beancount and to assign accounts to transactions I use a machine learning classifier trained on my existing ledger implemented using sklearn.

This works reasonably well for recurring transactions but is not infallible. I found that putting a threshold on the confidence score from the classifier is essential for not ending up with completely bogus account assignments.

Cheers,
Dan

--

--- You received this message because you are subscribed to the Google Groups "Ledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ledger-cli/c34a24e0-2b47-9f73-f94a-2687b6b64360%40grinta.net.

Reply via email to