I made a change a little while ago that allows you to turn your import file into a script:
from beancount.ingest.scripts_utils import ingest ... CONFIG = [ .. list of importer instances .. ] ... ingest(CONFIG) This makes your .import file into a script, you can run it with a "identify", "extract" or "file" subcommand. (You can still use the bean-identify, bean-extract, bean-file programs with it as before.) But why am I mentioning this? Well, because the purpose of doing that was to allow you to insert code before and/or after running the ingestion processes, and also to pass in arguments to the ingestion tools to customize it, see here: https://bitbucket.org/blais/beancount/src/353d874f678149eb4af951d1e57b92041f7bbc7b/beancount/ingest/scripts_utils.py#lines-29 What you're looking for here is the "detect_duplicates_func". You should be able to insert ingest(CONFIG, my_duplicates_func) at the bottom of your .import file and it should be invoked. If for whatever reason it doesn't fulfill your customization need, please let me know. On Sun, Sep 2, 2018 at 8:23 AM Stefano Zacchiroli <z...@upsilon.cc> wrote: > Heya, > I'm using the built-in CSV importer (beancount.ingest.importers.csv) > with bean-extract and, in spite of being documented as bare bone, it > works perfectly fine for my need :) > > The only issue I'm facing is that I want to customize the behavior of > beancount.ingest.similar.SimilarityComparator and I didn't find a way to > do so. > > (In short, I've a special metadata key, bank-label, which I import from > my CSV files and which I trust as quasi-unique ID for deduplicating > transactions. That key + transaction date would be my ideal > deduplication criteria. SimilarityComparator() is both more strict, > e.g., it requires dates to be relatively near in time, without a way to > pass a different time window; and more lax, e.g., allow amounts to vary > a bit; than what I want.) > > Ideally, I'd like to write my own SimilarityComparator and pass it down > to bean-extract via the importer configuration, but the configuration > API doesn't allow to do so ATM. Would such a generalization be welcome > to you, Martin? (as bug report and/or patch) > > Cheers > -- > Stefano Zacchiroli . z...@upsilon.cc . upsilon.cc/zack . . o . . . o . o > Computer Science Professor . CTO Software Heritage . . . . . o . . . o o > Former Debian Project Leader & OSI Board Director . . . o o o . . . o . > « the first rule of tautology club is the first rule of tautology club » > > -- > You received this message because you are subscribed to the Google Groups > "Beancount" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to beancount+unsubscr...@googlegroups.com. > To post to this group, send email to beancount@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/beancount/20180902122320.GA27063%40upsilon.cc > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Beancount" group. To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscr...@googlegroups.com. To post to this group, send email to beancount@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/beancount/CAK21%2BhP2KHCQX2%3D_wksgoXOvbhWNEO8G%2BEGDv8gcnXNag26QQw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.