Re: [GNC-dev] Normalizing/obfuscating live data
On 04/02/2019 08:40, Christian Stimming wrote: In a real data file there are still more places with text that need to be modified, e.g. the scheduled transaction templates, bayes import matching, and such. Also, the dates are left unmodified which may or may not be a problem. Stripping out scheduled tx should be OK unless they are specific to the problem being reported. Because gnc is, by definition, a tx stream processor future tx are not normally noticed until encountered. (Personally I love the ability to generate tx in the future, it allows me to model my immediate monetary future. A very positive thing.) I think all of the import stuff should be stripped too. Dates are more interesting, Christian people (right or wrong) place value on dates (in my culture it will be 14 Feb soon) How about this as a proposal? If the dates in the file are in sequence it usually won't matter how much time is in between each date. Why do I say this? Because gnc is a *sequential* tx processor and as such the *sequence* of transactions can be important but the actual dates often aren't. If anyone is struggling with this conceptually, in a gnc file the date defines the order in which a tx is processed, that is what a transaction stream program does. The tx may be in the wrong order (this is part of the reason why gnc does the weird thing of loading everything into memory, it can't trust the file!) so it has to work out which tx is first, which one comes next and so on. I don't think I am teaching ChristianS anything, just explaining stuff. So, I think the dates can be modified so long as the *order* of dates and times is left extant. Proposal: make the first date random (after 1971 or some later date for technical reasons), treat the tx in date+time sequence adding one day each time a difference is noted. This will produce a time compressed file that obfuscates when someone actually did something. Thoughts? -- Wm ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing/obfuscating live data
Am Sonntag, 3. Februar 2019, 17:03:06 CET schrieb John Ralls: > > On Feb 2, 2019, at 8:10 PM, David Carlson > > wrote: > > > > OK, I want to try https://wiki.gnucash.org/wiki/ObfuscateScript but I am > > not a computer programmer. I have no clue how to use it. Can someone > > help me? Thanks for the pointer. I've copied this script into our git at ./util/obfuscate.pl The script manages to process my 50MB file in approx. 30 seconds. The account and txn texts are all nicely obfuscated. The script also contains a random obfuscation for the amounts, which will simply cause lots of transactions to the equity account upon loading, but this could be enabled as well. In a real data file there are still more places with text that need to be modified, e.g. the scheduled transaction templates, bayes import matching, and such. Also, the dates are left unmodified which may or may not be a problem. Anyone please feel free to check with this script and add more obfuscation steps. I would like to achieve a state where the script will obfuscate my personal data file enough so that I feel I can make it available as a test file. Usage of the script: Save your normal file in uncompressed form to XML file. Then, ./obfuscate.pl inputfile.gnucash > outputfile.gnucash (Contrary to the comments, the output is just written to stdout, not in-place into the file.) Thanks for the idea here! Regards, Christian > Run it from a command line using perl, assuming here that you have > Strawberry installed on C: > > c:\strawberry\perl\bin\perl.exe ObfuscateScript path/to/myfile.gnucash > > Note that it rewrites the file in place, so make a copy and run it on that. > The file needs to be uncompressed. > > Regards, > John Ralls > > ___ > gnucash-devel mailing list > gnucash-devel@gnucash.org > https://lists.gnucash.org/mailman/listinfo/gnucash-devel ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel