Re: [GNC-dev] Normalizing/obfuscating live data

2019-02-11 Thread Wm via gnucash-devel

On 04/02/2019 08:40, Christian Stimming wrote:

In a real data file there are still more places with text that need to be
modified, e.g. the scheduled transaction templates, bayes import matching, and
such. Also, the dates are left unmodified which may or may not be a problem.


Stripping out scheduled tx should be OK unless they are specific to the 
problem being reported.


Because gnc is, by definition, a tx stream processor future tx are not 
normally noticed until encountered.  (Personally I love the ability to 
generate tx in the future, it allows me to model my immediate monetary 
future.  A very positive thing.)


I think all of the import stuff should be stripped too.

Dates are more interesting, Christian

people (right or wrong) place value on dates (in my culture it will be 
14 Feb soon)


How about this as a proposal?

If the dates in the file are in sequence it usually won't matter how 
much time is in between each date.


Why do I say this?

Because gnc is a *sequential* tx processor and as such the *sequence* of 
transactions can be important but the actual dates often aren't.


If anyone is struggling with this conceptually, in a gnc file the date 
defines the order in which a tx is processed, that is what a transaction 
stream program does.  The tx may be in the wrong order (this is part of 
the reason why gnc does the weird thing of loading everything into 
memory, it can't trust the file!) so it has to work out which tx is 
first, which one comes next and so on.


I don't think I am teaching ChristianS anything, just explaining stuff.

So, I think the dates can be modified so long as the *order* of dates 
and times is left extant.


Proposal: make the first date random (after 1971 or some later date for 
technical reasons), treat the tx in date+time sequence adding one day 
each time a difference is noted.  This will produce a time compressed 
file that obfuscates when someone actually did something.


Thoughts?

--
Wm

___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel


Re: [GNC-dev] Normalizing/obfuscating live data

2019-02-04 Thread Christian Stimming
Am Sonntag, 3. Februar 2019, 17:03:06 CET schrieb John Ralls:
> > On Feb 2, 2019, at 8:10 PM, David Carlson 
> > wrote:
> > 
> > OK, I want to try https://wiki.gnucash.org/wiki/ObfuscateScript but I am
> > not a computer programmer.  I have no clue how to use it.  Can someone
> > help me?

Thanks for the pointer. I've copied this script into our git at 
  ./util/obfuscate.pl
The script manages to process my 50MB file in approx. 30 seconds. The account 
and txn texts are all nicely obfuscated. 

The script also contains a random obfuscation for the amounts, which will 
simply cause lots of transactions to the equity account upon loading, but this 
could be enabled as well.

In a real data file there are still more places with text that need to be 
modified, e.g. the scheduled transaction templates, bayes import matching, and 
such. Also, the dates are left unmodified which may or may not be a problem. 

Anyone please feel free to check with this script and add more obfuscation 
steps. I would like to achieve a state where the script will obfuscate my 
personal data file enough so that I feel I can make it available as a test 
file.

Usage of the script: Save your normal file in uncompressed form to XML file. 
Then,

   ./obfuscate.pl  inputfile.gnucash > outputfile.gnucash

(Contrary to the comments, the output is just written to stdout, not in-place 
into the file.)

Thanks for the idea here!

Regards,
Christian

 
> Run it from a command line using perl, assuming here that you have
> Strawberry installed on C:
> 
>   c:\strawberry\perl\bin\perl.exe ObfuscateScript path/to/myfile.gnucash
> 
> Note that it rewrites the file in place, so make a copy and run it on that.
> The file needs to be uncompressed.
> 
> Regards,
> John Ralls
> 
> ___
> gnucash-devel mailing list
> gnucash-devel@gnucash.org
> https://lists.gnucash.org/mailman/listinfo/gnucash-devel




___
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel