Situation: someone reports a problem with gnc, at triage it becomes clear some data is going to be required to identify or solve the problem. Normal question? Can you give us a file.

Problem: for any number of reasons ranging from plain old personal privacy through to people that live in supposed liberal societies avoiding tax and people in supposed conservative societies avoiding persecution, sending live data isn't always appropriate. The USA has become very weird about this and most of our development people are in the USA so hopefully they'll understand the politics of privacy, eventually.

Suggestion: we try to make providing a file easier for people.

My suggestion is we ask people to save a *copy* of their data in SQLite and they then run a script across that copy that munges and obfuscates

1. account names [1]

2. numbers [2]

[1] people following this will probably be aware that gnc doesn't know about account names much beyond broad classes in spite of providing lots of names and not accommodating other accounting concepts such as the fact there is a level one up [3] My point here is that account names are important to people but not gnc so why not just randomize them? Obvious way? copy the actual account name (the guid) to the user visible one. this is a one way change unless someone has unusual settings on their SQLite file, if someone has those settings it seems reasonable to presume they also know how to turn them off and save the file again.

[2] as long as the transaction stream balances the actual numbers don't matter (their will be occasions where the numbers are important but these tend to be number extremes related to commodities rather than anyone using gnc to do a Mr Putin vs Mr Trump sports bet). In most cases multiplying any matching numbers by the same semi-random should produce a good file for examination so long as it is done consistently [4]

[3] that is a long argument I am interested in conceptually rather than personally, it doesn't affect me as a UK person but makes me think Internationally.

[4] I don't think a reductive discussion of true vs near true random [5] is appropriate, the significant point is the person viewing the data won't be able to work out the original number without significant effort and in most cases simply won't be able to work it out at all, we're talking computing assets I doubt anyone here has access to in order to get back *and* I believe the gnc people are actually motivated by solving problems, belief in the project and ordinary stuff like that so they won't even be looking.

[5] Random is fun if only because there are so many ways of doing it.

Questions: why SQLite rather than XML? Because if a person runs an agreed script across their file we can be sure of an outcome. Editing an XML file informally is scary, it immediately raises questions about consistency of data. Other SQL formats are not widely used, my proposal is we go for LCD where we can achieve normalization.

Normalization will have to be balanced: privacy vs contribution to the project.

I definitely want contribution from other people that work well with SQL, let's think about this together, people, I have written some scripts that confuse *my* data and I know that Geert is still waiting for me to send him a file.

Geert is a good person, I just don't want to show him very personal stuff in my file.

I have a plan for making showing a file easier, is anyone interested?

This is the *start* of a conversation, I welcome thoughts.































_______________________________________________
gnucash-devel mailing list
gnucash-devel@gnucash.org
https://lists.gnucash.org/mailman/listinfo/gnucash-devel

Reply via email to