Re: [GNC-dev] Normalizing live data, a suggestion for discussion
On 02/02/2019 23:05, David Cousens wrote: I don't since I retired a few years ago, but I did for 8 years prior to retiring (and I used MYOB for the 10 years prior to that before escaping). I am certainly not alone. You could have a proviso that the script won't work for files using the business functions but that then detracts considerably from its usefulness as a general diagnostic tool. I'm respecting you more as we progress, DavidC. The broad point is that a normalization is without opinion or value. No person would know if you had run your business successfully or not. The fear is "the government will know I earned 20AUD on a contract and I didn't report". struth is your government has much larger issues to deal with, ask them to to pay attention to that. That is, if you can manage one government for more than 3 fucking months at a time! --- Point: MYOB is respected in Oz, Liz says so, it must be true. Rest of the world doesn't give a flying fuck about whether it is a good double accounting prog or not. --- Sqlite itself and its availability on Linux is not really an issue. Most distros have it in their software repositories. What may be more of an issue is that a lot of people who don't use the database backends because they don't want the additional hassles of learning to use and maintain databases may be reluctant to install it. True, I think this is also a red herring, most people are using Windows and SQLite comes with gnc for free. Shouldnt you be asking why more people aren't using what they already have? I'm retired. Disagree, your mind is still active :) Taking an extra half day to learn something new doesn't worry me as long as it happens before my time is up. But if I am running a busy lfe and/or a business as I used to, I would be more reluctant. Again not a show stopper, only a limitation on general applicability. David Cousens Have a hug. -- Wm ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
On 03/02/2019 04:10, David Carlson wrote: OK, I want to try https://wiki.gnucash.org/wiki/ObfuscateScript but I am not a computer programmer. I have no clue how to use it. Can someone help me? it is perl, if you have F::Q working you probably have enough kit to run it. -- Wm ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
On 03/02/2019 16:03, John Ralls wrote: On Feb 2, 2019, at 8:10 PM, David Carlson wrote: OK, I want to try https://wiki.gnucash.org/wiki/ObfuscateScript but I am not a computer programmer. I have no clue how to use it. Can someone help me? Run it from a command line using perl, assuming here that you have Strawberry installed on C: c:\strawberry\perl\bin\perl.exe ObfuscateScript path/to/myfile.gnucash Note that it rewrites the file in place, so make a copy and run it on that. The file needs to be uncompressed. Apart from the write in place I quite like it as an idea to progress thought. Positive: it is in perl which (many|most) people may have a working version of if they are using F::Q Negative: it doesn't reconcile well, but this may actually be a positive because ... Positive: if the script breaks some splits this should be seen as a good thing by some, it makes the work of the super secret agents running gnc harder. Thinking aloud: another way of normalizing would be to split to some point beyond usefulness and let gnc put it back together again using Actions / Check & Repair === Remember flox, the idea is a file that someone else (who probably didn't vote for the idiot Trump) could look at to see *your* problem. Does the remote person want to see you paid USD10 for a burger meal and some beer then vomited on the pavement and had to pay a fine for that? Nope. The remote person wants to see what the fuck you have put in your file that is screwing up the transaction stream. -- Wm ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
On 03/02/2019 02:01, David Cousens wrote: As Geert pointed out whole of program testing is very difficult and rapidly reaches a situation where complexity is equal to or greater than the program complexity and this is really what gave rise to unit testing where you test individual components which do a specific function. That can't fix a problem where an incorrect presumption was made in the first place. One area in which an example file rather than a test file might be useful is in developing the documentation. The guide section on Accounts Transaction following through to Personal Finances in escence constructs a simple file while doing the tutorial. Here though it is the process of constructing the data in the file that is useful. A completed example file is not of great use. I'd advise against using any file as the right file for documentation purposes. There are just too many edge cases. Something I think would be amusing rather than instructive would be to put all of the example tx in the docs into one file. I doubt it would be useful to anyone other than an historian of finance programs but it would be fun to see what we ended up with. If someone is thinking of presenting a paper at a conference try it, mention me if you are feeling generous :) It is also likely that most problems which are likely to require this depth of investigation are unlikely to show up in a test file unless you can execute a series of entries in a scripted manner i.e. interact with the gui from a script and this is not possible with GnuCash at the moment AFAIK. The problem is usually somewhere in the process of getting to the results in the file and what is in the file is merely a symptom of the problem. gnc is a transaction stream application. each time you open a file it starts from 0 and does addition and subtraction. no more no less. on top of that we have pretty stuff, convenient ways of adding new transactions to the stream, convenient ways of reporting the results of the stream. nevertheless, it is still just a program interpreting a stream of transactions. gnc is a convenience. I don't see why I should have to give live data to people I don't know in person ... and I don't even have super secret stuff like tax havens or a Donald Trump blow job account or a religious belief. I just feel uncomfortable showing ordinary tx to people I don't know, it is that simple to me. Q: Why does someone need to see *my* (or your) tx to fix a problem? A: they don't So, we are stuck. -- Wm ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
Op zaterdag 2 februari 2019 22:36:18 CET schreef Wm via gnucash-devel: > On 02/02/2019 15:24, Geert Janssens wrote: > > As for Colin's question: on Windows and MacOS sqlite is supported out of > > the box. On linux it may require the additional installation of a libdbi > > driver. Most distros I know have packages for this driver but they may > > not be installed by default. > > It would be an odd distro that excluded SQLite, it is a requisite for a > lot of other stuff like browsers. Thinking aloud: maybe a server only > install might not have it or someone stupid enough to put their data on > Amazon might not have it available. The question then becomes, why was > the person so stupid? Well I do understand sqlite is available by default, but gnucash requires libdbi with the sqlite backend (which in turn indeed uses sqlite). I haven't checked whether all supported distros also have that combination installed by default. I don't know if webbrowsers also use libdbi. I know firefox does not. And I haven't and won't spend time to check this for all those distros. However I do agree this should only be a small hurdle. And I understand your script is an optional aid for those people that would want a better privacy guarantee before sending their data in for analysis. Geert ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
Op zaterdag 2 februari 2019 22:36:18 CET schreef Wm via gnucash-devel: > On 02/02/2019 15:24, Geert Janssens wrote: > > Yes, if you use business features, you may have entered business > > identifying data in File->Properties. It think that's what David is > > referring to. > I agree, the third party should not be identified. > > > Similarly there may be customer and vendor data (names addresses) in the > > book that should equally be obfuscated. Just random data is fine. > > Yes. > > Geert, at the moment I am putting guid in place of random, do you think > that is a wrong way to approach this? > I think GUIDs are probably fine as well. Note I'm going by the theoretical goal of not being able to reconstruct the user's real financial data from the obfuscated file. Personally I'm not interested in doing that at all, but people's paranoia levels may vary. So talking of guids. If I remember correctly the default guids for accounts coming from gnucash account templates are hard-coded (or at least they used to be until somewhere in the 2.6 series. So if that is still true then guid for account names is only fake obfuscation. And perhaps these guids should be replaced throughout the book during the obfuscation before replacing account names with guids > Actually, the nearer we get to complete random the less useful the file > becomes. Actual random data is harder than most people think and pretty > much defeats the purpose if you think about it. > >From a human's point of view a guid is just random numbers. So I don't see how that makes a difference. If the same random value is used where the data was the same in the original book, it's just like using a guid. And I'm no talking of numbers for this part, I'm talking about customer names, vendor addresses, that kind of stuff. > > Continuing on that vein, if you have bills and invoices, aside from > > randomizing the transaction's split amounts and values you'll also have to > > do the same for invoice entries. > > I don't think that is true in most situations and even if what you say > is true, I don't see it as a good argument against *attempting* a > normalized book for most people. > It's true if the bug to investigate is somewhere in the business code. In that case what your invoice data says should match what the resulting transactions say. Those are stored in different parts in the book, but are interrelated. But even if the bug is not in business data, the business data should be properly anonymized or removed anyway such that the user can confidently share it without risking real financial or private info can be extracted from it. Of course in that context the business data no longer has to be consistent though I still believe it makes debugging harder if it isn't. > > And to make the book useful for detecting > > business data bugs this should happen in such a way that invoice tax and > > discount amounts remain consistent after multiplying with random numbers > > *and* that the invoice totals continue to match the business transactions > > amounts in AR/AP accounts. > > There will be situations that involve the person doing the triage > needing to see actual transactions, I have already commented on that. > Sure. However that's not what I'm implying here. The extra business requirements are an extension of your initial concept that transactions should continue to balance. From a business data point of view invoices with their entries should continue to balance with their invoice transactions or the data quickly becomes meaningless. > > And to make that one level more complicated, after that the payment > > transactions *also* have to continue to match the new randomized invoice > > amount (if the invoice was paid in full). > > U, I don't think that is true. If the munged numbers match (and > they will, that is what the script will do) the transaction stream will > be OK. > > It is possible I have missed your point, Geert, but I think it is > looking like I understand the contents of the gnc files better than you :( > You did miss the point. You only think of balancing transactions. I'm also thinking of balancing lots, a more hidden aspect of the business data that's crucial to debug payment issues. My next reservation was also about consistent lots. > > It doesn't end there, payments can be split over multiple invoices, so > > again when one randomizes invoice amounts care must be taken to adjust > > the payments in proportion to the invoice amount change or fully paid > > invoices suddenly can become partially paid or overpaid. > > Not true. > > Geert, I don't want to say this but I believe you are actually wrong, > for once. It would be more useful to explain why you think that. > > > While this is probably all possible I believe the resulting script will be > > so complex that it will become a source of bugs in itself which would > > divert developer time to debugging and maintaining this script
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
> On Feb 2, 2019, at 8:10 PM, David Carlson wrote: > > OK, I want to try https://wiki.gnucash.org/wiki/ObfuscateScript but I am > not a computer programmer. I have no clue how to use it. Can someone help > me? Run it from a command line using perl, assuming here that you have Strawberry installed on C: c:\strawberry\perl\bin\perl.exe ObfuscateScript path/to/myfile.gnucash Note that it rewrites the file in place, so make a copy and run it on that. The file needs to be uncompressed. Regards, John Ralls ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
OK, I want to try https://wiki.gnucash.org/wiki/ObfuscateScript but I am not a computer programmer. I have no clue how to use it. Can someone help me? David C > > ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
Steve, As Geert pointed out whole of program testing is very difficult and rapidly reaches a situation where complexity is equal to or greater than the program complexity and this is really what gave rise to unit testing where you test individual components which do a specific function. One area in which an example file rather than a test file might be useful is in developing the documentation. The guide section on Accounts Transaction following through to Personal Finances in escence constructs a simple file while doing the tutorial. Here though it is the process of constructing the data in the file that is useful. A completed example file is not of great use. It is also likely that most problems which are likely to require this depth of investigation are unlikely to show up in a test file unless you can execute a series of entries in a scripted manner i.e. interact with the gui from a script and this is not possible with GnuCash at the moment AFAIK. The problem is usually somewhere in the process of getting to the results in the file and what is in the file is merely a symptom of the problem. David - David Cousens -- Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
Hello Wm Am 01.02.19 um 14:36 schrieb Wm via gnucash-devel: > > My suggestion is we ask people to save a *copy* of their data in SQLite > and they then run a script across that copy that munges and obfuscates > Did you see https://wiki.gnucash.org/wiki/ObfuscateScript ? It is targeting xml files and was uploaded in 2010. So it might be slightly bit rotten. Regards Frank ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
Wm, >> It doesn't end there, payments can be split over multiple invoices, so >> again >> when one randomizes invoice amounts care must be taken to adjust the >> payments >> in proportion to the invoice amount change or fully paid invoices >> suddenly can >> become partially paid or overpaid. > >Not true. > >Geert, I don't want to say this but I believe you are actually wrong, >for once. >On 02/02/2019 15:24, Geert Janssens wrote: In what way is what Geert says here not true? Payments can be split over multiple invoices. A single invoice could also have several payments associated with it. These sort of situations arise frequently in small businesses where you may need to micro manage your cash flow. If, in the randomisation process, you do not apply the same random factor to all the invoices covered by that payment, then what he says is exactly what will happen. This means your script will have to detect all of the invoices related to a payment. OK it can be dealt with, but again the script complexity is increased considerably to do so. >Most people don't use the business functions I don't since I retired a few years ago, but I did for 8 years prior to retiring (and I used MYOB for the 10 years prior to that before escaping). I am certainly not alone. You could have a proviso that the script won't work for files using the business functions but that then detracts considerably from its usefulness as a general diagnostic tool. Sqlite itself and its availability on Linux is not really an issue. Most distros have it in their software repositories. What may be more of an issue is that a lot of people who don't use the database backends because they don't want the additional hassles of learning to use and maintain databases may be reluctant to install it. It's not that it is all that difficult if you're familiar with it, but if you are not, it is an an additional hurdle and learning curve. I'm retired. Taking an extra half day to learn something new doesn't worry me as long as it happens before my time is up. But if I am running a busy lfe and/or a business as I used to, I would be more reluctant. Again not a show stopper, only a limitation on general applicability. David Cousens - David Cousens -- Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
On 02/02/2019 15:24, Geert Janssens wrote: Yes, if you use business features, you may have entered business identifying data in File->Properties. It think that's what David is referring to. I agree, the third party should not be identified. Similarly there may be customer and vendor data (names addresses) in the book that should equally be obfuscated. Just random data is fine. Yes. Geert, at the moment I am putting guid in place of random, do you think that is a wrong way to approach this? Actually, the nearer we get to complete random the less useful the file becomes. Actual random data is harder than most people think and pretty much defeats the purpose if you think about it. Continuing on that vein, if you have bills and invoices, aside from randomizing the transaction's split amounts and values you'll also have to do the same for invoice entries. I don't think that is true in most situations and even if what you say is true, I don't see it as a good argument against *attempting* a normalized book for most people. And to make the book useful for detecting business data bugs this should happen in such a way that invoice tax and discount amounts remain consistent after multiplying with random numbers *and* that the invoice totals continue to match the business transactions amounts in AR/AP accounts. There will be situations that involve the person doing the triage needing to see actual transactions, I have already commented on that. And to make that one level more complicated, after that the payment transactions *also* have to continue to match the new randomized invoice amount (if the invoice was paid in full). U, I don't think that is true. If the munged numbers match (and they will, that is what the script will do) the transaction stream will be OK. It is possible I have missed your point, Geert, but I think it is looking like I understand the contents of the gnc files better than you :( It doesn't end there, payments can be split over multiple invoices, so again when one randomizes invoice amounts care must be taken to adjust the payments in proportion to the invoice amount change or fully paid invoices suddenly can become partially paid or overpaid. Not true. Geert, I don't want to say this but I believe you are actually wrong, for once. While this is probably all possible I believe the resulting script will be so complex that it will become a source of bugs in itself which would divert developer time to debugging and maintaining this script rather than working on the effectively reported bug for which a sample data file was asked in the first place... H, I accept your point and disagree. Up until a book with only transactions, no business data at all it sounded like a useful tool. Be a brave man, Geert, most people don't use the business functions :) Oh and we haven't mentioned SXs and budgets yet... Unless they are material to the file being investigated I suggest we just delete all SXs and budget stuff. As for Colin's question: on Windows and MacOS sqlite is supported out of the box. On linux it may require the additional installation of a libdbi driver. Most distros I know have packages for this driver but they may not be installed by default. It would be an odd distro that excluded SQLite, it is a requisite for a lot of other stuff like browsers. Thinking aloud: maybe a server only install might not have it or someone stupid enough to put their data on Amazon might not have it available. The question then becomes, why was the person so stupid? As far as I am concerned this conversation is ongoing, if only because Geert says he still needs a file from me to replicate a basic problem that I don't think needs any data from me at all. -- Wm ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
On 02/02/2019 16:11, Geert Janssens wrote: But I don't know how feasible it is to effectively obfuscate that data withoug resorting to a complex script The script will be seen by others that do understand sql before anyone innocent gets to use it, promise. If the script is well documented (I don't see the point of obfuscated sql when we are doing something like this as time is not the major issue, getting the problem fixed is) then people that can read will use it. Further, most of the actual gnc code is so fucking obfuscated it is acknowledged only a handful of people can read it, so do you really want to raise the issue of obfuscation, Geert? Seriously, people that don't know how code works are already trusting their financial data to code they have no clue about. Why is my suggestion going to increase or decrease trust or increase or decrease complexity? Gr. >> that may introduce its own set of bugs My script cannot introduce a bug, we are normalizing data <-- read that again, please. or inadvertently also obfuscate the actual issue. That is a possibility. I consider this a positive not a negative from a triage POV. the user says: "oops, my problem doesn't exist after I ran the normalizing script" <-- is this good or bad? if the script is well documented the user can edit it and run it again, possibly solving the problem themselves. > > The latter is quickly tested, the former is a time waster. This is a very good point and I repeat, this is not suggested as compulsory, this is intended to make things easier not harder for people that do want to report things that may be specific to them without exposing irrelevant details they may consider private or personal. -- Wm ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
On 02/02/2019 15:40, David Carlson wrote: Wouldn't it be simpler to create a library of template files designed to exercise various features that a user could find one to illustrate his concern? To some extent this is already done in the build process. Life always throws up something unexpected. Further, users are by definition lazy and want the devs to look at *their* data rather than being expected to trawl through a set of files containing data not relevant to their real life situation in the hope that one of them shows the fault that, by definition, shouldn't have existed in the first place. See the circular bit? Thiswould bypass the need to figure out how to sanitize every possible user file. Sanitizing isn't that hard and we don't actually need perfection, just sufficient so that people are confident that the devs aren't snooping on them. If the user wants, he could still build his own example file as some users do now. The problem is that some people build files that don't work for everyone; it does say "normalizing" in the Subject line, none of this is ever going to be compulsory. -- Wm ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
On 02/02/2019 09:59, Colin Law wrote: Can all users save files as sqlite? Does that need anything extra installed on the OS side that may not be there? Also what about different builds of GC, do they all have sqlite? I'm fairly sure all of the official builds can save SQLite. If someone is rolling their own on a platform without the sqlite libraries then I think it would be unusual for them not to also have access to gnc on one of the production platforms, the whole idea being that the data should be easily transferable. Even if someone didn't have SQLite my suggestion isn't taking something away from from them. If someone can't save an SQLite file and run a script, the existing options are still there. -- Wm ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
Op zaterdag 2 februari 2019 16:40:34 CET schreef David Carlson: > Wouldn't it be simpler to create a library of template files designed to > exercise various features that a user could find one to illustrate his > concern? > > Thiswould bypass the need to figure out how to sanitize every possible user > file. > > If the user wants, he could still build his own example file as some users > do now. Both approaches have benefits and drawbacks. The number of possible ways something can go wrong in gnucash is near infinite. Sometimes the problems only appear purely due to the amount of data, sometimes it comes from migration issues (migration from older gnucash versions,...). It would be equally hard to come with a set of template files that would cover all of those. >From that point of view the idea to be able to look at the user's own data file is attractive as that is known to illustrate the problem. But I don't know how feasible it is to effectively obfuscate that data withoug resorting to a complex script that may introduce its own set of bugs or inadvertently also obfuscate the actual issue. The latter is quickly tested, the former is a time waster. Geert ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
On Sat, Feb 2, 2019, 9:25 AM Geert Janssens Op zaterdag 2 februari 2019 10:19:02 CET schreef Wm via gnucash-devel: > > On 02/02/2019 00:16, David Cousens wrote: > > > As well as the account names you might also want to munge data in the > > > description/memo fields. This can contain identifying information for > > > customers/vendors. > > > > How about we just zap the stuff in description/memo fields by default? > > They're not mathematically significant and rarely cause double entry > > problems unless someone introduces unusual UI stuff in which case they > > should be able to provide an example. > > > > > Also possible any data relating to the owner of the file > > > which is stored in the file/database. > > > > Does your file/database have an obvious owner? Mine doesn't apart from > > the name of the file which is the first and obvious thing to change > > before you send it off for someone else to look at. > > > > If you mean bits of text in reports they wouldn't be included in an > > SQLite file. > > > > If you mean bits of text in outbound documents I think we've already > > zapped them. > > > > Have I missed your point? > > > > Yes, if you use business features, you may have entered business > identifying > data in File->Properties. It think that's what David is referring to. > Similarly there may be customer and vendor data (names addresses) in the > book > that should equally be obfuscated. Just random data is fine. > > Continuing on that vein, if you have bills and invoices, aside from > randomizing the transaction's split amounts and values you'll also have to > do > the same for invoice entries. And to make the book useful for detecting > business data bugs this should happen in such a way that invoice tax and > discount amounts remain consistent after multiplying with random numbers > *and* > that the invoice totals continue to match the business transactions > amounts in > AR/AP accounts. > > And to make that one level more complicated, after that the payment > transactions *also* have to continue to match the new randomized invoice > amount (if the invoice was paid in full). > > It doesn't end there, payments can be split over multiple invoices, so > again > when one randomizes invoice amounts care must be taken to adjust the > payments > in proportion to the invoice amount change or fully paid invoices suddenly > can > become partially paid or overpaid. > > While this is probably all possible I believe the resulting script will be > so > complex that it will become a source of bugs in itself which would divert > developer time to debugging and maintaining this script rather than > working on > the effectively reported bug for which a sample data file was asked in the > first place... > > Up until a book with only transactions, no business data at all it sounded > like a useful tool. > > Oh and we haven't mentioned SXs and budgets yet... > > As for Colin's question: on Windows and MacOS sqlite is supported out of > the > box. On linux it may require the additional installation of a libdbi > driver. > Most distros I know have packages for this driver but they may not be > installed by default. > > Geert > > > ___ > gnucash-devel mailing list > gnucash-devel@gnucash.org > https://lists.gnucash.org/mailman/listinfo/gnucash-devel Wouldn't it be simpler to create a library of template files designed to exercise various features that a user could find one to illustrate his concern? Thiswould bypass the need to figure out how to sanitize every possible user file. If the user wants, he could still build his own example file as some users do now. David Carlson > > ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
Op zaterdag 2 februari 2019 10:19:02 CET schreef Wm via gnucash-devel: > On 02/02/2019 00:16, David Cousens wrote: > > As well as the account names you might also want to munge data in the > > description/memo fields. This can contain identifying information for > > customers/vendors. > > How about we just zap the stuff in description/memo fields by default? > They're not mathematically significant and rarely cause double entry > problems unless someone introduces unusual UI stuff in which case they > should be able to provide an example. > > > Also possible any data relating to the owner of the file > > which is stored in the file/database. > > Does your file/database have an obvious owner? Mine doesn't apart from > the name of the file which is the first and obvious thing to change > before you send it off for someone else to look at. > > If you mean bits of text in reports they wouldn't be included in an > SQLite file. > > If you mean bits of text in outbound documents I think we've already > zapped them. > > Have I missed your point? > Yes, if you use business features, you may have entered business identifying data in File->Properties. It think that's what David is referring to. Similarly there may be customer and vendor data (names addresses) in the book that should equally be obfuscated. Just random data is fine. Continuing on that vein, if you have bills and invoices, aside from randomizing the transaction's split amounts and values you'll also have to do the same for invoice entries. And to make the book useful for detecting business data bugs this should happen in such a way that invoice tax and discount amounts remain consistent after multiplying with random numbers *and* that the invoice totals continue to match the business transactions amounts in AR/AP accounts. And to make that one level more complicated, after that the payment transactions *also* have to continue to match the new randomized invoice amount (if the invoice was paid in full). It doesn't end there, payments can be split over multiple invoices, so again when one randomizes invoice amounts care must be taken to adjust the payments in proportion to the invoice amount change or fully paid invoices suddenly can become partially paid or overpaid. While this is probably all possible I believe the resulting script will be so complex that it will become a source of bugs in itself which would divert developer time to debugging and maintaining this script rather than working on the effectively reported bug for which a sample data file was asked in the first place... Up until a book with only transactions, no business data at all it sounded like a useful tool. Oh and we haven't mentioned SXs and budgets yet... As for Colin's question: on Windows and MacOS sqlite is supported out of the box. On linux it may require the additional installation of a libdbi driver. Most distros I know have packages for this driver but they may not be installed by default. Geert ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
Can all users save files as sqlite? Does that need anything extra installed on the OS side that may not be there? Also what about different builds of GC, do they all have sqlite?Colin Colin ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
On 02/02/2019 00:16, David Cousens wrote: As well as the account names you might also want to munge data in the description/memo fields. This can contain identifying information for customers/vendors. How about we just zap the stuff in description/memo fields by default? They're not mathematically significant and rarely cause double entry problems unless someone introduces unusual UI stuff in which case they should be able to provide an example. Also possible any data relating to the owner of the file which is stored in the file/database. Does your file/database have an obvious owner? Mine doesn't apart from the name of the file which is the first and obvious thing to change before you send it off for someone else to look at. If you mean bits of text in reports they wouldn't be included in an SQLite file. If you mean bits of text in outbound documents I think we've already zapped them. Have I missed your point? Always possible, don't be put off by my rough and tumble impression of the idiot Trump, I do actually care. The combination of the above would probably be considered commercially sensitive information and at a personal level what banks/service companies etc you deal with might be a possible problem if it is in the public domain. Ummm, that isn't really our problem, David. If you subscribe to the "I'm an American and the government supports me" foolishness I'm wondering why the fuck any of you voted for the imbecile in charge at the moment! Any banking account details have already been removed. Next? -- Wm ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
Wm As well as the account names you might also want to munge data in the description/memo fields. This can contain identifying information for customers/vendors. Also possible any data relating to the owner of the file which is stored in the file/database. The combination of the above would probably be considered commercially sensitive information and at a personal level what banks/service companies etc you deal with might be a possible problem if it is in the public domain. David Cousens - David Cousens -- Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
On 01/02/2019 13:36, Wm via gnucash-devel wrote: would someone other than idiot Stephen M Butler attempt a reply please TIA ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
On 01/02/2019 19:17, Stephen M. Butler wrote: Ummm, Stephen M. Butler I don't think you were my intended audience. Let me put you down gently. It might be better to have a standardized test file that folks could download, and run their scenario against. Nope, we can do that already, I was addressing other realistic situations. However, there are situations that arise where the only solution is to look at the original file. In that case some obfuscation would be helpful. I would think that memos and descriptions would also need to be randomized. My suggestion is they are zapped, no personal stuff at all After a careful read, I realized you did intend to randomize the transaction amoun ts (which would have to be careful to ensure the DR/CR remained balanced. I'm one of the more intelligent people here, the tx will remain balanced. Otherwise, one could at least get the total Assets/Liabilities/Income/Expense values known for the submitter. That may be sensitive information. I know that I've shared some information that later reflection was "did I really give them that!" Um Now, to the XML vs SQLite argument. Whatever script is applied to one could easily have a counterpart that would apply to the other. You wouldn't have to manually (informally) edit the XML. A known script should provide a known outcome. Not true in reverse if someone throws in some numbers no other person knows about. Think about diminishing returns. I can't correct this fucked up quote below, must be a Mexican border issue, sigh. Looks like a Trump voter, fucked quotient in general. >I suspect that many folks are using an XML back-end and would rather not fiddle with a database back-end. We know know that, we ask for a specific db when we need to test stuff. I've given up correcting the quoting, sorry, folks. I'm in that camp even though I'm a trained Oracle DBA and spent a couple decades using that back-end professionally. We are unimpressed unless you contribute. Some of us also think training may have been wasted time if you end up not knowing much about databases. I think the first step is having a standard test file that a use could apply to their favorite back-end, run their scenario, check the results. Wrong, please read what I said before. G. I hate it when someone so obviously doesn't read. If the problem is verified, then we have pretty good evidence the problem is in the application. If the problem doesn't show up, then it indicates the problem may be in the data. That would require a "data forensic expert" (aka developer or some assistant) to look deeper into the user's data file. In that case a good obfuscation tool would come in handy. I'd say something obviously rude around now but Liz would zap me instead of the fool if past rules are anything to go by :( I'd like someone with a clue to attempt an answer. -- Wm ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] Normalizing live data, a suggestion for discussion
On 2/1/19 5:36 AM, Wm via gnucash-devel wrote: > Situation: someone reports a problem with gnc, at triage it becomes > clear some data is going to be required to identify or solve the > problem. Normal question? Can you give us a file. > > Problem: for any number of reasons ranging from plain old personal > privacy through to people that live in supposed liberal societies > avoiding tax and people in supposed conservative societies avoiding > persecution, sending live data isn't always appropriate. The USA has > become very weird about this and most of our development people are in > the USA so hopefully they'll understand the politics of privacy, > eventually. > > Suggestion: we try to make providing a file easier for people. > > My suggestion is we ask people to save a *copy* of their data in > SQLite and they then run a script across that copy that munges and > obfuscates > > 1. account names [1] > > 2. numbers [2] > > [1] people following this will probably be aware that gnc doesn't know > about account names much beyond broad classes in spite of providing > lots of names and not accommodating other accounting concepts such as > the fact there is a level one up [3] My point here is that account > names are important to people but not gnc so why not just randomize > them? Obvious way? copy the actual account name (the guid) to the user > visible one. this is a one way change unless someone has unusual > settings on their SQLite file, if someone has those settings it seems > reasonable to presume they also know how to turn them off and save the > file again. > > [2] as long as the transaction stream balances the actual numbers > don't matter (their will be occasions where the numbers are important > but these tend to be number extremes related to commodities rather > than anyone using gnc to do a Mr Putin vs Mr Trump sports bet). In > most cases multiplying any matching numbers by the same semi-random > should produce a good file for examination so long as it is done > consistently [4] > > [3] that is a long argument I am interested in conceptually rather > than personally, it doesn't affect me as a UK person but makes me > think Internationally. > > [4] I don't think a reductive discussion of true vs near true random > [5] is appropriate, the significant point is the person viewing the > data won't be able to work out the original number without significant > effort and in most cases simply won't be able to work it out at all, > we're talking computing assets I doubt anyone here has access to in > order to get back *and* I believe the gnc people are actually > motivated by solving problems, belief in the project and ordinary > stuff like that so they won't even be looking. > > [5] Random is fun if only because there are so many ways of doing it. > > Questions: why SQLite rather than XML? Because if a person runs an > agreed script across their file we can be sure of an outcome. Editing > an XML file informally is scary, it immediately raises questions about > consistency of data. Other SQL formats are not widely used, my > proposal is we go for LCD where we can achieve normalization. > > Normalization will have to be balanced: privacy vs contribution to the > project. > > I definitely want contribution from other people that work well with > SQL, let's think about this together, people, I have written some > scripts that confuse *my* data and I know that Geert is still waiting > for me to send him a file. > > Geert is a good person, I just don't want to show him very personal > stuff in my file. > > I have a plan for making showing a file easier, is anyone interested? > > This is the *start* of a conversation, I welcome thoughts. It might be better to have a standardized test file that folks could download, and run their scenario against. However, there are situations that arise where the only solution is to look at the original file. In that case some obfuscation would be helpful. I would think that memos and descriptions would also need to be randomized. After a careful read, I realized you did intend to randomize the transaction amoun ts (which would have to be careful to ensure the DR/CR remained balanced. Otherwise, one could at least get the total Assets/Liabilities/Income/Expense values known for the submitter. That may be sensitive information. I know that I've shared some information that later reflection was "did I really give them that!" Now, to the XML vs SQLite argument. Whatever script is applied to one could easily have a counterpart that would apply to the other. You wouldn't have to manually (informally) edit the XML. A known script should provide a known outcome. I suspect that many folks are using an XML back-end and would rather not fiddle with a database back-end. I'm in that camp even though I'm a trained Oracle DBA and spent a couple decades using that back-end professionally. I think the first step is having a standard test file that a use could apply