Re: [GNC-dev] avoid the brain dead import
Dear Wm, On Thu, August 30, 2018 10:10 am, Wm via gnucash-devel wrote: > On 29/08/2018 23:52, David Cousens wrote: > >> I think the decision about whether to import a small number of >> transactions by hand is really one for the user and not the importer to >> make. I would import small batches, maybe 20-30 to test the importer >> function and ensure it was working as expected before attempting to >> import 10k. > > You are missing the point entirely. > > The importer compares the tx being imported against *every* extant tx. The importer compares against existing transactions to detect duplicates. This is done because there is absolutely no guarantee that the user wont import the same transaction multiple times. This can happen by accident (importing the same file multiple times), or it could happen because the data source provides the same data multiple times (e.g., some banks will provide overlapping downloads). It has to search every existing transaction because there is no way in the underlying code not to do that. Theoretically you should only need to search through transactions within a relatively short time frame (say, +/- 2-3 weeks). However there is no way to do this. Even when you create a QofSearch with a limited data range, it will *still* iterate through every existing transaction. Of course, if the date is not in range it will get thrown out. However by that point the damage has been done. This issue will only get fixed when we can move GnuCash to be a true DB app. Then the SQL code can truly limit the search space properly. Hopefully this explains what's going on. -derek -- Derek Atkins 617-623-3745 de...@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] avoid the brain dead import
On 29/08/2018 23:52, David Cousens wrote: I think the decision about whether to import a small number of transactions by hand is really one for the user and not the importer to make. I would import small batches, maybe 20-30 to test the importer function and ensure it was working as expected before attempting to import 10k. You are missing the point entirely. The importer compares the tx being imported against *every* extant tx. Read that twice, please. ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] avoid the brain dead import
William, I have experienced the importer trying to match data out of the range of dates in the current import. It only occurred from memory when I first changed over to version 3.0. The matcher appeared to have lost all memory of what accounts to assign in the changeover form 2.6. However I found after importing 1-2 months data it was functioning normally again. I have been using the OFX importer for 3-4 years with OFX without any significant problems. Your point about large data files sounds valid. I havent looked at the code for the match picker so I don't know how it works or whether it works on the historical data to extract the information it needs to make a choice of an accounts to assign or data to match. As it is a Bayesian mechanism at some point it has to examine the existing data and construct some sort of probability table, so my guess would be that this could be a step which is taking so long. Being able to set a preference for a date range or period to use in constructing the initial probability tables is probably a good idea if this is the case. My experience on the changeover from 2.6 to 3.0 when it appeared to have lost any memory of previous import assignments indicated that the importer was constructing those tables from the data it imports and not from the historical data, but I could be wrong. I would expect it to be using a Kalman filtering approach on the input data but can't be sure until I get a good look at the code. It did attempt to match transactions that were otherwise similar to transactions in the previous month or two initially. I only have data going back~8 years and have been retired for a large percentage of that so my files aren't huge so I may not be hitting your problem if it is the case that it does look further back. I think the decision about whether to import a small number of transactions by hand is really one for the user and not the importer to make. I would import small batches, maybe 20-30 to test the importer function and ensure it was working as expected before attempting to import 10k. On Wed, 2018-08-29 at 22:00 +0100, Wm via gnucash-devel wrote: > On 25/08/2018 07:22, David Cousens wrote: > > i thank David for his posting which i have read, I don't address all > he said > > > Keep trying. Tthe brain dead importer does get less brain dead with > > repeated > > use. > > i'm not sure it does get better as implemented because 2 of the bits > of > brain dead-ity are > > 1. the universe against which the importer is comparing imported tx > is > going to be growing so as a strategy it is doO0MED to sluggishness > and > eventually not being used unless there is some limit to the universe > (week / month / quarter / year / decade) > > 2. unless there is something better users are going to try and use > it > and become more frustrated and stop using it. > > > > fairly easy to think about ways of fixing 1. like "do you want the > importer to really, really, really compare the imported tx against > your > stuff from the 1980's ? y/N" at the moment this is defaulting to Y > without asking and I don't think that makes sense. > > I mean, think of inflation? Why would one of anything in 2018 be > sensibly matched against the same thing 30 years ago? > > There isn't even the opportunity to time limit the universe and some > folk have stuff going back much longer than me and have many more tx > than me. > > fixing 2. just involves some thought about the user, almost no > programming. Redundant questions for the user would be, "you are > importing 3 tx, you have 10K tx in your file, this could take > fucking > hours, do you want to continue or just type them in by hand? if you > want my advice by hand is quicker" > > See? the importer has no idea of scale, 3 tx incoming ? I'll do it > by > hand. > > > > > > ___ > gnucash-devel mailing list > gnucash-devel@gnucash.org > https://lists.gnucash.org/mailman/listinfo/gnucash-devel ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] avoid the brain dead import
On 25/08/2018 07:22, David Cousens wrote: i thank David for his posting which i have read, I don't address all he said Keep trying. Tthe brain dead importer does get less brain dead with repeated use. i'm not sure it does get better as implemented because 2 of the bits of brain dead-ity are 1. the universe against which the importer is comparing imported tx is going to be growing so as a strategy it is doO0MED to sluggishness and eventually not being used unless there is some limit to the universe (week / month / quarter / year / decade) 2. unless there is something better users are going to try and use it and become more frustrated and stop using it. fairly easy to think about ways of fixing 1. like "do you want the importer to really, really, really compare the imported tx against your stuff from the 1980's ? y/N" at the moment this is defaulting to Y without asking and I don't think that makes sense. I mean, think of inflation? Why would one of anything in 2018 be sensibly matched against the same thing 30 years ago? There isn't even the opportunity to time limit the universe and some folk have stuff going back much longer than me and have many more tx than me. fixing 2. just involves some thought about the user, almost no programming. Redundant questions for the user would be, "you are importing 3 tx, you have 10K tx in your file, this could take fucking hours, do you want to continue or just type them in by hand? if you want my advice by hand is quicker" See? the importer has no idea of scale, 3 tx incoming ? I'll do it by hand. ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel
Re: [GNC-dev] avoid the brain dead import
William I think the answer to your question lies in the fact that files users wish to import don't come from a single source and don't always conform to any well defined standard with regard to both the data format and the information supplied. Importing OFX data is considerably more straightforward than importing CSV data for this reason, as it does conform to a reasonably well defined standard, but even then some institutions do manage to stuff it up. In most cases users don't necessarily have any control over what another institution includes in the files they supply. Most include transactions between the To and From dates inclusively that you might enter when requesting a data download but this is not guaranteed. Stupid ? Yes, but the importer has to cope with stupid, as well as nicely well formatted and thought out files. Not all data for a bank account includes the detail of which account you may want the second split of a transaction to go to and even if they do it may not match your choices in setting up your chart of accounts. If it does then GnuCash deals with that I am an accountant (retired) and I have imported the same file (or at least overlapping data in different files) on more than one occasion since I have been using GnuCash. The point of the matcher is to pick this up before you have enterd the data into your accounts and not have to deal with the far more laborious task of working out which transactions were duplicated in an import and deleting them from your records one by one once they have been imported. If you get the date format wrong relative to your locale format on an import, it can be particularly difficult. Swapping days and years produces some interesting results. The matcher also has a Bayesian learning system which can allocate the transfer account for the second split on the basis of matching information in the description and other fields. My experience has been after I have imported one or two month's data, it will generally assign the transfer account for about 60% of data in the succeeding months and handles regular payments and deposits pretty well and it gets better still after a few months. I import a few hundred transactions a month, generally in 5-10 minutes from OFX files with no problem. CSV importing (e.g. Paypal can be far more problematical but the ability of the importer in v3.2 to save import settings is a great help. There is a recent patch (Bug 796778) which might help you shorten the initial input before the matcher works efficiently but it is not yet incorporated in the master branch. It implements multiple selection of rows in the matcher e.g.. from the same vendor using Ctrl-click and Shift Click and the rubberbanding techniques implemented in GTK and the assignment of those rows to a single transfer account. It speeds up the initial import of data quite a bit but is less effective once the Bayesian matching is trained (which is possibly why it has not been implemented before now) as that tends to pick up repeated transactions fairly well. The downside is of course there is always a transaction or two from the same vendor or customer which may have to go to a different transfer account, i.e. you still have to check that it has been correctly assigned by the matcher. Keep trying. Tthe brain dead importer does get less brain dead with repeated use. David Cousens - David Cousens -- Sent from: http://gnucash.1415818.n4.nabble.com/GnuCash-Dev-f1435356.html ___ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel