Re: [sqlite] Standard deviation last x entries
This is getting far away from SQLite now and moving to medical statistics. The thing is that the authors of this algorithm have taken a large amount of patient data and have calculated what the best predictors are to calculate the chance for the individual patient to have a heart attack or stroke in the next 10 years. For blood pressure they calculated that SD was the best one to get a figure for the variability of systolic blood pressure and they calculated that this SD was relevant. All explained in the article mentioned in the mentioned link: https://qrisk.org/three/ RBS On Sun, Oct 20, 2019 at 2:04 PM Richard Damon wrote: > In my experience max(x) - min(x) isn't a great measure for this sort of > thing, as it is ultimately dependent on outliers. Something like > 75%-tile to 25%-tile (or other similar values) might make more sense. > > Ultimately, taking a Standard Deviation (or Variance) of a set of > reading for a set of persons seems a bit unusual as it is mixing two > different statistics. One is that each person themselves have a mean and > Standard Deviation to their readings, and then you have the statistics > of the whole population, with things like the group mean (the mean of > the individual means), the variation of the means ( the Standard > Deviation of the Individual Means), the mean variation (the Mean of the > individual Standard Deviations) and even the variability of the > varibility (The Standard Deviation of the individual Standard deviations). > > Note that I would definitely treat the readings from an individual as a > sample, unless the patients were being CONTINUALLY monitored, as > periodic readings ARE samples (and presumably the patient is more > important than the actual readings). I might also be tempted to treat > your total group of patients as a sample, unless you think you are (and > have) treating everyone that falls in the category. > > On 10/20/19 8:31 AM, Gabor Grothendieck wrote: > > Another approach is to question whether you really need the > > standard deviation or if any measure of variation would do. > > > > You could use the range, max(x) - min(x) , and just report it as the > range > > rather than the standard deviation. Also range/4 approximates the > > standard deviation (google the range rule) > > although the approximation is not very good. > > > > > > > > On Fri, Oct 18, 2019 at 6:41 PM Gabor Grothendieck > > wrote: > >> There is a stdev function for sqlite here: > >> https://www.sqlite.org/contrib//download/extension-functions.c?get=25 > >> > >> On Wed, Oct 16, 2019 at 7:57 PM Olaf Schmidt > wrote: > >>> Am 12.10.2019 um 16:47 schrieb Bart Smissaert: > Sorry, I forgot to tell that. It is date column with an integer > number. > > ID xValue xDate > > 1 130 40123 > 1 120 41232 > 1 140 40582 > 1 100 40888 > 1 110 42541 > 2 140 41225 > 2 130 41589 > 2 150 40872 > >>> Because SQLite on Android has at least CTEs available, > >>> (since Android 5 or so), I thought I make an attempt using them... > >>> > >>> For your above sample-set I've created a table Test this way > >>> (with slightly changed Column-Names + different IDs with "gaps" > >>> + a few more sample-records, to cover especially n=1 sets): > >>> > >>> Create Table Test(ID, xVal, xDat); > >>> > >>> Insert Into Test Values(3, 130, 40123); > >>> Insert Into Test Values(3, 120, 41232); > >>> Insert Into Test Values(3, 140, 40582); > >>> Insert Into Test Values(3, 100, 40888); > >>> Insert Into Test Values(3, 110, 42541); > >>> > >>> Insert Into Test Values(5, 140, 41225); > >>> Insert Into Test Values(5, 130, 41589); > >>> Insert Into Test Values(5, 150, 40872); > >>> > >>> Insert Into Test Values(6, 110, 41225); > >>> Insert Into Test Values(6, 115, 41227); > >>> > >>> Insert Into Test Values(9, 105, 41225); > >>> > >>> After the above Table-Creation + Inserts, one should > >>> define a "Base-View" first, which is then able to select > >>> appropriate aggregates (respecting your "max-4" condition): > >>> > >>> Create View vw_StdDev_Max4_Aggregates As > >>> Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T > >>> Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc > >>> Limit 4) > >>> Group By ID; > >>> > >>> Only this view above has to be adjusted, according to your > >>> real table- and column-names, but the following two "derived views", > >>> will work after these changes as well, as long as you leave the > >>> above view-name - and the field-names: (ID, a1, a2, n) intact. > >>> > >>> So what remains, is two more view-defs (one for the population-StdDev - > >>> and one with a few slight changes, to cover the sample-StdDev as well. > >>> > >>> Create View vw_StdDev_Samples As > >>> Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", ( > >>>With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All > >>> Select s2, (s
Re: [sqlite] Standard deviation last x entries
In my experience max(x) - min(x) isn't a great measure for this sort of thing, as it is ultimately dependent on outliers. Something like 75%-tile to 25%-tile (or other similar values) might make more sense. Ultimately, taking a Standard Deviation (or Variance) of a set of reading for a set of persons seems a bit unusual as it is mixing two different statistics. One is that each person themselves have a mean and Standard Deviation to their readings, and then you have the statistics of the whole population, with things like the group mean (the mean of the individual means), the variation of the means ( the Standard Deviation of the Individual Means), the mean variation (the Mean of the individual Standard Deviations) and even the variability of the varibility (The Standard Deviation of the individual Standard deviations). Note that I would definitely treat the readings from an individual as a sample, unless the patients were being CONTINUALLY monitored, as periodic readings ARE samples (and presumably the patient is more important than the actual readings). I might also be tempted to treat your total group of patients as a sample, unless you think you are (and have) treating everyone that falls in the category. On 10/20/19 8:31 AM, Gabor Grothendieck wrote: > Another approach is to question whether you really need the > standard deviation or if any measure of variation would do. > > You could use the range, max(x) - min(x) , and just report it as the range > rather than the standard deviation. Also range/4 approximates the > standard deviation (google the range rule) > although the approximation is not very good. > > > > On Fri, Oct 18, 2019 at 6:41 PM Gabor Grothendieck > wrote: >> There is a stdev function for sqlite here: >> https://www.sqlite.org/contrib//download/extension-functions.c?get=25 >> >> On Wed, Oct 16, 2019 at 7:57 PM Olaf Schmidt wrote: >>> Am 12.10.2019 um 16:47 schrieb Bart Smissaert: Sorry, I forgot to tell that. It is date column with an integer number. ID xValue xDate 1 130 40123 1 120 41232 1 140 40582 1 100 40888 1 110 42541 2 140 41225 2 130 41589 2 150 40872 >>> Because SQLite on Android has at least CTEs available, >>> (since Android 5 or so), I thought I make an attempt using them... >>> >>> For your above sample-set I've created a table Test this way >>> (with slightly changed Column-Names + different IDs with "gaps" >>> + a few more sample-records, to cover especially n=1 sets): >>> >>> Create Table Test(ID, xVal, xDat); >>> >>> Insert Into Test Values(3, 130, 40123); >>> Insert Into Test Values(3, 120, 41232); >>> Insert Into Test Values(3, 140, 40582); >>> Insert Into Test Values(3, 100, 40888); >>> Insert Into Test Values(3, 110, 42541); >>> >>> Insert Into Test Values(5, 140, 41225); >>> Insert Into Test Values(5, 130, 41589); >>> Insert Into Test Values(5, 150, 40872); >>> >>> Insert Into Test Values(6, 110, 41225); >>> Insert Into Test Values(6, 115, 41227); >>> >>> Insert Into Test Values(9, 105, 41225); >>> >>> After the above Table-Creation + Inserts, one should >>> define a "Base-View" first, which is then able to select >>> appropriate aggregates (respecting your "max-4" condition): >>> >>> Create View vw_StdDev_Max4_Aggregates As >>> Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T >>> Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc >>> Limit 4) >>> Group By ID; >>> >>> Only this view above has to be adjusted, according to your >>> real table- and column-names, but the following two "derived views", >>> will work after these changes as well, as long as you leave the >>> above view-name - and the field-names: (ID, a1, a2, n) intact. >>> >>> So what remains, is two more view-defs (one for the population-StdDev - >>> and one with a few slight changes, to cover the sample-StdDev as well. >>> >>> Create View vw_StdDev_Samples As >>> Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", ( >>>With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All >>> Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 >>> Limit 32 >>>) Select s From r Order By i Desc Limit 1 >>> ) s From vw_StdDev_Max4_Aggregates Where n>1 >>> Union All >>> Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1; >>> >>> Create View vw_StdDev_Populations As >>> Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", ( >>>With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All >>> Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 >>> Limit 32 >>>) Select s From r Order By i Desc Limit 1 >>> ) sigma From vw_StdDev_Max4_Aggregates Where n>1 >>> Union All >>> Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1; >>> >>> >>> Ok, here the result I get on my test-table, when the 3 views >>> are in place, and the following gets
Re: [sqlite] Standard deviation last x entries
Yes, could use something else than SD, but the mentioned calculator uses SD. Got this all working fine now. RBS On Sun, 20 Oct 2019, 13:31 Gabor Grothendieck, wrote: > Another approach is to question whether you really need the > standard deviation or if any measure of variation would do. > > You could use the range, max(x) - min(x) , and just report it as the range > rather than the standard deviation. Also range/4 approximates the > standard deviation (google the range rule) > although the approximation is not very good. > > > > On Fri, Oct 18, 2019 at 6:41 PM Gabor Grothendieck > wrote: > > > > There is a stdev function for sqlite here: > > https://www.sqlite.org/contrib//download/extension-functions.c?get=25 > > > > On Wed, Oct 16, 2019 at 7:57 PM Olaf Schmidt > wrote: > > > > > > Am 12.10.2019 um 16:47 schrieb Bart Smissaert: > > > > Sorry, I forgot to tell that. It is date column with an integer > number. > > > > > > > > ID xValue xDate > > > > > > > > 1 130 40123 > > > > 1 120 41232 > > > > 1 140 40582 > > > > 1 100 40888 > > > > 1 110 42541 > > > > 2 140 41225 > > > > 2 130 41589 > > > > 2 150 40872 > > > > > > Because SQLite on Android has at least CTEs available, > > > (since Android 5 or so), I thought I make an attempt using them... > > > > > > For your above sample-set I've created a table Test this way > > > (with slightly changed Column-Names + different IDs with "gaps" > > > + a few more sample-records, to cover especially n=1 sets): > > > > > > Create Table Test(ID, xVal, xDat); > > > > > > Insert Into Test Values(3, 130, 40123); > > > Insert Into Test Values(3, 120, 41232); > > > Insert Into Test Values(3, 140, 40582); > > > Insert Into Test Values(3, 100, 40888); > > > Insert Into Test Values(3, 110, 42541); > > > > > > Insert Into Test Values(5, 140, 41225); > > > Insert Into Test Values(5, 130, 41589); > > > Insert Into Test Values(5, 150, 40872); > > > > > > Insert Into Test Values(6, 110, 41225); > > > Insert Into Test Values(6, 115, 41227); > > > > > > Insert Into Test Values(9, 105, 41225); > > > > > > After the above Table-Creation + Inserts, one should > > > define a "Base-View" first, which is then able to select > > > appropriate aggregates (respecting your "max-4" condition): > > > > > > Create View vw_StdDev_Max4_Aggregates As > > > Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T > > > Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc > > > Limit 4) > > > Group By ID; > > > > > > Only this view above has to be adjusted, according to your > > > real table- and column-names, but the following two "derived views", > > > will work after these changes as well, as long as you leave the > > > above view-name - and the field-names: (ID, a1, a2, n) intact. > > > > > > So what remains, is two more view-defs (one for the population-StdDev - > > > and one with a few slight changes, to cover the sample-StdDev as well. > > > > > > Create View vw_StdDev_Samples As > > > Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", ( > > >With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All > > > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > > > Limit 32 > > >) Select s From r Order By i Desc Limit 1 > > > ) s From vw_StdDev_Max4_Aggregates Where n>1 > > > Union All > > > Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1; > > > > > > Create View vw_StdDev_Populations As > > > Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", ( > > >With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All > > > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > > > Limit 32 > > >) Select s From r Order By i Desc Limit 1 > > > ) sigma From vw_StdDev_Max4_Aggregates Where n>1 > > > Union All > > > Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1; > > > > > > > > > Ok, here the result I get on my test-table, when the 3 views > > > are in place, and the following gets executed: > > > > > > Select ID, n_PopulationSize, sigma From vw_StdDev_Populations; > > > ID n sigma > > > > > > 3 4 14.7901994577491 > > > 5 3 8.16496580927733 > > > 6 2 2.5 > > > 9 1 0 > > > > > > > > > Select ID, n_SampleSize, s From vw_StdDev_Samples; > > > ID n s > > > > > > 3 4 17.0782512765993 > > > 5 3 10.1 > > > 6 2 3.53553390593274 > > > 9 1 null > > > > > > Please note the returned null (vs. the real 0) in the last result > > > (done, because "unbiased estimates" are undefined for sample-size == 1) > > > > > > Finally for those interested, the CTE-based "iterated Square-Root", > > > cut out for an isolated calculation of sqrt(2). > > > > > > With r(s2, s, i) As (Select 2, 1, 1 Union All > > >Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > > > Limit 32 > > > ) Select s Fr
Re: [sqlite] Standard deviation last x entries
Another approach is to question whether you really need the standard deviation or if any measure of variation would do. You could use the range, max(x) - min(x) , and just report it as the range rather than the standard deviation. Also range/4 approximates the standard deviation (google the range rule) although the approximation is not very good. On Fri, Oct 18, 2019 at 6:41 PM Gabor Grothendieck wrote: > > There is a stdev function for sqlite here: > https://www.sqlite.org/contrib//download/extension-functions.c?get=25 > > On Wed, Oct 16, 2019 at 7:57 PM Olaf Schmidt wrote: > > > > Am 12.10.2019 um 16:47 schrieb Bart Smissaert: > > > Sorry, I forgot to tell that. It is date column with an integer number. > > > > > > ID xValue xDate > > > > > > 1 130 40123 > > > 1 120 41232 > > > 1 140 40582 > > > 1 100 40888 > > > 1 110 42541 > > > 2 140 41225 > > > 2 130 41589 > > > 2 150 40872 > > > > Because SQLite on Android has at least CTEs available, > > (since Android 5 or so), I thought I make an attempt using them... > > > > For your above sample-set I've created a table Test this way > > (with slightly changed Column-Names + different IDs with "gaps" > > + a few more sample-records, to cover especially n=1 sets): > > > > Create Table Test(ID, xVal, xDat); > > > > Insert Into Test Values(3, 130, 40123); > > Insert Into Test Values(3, 120, 41232); > > Insert Into Test Values(3, 140, 40582); > > Insert Into Test Values(3, 100, 40888); > > Insert Into Test Values(3, 110, 42541); > > > > Insert Into Test Values(5, 140, 41225); > > Insert Into Test Values(5, 130, 41589); > > Insert Into Test Values(5, 150, 40872); > > > > Insert Into Test Values(6, 110, 41225); > > Insert Into Test Values(6, 115, 41227); > > > > Insert Into Test Values(9, 105, 41225); > > > > After the above Table-Creation + Inserts, one should > > define a "Base-View" first, which is then able to select > > appropriate aggregates (respecting your "max-4" condition): > > > > Create View vw_StdDev_Max4_Aggregates As > > Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T > > Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc > > Limit 4) > > Group By ID; > > > > Only this view above has to be adjusted, according to your > > real table- and column-names, but the following two "derived views", > > will work after these changes as well, as long as you leave the > > above view-name - and the field-names: (ID, a1, a2, n) intact. > > > > So what remains, is two more view-defs (one for the population-StdDev - > > and one with a few slight changes, to cover the sample-StdDev as well. > > > > Create View vw_StdDev_Samples As > > Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", ( > >With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All > > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > > Limit 32 > >) Select s From r Order By i Desc Limit 1 > > ) s From vw_StdDev_Max4_Aggregates Where n>1 > > Union All > > Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1; > > > > Create View vw_StdDev_Populations As > > Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", ( > >With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All > > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > > Limit 32 > >) Select s From r Order By i Desc Limit 1 > > ) sigma From vw_StdDev_Max4_Aggregates Where n>1 > > Union All > > Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1; > > > > > > Ok, here the result I get on my test-table, when the 3 views > > are in place, and the following gets executed: > > > > Select ID, n_PopulationSize, sigma From vw_StdDev_Populations; > > ID n sigma > > > > 3 4 14.7901994577491 > > 5 3 8.16496580927733 > > 6 2 2.5 > > 9 1 0 > > > > > > Select ID, n_SampleSize, s From vw_StdDev_Samples; > > ID n s > > > > 3 4 17.0782512765993 > > 5 3 10.1 > > 6 2 3.53553390593274 > > 9 1 null > > > > Please note the returned null (vs. the real 0) in the last result > > (done, because "unbiased estimates" are undefined for sample-size == 1) > > > > Finally for those interested, the CTE-based "iterated Square-Root", > > cut out for an isolated calculation of sqrt(2). > > > > With r(s2, s, i) As (Select 2, 1, 1 Union All > >Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > > Limit 32 > > ) Select s From r Order By i Desc Limit 1 > > > > The max. iterations are limited to 32 - though for input-values > > in a "typical range", the iteration will usually exit earlier... > > In case of the example above -> sqrt(2), the iteration-counter > > i reached only 6 (as can be seen, when you include i alongside s > > in the final Select. > > > > With r(s2, s, i) As (Select 2, 1, 1 Union All > >Select s2, (s2>0)*(s+s2/s)*.
Re: [sqlite] Standard deviation last x entries
Thanks for that and have tried this now (on Android app) and works fine. It is fast as well, although slightly slower than the previous version. I ran this on a column with 8000 values ranging from 0 to 1600 and this took about 140 milli-seconds on a fast Samsung S9 phone. Database is on a SD. I think as square root it such a common thing it will be worth it to be added to the standard SQLite functions. RBS On Sun, Oct 20, 2019 at 3:24 AM Keith Medcalf wrote: > > Here is a recursive CTE that will calculate the square root to the best > precision your processor math implementation is capable of. It limits the > recursion by prohibiting the insertion of duplicate guesses by using UNION > rather than UNION ALL, which will cause pathological cases that oscillate > to terminate. It then scans those guesses looking for the one that is the > "closest" approximation to the actual square root. If you ask for the > square root of a negative number, you get null, and if subnormal division > gives a nan (NULL) that will also terminate the recursion. > > with guesses(findRootOf, guessRoot) > as ( >select ?1, > case when ?1 < 0 then null else ?1 / 2.0 end > union >select findRootOf, > (guessRoot + findRootOf / guessRoot) / 2.0 as newGuessRoot > from guesses > where guessRoot > 0.0 > ) > select guessRoot > from guesses > order by abs(guessRoot*guessRoot - findRootOf) >limit 1; > > Note that the parameter is used twice. Once to put in the table, and > again to compute the first "guess" (which is really to prime the guess with > NULL to prevent attempts to calculate the root of a negative number). > > guessRoot exactly matches the results of the sqrt function 76% of the > time, and is within 1 ULP 100% of the time. > guessRoot*guessRoot == findRootOf 48% of the time, within 1 ULP 94% of the > time, and within 2 ULP 100% of the time. > *based on 5,000,000 randomly generated positive real numbers > > -- > The fact that there's a Highway to Hell but only a Stairway to Heaven says > a lot about anticipated traffic volume. > > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
Here is a recursive CTE that will calculate the square root to the best precision your processor math implementation is capable of. It limits the recursion by prohibiting the insertion of duplicate guesses by using UNION rather than UNION ALL, which will cause pathological cases that oscillate to terminate. It then scans those guesses looking for the one that is the "closest" approximation to the actual square root. If you ask for the square root of a negative number, you get null, and if subnormal division gives a nan (NULL) that will also terminate the recursion. with guesses(findRootOf, guessRoot) as ( select ?1, case when ?1 < 0 then null else ?1 / 2.0 end union select findRootOf, (guessRoot + findRootOf / guessRoot) / 2.0 as newGuessRoot from guesses where guessRoot > 0.0 ) select guessRoot from guesses order by abs(guessRoot*guessRoot - findRootOf) limit 1; Note that the parameter is used twice. Once to put in the table, and again to compute the first "guess" (which is really to prime the guess with NULL to prevent attempts to calculate the root of a negative number). guessRoot exactly matches the results of the sqrt function 76% of the time, and is within 1 ULP 100% of the time. guessRoot*guessRoot == findRootOf 48% of the time, within 1 ULP 94% of the time, and within 2 ULP 100% of the time. *based on 5,000,000 randomly generated positive real numbers -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
Note that you might need to change the 2.22044604925031e-16 constant to 2 ULPs (4.44089209850063e-16) in order to get around some pathological rounding cases. It probably will not be noticeable faster since you are only saving a few machine cycles per iteration. It might be noticeable unless you are running the calculation thousands or millions of times. -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users On >Behalf Of Bart Smissaert >Sent: Friday, 18 October, 2019 16:48 >To: SQLite mailing list >Subject: Re: [sqlite] Standard deviation last x entries > >> However, to optimize the calculation the following is more efficient > >I tested it, but didn't find it any faster. >Naming makes it a lot clearer though. > >RBS > >On Fri, Oct 18, 2019 at 10:56 PM Keith Medcalf >wrote: > >> >> We are calculating the square root using Newtons Method of successive >> approximation >> >> https://pages.mtu.edu/~shene/COURSES/cs201/NOTES/chap04/sqrt.html >> >> (x>0)*(y+x/y)/2 yi >> >> is computing the new guess of the square root of x based on the last >guess >> y and calling that calculation yi (the AS keyword is omitted). The >(x>0) >> part is to make sure that x is always a positive number otherwise the >new >> guess will be 0. This means that the attempt to find the square root >of a >> negative number (or 0) will always return 0 for each of two iterations >thus >> meeting the stop condition because abs(0 - 0) is less than any positive >> number. >> >> To use a CTE with better named variables one would use: >> >> with guesses(findRootOf, guessRoot, iteration) >> as ( >> select ?, 1.0, 1 >> union all >> select findRootOf, (findRootOf > 0) * (guessRoot + findRootOf >/ >> guessRoot) / 2.0 as newGuess, iteration + 1 >>from guesses >> where abs(newGuess - guessRoot) > 1e-12 >> limit 32 >> ) >> select guessRoot >> from guesses >> order by iteration desc >>limit 1; >> >> However, to optimize the calculation the following is more efficient >> (since it only calculates the new guess once each iteration, and >computes >> to the limit of precision). We are substituting calculating the guess >> twice and its difference from a constant (9 operations) with computing >the >> actual precision (6 operations): >> >> with guesses(findRootOf, guessRoot, iteration) >> as ( >> select ?, 1.0, 1 >> union all >> select findRootOf, (findRootOf > 0) * (guessRoot + findRootOf >/ >> guessRoot) / 2.0, iteration + 1 >>from guesses >> where abs(guessRoot*guessRoot - findRootOf) > >> (2.22044604925031e-16 * FindRootOf) >> limit 32 >> ) >> select guessRoot >> from guesses >> order by iteration desc >>limit 1; >> >> where 2.22044604925031e-16 is the epsilon of IEEE754 double precision >> floating point numbers (the limit of computational precision in the >last >> binary place) >> >> -- >> The fact that there's a Highway to Hell but only a Stairway to Heaven >says >> a lot about anticipated traffic volume. >> >> >-Original Message- >> >From: sqlite-users On >> >Behalf Of Bart Smissaert >> >Sent: Friday, 18 October, 2019 14:55 >> >To: SQLite mailing list >> >Subject: Re: [sqlite] Standard deviation last x entries >> > >> >Hi Olaf, >> > >> >Could you tell me what this is doing: >> > >> >,(x>0)*(y+x/y)/2 yi >> > >> >Especially the yi after the 2 >> >How does the yi relate to the preceding bit? >> > >> >RBS >> > >> > >> > >> >On Fri, Oct 18, 2019 at 8:26 PM Olaf Schmidt >wrote: >> > >> >> Am 18.10.2019 um 19:45 schrieb Bart Smissaert: >> >> >> >> > Regarding: >> >> > >> >> > With r(s2, s, i) As (Select 2, 1, 1 Union All >> >> > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2- >s*s)>1e-12 >> >> > Limit 32 >> >> > ) Select s From r Order By i Desc Limit 1 >> >> > >> >> > How would this work if I wanted to update all the values in a >table >> >> column >> >> > to have the square root? >> >&
Re: [sqlite] Standard deviation last x entries
> However, to optimize the calculation the following is more efficient I tested it, but didn't find it any faster. Naming makes it a lot clearer though. RBS On Fri, Oct 18, 2019 at 10:56 PM Keith Medcalf wrote: > > We are calculating the square root using Newtons Method of successive > approximation > > https://pages.mtu.edu/~shene/COURSES/cs201/NOTES/chap04/sqrt.html > > (x>0)*(y+x/y)/2 yi > > is computing the new guess of the square root of x based on the last guess > y and calling that calculation yi (the AS keyword is omitted). The (x>0) > part is to make sure that x is always a positive number otherwise the new > guess will be 0. This means that the attempt to find the square root of a > negative number (or 0) will always return 0 for each of two iterations thus > meeting the stop condition because abs(0 - 0) is less than any positive > number. > > To use a CTE with better named variables one would use: > > with guesses(findRootOf, guessRoot, iteration) > as ( > select ?, 1.0, 1 > union all > select findRootOf, (findRootOf > 0) * (guessRoot + findRootOf / > guessRoot) / 2.0 as newGuess, iteration + 1 >from guesses > where abs(newGuess - guessRoot) > 1e-12 > limit 32 > ) > select guessRoot > from guesses > order by iteration desc >limit 1; > > However, to optimize the calculation the following is more efficient > (since it only calculates the new guess once each iteration, and computes > to the limit of precision). We are substituting calculating the guess > twice and its difference from a constant (9 operations) with computing the > actual precision (6 operations): > > with guesses(findRootOf, guessRoot, iteration) > as ( > select ?, 1.0, 1 > union all > select findRootOf, (findRootOf > 0) * (guessRoot + findRootOf / > guessRoot) / 2.0, iteration + 1 >from guesses > where abs(guessRoot*guessRoot - findRootOf) > > (2.22044604925031e-16 * FindRootOf) > limit 32 > ) > select guessRoot > from guesses > order by iteration desc >limit 1; > > where 2.22044604925031e-16 is the epsilon of IEEE754 double precision > floating point numbers (the limit of computational precision in the last > binary place) > > -- > The fact that there's a Highway to Hell but only a Stairway to Heaven says > a lot about anticipated traffic volume. > > >-Original Message- > >From: sqlite-users On > >Behalf Of Bart Smissaert > >Sent: Friday, 18 October, 2019 14:55 > >To: SQLite mailing list > >Subject: Re: [sqlite] Standard deviation last x entries > > > >Hi Olaf, > > > >Could you tell me what this is doing: > > > >,(x>0)*(y+x/y)/2 yi > > > >Especially the yi after the 2 > >How does the yi relate to the preceding bit? > > > >RBS > > > > > > > >On Fri, Oct 18, 2019 at 8:26 PM Olaf Schmidt wrote: > > > >> Am 18.10.2019 um 19:45 schrieb Bart Smissaert: > >> > >> > Regarding: > >> > > >> > With r(s2, s, i) As (Select 2, 1, 1 Union All > >> > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > >> > Limit 32 > >> > ) Select s From r Order By i Desc Limit 1 > >> > > >> > How would this work if I wanted to update all the values in a table > >> column > >> > to have the square root? > >> > >> Well, as with any other (single-value-returning) Sub-Select > >> (which was enclosed in parentheses)... > >> > >> E.g. > >> "Update MyTable Set MyCol = (Select 1)" > >> would update MyCol with the value 1 across the whole table... > >> > >> Same thing basically (only "a bit larger") with a CTE-based > >> "Single-Value-SubSelect"... > >> > >> BTW, I've updated and tuned the thing a bit (which should now > >> offer more precision, and ~20% more performance as well): > >> > >> With r (x, y, i) As ( > >>Select ?, 1, 1 Union All > >>Select x,(x>0)*(y+x/y)/2 yi,i+1 From r Where Abs(yi-y)>1e-12 Limit > >32 > >> ) Select y From r Order By i Desc Limit 1 > >> > >> Note the question-marked "Parameter" for the "squared Input-Value" > >> (in the first Select Statement of the CTEs triple). > >> > >> Integrated into an Update-Query it could look this way: > >> > >> Update MyTable Set MySqua
Re: [sqlite] Standard deviation last x entries
Thanks, I did see that, but can't use this on Android. RBS On Fri, Oct 18, 2019 at 11:41 PM Gabor Grothendieck wrote: > There is a stdev function for sqlite here: > https://www.sqlite.org/contrib//download/extension-functions.c?get=25 > > On Wed, Oct 16, 2019 at 7:57 PM Olaf Schmidt wrote: > > > > Am 12.10.2019 um 16:47 schrieb Bart Smissaert: > > > Sorry, I forgot to tell that. It is date column with an integer number. > > > > > > ID xValue xDate > > > > > > 1 130 40123 > > > 1 120 41232 > > > 1 140 40582 > > > 1 100 40888 > > > 1 110 42541 > > > 2 140 41225 > > > 2 130 41589 > > > 2 150 40872 > > > > Because SQLite on Android has at least CTEs available, > > (since Android 5 or so), I thought I make an attempt using them... > > > > For your above sample-set I've created a table Test this way > > (with slightly changed Column-Names + different IDs with "gaps" > > + a few more sample-records, to cover especially n=1 sets): > > > > Create Table Test(ID, xVal, xDat); > > > > Insert Into Test Values(3, 130, 40123); > > Insert Into Test Values(3, 120, 41232); > > Insert Into Test Values(3, 140, 40582); > > Insert Into Test Values(3, 100, 40888); > > Insert Into Test Values(3, 110, 42541); > > > > Insert Into Test Values(5, 140, 41225); > > Insert Into Test Values(5, 130, 41589); > > Insert Into Test Values(5, 150, 40872); > > > > Insert Into Test Values(6, 110, 41225); > > Insert Into Test Values(6, 115, 41227); > > > > Insert Into Test Values(9, 105, 41225); > > > > After the above Table-Creation + Inserts, one should > > define a "Base-View" first, which is then able to select > > appropriate aggregates (respecting your "max-4" condition): > > > > Create View vw_StdDev_Max4_Aggregates As > > Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T > > Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc > > Limit 4) > > Group By ID; > > > > Only this view above has to be adjusted, according to your > > real table- and column-names, but the following two "derived views", > > will work after these changes as well, as long as you leave the > > above view-name - and the field-names: (ID, a1, a2, n) intact. > > > > So what remains, is two more view-defs (one for the population-StdDev - > > and one with a few slight changes, to cover the sample-StdDev as well. > > > > Create View vw_StdDev_Samples As > > Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", ( > >With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All > > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > > Limit 32 > >) Select s From r Order By i Desc Limit 1 > > ) s From vw_StdDev_Max4_Aggregates Where n>1 > > Union All > > Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1; > > > > Create View vw_StdDev_Populations As > > Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", ( > >With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All > > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > > Limit 32 > >) Select s From r Order By i Desc Limit 1 > > ) sigma From vw_StdDev_Max4_Aggregates Where n>1 > > Union All > > Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1; > > > > > > Ok, here the result I get on my test-table, when the 3 views > > are in place, and the following gets executed: > > > > Select ID, n_PopulationSize, sigma From vw_StdDev_Populations; > > ID n sigma > > > > 3 4 14.7901994577491 > > 5 3 8.16496580927733 > > 6 2 2.5 > > 9 1 0 > > > > > > Select ID, n_SampleSize, s From vw_StdDev_Samples; > > ID n s > > > > 3 4 17.0782512765993 > > 5 3 10.1 > > 6 2 3.53553390593274 > > 9 1 null > > > > Please note the returned null (vs. the real 0) in the last result > > (done, because "unbiased estimates" are undefined for sample-size == 1) > > > > Finally for those interested, the CTE-based "iterated Square-Root", > > cut out for an isolated calculation of sqrt(2). > > > > With r(s2, s, i) As (Select 2, 1, 1 Union All > >Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > > Limit 32 > > ) Select s From r Order By i Desc Limit 1 > > > > The max. iterations are limited to 32 - though for input-values > > in a "typical range", the iteration will usually exit earlier... > > In case of the example above -> sqrt(2), the iteration-counter > > i reached only 6 (as can be seen, when you include i alongside s > > in the final Select. > > > > With r(s2, s, i) As (Select 2, 1, 1 Union All > >Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > > Limit 32 > > ) Select i, s From r Order By i Desc Limit 1 > > > > HTH > > > > Olaf > > > > > > > > ___ > > sqlite-users mailing list > > sqlite-users@mailinglists.sqlite.org > > http://mailinglists.sqlite.or
Re: [sqlite] Standard deviation last x entries
There is a stdev function for sqlite here: https://www.sqlite.org/contrib//download/extension-functions.c?get=25 On Wed, Oct 16, 2019 at 7:57 PM Olaf Schmidt wrote: > > Am 12.10.2019 um 16:47 schrieb Bart Smissaert: > > Sorry, I forgot to tell that. It is date column with an integer number. > > > > ID xValue xDate > > > > 1 130 40123 > > 1 120 41232 > > 1 140 40582 > > 1 100 40888 > > 1 110 42541 > > 2 140 41225 > > 2 130 41589 > > 2 150 40872 > > Because SQLite on Android has at least CTEs available, > (since Android 5 or so), I thought I make an attempt using them... > > For your above sample-set I've created a table Test this way > (with slightly changed Column-Names + different IDs with "gaps" > + a few more sample-records, to cover especially n=1 sets): > > Create Table Test(ID, xVal, xDat); > > Insert Into Test Values(3, 130, 40123); > Insert Into Test Values(3, 120, 41232); > Insert Into Test Values(3, 140, 40582); > Insert Into Test Values(3, 100, 40888); > Insert Into Test Values(3, 110, 42541); > > Insert Into Test Values(5, 140, 41225); > Insert Into Test Values(5, 130, 41589); > Insert Into Test Values(5, 150, 40872); > > Insert Into Test Values(6, 110, 41225); > Insert Into Test Values(6, 115, 41227); > > Insert Into Test Values(9, 105, 41225); > > After the above Table-Creation + Inserts, one should > define a "Base-View" first, which is then able to select > appropriate aggregates (respecting your "max-4" condition): > > Create View vw_StdDev_Max4_Aggregates As > Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T > Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc > Limit 4) > Group By ID; > > Only this view above has to be adjusted, according to your > real table- and column-names, but the following two "derived views", > will work after these changes as well, as long as you leave the > above view-name - and the field-names: (ID, a1, a2, n) intact. > > So what remains, is two more view-defs (one for the population-StdDev - > and one with a few slight changes, to cover the sample-StdDev as well. > > Create View vw_StdDev_Samples As > Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", ( >With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 >) Select s From r Order By i Desc Limit 1 > ) s From vw_StdDev_Max4_Aggregates Where n>1 > Union All > Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1; > > Create View vw_StdDev_Populations As > Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", ( >With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 >) Select s From r Order By i Desc Limit 1 > ) sigma From vw_StdDev_Max4_Aggregates Where n>1 > Union All > Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1; > > > Ok, here the result I get on my test-table, when the 3 views > are in place, and the following gets executed: > > Select ID, n_PopulationSize, sigma From vw_StdDev_Populations; > ID n sigma > > 3 4 14.7901994577491 > 5 3 8.16496580927733 > 6 2 2.5 > 9 1 0 > > > Select ID, n_SampleSize, s From vw_StdDev_Samples; > ID n s > > 3 4 17.0782512765993 > 5 3 10.1 > 6 2 3.53553390593274 > 9 1 null > > Please note the returned null (vs. the real 0) in the last result > (done, because "unbiased estimates" are undefined for sample-size == 1) > > Finally for those interested, the CTE-based "iterated Square-Root", > cut out for an isolated calculation of sqrt(2). > > With r(s2, s, i) As (Select 2, 1, 1 Union All >Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 > ) Select s From r Order By i Desc Limit 1 > > The max. iterations are limited to 32 - though for input-values > in a "typical range", the iteration will usually exit earlier... > In case of the example above -> sqrt(2), the iteration-counter > i reached only 6 (as can be seen, when you include i alongside s > in the final Select. > > With r(s2, s, i) As (Select 2, 1, 1 Union All >Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 > ) Select i, s From r Order By i Desc Limit 1 > > HTH > > Olaf > > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
> and calling that calculation yi (the AS keyword is omitted). Ah, indeed, simple! Thanks for that explanation. Will study this and try it out. RBS On Fri, Oct 18, 2019 at 10:56 PM Keith Medcalf wrote: > > We are calculating the square root using Newtons Method of successive > approximation > > https://pages.mtu.edu/~shene/COURSES/cs201/NOTES/chap04/sqrt.html > > (x>0)*(y+x/y)/2 yi > > is computing the new guess of the square root of x based on the last guess > y and calling that calculation yi (the AS keyword is omitted). The (x>0) > part is to make sure that x is always a positive number otherwise the new > guess will be 0. This means that the attempt to find the square root of a > negative number (or 0) will always return 0 for each of two iterations thus > meeting the stop condition because abs(0 - 0) is less than any positive > number. > > To use a CTE with better named variables one would use: > > with guesses(findRootOf, guessRoot, iteration) > as ( > select ?, 1.0, 1 > union all > select findRootOf, (findRootOf > 0) * (guessRoot + findRootOf / > guessRoot) / 2.0 as newGuess, iteration + 1 >from guesses > where abs(newGuess - guessRoot) > 1e-12 > limit 32 > ) > select guessRoot > from guesses > order by iteration desc >limit 1; > > However, to optimize the calculation the following is more efficient > (since it only calculates the new guess once each iteration, and computes > to the limit of precision). We are substituting calculating the guess > twice and its difference from a constant (9 operations) with computing the > actual precision (6 operations): > > with guesses(findRootOf, guessRoot, iteration) > as ( > select ?, 1.0, 1 > union all > select findRootOf, (findRootOf > 0) * (guessRoot + findRootOf / > guessRoot) / 2.0, iteration + 1 >from guesses > where abs(guessRoot*guessRoot - findRootOf) > > (2.22044604925031e-16 * FindRootOf) > limit 32 > ) > select guessRoot > from guesses > order by iteration desc >limit 1; > > where 2.22044604925031e-16 is the epsilon of IEEE754 double precision > floating point numbers (the limit of computational precision in the last > binary place) > > -- > The fact that there's a Highway to Hell but only a Stairway to Heaven says > a lot about anticipated traffic volume. > > >-Original Message- > >From: sqlite-users On > >Behalf Of Bart Smissaert > >Sent: Friday, 18 October, 2019 14:55 > >To: SQLite mailing list > >Subject: Re: [sqlite] Standard deviation last x entries > > > >Hi Olaf, > > > >Could you tell me what this is doing: > > > >,(x>0)*(y+x/y)/2 yi > > > >Especially the yi after the 2 > >How does the yi relate to the preceding bit? > > > >RBS > > > > > > > >On Fri, Oct 18, 2019 at 8:26 PM Olaf Schmidt wrote: > > > >> Am 18.10.2019 um 19:45 schrieb Bart Smissaert: > >> > >> > Regarding: > >> > > >> > With r(s2, s, i) As (Select 2, 1, 1 Union All > >> > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > >> > Limit 32 > >> > ) Select s From r Order By i Desc Limit 1 > >> > > >> > How would this work if I wanted to update all the values in a table > >> column > >> > to have the square root? > >> > >> Well, as with any other (single-value-returning) Sub-Select > >> (which was enclosed in parentheses)... > >> > >> E.g. > >> "Update MyTable Set MyCol = (Select 1)" > >> would update MyCol with the value 1 across the whole table... > >> > >> Same thing basically (only "a bit larger") with a CTE-based > >> "Single-Value-SubSelect"... > >> > >> BTW, I've updated and tuned the thing a bit (which should now > >> offer more precision, and ~20% more performance as well): > >> > >> With r (x, y, i) As ( > >>Select ?, 1, 1 Union All > >>Select x,(x>0)*(y+x/y)/2 yi,i+1 From r Where Abs(yi-y)>1e-12 Limit > >32 > >> ) Select y From r Order By i Desc Limit 1 > >> > >> Note the question-marked "Parameter" for the "squared Input-Value" > >> (in the first Select Statement of the CTEs triple). > >> > >> Integrated into an Update-Query it could look this way: > >> > >> Update MyTable Set MySquareRoot = ( -- SubS
Re: [sqlite] Standard deviation last x entries
We are calculating the square root using Newtons Method of successive approximation https://pages.mtu.edu/~shene/COURSES/cs201/NOTES/chap04/sqrt.html (x>0)*(y+x/y)/2 yi is computing the new guess of the square root of x based on the last guess y and calling that calculation yi (the AS keyword is omitted). The (x>0) part is to make sure that x is always a positive number otherwise the new guess will be 0. This means that the attempt to find the square root of a negative number (or 0) will always return 0 for each of two iterations thus meeting the stop condition because abs(0 - 0) is less than any positive number. To use a CTE with better named variables one would use: with guesses(findRootOf, guessRoot, iteration) as ( select ?, 1.0, 1 union all select findRootOf, (findRootOf > 0) * (guessRoot + findRootOf / guessRoot) / 2.0 as newGuess, iteration + 1 from guesses where abs(newGuess - guessRoot) > 1e-12 limit 32 ) select guessRoot from guesses order by iteration desc limit 1; However, to optimize the calculation the following is more efficient (since it only calculates the new guess once each iteration, and computes to the limit of precision). We are substituting calculating the guess twice and its difference from a constant (9 operations) with computing the actual precision (6 operations): with guesses(findRootOf, guessRoot, iteration) as ( select ?, 1.0, 1 union all select findRootOf, (findRootOf > 0) * (guessRoot + findRootOf / guessRoot) / 2.0, iteration + 1 from guesses where abs(guessRoot*guessRoot - findRootOf) > (2.22044604925031e-16 * FindRootOf) limit 32 ) select guessRoot from guesses order by iteration desc limit 1; where 2.22044604925031e-16 is the epsilon of IEEE754 double precision floating point numbers (the limit of computational precision in the last binary place) -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users On >Behalf Of Bart Smissaert >Sent: Friday, 18 October, 2019 14:55 >To: SQLite mailing list >Subject: Re: [sqlite] Standard deviation last x entries > >Hi Olaf, > >Could you tell me what this is doing: > >,(x>0)*(y+x/y)/2 yi > >Especially the yi after the 2 >How does the yi relate to the preceding bit? > >RBS > > > >On Fri, Oct 18, 2019 at 8:26 PM Olaf Schmidt wrote: > >> Am 18.10.2019 um 19:45 schrieb Bart Smissaert: >> >> > Regarding: >> > >> > With r(s2, s, i) As (Select 2, 1, 1 Union All >> > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 >> > Limit 32 >> > ) Select s From r Order By i Desc Limit 1 >> > >> > How would this work if I wanted to update all the values in a table >> column >> > to have the square root? >> >> Well, as with any other (single-value-returning) Sub-Select >> (which was enclosed in parentheses)... >> >> E.g. >> "Update MyTable Set MyCol = (Select 1)" >> would update MyCol with the value 1 across the whole table... >> >> Same thing basically (only "a bit larger") with a CTE-based >> "Single-Value-SubSelect"... >> >> BTW, I've updated and tuned the thing a bit (which should now >> offer more precision, and ~20% more performance as well): >> >> With r (x, y, i) As ( >>Select ?, 1, 1 Union All >>Select x,(x>0)*(y+x/y)/2 yi,i+1 From r Where Abs(yi-y)>1e-12 Limit >32 >> ) Select y From r Order By i Desc Limit 1 >> >> Note the question-marked "Parameter" for the "squared Input-Value" >> (in the first Select Statement of the CTEs triple). >> >> Integrated into an Update-Query it could look this way: >> >> Update MyTable Set MySquareRoot = ( -- SubSelect-Opening-Paren... >> With r (x, y, i) As ( >> Select MySquaredValue, 1, 1 Union All >> Select x,(x>0)*(y+x/y)/2 yi,i+1 From r Where Abs(yi-y)>1e-12 Limit >32 >> ) Select y From r Order By i Desc Limit 1 >> ) -- SubSelect-Closing-Paren... >> >> HTH >> >> Olaf >> >> ___ >> sqlite-users mailing list >> sqlite-users@mailinglists.sqlite.org >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >> >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
Hi Olaf, Could you tell me what this is doing: ,(x>0)*(y+x/y)/2 yi Especially the yi after the 2 How does the yi relate to the preceding bit? RBS On Fri, Oct 18, 2019 at 8:26 PM Olaf Schmidt wrote: > Am 18.10.2019 um 19:45 schrieb Bart Smissaert: > > > Regarding: > > > > With r(s2, s, i) As (Select 2, 1, 1 Union All > > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > > Limit 32 > > ) Select s From r Order By i Desc Limit 1 > > > > How would this work if I wanted to update all the values in a table > column > > to have the square root? > > Well, as with any other (single-value-returning) Sub-Select > (which was enclosed in parentheses)... > > E.g. > "Update MyTable Set MyCol = (Select 1)" > would update MyCol with the value 1 across the whole table... > > Same thing basically (only "a bit larger") with a CTE-based > "Single-Value-SubSelect"... > > BTW, I've updated and tuned the thing a bit (which should now > offer more precision, and ~20% more performance as well): > > With r (x, y, i) As ( >Select ?, 1, 1 Union All >Select x,(x>0)*(y+x/y)/2 yi,i+1 From r Where Abs(yi-y)>1e-12 Limit 32 > ) Select y From r Order By i Desc Limit 1 > > Note the question-marked "Parameter" for the "squared Input-Value" > (in the first Select Statement of the CTEs triple). > > Integrated into an Update-Query it could look this way: > > Update MyTable Set MySquareRoot = ( -- SubSelect-Opening-Paren... > With r (x, y, i) As ( > Select MySquaredValue, 1, 1 Union All > Select x,(x>0)*(y+x/y)/2 yi,i+1 From r Where Abs(yi-y)>1e-12 Limit 32 > ) Select y From r Order By i Desc Limit 1 > ) -- SubSelect-Closing-Paren... > > HTH > > Olaf > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
Have tested this and CTE pure SQL method is a lot faster (about 4 x) than my previous method in code: strSQL = "SELECT ID, SD_BP FROM QR3PARAMS" RS1 = cConn.SQL1.ExecQuery(strSQL) strSQL = "UPDATE QR3PARAMS SET SD_BP = ? WHERE ID = ?" cConn.BeginTransaction Do While RS1.NextRow cConn.SQL1.ExecNonQuery2(strSQL, Array As String(Sqrt(RS1.GetDouble2(1)), RS1.GetInt2(0))) Loop cConn.EndTransaction This is B4A code on Android with the SQLCipher driver. RBS On Fri, Oct 18, 2019 at 8:26 PM Olaf Schmidt wrote: > Am 18.10.2019 um 19:45 schrieb Bart Smissaert: > > > Regarding: > > > > With r(s2, s, i) As (Select 2, 1, 1 Union All > > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > > Limit 32 > > ) Select s From r Order By i Desc Limit 1 > > > > How would this work if I wanted to update all the values in a table > column > > to have the square root? > > Well, as with any other (single-value-returning) Sub-Select > (which was enclosed in parentheses)... > > E.g. > "Update MyTable Set MyCol = (Select 1)" > would update MyCol with the value 1 across the whole table... > > Same thing basically (only "a bit larger") with a CTE-based > "Single-Value-SubSelect"... > > BTW, I've updated and tuned the thing a bit (which should now > offer more precision, and ~20% more performance as well): > > With r (x, y, i) As ( >Select ?, 1, 1 Union All >Select x,(x>0)*(y+x/y)/2 yi,i+1 From r Where Abs(yi-y)>1e-12 Limit 32 > ) Select y From r Order By i Desc Limit 1 > > Note the question-marked "Parameter" for the "squared Input-Value" > (in the first Select Statement of the CTEs triple). > > Integrated into an Update-Query it could look this way: > > Update MyTable Set MySquareRoot = ( -- SubSelect-Opening-Paren... > With r (x, y, i) As ( > Select MySquaredValue, 1, 1 Union All > Select x,(x>0)*(y+x/y)/2 yi,i+1 From r Where Abs(yi-y)>1e-12 Limit 32 > ) Select y From r Order By i Desc Limit 1 > ) -- SubSelect-Closing-Paren... > > HTH > > Olaf > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
Hi Olaf, OK, thanks, that works fine indeed. > Well, as with any other (single-value-returning) Sub-Select It is just that saw SQL's where the update part came after the CTE, eg: https://stackoverflow.com/questions/27637278/how-to-remove-duplicate-rows-with-cte-when-partitioning-by-another-tables-colum WITH cte AS ( SELECT ROW_NUMBER() OVER (PARTITION BY GroupId ORDER BY Created) AS Rnk FROM @Item AS i JOIN @ItemType AS it ON i.ItemTypeId = it.Id ) DELETE FROM cte WHERE Rnk > 1; RBS On Fri, Oct 18, 2019 at 8:26 PM Olaf Schmidt wrote: > Am 18.10.2019 um 19:45 schrieb Bart Smissaert: > > > Regarding: > > > > With r(s2, s, i) As (Select 2, 1, 1 Union All > > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > > Limit 32 > > ) Select s From r Order By i Desc Limit 1 > > > > How would this work if I wanted to update all the values in a table > column > > to have the square root? > > Well, as with any other (single-value-returning) Sub-Select > (which was enclosed in parentheses)... > > E.g. > "Update MyTable Set MyCol = (Select 1)" > would update MyCol with the value 1 across the whole table... > > Same thing basically (only "a bit larger") with a CTE-based > "Single-Value-SubSelect"... > > BTW, I've updated and tuned the thing a bit (which should now > offer more precision, and ~20% more performance as well): > > With r (x, y, i) As ( >Select ?, 1, 1 Union All >Select x,(x>0)*(y+x/y)/2 yi,i+1 From r Where Abs(yi-y)>1e-12 Limit 32 > ) Select y From r Order By i Desc Limit 1 > > Note the question-marked "Parameter" for the "squared Input-Value" > (in the first Select Statement of the CTEs triple). > > Integrated into an Update-Query it could look this way: > > Update MyTable Set MySquareRoot = ( -- SubSelect-Opening-Paren... > With r (x, y, i) As ( > Select MySquaredValue, 1, 1 Union All > Select x,(x>0)*(y+x/y)/2 yi,i+1 From r Where Abs(yi-y)>1e-12 Limit 32 > ) Select y From r Order By i Desc Limit 1 > ) -- SubSelect-Closing-Paren... > > HTH > > Olaf > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
Am 18.10.2019 um 19:45 schrieb Bart Smissaert: Regarding: With r(s2, s, i) As (Select 2, 1, 1 Union All Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 Limit 32 ) Select s From r Order By i Desc Limit 1 How would this work if I wanted to update all the values in a table column to have the square root? Well, as with any other (single-value-returning) Sub-Select (which was enclosed in parentheses)... E.g. "Update MyTable Set MyCol = (Select 1)" would update MyCol with the value 1 across the whole table... Same thing basically (only "a bit larger") with a CTE-based "Single-Value-SubSelect"... BTW, I've updated and tuned the thing a bit (which should now offer more precision, and ~20% more performance as well): With r (x, y, i) As ( Select ?, 1, 1 Union All Select x,(x>0)*(y+x/y)/2 yi,i+1 From r Where Abs(yi-y)>1e-12 Limit 32 ) Select y From r Order By i Desc Limit 1 Note the question-marked "Parameter" for the "squared Input-Value" (in the first Select Statement of the CTEs triple). Integrated into an Update-Query it could look this way: Update MyTable Set MySquareRoot = ( -- SubSelect-Opening-Paren... With r (x, y, i) As ( Select MySquaredValue, 1, 1 Union All Select x,(x>0)*(y+x/y)/2 yi,i+1 From r Where Abs(yi-y)>1e-12 Limit 32 ) Select y From r Order By i Desc Limit 1 ) -- SubSelect-Closing-Paren... HTH Olaf ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
Hi Olaf, Regarding: With r(s2, s, i) As (Select 2, 1, 1 Union All Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 Limit 32 ) Select s From r Order By i Desc Limit 1 How would this work if I wanted to update all the values in a table column to have the square root? RBS On Thu, Oct 17, 2019 at 12:56 AM Olaf Schmidt wrote: > Am 12.10.2019 um 16:47 schrieb Bart Smissaert: > > Sorry, I forgot to tell that. It is date column with an integer number. > > > > ID xValue xDate > > > > 1 130 40123 > > 1 120 41232 > > 1 140 40582 > > 1 100 40888 > > 1 110 42541 > > 2 140 41225 > > 2 130 41589 > > 2 150 40872 > > Because SQLite on Android has at least CTEs available, > (since Android 5 or so), I thought I make an attempt using them... > > For your above sample-set I've created a table Test this way > (with slightly changed Column-Names + different IDs with "gaps" > + a few more sample-records, to cover especially n=1 sets): > > Create Table Test(ID, xVal, xDat); > > Insert Into Test Values(3, 130, 40123); > Insert Into Test Values(3, 120, 41232); > Insert Into Test Values(3, 140, 40582); > Insert Into Test Values(3, 100, 40888); > Insert Into Test Values(3, 110, 42541); > > Insert Into Test Values(5, 140, 41225); > Insert Into Test Values(5, 130, 41589); > Insert Into Test Values(5, 150, 40872); > > Insert Into Test Values(6, 110, 41225); > Insert Into Test Values(6, 115, 41227); > > Insert Into Test Values(9, 105, 41225); > > After the above Table-Creation + Inserts, one should > define a "Base-View" first, which is then able to select > appropriate aggregates (respecting your "max-4" condition): > > Create View vw_StdDev_Max4_Aggregates As > Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T > Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc > Limit 4) > Group By ID; > > Only this view above has to be adjusted, according to your > real table- and column-names, but the following two "derived views", > will work after these changes as well, as long as you leave the > above view-name - and the field-names: (ID, a1, a2, n) intact. > > So what remains, is two more view-defs (one for the population-StdDev - > and one with a few slight changes, to cover the sample-StdDev as well. > > Create View vw_StdDev_Samples As > Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", ( >With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 >) Select s From r Order By i Desc Limit 1 > ) s From vw_StdDev_Max4_Aggregates Where n>1 > Union All > Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1; > > Create View vw_StdDev_Populations As > Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", ( >With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 >) Select s From r Order By i Desc Limit 1 > ) sigma From vw_StdDev_Max4_Aggregates Where n>1 > Union All > Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1; > > > Ok, here the result I get on my test-table, when the 3 views > are in place, and the following gets executed: > > Select ID, n_PopulationSize, sigma From vw_StdDev_Populations; > ID n sigma > > 3 4 14.7901994577491 > 5 3 8.16496580927733 > 6 2 2.5 > 9 1 0 > > > Select ID, n_SampleSize, s From vw_StdDev_Samples; > ID n s > > 3 4 17.0782512765993 > 5 3 10.1 > 6 2 3.53553390593274 > 9 1 null > > Please note the returned null (vs. the real 0) in the last result > (done, because "unbiased estimates" are undefined for sample-size == 1) > > Finally for those interested, the CTE-based "iterated Square-Root", > cut out for an isolated calculation of sqrt(2). > > With r(s2, s, i) As (Select 2, 1, 1 Union All >Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 > ) Select s From r Order By i Desc Limit 1 > > The max. iterations are limited to 32 - though for input-values > in a "typical range", the iteration will usually exit earlier... > In case of the example above -> sqrt(2), the iteration-counter > i reached only 6 (as can be seen, when you include i alongside s > in the final Select. > > With r(s2, s, i) As (Select 2, 1, 1 Union All >Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 > ) Select i, s From r Order By i Desc Limit 1 > > HTH > > Olaf > > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo
Re: [sqlite] Standard deviation last x entries
Hi Olaf, Tested all on the Windows app and works perfectly fine indeed. Thanks for that. RBS On Thu, Oct 17, 2019 at 12:56 AM Olaf Schmidt wrote: > Am 12.10.2019 um 16:47 schrieb Bart Smissaert: > > Sorry, I forgot to tell that. It is date column with an integer number. > > > > ID xValue xDate > > > > 1 130 40123 > > 1 120 41232 > > 1 140 40582 > > 1 100 40888 > > 1 110 42541 > > 2 140 41225 > > 2 130 41589 > > 2 150 40872 > > Because SQLite on Android has at least CTEs available, > (since Android 5 or so), I thought I make an attempt using them... > > For your above sample-set I've created a table Test this way > (with slightly changed Column-Names + different IDs with "gaps" > + a few more sample-records, to cover especially n=1 sets): > > Create Table Test(ID, xVal, xDat); > > Insert Into Test Values(3, 130, 40123); > Insert Into Test Values(3, 120, 41232); > Insert Into Test Values(3, 140, 40582); > Insert Into Test Values(3, 100, 40888); > Insert Into Test Values(3, 110, 42541); > > Insert Into Test Values(5, 140, 41225); > Insert Into Test Values(5, 130, 41589); > Insert Into Test Values(5, 150, 40872); > > Insert Into Test Values(6, 110, 41225); > Insert Into Test Values(6, 115, 41227); > > Insert Into Test Values(9, 105, 41225); > > After the above Table-Creation + Inserts, one should > define a "Base-View" first, which is then able to select > appropriate aggregates (respecting your "max-4" condition): > > Create View vw_StdDev_Max4_Aggregates As > Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T > Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc > Limit 4) > Group By ID; > > Only this view above has to be adjusted, according to your > real table- and column-names, but the following two "derived views", > will work after these changes as well, as long as you leave the > above view-name - and the field-names: (ID, a1, a2, n) intact. > > So what remains, is two more view-defs (one for the population-StdDev - > and one with a few slight changes, to cover the sample-StdDev as well. > > Create View vw_StdDev_Samples As > Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", ( >With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 >) Select s From r Order By i Desc Limit 1 > ) s From vw_StdDev_Max4_Aggregates Where n>1 > Union All > Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1; > > Create View vw_StdDev_Populations As > Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", ( >With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 >) Select s From r Order By i Desc Limit 1 > ) sigma From vw_StdDev_Max4_Aggregates Where n>1 > Union All > Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1; > > > Ok, here the result I get on my test-table, when the 3 views > are in place, and the following gets executed: > > Select ID, n_PopulationSize, sigma From vw_StdDev_Populations; > ID n sigma > > 3 4 14.7901994577491 > 5 3 8.16496580927733 > 6 2 2.5 > 9 1 0 > > > Select ID, n_SampleSize, s From vw_StdDev_Samples; > ID n s > > 3 4 17.0782512765993 > 5 3 10.1 > 6 2 3.53553390593274 > 9 1 null > > Please note the returned null (vs. the real 0) in the last result > (done, because "unbiased estimates" are undefined for sample-size == 1) > > Finally for those interested, the CTE-based "iterated Square-Root", > cut out for an isolated calculation of sqrt(2). > > With r(s2, s, i) As (Select 2, 1, 1 Union All >Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 > ) Select s From r Order By i Desc Limit 1 > > The max. iterations are limited to 32 - though for input-values > in a "typical range", the iteration will usually exit earlier... > In case of the example above -> sqrt(2), the iteration-counter > i reached only 6 (as can be seen, when you include i alongside s > in the final Select. > > With r(s2, s, i) As (Select 2, 1, 1 Union All >Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 > ) Select i, s From r Order By i Desc Limit 1 > > HTH > > Olaf > > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
Ignore the mentioned problem, must have been an e-mail artefact and view creates fine. Will test all later. RBS On Thu, Oct 17, 2019 at 11:02 AM Bart Smissaert wrote: > Hi Olaf, > > Thanks, will try that out. > I have a slight problem with your first create view on Android (fine in my > Windows desktop app), objecting to: > T.ID, highlighting the dot. > I tried with giving an alias T2 to the table in the subselect, but that > made no difference. > Will try later. > > RBS > > On Thu, Oct 17, 2019 at 12:56 AM Olaf Schmidt wrote: > >> Am 12.10.2019 um 16:47 schrieb Bart Smissaert: >> > Sorry, I forgot to tell that. It is date column with an integer number. >> > >> > ID xValue xDate >> > >> > 1 130 40123 >> > 1 120 41232 >> > 1 140 40582 >> > 1 100 40888 >> > 1 110 42541 >> > 2 140 41225 >> > 2 130 41589 >> > 2 150 40872 >> >> Because SQLite on Android has at least CTEs available, >> (since Android 5 or so), I thought I make an attempt using them... >> >> For your above sample-set I've created a table Test this way >> (with slightly changed Column-Names + different IDs with "gaps" >> + a few more sample-records, to cover especially n=1 sets): >> >> Create Table Test(ID, xVal, xDat); >> >> Insert Into Test Values(3, 130, 40123); >> Insert Into Test Values(3, 120, 41232); >> Insert Into Test Values(3, 140, 40582); >> Insert Into Test Values(3, 100, 40888); >> Insert Into Test Values(3, 110, 42541); >> >> Insert Into Test Values(5, 140, 41225); >> Insert Into Test Values(5, 130, 41589); >> Insert Into Test Values(5, 150, 40872); >> >> Insert Into Test Values(6, 110, 41225); >> Insert Into Test Values(6, 115, 41227); >> >> Insert Into Test Values(9, 105, 41225); >> >> After the above Table-Creation + Inserts, one should >> define a "Base-View" first, which is then able to select >> appropriate aggregates (respecting your "max-4" condition): >> >> Create View vw_StdDev_Max4_Aggregates As >> Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T >> Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc >> Limit 4) >> Group By ID; >> >> Only this view above has to be adjusted, according to your >> real table- and column-names, but the following two "derived views", >> will work after these changes as well, as long as you leave the >> above view-name - and the field-names: (ID, a1, a2, n) intact. >> >> So what remains, is two more view-defs (one for the population-StdDev - >> and one with a few slight changes, to cover the sample-StdDev as well. >> >> Create View vw_StdDev_Samples As >> Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", ( >>With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All >> Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 >> Limit 32 >>) Select s From r Order By i Desc Limit 1 >> ) s From vw_StdDev_Max4_Aggregates Where n>1 >> Union All >> Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1; >> >> Create View vw_StdDev_Populations As >> Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", ( >>With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All >> Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 >> Limit 32 >>) Select s From r Order By i Desc Limit 1 >> ) sigma From vw_StdDev_Max4_Aggregates Where n>1 >> Union All >> Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1; >> >> >> Ok, here the result I get on my test-table, when the 3 views >> are in place, and the following gets executed: >> >> Select ID, n_PopulationSize, sigma From vw_StdDev_Populations; >> ID n sigma >> >> 3 4 14.7901994577491 >> 5 3 8.16496580927733 >> 6 2 2.5 >> 9 1 0 >> >> >> Select ID, n_SampleSize, s From vw_StdDev_Samples; >> ID n s >> >> 3 4 17.0782512765993 >> 5 3 10.1 >> 6 2 3.53553390593274 >> 9 1 null >> >> Please note the returned null (vs. the real 0) in the last result >> (done, because "unbiased estimates" are undefined for sample-size == 1) >> >> Finally for those interested, the CTE-based "iterated Square-Root", >> cut out for an isolated calculation of sqrt(2). >> >> With r(s2, s, i) As (Select 2, 1, 1 Union All >>Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 >> Limit 32 >> ) Select s From r Order By i Desc Limit 1 >> >> The max. iterations are limited to 32 - though for input-values >> in a "typical range", the iteration will usually exit earlier... >> In case of the example above -> sqrt(2), the iteration-counter >> i reached only 6 (as can be seen, when you include i alongside s >> in the final Select. >> >> With r(s2, s, i) As (Select 2, 1, 1 Union All >>Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 >> Limit 32 >> ) Select i, s From r Order By i Desc Limit 1 >> >> HTH >> >> Olaf >> >> >> >> __
Re: [sqlite] Standard deviation last x entries
Hi Olaf, Thanks, will try that out. I have a slight problem with your first create view on Android (fine in my Windows desktop app), objecting to: T.ID, highlighting the dot. I tried with giving an alias T2 to the table in the subselect, but that made no difference. Will try later. RBS On Thu, Oct 17, 2019 at 12:56 AM Olaf Schmidt wrote: > Am 12.10.2019 um 16:47 schrieb Bart Smissaert: > > Sorry, I forgot to tell that. It is date column with an integer number. > > > > ID xValue xDate > > > > 1 130 40123 > > 1 120 41232 > > 1 140 40582 > > 1 100 40888 > > 1 110 42541 > > 2 140 41225 > > 2 130 41589 > > 2 150 40872 > > Because SQLite on Android has at least CTEs available, > (since Android 5 or so), I thought I make an attempt using them... > > For your above sample-set I've created a table Test this way > (with slightly changed Column-Names + different IDs with "gaps" > + a few more sample-records, to cover especially n=1 sets): > > Create Table Test(ID, xVal, xDat); > > Insert Into Test Values(3, 130, 40123); > Insert Into Test Values(3, 120, 41232); > Insert Into Test Values(3, 140, 40582); > Insert Into Test Values(3, 100, 40888); > Insert Into Test Values(3, 110, 42541); > > Insert Into Test Values(5, 140, 41225); > Insert Into Test Values(5, 130, 41589); > Insert Into Test Values(5, 150, 40872); > > Insert Into Test Values(6, 110, 41225); > Insert Into Test Values(6, 115, 41227); > > Insert Into Test Values(9, 105, 41225); > > After the above Table-Creation + Inserts, one should > define a "Base-View" first, which is then able to select > appropriate aggregates (respecting your "max-4" condition): > > Create View vw_StdDev_Max4_Aggregates As > Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T > Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc > Limit 4) > Group By ID; > > Only this view above has to be adjusted, according to your > real table- and column-names, but the following two "derived views", > will work after these changes as well, as long as you leave the > above view-name - and the field-names: (ID, a1, a2, n) intact. > > So what remains, is two more view-defs (one for the population-StdDev - > and one with a few slight changes, to cover the sample-StdDev as well. > > Create View vw_StdDev_Samples As > Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", ( >With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 >) Select s From r Order By i Desc Limit 1 > ) s From vw_StdDev_Max4_Aggregates Where n>1 > Union All > Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1; > > Create View vw_StdDev_Populations As > Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", ( >With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All > Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 >) Select s From r Order By i Desc Limit 1 > ) sigma From vw_StdDev_Max4_Aggregates Where n>1 > Union All > Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1; > > > Ok, here the result I get on my test-table, when the 3 views > are in place, and the following gets executed: > > Select ID, n_PopulationSize, sigma From vw_StdDev_Populations; > ID n sigma > > 3 4 14.7901994577491 > 5 3 8.16496580927733 > 6 2 2.5 > 9 1 0 > > > Select ID, n_SampleSize, s From vw_StdDev_Samples; > ID n s > > 3 4 17.0782512765993 > 5 3 10.1 > 6 2 3.53553390593274 > 9 1 null > > Please note the returned null (vs. the real 0) in the last result > (done, because "unbiased estimates" are undefined for sample-size == 1) > > Finally for those interested, the CTE-based "iterated Square-Root", > cut out for an isolated calculation of sqrt(2). > > With r(s2, s, i) As (Select 2, 1, 1 Union All >Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 > ) Select s From r Order By i Desc Limit 1 > > The max. iterations are limited to 32 - though for input-values > in a "typical range", the iteration will usually exit earlier... > In case of the example above -> sqrt(2), the iteration-counter > i reached only 6 (as can be seen, when you include i alongside s > in the final Select. > > With r(s2, s, i) As (Select 2, 1, 1 Union All >Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 > Limit 32 > ) Select i, s From r Order By i Desc Limit 1 > > HTH > > Olaf > > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqli
Re: [sqlite] Standard deviation last x entries
Am 12.10.2019 um 16:47 schrieb Bart Smissaert: Sorry, I forgot to tell that. It is date column with an integer number. ID xValue xDate 1 130 40123 1 120 41232 1 140 40582 1 100 40888 1 110 42541 2 140 41225 2 130 41589 2 150 40872 Because SQLite on Android has at least CTEs available, (since Android 5 or so), I thought I make an attempt using them... For your above sample-set I've created a table Test this way (with slightly changed Column-Names + different IDs with "gaps" + a few more sample-records, to cover especially n=1 sets): Create Table Test(ID, xVal, xDat); Insert Into Test Values(3, 130, 40123); Insert Into Test Values(3, 120, 41232); Insert Into Test Values(3, 140, 40582); Insert Into Test Values(3, 100, 40888); Insert Into Test Values(3, 110, 42541); Insert Into Test Values(5, 140, 41225); Insert Into Test Values(5, 130, 41589); Insert Into Test Values(5, 150, 40872); Insert Into Test Values(6, 110, 41225); Insert Into Test Values(6, 115, 41227); Insert Into Test Values(9, 105, 41225); After the above Table-Creation + Inserts, one should define a "Base-View" first, which is then able to select appropriate aggregates (respecting your "max-4" condition): Create View vw_StdDev_Max4_Aggregates As Select ID, Avg(xVal) a1, Avg(xVal*xVal) a2, Count(*) n From Test T Where xDat In (Select xDat From Test Where T.ID=ID Order By xDat Desc Limit 4) Group By ID; Only this view above has to be adjusted, according to your real table- and column-names, but the following two "derived views", will work after these changes as well, as long as you leave the above view-name - and the field-names: (ID, a1, a2, n) intact. So what remains, is two more view-defs (one for the population-StdDev - and one with a few slight changes, to cover the sample-StdDev as well. Create View vw_StdDev_Samples As Select ID, n n_SampleSize, (a2-a1*a1)*n/(n-1) "s^2", ( With r(s2, s, i) As (Select (a2-a1*a1)*n/(n-1), 1, 1 Union All Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 Limit 32 ) Select s From r Order By i Desc Limit 1 ) s From vw_StdDev_Max4_Aggregates Where n>1 Union All Select ID, n, Null, Null From vw_StdDev_Max4_Aggregates Where n=1; Create View vw_StdDev_Populations As Select ID, n n_PopulationSize, (a2-a1*a1) "sigma^2", ( With r(s2, s, i) As (Select (a2-a1*a1), 1, 1 Union All Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 Limit 32 ) Select s From r Order By i Desc Limit 1 ) sigma From vw_StdDev_Max4_Aggregates Where n>1 Union All Select ID, n, 0, 0 From vw_StdDev_Max4_Aggregates Where n=1; Ok, here the result I get on my test-table, when the 3 views are in place, and the following gets executed: Select ID, n_PopulationSize, sigma From vw_StdDev_Populations; ID n sigma 3 4 14.7901994577491 5 3 8.16496580927733 6 2 2.5 9 1 0 Select ID, n_SampleSize, s From vw_StdDev_Samples; ID n s 3 4 17.0782512765993 5 3 10.1 6 2 3.53553390593274 9 1 null Please note the returned null (vs. the real 0) in the last result (done, because "unbiased estimates" are undefined for sample-size == 1) Finally for those interested, the CTE-based "iterated Square-Root", cut out for an isolated calculation of sqrt(2). With r(s2, s, i) As (Select 2, 1, 1 Union All Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 Limit 32 ) Select s From r Order By i Desc Limit 1 The max. iterations are limited to 32 - though for input-values in a "typical range", the iteration will usually exit earlier... In case of the example above -> sqrt(2), the iteration-counter i reached only 6 (as can be seen, when you include i alongside s in the final Select. With r(s2, s, i) As (Select 2, 1, 1 Union All Select s2, (s2>0)*(s+s2/s)*.5, i+1 From r Where Abs(s2-s*s)>1e-12 Limit 32 ) Select i, s From r Order By i Desc Limit 1 HTH Olaf ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
This is not a sample, but all the systolic blood pressure values of all of our patients. All this has to do with the calculation of the QRisk3 score: https://qrisk.org/three/ RBS On Sat, Oct 12, 2019 at 5:34 PM Richard Damon wrote: > One thing to point out, it sounds like you are dealing with a sample, > and I think you want to get the estimated standard deviation of the > process, which says you really want to use the adjusted formula that > uses N-1 in the denominator, as the expected value of the standard > deviation of a sample is smaller than the standard deviation of the > process/population. > > One quick test is to think what you would do if you had just a single > point. Does this mean you really have a Standard Deviation of Zero (as > that is the SD of a population of one) or does it mean you don't have an > idea of what the SD of the population is. > > On 10/12/19 12:13 PM, Bart Smissaert wrote: > > Thanks, I do know how to calculate the SD in code, but I thought in this > > particular case it might be faster to do this in SQL. > > Only problem is the square root and for that reason I will test this in > > code as well and see how it compares with SQL. > > > > I found a way to get the one from last step, so that is without doing the > > final square root step. > > Note that this has to do with a number of systolic blood pressure > readings > > and I am actually taking the last 6 and that is useful as due > > to treatment the SD often will be less for the later values: > > > > UPDATE QR3PARAMS SET AVG_BP = > > (SELECT AVG(SYST) FROM (SELECT SYSTOLIC AS SYST FROM BP WHERE ID = > > QR3PARAMS.ID ORDER BY ENTRY_DATE DESC LIMIT 6)) > > > > UPDATE QR3PARAMS SET DEV_BP = > > (SELECT AVG(SYST * SYST) FROM (SELECT SYSTOLIC - QR3PARAMS.AVG_BP AS SYST > > FROM BP WHERE ID = QR3PARAMS.ID ORDER BY ENTRY_DATE DESC LIMIT 6)) > > > > RBS > > > > > > > > > > On Sat, Oct 12, 2019 at 4:21 PM Keith Medcalf > wrote: > > > >> The Standard Deviation of the Population is the Square Root of the mean > of > >> the second order differences. > >> > >> For your input values, you calculate the mean. > >> Then you calculate the mean of the square of the difference between each > >> value and the mean of the values. > >> Then you take the square root of that. > >> > >> You can also compute it using a running calculation (generally more > >> accurate on IEEE754 hardware). > >> > >> So just retrieve the values you are interested in and do the > calculations > >> in your application. That is, if you only want FOUR values, then write > a > >> query that returns only those FOUR values, and use those to computer the > >> answer. While you can do all the calculations (including the Square > Root) > >> in SQL, it is likely to be horrendously slow. > >> > >> However, before you can do *ANYTHING* else, you need to be able to > >> demonstrate that you can write a select statement that retrieves the > values > >> you want to work on, only the values you want to work on, and nothing > but > >> the values you want to work on. Having 47,000 values and sauing that > you > >> would like to find the standard deviation of some set of four of them is > >> entertaining, but not useful in any way. > >> > >> -- > >> The fact that there's a Highway to Hell but only a Stairway to Heaven > says > >> a lot about anticipated traffic volume. > >> > >>> -Original Message- > >>> From: sqlite-users On > >>> Behalf Of Bart Smissaert > >>> Sent: Saturday, 12 October, 2019 08:09 > >>> To: General Discussion of SQLite Database >>> us...@mailinglists.sqlite.org> > >>> Subject: [sqlite] Standard deviation last x entries > >>> > >>> How do I get the standard deviation of the last 4 entries (there could > be > >>> less than 4) of an integer column grouped by an integer ID entry in > >>> another > >>> column in the same table. > >>> > >>> So data could be like this: > >>> > >>> ID xValue > >>> > >>> 1 130 > >>> 1 120 > >>> 1 140 > >>> 1 100 > >>> 1 110 > >>> 2 140 > >>> 2 130 > >>> 2 150 > >>> > >>> I know that SQLite doesn't have the SQRT function (and I can't make a > UDF > >>> as I am doing this in an Android phone app) but I can do this last step > >>> in > >>> code. > >>> > >>> RBS > >>> ___ > >>> sqlite-users mailing list > >>> sqlite-users@mailinglists.sqlite.org > >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > >> > >> > >> ___ > >> sqlite-users mailing list > >> sqlite-users@mailinglists.sqlite.org > >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > >> > > ___ > > sqlite-users mailing list > > sqlite-users@mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > -- > Richard Damon > > ___ > sqlite-users mailing list > sqlite-users@mailingl
Re: [sqlite] Standard deviation last x entries
One thing to point out, it sounds like you are dealing with a sample, and I think you want to get the estimated standard deviation of the process, which says you really want to use the adjusted formula that uses N-1 in the denominator, as the expected value of the standard deviation of a sample is smaller than the standard deviation of the process/population. One quick test is to think what you would do if you had just a single point. Does this mean you really have a Standard Deviation of Zero (as that is the SD of a population of one) or does it mean you don't have an idea of what the SD of the population is. On 10/12/19 12:13 PM, Bart Smissaert wrote: > Thanks, I do know how to calculate the SD in code, but I thought in this > particular case it might be faster to do this in SQL. > Only problem is the square root and for that reason I will test this in > code as well and see how it compares with SQL. > > I found a way to get the one from last step, so that is without doing the > final square root step. > Note that this has to do with a number of systolic blood pressure readings > and I am actually taking the last 6 and that is useful as due > to treatment the SD often will be less for the later values: > > UPDATE QR3PARAMS SET AVG_BP = > (SELECT AVG(SYST) FROM (SELECT SYSTOLIC AS SYST FROM BP WHERE ID = > QR3PARAMS.ID ORDER BY ENTRY_DATE DESC LIMIT 6)) > > UPDATE QR3PARAMS SET DEV_BP = > (SELECT AVG(SYST * SYST) FROM (SELECT SYSTOLIC - QR3PARAMS.AVG_BP AS SYST > FROM BP WHERE ID = QR3PARAMS.ID ORDER BY ENTRY_DATE DESC LIMIT 6)) > > RBS > > > > > On Sat, Oct 12, 2019 at 4:21 PM Keith Medcalf wrote: > >> The Standard Deviation of the Population is the Square Root of the mean of >> the second order differences. >> >> For your input values, you calculate the mean. >> Then you calculate the mean of the square of the difference between each >> value and the mean of the values. >> Then you take the square root of that. >> >> You can also compute it using a running calculation (generally more >> accurate on IEEE754 hardware). >> >> So just retrieve the values you are interested in and do the calculations >> in your application. That is, if you only want FOUR values, then write a >> query that returns only those FOUR values, and use those to computer the >> answer. While you can do all the calculations (including the Square Root) >> in SQL, it is likely to be horrendously slow. >> >> However, before you can do *ANYTHING* else, you need to be able to >> demonstrate that you can write a select statement that retrieves the values >> you want to work on, only the values you want to work on, and nothing but >> the values you want to work on. Having 47,000 values and sauing that you >> would like to find the standard deviation of some set of four of them is >> entertaining, but not useful in any way. >> >> -- >> The fact that there's a Highway to Hell but only a Stairway to Heaven says >> a lot about anticipated traffic volume. >> >>> -Original Message- >>> From: sqlite-users On >>> Behalf Of Bart Smissaert >>> Sent: Saturday, 12 October, 2019 08:09 >>> To: General Discussion of SQLite Database >> us...@mailinglists.sqlite.org> >>> Subject: [sqlite] Standard deviation last x entries >>> >>> How do I get the standard deviation of the last 4 entries (there could be >>> less than 4) of an integer column grouped by an integer ID entry in >>> another >>> column in the same table. >>> >>> So data could be like this: >>> >>> ID xValue >>> >>> 1 130 >>> 1 120 >>> 1 140 >>> 1 100 >>> 1 110 >>> 2 140 >>> 2 130 >>> 2 150 >>> >>> I know that SQLite doesn't have the SQRT function (and I can't make a UDF >>> as I am doing this in an Android phone app) but I can do this last step >>> in >>> code. >>> >>> RBS >>> ___ >>> sqlite-users mailing list >>> sqlite-users@mailinglists.sqlite.org >>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >> >> >> ___ >> sqlite-users mailing list >> sqlite-users@mailinglists.sqlite.org >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >> > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users -- Richard Damon ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
Thanks, I do know how to calculate the SD in code, but I thought in this particular case it might be faster to do this in SQL. Only problem is the square root and for that reason I will test this in code as well and see how it compares with SQL. I found a way to get the one from last step, so that is without doing the final square root step. Note that this has to do with a number of systolic blood pressure readings and I am actually taking the last 6 and that is useful as due to treatment the SD often will be less for the later values: UPDATE QR3PARAMS SET AVG_BP = (SELECT AVG(SYST) FROM (SELECT SYSTOLIC AS SYST FROM BP WHERE ID = QR3PARAMS.ID ORDER BY ENTRY_DATE DESC LIMIT 6)) UPDATE QR3PARAMS SET DEV_BP = (SELECT AVG(SYST * SYST) FROM (SELECT SYSTOLIC - QR3PARAMS.AVG_BP AS SYST FROM BP WHERE ID = QR3PARAMS.ID ORDER BY ENTRY_DATE DESC LIMIT 6)) RBS On Sat, Oct 12, 2019 at 4:21 PM Keith Medcalf wrote: > > The Standard Deviation of the Population is the Square Root of the mean of > the second order differences. > > For your input values, you calculate the mean. > Then you calculate the mean of the square of the difference between each > value and the mean of the values. > Then you take the square root of that. > > You can also compute it using a running calculation (generally more > accurate on IEEE754 hardware). > > So just retrieve the values you are interested in and do the calculations > in your application. That is, if you only want FOUR values, then write a > query that returns only those FOUR values, and use those to computer the > answer. While you can do all the calculations (including the Square Root) > in SQL, it is likely to be horrendously slow. > > However, before you can do *ANYTHING* else, you need to be able to > demonstrate that you can write a select statement that retrieves the values > you want to work on, only the values you want to work on, and nothing but > the values you want to work on. Having 47,000 values and sauing that you > would like to find the standard deviation of some set of four of them is > entertaining, but not useful in any way. > > -- > The fact that there's a Highway to Hell but only a Stairway to Heaven says > a lot about anticipated traffic volume. > > >-Original Message- > >From: sqlite-users On > >Behalf Of Bart Smissaert > >Sent: Saturday, 12 October, 2019 08:09 > >To: General Discussion of SQLite Database >us...@mailinglists.sqlite.org> > >Subject: [sqlite] Standard deviation last x entries > > > >How do I get the standard deviation of the last 4 entries (there could be > >less than 4) of an integer column grouped by an integer ID entry in > >another > >column in the same table. > > > >So data could be like this: > > > >ID xValue > > > >1 130 > >1 120 > >1 140 > >1 100 > >1 110 > >2 140 > >2 130 > >2 150 > > > >I know that SQLite doesn't have the SQRT function (and I can't make a UDF > >as I am doing this in an Android phone app) but I can do this last step > >in > >code. > > > >RBS > >___ > >sqlite-users mailing list > >sqlite-users@mailinglists.sqlite.org > >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
#!python3 from __future__ import absolute_import, division, print_function, unicode_literals import apsw db = apsw.Connection() db.executescript(""" create table x ( value integer not null ); insert into x values (120), (130), (140), (110); """) # Method 1: Using partial Running calc sumx = 0.0 sumx2 = 0.0 count = 0 for row in db.execute('select value from x'): sumx += row.value sumx2 += row.value ** 2 count += 1 sd1 = (sumx2 / count - (sumx / count) ** 2) ** 0.5 # Method 2: Using double retrieval brute force calc avgx = 0.0 avgm = 0.0 count = 0 for row in db.execute('select avg(value) as avg from x'): avgx = row.avg for row in db.execute('select value from x'): avgm += (row.value - avgx) ** 2 count += 1 sd2 = (avgm / count) ** 0.5 # Method 3: Using my extension which does full running calc for row in db.execute('select stdevp(value) as stdev from x'): sd3 = row.stdev # Method 4: Using mostly pure SQL for row in db.execute('select count(value) as count, sum(value) as sumx, sum(value*value) as sumx2 from x'): sd4 = (row.sumx2 / row.count - (row.sumx / row.count) ** 2) ** 0.5 # Method 5: Using even more calcs in SQL for row in db.execute('''select sumx2 / count - (sumx / count) * (sumx / count) as var from (select count(value) as count, sum(value) as sumx, sum(value*value) as sumx2 from x)'''): sd5 = row.var ** 0.5 print(sd1, sd2, sd3, sd4, sd5) => 11.180339887498949 11.180339887498949 11.180339887498949 11.180339887498949 11.180339887498949 You could even calculate the square root in a recursive CTE if you wanted and not have your application do anything at all other than submit the query. However, Method 1 is likely the most efficient (and easiest) to do assuming that you do not have an extension which calculates the aggregate, and is also likely to execute the fastest. -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users On >Behalf Of Richard Damon >Sent: Saturday, 12 October, 2019 09:33 >To: sqlite-users@mailinglists.sqlite.org >Subject: Re: [sqlite] Standard deviation last x entries > >On 10/12/19 11:23 AM, Richard Damon wrote: >> On 10/12/19 10:08 AM, Bart Smissaert wrote: >>> How do I get the standard deviation of the last 4 entries (there could >be >>> less than 4) of an integer column grouped by an integer ID entry in >another >>> column in the same table. >>> >>> So data could be like this: >>> >>> ID xValue >>> >>> 1 130 >>> 1 120 >>> 1 140 >>> 1 100 >>> 1 110 >>> 2 140 >>> 2 130 >>> 2 150 >>> >>> I know that SQLite doesn't have the SQRT function (and I can't make a >UDF >>> as I am doing this in an Android phone app) but I can do this last >step in >>> code. >>> >>> RBS >> I may not be the best at forming the SQL to do this, but here is the >> general idea I would use. First use a query with ORDER BY DECR and >LIMIT >> to get the last 4 items, and select the value, and a computer column of >> value2 = value * value. >> >> Then over this result, do a GROUP BY to compute N = count(value), S = >> sum(value), S2 =sum(value2) >> >> You can then compute the variance (standard deviation squared) from the >> equation >> >> Variance = S2 / N - (S / N) * (S / N) >> >> This equation assumes that you are working with the full population and >> not just a sample, if you data is to be considered a sample you need to >> make a small adjustment in the formula to >> >> Estimated Variance of Popultion = S2 / (N-1) - S * S / (N * (N-1)) >> >> >(Sorry about the messed up subject, reposting to fix it) > >-- >Richard Damon > >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
On 10/12/19 11:23 AM, Richard Damon wrote: > On 10/12/19 10:08 AM, Bart Smissaert wrote: >> How do I get the standard deviation of the last 4 entries (there could be >> less than 4) of an integer column grouped by an integer ID entry in another >> column in the same table. >> >> So data could be like this: >> >> ID xValue >> >> 1 130 >> 1 120 >> 1 140 >> 1 100 >> 1 110 >> 2 140 >> 2 130 >> 2 150 >> >> I know that SQLite doesn't have the SQRT function (and I can't make a UDF >> as I am doing this in an Android phone app) but I can do this last step in >> code. >> >> RBS > I may not be the best at forming the SQL to do this, but here is the > general idea I would use. First use a query with ORDER BY DECR and LIMIT > to get the last 4 items, and select the value, and a computer column of > value2 = value * value. > > Then over this result, do a GROUP BY to compute N = count(value), S = > sum(value), S2 =sum(value2) > > You can then compute the variance (standard deviation squared) from the > equation > > Variance = S2 / N - (S / N) * (S / N) > > This equation assumes that you are working with the full population and > not just a sample, if you data is to be considered a sample you need to > make a small adjustment in the formula to > > Estimated Variance of Popultion = S2 / (N-1) - S * S / (N * (N-1)) > > (Sorry about the messed up subject, reposting to fix it) -- Richard Damon ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
The Standard Deviation of the Population is the Square Root of the mean of the second order differences. For your input values, you calculate the mean. Then you calculate the mean of the square of the difference between each value and the mean of the values. Then you take the square root of that. You can also compute it using a running calculation (generally more accurate on IEEE754 hardware). So just retrieve the values you are interested in and do the calculations in your application. That is, if you only want FOUR values, then write a query that returns only those FOUR values, and use those to computer the answer. While you can do all the calculations (including the Square Root) in SQL, it is likely to be horrendously slow. However, before you can do *ANYTHING* else, you need to be able to demonstrate that you can write a select statement that retrieves the values you want to work on, only the values you want to work on, and nothing but the values you want to work on. Having 47,000 values and sauing that you would like to find the standard deviation of some set of four of them is entertaining, but not useful in any way. -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users On >Behalf Of Bart Smissaert >Sent: Saturday, 12 October, 2019 08:09 >To: General Discussion of SQLite Database us...@mailinglists.sqlite.org> >Subject: [sqlite] Standard deviation last x entries > >How do I get the standard deviation of the last 4 entries (there could be >less than 4) of an integer column grouped by an integer ID entry in >another >column in the same table. > >So data could be like this: > >ID xValue > >1 130 >1 120 >1 140 >1 100 >1 110 >2 140 >2 130 >2 150 > >I know that SQLite doesn't have the SQRT function (and I can't make a UDF >as I am doing this in an Android phone app) but I can do this last step >in >code. > >RBS >___ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
Sorry, I forgot to tell that. It is date column with an integer number. ID xValue xDate 1 130 40123 1 120 41232 1 140 40582 1 100 40888 1 110 42541 2 140 41225 2 130 41589 2 150 40872 RBS On Sat, Oct 12, 2019 at 3:18 PM Igor Tandetnik wrote: > On 10/12/2019 10:08 AM, Bart Smissaert wrote: > > How do I get the standard deviation of the last 4 entries (there could be > > less than 4) of an integer column grouped by an integer ID entry in > another > > column in the same table. > > What do you mean by "last 4 entries"? What determines the order? How does > one tell which entry is the first and which is the last? > -- > Igor Tandetnik > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Standard deviation last x entries
On 10/12/2019 10:08 AM, Bart Smissaert wrote: How do I get the standard deviation of the last 4 entries (there could be less than 4) of an integer column grouped by an integer ID entry in another column in the same table. What do you mean by "last 4 entries"? What determines the order? How does one tell which entry is the first and which is the last? -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users