Re: [HACKERS] Weighted Stats
On Fri, Mar 18, 2016 at 06:12:12PM -0700, Jeff Janes wrote: > On Tue, Mar 15, 2016 at 8:36 AM, David Fetter wrote: > > > > Please find attached a patch that uses the float8 version to cover the > > numeric types. > > Is there a well-defined meaning for having a negative weight? If no, > should it be disallowed? Done. > Shouldn't these then give the same result: > > select stddev_samp(val) from foo; > stddev_samp > --- > 2887.054977297105 > > select weighted_stddev_samp(val,count) from foo2; > weighted_stddev_samp > -- > 2887.19919651336 > > The 5th digit seems too early to be seeing round-off error. Fixed down-thread. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weighted Stats
On Fri, Apr 08, 2016 at 01:47:56PM -0700, David Fetter wrote: > Sorry about the delay. This patch disallows negative weights, > although it still has that odd difference Jeff found. Difference corrected. It turned out that my reference was mistaken on the calculation. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index cb7fe38..ba31d02 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -12750,6 +12750,29 @@ NULL baz(3 rows) +weighted_average + + +weighted_avg + + weighted_avg(value expression, weight expression) + + + smallint, int, + bigint, real, double + precision, numeric, or interval + + + numeric for any integer-type argument, + double precision for a floating-point argument, + otherwise the same as the argument data type + + the average (arithmetic mean) of all input values, weighted by the input weights + + + + + bit_and bit_and(expression) @@ -13430,6 +13453,29 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; +weighted standard deviation +population + + +weighted_stddev_pop + + weighted_stddev_pop(value expression, weight expression) + + + smallint, int, + bigint, real, double + precision, or numeric + + + double precision for floating-point arguments, + otherwise numeric + + weighted population standard deviation of the input values + + + + + standard deviation sample @@ -13454,6 +13500,29 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; +weighted standard deviation +sample + + +weighted_stddev_samp + + weighted_stddev_samp(value expression, weight expression) + + + smallint, int, + bigint, real, double + precision, or numeric + + + double precision for floating-point arguments, + otherwise numeric + + weighted sample standard deviation of the input values + + + + + variance variance(expression) diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c index c7c0b58..fe4a5e4 100644 --- a/src/backend/utils/adt/float.c +++ b/src/backend/utils/adt/float.c @@ -2405,6 +2405,7 @@ setseed(PG_FUNCTION_ARGS) * float8_accum- accumulate for AVG(), variance aggregates, etc. * float4_accum- same, but input data is float4 * float8_avg - produce final result for float AVG() + * float8_weighted_avg - produce final result for float WEIGHTED_AVG() * float8_var_samp - produce final result for float VAR_SAMP() * float8_var_pop - produce final result for float VAR_POP() * float8_stddev_samp - produce final result for float STDDEV_SAMP() @@ -3205,6 +3206,163 @@ float8_regr_intercept(PG_FUNCTION_ARGS) PG_RETURN_FLOAT8(numeratorXXY / numeratorX); } +/* + * === + * WEIGHTED AGGREGATES + * === + * + * The transition datatype for these aggregates is a 5-element array + * of float8, holding the values N, sum(W), sum(W*X), and sum(W*X*X) + * in that order. + * + * First, an accumulator function for those we can't pirate from the + * other accumulators. This accumulator function takes out some of + * the rounding error inherent in the general one. + * https://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods + * + * It consists of a five-element array which includes: + * + * N, the number of non-zero-weighted values seen thus far, + * W, the running sum of weights, + * WX, the running dot product of weights and values, + * A, an intermediate value used in the calculation, and + * Q, another intermediate value. + * + */ + +Datum +float8_weighted_accum(PG_FUNCTION_ARGS) +{ + ArrayType *transarray = PG_GETARG_ARRAYTYPE_P(0); + float8 newvalX = PG_GETARG_FLOAT8(1); + float8 newvalW = PG_GETARG_FLOAT8(2); + float8 *transvalues; + float8 N, /* common */ + W, /* common */ + WX, /* Used in avg */ + A, /* Used in stddev_* */ + Q; /* Used in stddev_* */ + + tr
Re: [HACKERS] Weighted Stats
On Sun, Mar 20, 2016 at 03:38:40PM -0700, David Fetter wrote: > On Sat, Mar 19, 2016 at 05:04:08PM +0100, Tomas Vondra wrote: > > Hi, > > > > On 03/19/2016 07:34 AM, David Fetter wrote: > > >On Fri, Mar 18, 2016 at 06:12:12PM -0700, Jeff Janes wrote: > > >>On Tue, Mar 15, 2016 at 8:36 AM, David Fetter wrote: > > >>> > > >>>Please find attached a patch that uses the float8 version to cover the > > >>>numeric types. > > >> > > >>Is there a well-defined meaning for having a negative weight? If no, > > >>should it be disallowed? > > > > > >Opinions on this appear to vary. A Wikipedia article defines weights > > >as non-negative, while a manual to which it refers only uses non-zero. > > > > > >https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Mathematical_definition > > >https://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html > > > > I don't think that actually allows negative weights. It says that > > > >w_i = 1/\sigma_i^2 > > > > and variance is always > 0, thus w_i > 0. The zero is used as a special flag > > to remove the sample from the data set in a simple way. > > > > >I'm not sure which if either would be authoritative, but I could > > >certainly make up variants for each assumption. > > > > > >The assumption they have in common about weights is that a zero > > >weight is not part of the calculation, which assumption is > > >implemented in the previously submitted code. > > > > I think that if we're not sure what should happen with negative weights, > > then we should disallow them. It's easy to allow them later once we have a > > reasonable definition, but if we allow them now and later realize it should > > behave differently, we'll be in trouble because of breaking existing uses. > > OK > > > I can't really come up with a reasonable example that would actually use > > negative weights. Can you? That would probably help with defining the > > behavior correctly. > > No, but I'm not a statistician. I've seen them mentioned in contexts > that appear to be discussions among same, and again opinions vary. > > > Allowing negative weights has other consequences. For example, what if > > sum(W) ends up being 0? For example > > > > CREATE TABLE t (a float, b float); > > INSERT INTO t SELECT i, 1 FROM generate_series(1,1000) s(i); > > INSERT INTO t SELECT i, -1 FROM generate_series(1,1000) s(i); > > > > SELECT weighted_avg(a,b) FROM t; > > weighted_avg > > -- > > NaN > > (1 row) > > > > Is that the correct behavior? Why? > > It's not, and you're right. > > I will send a patch that disallows negative weights this evening or > tomorrow. It will be slightly more complicated as I believe I will > need to create a new accumulator function for the weighted_avg() case > where I had been using an extant one before. > > Cheers, > David. Sorry about the delay. This patch disallows negative weights, although it still has that odd difference Jeff found. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 88145c5..b939340 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -12750,6 +12750,29 @@ NULL baz(3 rows) +weighted_average + + +weighted_avg + + weighted_avg(value expression, weight expression) + + + smallint, int, + bigint, real, double + precision, numeric, or interval + + + numeric for any integer-type argument, + double precision for a floating-point argument, + otherwise the same as the argument data type + + the average (arithmetic mean) of all input values, weighted by the input weights + + + + + bit_and bit_and(expression) @@ -13430,6 +13453,29 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; +weighted standard deviation +population + + +weighted_stddev_pop + + weighted_stddev_pop(value expression, weight expression) + + + smallint, int, + bigint, real, double + precision, or numeric + + + double precision for floating-point arguments, + otherwise numeric + + weighted population standard deviation of the input values + + + + + standard deviation sample @@ -13454,6 +13500,29 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; +weighted standard deviation +sample + + +weighted_stddev_samp + + weighted_stddev_samp(value expression, weight expression) + + +
Re: [HACKERS] Weighted Stats
On Sat, Mar 19, 2016 at 05:04:08PM +0100, Tomas Vondra wrote: > Hi, > > On 03/19/2016 07:34 AM, David Fetter wrote: > >On Fri, Mar 18, 2016 at 06:12:12PM -0700, Jeff Janes wrote: > >>On Tue, Mar 15, 2016 at 8:36 AM, David Fetter wrote: > >>> > >>>Please find attached a patch that uses the float8 version to cover the > >>>numeric types. > >> > >>Is there a well-defined meaning for having a negative weight? If no, > >>should it be disallowed? > > > >Opinions on this appear to vary. A Wikipedia article defines weights > >as non-negative, while a manual to which it refers only uses non-zero. > > > >https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Mathematical_definition > >https://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html > > I don't think that actually allows negative weights. It says that > >w_i = 1/\sigma_i^2 > > and variance is always > 0, thus w_i > 0. The zero is used as a special flag > to remove the sample from the data set in a simple way. > > >I'm not sure which if either would be authoritative, but I could > >certainly make up variants for each assumption. > > > >The assumption they have in common about weights is that a zero > >weight is not part of the calculation, which assumption is > >implemented in the previously submitted code. > > I think that if we're not sure what should happen with negative weights, > then we should disallow them. It's easy to allow them later once we have a > reasonable definition, but if we allow them now and later realize it should > behave differently, we'll be in trouble because of breaking existing uses. OK > I can't really come up with a reasonable example that would actually use > negative weights. Can you? That would probably help with defining the > behavior correctly. No, but I'm not a statistician. I've seen them mentioned in contexts that appear to be discussions among same, and again opinions vary. > Allowing negative weights has other consequences. For example, what if > sum(W) ends up being 0? For example > > CREATE TABLE t (a float, b float); > INSERT INTO t SELECT i, 1 FROM generate_series(1,1000) s(i); > INSERT INTO t SELECT i, -1 FROM generate_series(1,1000) s(i); > > SELECT weighted_avg(a,b) FROM t; > weighted_avg > -- > NaN > (1 row) > > Is that the correct behavior? Why? It's not, and you're right. I will send a patch that disallows negative weights this evening or tomorrow. It will be slightly more complicated as I believe I will need to create a new accumulator function for the weighted_avg() case where I had been using an extant one before. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weighted Stats
Hi, On 03/19/2016 07:34 AM, David Fetter wrote: On Fri, Mar 18, 2016 at 06:12:12PM -0700, Jeff Janes wrote: On Tue, Mar 15, 2016 at 8:36 AM, David Fetter wrote: Please find attached a patch that uses the float8 version to cover the numeric types. Is there a well-defined meaning for having a negative weight? If no, should it be disallowed? Opinions on this appear to vary. A Wikipedia article defines weights as non-negative, while a manual to which it refers only uses non-zero. https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Mathematical_definition https://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html I don't think that actually allows negative weights. It says that w_i = 1/\sigma_i^2 and variance is always > 0, thus w_i > 0. The zero is used as a special flag to remove the sample from the data set in a simple way. I'm not sure which if either would be authoritative, but I could certainly make up variants for each assumption. The assumption they have in common about weights is that a zero weight is not part of the calculation, which assumption is implemented in the previously submitted code. I think that if we're not sure what should happen with negative weights, then we should disallow them. It's easy to allow them later once we have a reasonable definition, but if we allow them now and later realize it should behave differently, we'll be in trouble because of breaking existing uses. I can't really come up with a reasonable example that would actually use negative weights. Can you? That would probably help with defining the behavior correctly. Allowing negative weights has other consequences. For example, what if sum(W) ends up being 0? For example CREATE TABLE t (a float, b float); INSERT INTO t SELECT i, 1 FROM generate_series(1,1000) s(i); INSERT INTO t SELECT i, -1 FROM generate_series(1,1000) s(i); SELECT weighted_avg(a,b) FROM t; weighted_avg -- NaN (1 row) Is that the correct behavior? Why? So -1 to allowing negative weights, unless we can come up with proper definition or at least good examples demonstrating the usefulness. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weighted Stats
On Fri, Mar 18, 2016 at 11:34 PM, David Fetter wrote: > On Fri, Mar 18, 2016 at 06:12:12PM -0700, Jeff Janes wrote: >> Also, I think it might not give the correct answer even without >> negative weights: >> >> create table foo as select floor(random()*1)::int val from >> generate_series(1,1000); >> >> create table foo2 as select val, count(*) from foo group by val; >> >> Shouldn't these then give the same result: >> >> select stddev_samp(val) from foo; >> stddev_samp >> --- >> 2887.054977297105 >> >> select weighted_stddev_samp(val,count) from foo2; >> weighted_stddev_samp >> -- >> 2887.19919651336 >> >> The 5th digit seems too early to be seeing round-off error. > > Please pardon me if I've misunderstood, but you appear to be assuming > that > > SELECT val, count(*) FROM foo GROUP BY val > > will produce precisely identical count(*)s at each row, which it > overwhelmingly likely won't, producing the difference you see above. I think the count for each val that gets put in foo2.count should be the same as the weight of that val as it occurs in foo. Surely they shouldn't all have the same weight in foo2, unless they all have the same number of appearances in foo. Which, as you say, they are not likely to. But still, the foo2.count that they do individually get should be equal to their weight, shouldn't it? The other two methods (*avg and *stddev_pop) do give the same answers using the two different methods (unweighted against foo, weighted against foo2) Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weighted Stats
On Fri, Mar 18, 2016 at 06:12:12PM -0700, Jeff Janes wrote: > On Tue, Mar 15, 2016 at 8:36 AM, David Fetter wrote: > > > > Please find attached a patch that uses the float8 version to cover the > > numeric types. > > Is there a well-defined meaning for having a negative weight? If no, > should it be disallowed? Opinions on this appear to vary. A Wikipedia article defines weights as non-negative, while a manual to which it refers only uses non-zero. https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Mathematical_definition https://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html I'm not sure which if either would be authoritative, but I could certainly make up variants for each assumption. The assumption they have in common about weights is that a zero weight is not part of the calculation, which assumption is implemented in the previously submitted code. > I don't know what I was expecting, but not this: > > select weighted_avg(x,1000-2*x) from generate_series(1,1000) f(x); >weighted_avg > -- > 1671666717.1 I'm guessing that negative weights can cause bizarre outcomes, assuming it turns out we should allow them. > Also, I think it might not give the correct answer even without > negative weights: > > create table foo as select floor(random()*1)::int val from > generate_series(1,1000); > > create table foo2 as select val, count(*) from foo group by val; > > Shouldn't these then give the same result: > > select stddev_samp(val) from foo; > stddev_samp > --- > 2887.054977297105 > > select weighted_stddev_samp(val,count) from foo2; > weighted_stddev_samp > -- > 2887.19919651336 > > The 5th digit seems too early to be seeing round-off error. Please pardon me if I've misunderstood, but you appear to be assuming that SELECT val, count(*) FROM foo GROUP BY val will produce precisely identical count(*)s at each row, which it overwhelmingly likely won't, producing the difference you see above. What have I misunderstood? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weighted Stats
On Tue, Mar 15, 2016 at 8:36 AM, David Fetter wrote: > > Please find attached a patch that uses the float8 version to cover the > numeric types. Is there a well-defined meaning for having a negative weight? If no, should it be disallowed? I don't know what I was expecting, but not this: select weighted_avg(x,1000-2*x) from generate_series(1,1000) f(x); weighted_avg -- 1671666717.1 Also, I think it might not give the correct answer even without negative weights: create table foo as select floor(random()*1)::int val from generate_series(1,1000); create table foo2 as select val, count(*) from foo group by val; Shouldn't these then give the same result: select stddev_samp(val) from foo; stddev_samp --- 2887.054977297105 select weighted_stddev_samp(val,count) from foo2; weighted_stddev_samp -- 2887.19919651336 The 5th digit seems too early to be seeing round-off error. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weighted Stats
On Fri, Jan 08, 2016 at 04:37:36PM +1100, Haribabu Kommi wrote: > On Mon, Dec 21, 2015 at 1:50 PM, David Fetter wrote: > > On Sun, Dec 20, 2015 at 06:13:33PM -0600, Jim Nasby wrote: > >> On 11/2/15 5:46 PM, David Fetter wrote: > >> >I'd like to add weighted statistics to PostgreSQL > >> > >> Anything happen with this? If community isn't interested, ISTM it'd be good > >> to put this in PGXN. > > > > I think it's already in PGXN as an extension, and I'll get another > > version out this early this week, as it involves mostly adding some > > tests. > > > > I'll do the float8 ones for core this week, too, and unless there's a > > really great reason to do more data types on the first pass, it should > > be in committable shape. > > I reviewed the patch, following are my observations. > > 1. + precision, numeric, or interval > > with interval type it is giving problem. As interval data type is not > supported, > so remove it in the list of supported inputs. Done. > > 2. +float8_weighted_avg(PG_FUNCTION_ARGS) > > It will be helpful, if you provide some information as a function header, > how the weighted average is calculated similar like other weighted functions. Done. > 3. + transvalues = check_float8_array(transarray, > "float8_weighted_stddev_accum", 4); > > The second parameter to check_float8_array should be "float8_weighted_accum". Done. > 4. There is an OID conflict of 4066 with latest master code. Fixed. > 5.+ A += newvalW * ( newvalX - transvalues[2] ) / W; > + CHECKFLOATVAL(A, isinf(newvalW) || isinf(newvalX - transvalues[2]) > || isinf(1.0/W), true); > > + Q += newvalW * (newvalX - transvalues[2]) * (newvalX - A); > + CHECKFLOATVAL(A, isinf(newvalX - transvalues[3]) || isinf(newvalX - > A) || isinf(1.0/W), true); > > > Is the need of calculation also needs to be passed to CHECKFLOATVAL? > Just passing > the variables involved in the calculation isn't enough? If expressions > are required then > it should be something as follows? > > CHECKFLOATVAL(A, isinf(transvalues[2]) || isinf(newvalW) || > isinf(newvalX - transvalues[2]) || isinf(1.0/W), true); > > CHECKFLOATVAL(Q, isinf(transvalues[3]) || isinf(newvalX - > transvalues[2]) || isinf(newvalX - A) || isinf(1.0/W), true); Done. Please find attached a patch that uses the float8 version to cover the numeric types. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 000489d..09ada7e 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -12645,6 +12645,29 @@ NULL baz(3 rows) +weighted_average + + +weighted_avg + + weighted_avg(value expression, weight expression) + + + smallint, int, + bigint, real, double + precision, numeric, or interval + + + numeric for any integer-type argument, + double precision for a floating-point argument, + otherwise the same as the argument data type + + the average (arithmetic mean) of all input values, weighted by the input weights + + + + + bit_and bit_and(expression) @@ -13288,6 +13311,29 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; +weighted standard deviation +population + + +weighted_stddev_pop + + weighted_stddev_pop(value expression, weight expression) + + + smallint, int, + bigint, real, double + precision, or numeric + + + double precision for floating-point arguments, + otherwise numeric + + weighted population standard deviation of the input values + + + + + standard deviation sample @@ -13311,6 +13357,29 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; +weighted standard deviation +sample + + +weighted_stddev_samp + + weighted_stddev_samp(value expression, weight expression) + + + smallint, int, + bigint, real, double + precision, or numeric + + + double precision for floating-point arguments, + otherwise numeric + + weighted sample standard deviation of the input values + + + + + variance variance(expression) diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c index d4e5d55..c03f712 100644 --- a/src/backend/utils/adt/float.c +++ b/src/backend/utils/adt/float.c @@ -2364,6 +2364,7 @@ setseed(PG_FUNCTION_ARGS) * float8_accum- accumulate for AVG(), var
Re: [HACKERS] Weighted Stats
I'm closing this for the current commitfest as returned-with-feedback. Please resubmit for the 2016-03 CF once you have it. Thanks! -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weighted Stats
On Fri, Jan 08, 2016 at 04:37:36PM +1100, Haribabu Kommi wrote: > On Mon, Dec 21, 2015 at 1:50 PM, David Fetter wrote: > > On Sun, Dec 20, 2015 at 06:13:33PM -0600, Jim Nasby wrote: > >> On 11/2/15 5:46 PM, David Fetter wrote: > >> >I'd like to add weighted statistics to PostgreSQL > >> > >> Anything happen with this? If community isn't interested, ISTM it'd be good > >> to put this in PGXN. > > > > I think it's already in PGXN as an extension, and I'll get another > > version out this early this week, as it involves mostly adding some > > tests. > > > > I'll do the float8 ones for core this week, too, and unless there's a > > really great reason to do more data types on the first pass, it should > > be in committable shape. > > I reviewed the patch, following are my observations. > > 1. + precision, numeric, or interval > > with interval type it is giving problem. As interval data type is not > supported, > so remove it in the list of supported inputs. I'd meant to add more, but will make sure that the next version documents exactly the types it supports. > postgres=# select weighted_avg(f7,f1) from tbl; > ERROR: function weighted_avg(interval, smallint) does not exist at character > 8 > HINT: No function matches the given name and argument types. You > might need to add explicit type casts. > > > 2. +float8_weighted_avg(PG_FUNCTION_ARGS) > > It will be helpful, if you provide some information as a function header, > how the weighted average is calculated similar like other weighted functions. Will do. > 3. + transvalues = check_float8_array(transarray, > "float8_weighted_stddev_accum", 4); > > The second parameter to check_float8_array should be "float8_weighted_accum". Oops. Will fix. > 4. There is an OID conflict of 4066 with latest master code. Will fix. > 5.+ A += newvalW * ( newvalX - transvalues[2] ) / W; > + CHECKFLOATVAL(A, isinf(newvalW) || isinf(newvalX - transvalues[2]) > || isinf(1.0/W), true); > > + Q += newvalW * (newvalX - transvalues[2]) * (newvalX - A); > + CHECKFLOATVAL(A, isinf(newvalX - transvalues[3]) || isinf(newvalX - > A) || isinf(1.0/W), true); > > > Is the need of calculation also needs to be passed to CHECKFLOATVAL? > Just passing > the variables involved in the calculation isn't enough? If expressions > are required then > it should be something as follows? > > CHECKFLOATVAL(A, isinf(transvalues[2]) || isinf(newvalW) || > isinf(newvalX - transvalues[2]) || isinf(1.0/W), true); > > CHECKFLOATVAL(Q, isinf(transvalues[3]) || isinf(newvalX - > transvalues[2]) || isinf(newvalX - A) || isinf(1.0/W), true); Will fix. > I verified the stddev transition and final function calculations > according to wikipedia > and they are fine. Thanks for reviewing this! Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weighted Stats
On Mon, Dec 21, 2015 at 1:50 PM, David Fetter wrote: > On Sun, Dec 20, 2015 at 06:13:33PM -0600, Jim Nasby wrote: >> On 11/2/15 5:46 PM, David Fetter wrote: >> >I'd like to add weighted statistics to PostgreSQL >> >> Anything happen with this? If community isn't interested, ISTM it'd be good >> to put this in PGXN. > > I think it's already in PGXN as an extension, and I'll get another > version out this early this week, as it involves mostly adding some > tests. > > I'll do the float8 ones for core this week, too, and unless there's a > really great reason to do more data types on the first pass, it should > be in committable shape. I reviewed the patch, following are my observations. 1. + precision, numeric, or interval with interval type it is giving problem. As interval data type is not supported, so remove it in the list of supported inputs. postgres=# select weighted_avg(f7,f1) from tbl; ERROR: function weighted_avg(interval, smallint) does not exist at character 8 HINT: No function matches the given name and argument types. You might need to add explicit type casts. 2. +float8_weighted_avg(PG_FUNCTION_ARGS) It will be helpful, if you provide some information as a function header, how the weighted average is calculated similar like other weighted functions. 3. + transvalues = check_float8_array(transarray, "float8_weighted_stddev_accum", 4); The second parameter to check_float8_array should be "float8_weighted_accum". 4. There is an OID conflict of 4066 with latest master code. 5.+ A += newvalW * ( newvalX - transvalues[2] ) / W; + CHECKFLOATVAL(A, isinf(newvalW) || isinf(newvalX - transvalues[2]) || isinf(1.0/W), true); + Q += newvalW * (newvalX - transvalues[2]) * (newvalX - A); + CHECKFLOATVAL(A, isinf(newvalX - transvalues[3]) || isinf(newvalX - A) || isinf(1.0/W), true); Is the need of calculation also needs to be passed to CHECKFLOATVAL? Just passing the variables involved in the calculation isn't enough? If expressions are required then it should be something as follows? CHECKFLOATVAL(A, isinf(transvalues[2]) || isinf(newvalW) || isinf(newvalX - transvalues[2]) || isinf(1.0/W), true); CHECKFLOATVAL(Q, isinf(transvalues[3]) || isinf(newvalX - transvalues[2]) || isinf(newvalX - A) || isinf(1.0/W), true); I verified the stddev transition and final function calculations according to wikipedia and they are fine. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weighted Stats
On Sun, Dec 20, 2015 at 06:13:33PM -0600, Jim Nasby wrote: > On 11/2/15 5:46 PM, David Fetter wrote: > >I'd like to add weighted statistics to PostgreSQL > > Anything happen with this? If community isn't interested, ISTM it'd be good > to put this in PGXN. I think it's already in PGXN as an extension, and I'll get another version out this early this week, as it involves mostly adding some tests. I'll do the float8 ones for core this week, too, and unless there's a really great reason to do more data types on the first pass, it should be in committable shape. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Weighted Stats
On 11/2/15 5:46 PM, David Fetter wrote: I'd like to add weighted statistics to PostgreSQL Anything happen with this? If community isn't interested, ISTM it'd be good to put this in PGXN. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers