Re: Does rewriteTargetListIU still need to add UPDATE tlist entries?

2021-04-26 Thread Dean Rasheed
On Mon, 26 Apr 2021 at 15:09, Tom Lane wrote: > > Thanks for looking at that. On reflection I think this must be so, > because those rewriter mechanisms were designed long before we had > trigger-updatable views, and rewriteTargetListIU has never added > tlist items like this for any other sort

Re: pgbench - add pseudo-random permutation function

2021-04-06 Thread Dean Rasheed
On Mon, 5 Apr 2021 at 13:07, Fabien COELHO wrote: > > Attached a v28 which I hope fixes the many above issues and stays with > ints. The randu64 is still some kind of a joke, I artificially reduced the > cost by calling jrand48 once and extending it to 64 bits, so it could give > an idea of the

pgsql: pgbench: Function to generate random permutations.

2021-04-06 Thread Dean Rasheed
pgbench: Function to generate random permutations. This adds a new function, permute(), that generates pseudorandom permutations of arbitrary sizes. This can be used to randomly shuffle a set of values to remove unwanted correlations. For example, permuting the output from a non-uniform random

Re: pgbench - add pseudo-random permutation function

2021-04-04 Thread Dean Rasheed
On Fri, 2 Apr 2021 at 06:38, Fabien COELHO wrote: > > >> r = (uint64) (pg_erand48(random_state.xseed) * size); > >> > >> I do not understand why the random values are multiplied by anything in > >> the first place… > > > > These are just random integers in the range [0,mask] and [0,size-1], > >

Re: pgbench - add pseudo-random permutation function

2021-04-01 Thread Dean Rasheed
On Wed, 31 Mar 2021 at 18:53, Fabien COELHO wrote: > > While looking at it, I have some doubts on this part: > > m = (uint64) (pg_erand48(random_state.xseed) * (mask + 1)) | 1; > r = (uint64) (pg_erand48(random_state.xseed) * (mask + 1)); > r = (uint64) (pg_erand48(random_state.xseed) *

Re: pgbench - add pseudo-random permutation function

2021-03-31 Thread Dean Rasheed
On Wed, 31 Mar 2021 at 09:02, Fabien COELHO wrote: > > >> First, I have a thing against erand48. > > > Also, there is a 64 bits seed provided to the function which instantly > ignores 16 of them, which looks pretty silly to me. > Yeah, that was copied from set_random_seed(). > At least, I

Re: pgbench - add pseudo-random permutation function

2021-03-30 Thread Dean Rasheed
On Tue, 30 Mar 2021 at 20:31, Dean Rasheed wrote: > > Yeah, that's probably a fair point. However, all the existing pgbench > random functions are using it, so I think it's fair enough for > permute() to do the same (and actually 2^48 is pretty huge). Switching > to a 64

Re: pgbench - add pseudo-random permutation function

2021-03-30 Thread Dean Rasheed
On Tue, 30 Mar 2021 at 19:26, Fabien COELHO wrote: > > First, I have a thing against erand48. Yeah, that's probably a fair point. However, all the existing pgbench random functions are using it, so I think it's fair enough for permute() to do the same (and actually 2^48 is pretty huge).

Re: pgbench - add pseudo-random permutation function

2021-03-30 Thread Dean Rasheed
On Mon, 22 Mar 2021 at 13:43, Dean Rasheed wrote: > > On Sun, 14 Mar 2021 at 16:08, Fabien COELHO wrote: > > > > > My main question on this now is, do you have a scholar reference for > > > this algorithm? > > > > Nope, otherwise I would

Re: PoC/WIP: Extended statistics on expressions

2021-03-26 Thread Dean Rasheed
On Thu, 25 Mar 2021 at 19:59, Tomas Vondra wrote: > > Attached is an updated patch series, with all the changes discussed > here. I've cleaned up the ndistinct stuff a bit more (essentially > reverting back from GroupExprInfo to GroupVarInfo name), and got rid of > the

Re: PoC/WIP: Extended statistics on expressions

2021-03-25 Thread Dean Rasheed
On Thu, 25 Mar 2021 at 00:05, Tomas Vondra wrote: > > here's an updated patch. 0001 The change to the way that CreateStatistics() records dependencies isn't quite right -- recordDependencyOnSingleRelExpr() will not create any dependencies if the expression uses only a whole-row Var. However,

Re: PoC/WIP: Extended statistics on expressions

2021-03-25 Thread Dean Rasheed
On Thu, 25 Mar 2021 at 00:05, Tomas Vondra wrote: > > Actually, I think we need that block at all - there's no point in > keeping the exact expression, because if there was a statistics matching > it it'd be matched by the examine_variable. So if we get here, we have > to just split it into the

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Dean Rasheed
On Wed, 24 Mar 2021 at 16:48, Tomas Vondra wrote: > > As for the changes proposed in the create_statistics, do we really want > to use univariate / multivariate there? Yes, the terms are correct, but > I'm not sure how many people looking at CREATE STATISTICS will > understand them. > Hmm, I

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Dean Rasheed
On Wed, 24 Mar 2021 at 14:48, Tomas Vondra wrote: > > AFAIK the primary issue here is that the two places disagree. While > estimate_num_groups does this > > varnos = pull_varnos(root, (Node *) varshere); > if (bms_membership(varnos) == BMS_SINGLETON) > { ... } > > the

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Dean Rasheed
On Wed, 24 Mar 2021 at 10:22, Tomas Vondra wrote: > > Thanks, it seems to be some thinko in handling in PlaceHolderVars, which > seem to break the code's assumptions about varnos. This fixes it for me, > but I need to look at it more closely. > I think that makes sense. Reviewing the docs, I

Re: pgbench - add pseudo-random permutation function

2021-03-22 Thread Dean Rasheed
On Sun, 14 Mar 2021 at 16:08, Fabien COELHO wrote: > > > My main question on this now is, do you have a scholar reference for > > this algorithm? > > Nope, otherwise I would have put a reference. I'm a scholar though, if > it helps:-) > > I could not find any algorithm that fitted the bill. The

Re: PoC/WIP: Extended statistics on expressions

2021-03-18 Thread Dean Rasheed
On Wed, 17 Mar 2021 at 21:31, Tomas Vondra wrote: > > I agree applying at least the [(a+b),c] stats is probably the right > approach, as it means we're considering at least the available > information about dependence between the columns. > > I think to improve this, we'll need to teach the code

Re: PoC/WIP: Extended statistics on expressions

2021-03-17 Thread Dean Rasheed
On Wed, 17 Mar 2021 at 20:48, Dean Rasheed wrote: > > For reference, here is the test case I was using (which isn't really very > good for > catching dependence between columns): > And here's a test case with much more dependence between the columns: DROP TABLE IF EXISTS foo; C

Re: PoC/WIP: Extended statistics on expressions

2021-03-17 Thread Dean Rasheed
On Wed, 17 Mar 2021 at 19:07, Tomas Vondra wrote: > > On 3/17/21 7:54 PM, Dean Rasheed wrote: > > > > it might have been better to estimate the first case as > > > > ndistinct((a+b)) * ndistinct(c) * ndistinct(d) > > > > and the second case as >

Re: PoC/WIP: Extended statistics on expressions

2021-03-17 Thread Dean Rasheed
On Wed, 17 Mar 2021 at 17:26, Tomas Vondra wrote: > > My concern is that the current behavior (where we prefer expression > stats over multi-column stats to some extent) works fine as long as the > parts are independent, but once there's dependency it's probably more > likely to produce

Re: PoC/WIP: Extended statistics on expressions

2021-03-17 Thread Dean Rasheed
On Sun, 7 Mar 2021 at 21:10, Tomas Vondra wrote: > > 2) ndistinct > > There's one thing that's bugging me, in how we handle "partial" matches. > For each expression we track both the original expression and the Vars > we extract from it. If we can't find a statistics matching the whole >

Re: pgbench - add pseudo-random permutation function

2021-03-11 Thread Dean Rasheed
On Thu, 11 Mar 2021 at 19:06, David Bowen wrote: > > The algorithm for generating a random permutation with a uniform distribution > across all permutations is easy: > for (i=0; iswap a[n-i] with a[rand(n-i+1)] > } > > where 0 <= rand(x) < x and a[i] is initially i (see Knuth, Section 3.4.2

Re: pgbench - add pseudo-random permutation function

2021-03-11 Thread Dean Rasheed
On Thu, 11 Mar 2021 at 00:58, Bruce Momjian wrote: > > Maybe Dean Rasheed can help because of his math background --- CC'ing him. > Reading the thread I can see how such a function might be useful to scatter non-uniformly random values. The implementation looks plausible too, thoug

Re: PoC/WIP: Extended statistics on expressions

2021-03-05 Thread Dean Rasheed
On Thu, 4 Mar 2021 at 22:16, Tomas Vondra wrote: > > Attached is a slightly improved version of the patch series, addressing > most of the issues raised in the previous message. Cool. Sorry for the delay replying. > 0003-Extended-statistics-on-expressions-20210304.patch > > Mostly unchanged,

Re: PoC/WIP: Extended statistics on expressions

2021-01-22 Thread Dean Rasheed
On Fri, 22 Jan 2021 at 04:46, Justin Pryzby wrote: > > I think you'll maybe have to do something better - this seems a bit too weird: > > | postgres=# CREATE STATISTICS s2 ON (i+1) ,i FROM t; > | postgres=# \d t > | ... > | "public"."s2" (ndistinct, dependencies, mcv) ON i FROM t > I guess

Re: PoC/WIP: Extended statistics on expressions

2021-01-21 Thread Dean Rasheed
On Tue, 19 Jan 2021 at 01:57, Tomas Vondra wrote: > > > A slightly bigger issue that I don't like is the way it assigns > > attribute numbers for expressions starting from > > MaxHeapAttributeNumber+1, so the first expression has an attnum of > > 1601. That leads to pretty inefficient use of

Re: PoC/WIP: Extended statistics on expressions

2021-01-18 Thread Dean Rasheed
Looking through extended_stats.c, I found a corner case that can lead to a seg-fault: CREATE TABLE foo(); CREATE STATISTICS s ON (1) FROM foo; ANALYSE foo; This crashes in lookup_var_attr_stats(), because it isn't expecting nvacatts to be 0. I can't think of any case where building stats on a

Re: PoC/WIP: Extended statistics on expressions

2021-01-07 Thread Dean Rasheed
Starting to look at the planner code, I found an oversight in the way expression stats are read at the start of planning -- it is necessary to call ChangeVarNodes() on any expressions if the relid isn't 1, otherwise the stats expressions may contain Var nodes referring to the wrong relation.

Re: PoC/WIP: Extended statistics on expressions

2021-01-06 Thread Dean Rasheed
Looking over the statscmds.c changes, there are a few XXX's and FIXME's that need resolving, and I had a couple of other minor comments: + /* +* An expression using mutable functions is probably wrong, +* since if you aren't going to get the same result for the +

Re: PoC/WIP: Extended statistics on expressions

2021-01-05 Thread Dean Rasheed
On Tue, 5 Jan 2021 at 00:45, Tomas Vondra wrote: > > On 1/4/21 4:34 PM, Dean Rasheed wrote: > > > > * In src/bin/psql/describe.c, I think the \d output should also > > exclude the "expressions" stats kind and just list the other kinds (or > > have no kinds

pgsql: Add an explicit cast to double when using fabs().

2021-01-05 Thread Dean Rasheed
Add an explicit cast to double when using fabs(). Commit bc43b7c2c0 used fabs() directly on an int variable, which apparently requires an explicit cast on some platforms. Per buildfarm. Branch -- REL_11_STABLE Details ---

pgsql: Add an explicit cast to double when using fabs().

2021-01-05 Thread Dean Rasheed
Add an explicit cast to double when using fabs(). Commit bc43b7c2c0 used fabs() directly on an int variable, which apparently requires an explicit cast on some platforms. Per buildfarm. Branch -- master Details ---

pgsql: Add an explicit cast to double when using fabs().

2021-01-05 Thread Dean Rasheed
Add an explicit cast to double when using fabs(). Commit bc43b7c2c0 used fabs() directly on an int variable, which apparently requires an explicit cast on some platforms. Per buildfarm. Branch -- REL_13_STABLE Details ---

pgsql: Add an explicit cast to double when using fabs().

2021-01-05 Thread Dean Rasheed
Add an explicit cast to double when using fabs(). Commit bc43b7c2c0 used fabs() directly on an int variable, which apparently requires an explicit cast on some platforms. Per buildfarm. Branch -- REL_12_STABLE Details ---

pgsql: Add an explicit cast to double when using fabs().

2021-01-05 Thread Dean Rasheed
Add an explicit cast to double when using fabs(). Commit bc43b7c2c0 used fabs() directly on an int variable, which apparently requires an explicit cast on some platforms. Per buildfarm. Branch -- REL9_6_STABLE Details ---

pgsql: Add an explicit cast to double when using fabs().

2021-01-05 Thread Dean Rasheed
Add an explicit cast to double when using fabs(). Commit bc43b7c2c0 used fabs() directly on an int variable, which apparently requires an explicit cast on some platforms. Per buildfarm. Branch -- REL_10_STABLE Details ---

pgsql: Fix numeric_power() when the exponent is INT_MIN.

2021-01-05 Thread Dean Rasheed
Fix numeric_power() when the exponent is INT_MIN. In power_var_int(), the computation of the number of significant digits to use in the computation used log(Abs(exp)), which isn't safe because Abs(exp) returns INT_MIN when exp is INT_MIN. Use fabs() instead of Abs(), so that the exponent is cast

pgsql: Fix numeric_power() when the exponent is INT_MIN.

2021-01-05 Thread Dean Rasheed
Fix numeric_power() when the exponent is INT_MIN. In power_var_int(), the computation of the number of significant digits to use in the computation used log(Abs(exp)), which isn't safe because Abs(exp) returns INT_MIN when exp is INT_MIN. Use fabs() instead of Abs(), so that the exponent is cast

pgsql: Fix numeric_power() when the exponent is INT_MIN.

2021-01-05 Thread Dean Rasheed
Fix numeric_power() when the exponent is INT_MIN. In power_var_int(), the computation of the number of significant digits to use in the computation used log(Abs(exp)), which isn't safe because Abs(exp) returns INT_MIN when exp is INT_MIN. Use fabs() instead of Abs(), so that the exponent is cast

pgsql: Fix numeric_power() when the exponent is INT_MIN.

2021-01-05 Thread Dean Rasheed
Fix numeric_power() when the exponent is INT_MIN. In power_var_int(), the computation of the number of significant digits to use in the computation used log(Abs(exp)), which isn't safe because Abs(exp) returns INT_MIN when exp is INT_MIN. Use fabs() instead of Abs(), so that the exponent is cast

pgsql: Fix numeric_power() when the exponent is INT_MIN.

2021-01-05 Thread Dean Rasheed
Fix numeric_power() when the exponent is INT_MIN. In power_var_int(), the computation of the number of significant digits to use in the computation used log(Abs(exp)), which isn't safe because Abs(exp) returns INT_MIN when exp is INT_MIN. Use fabs() instead of Abs(), so that the exponent is cast

Bug in numeric_power() if exponent is INT_MIN

2021-01-04 Thread Dean Rasheed
(Amusingly I only found this after discovering that Windows Calculator has a similar bug which causes it to crash if you try to raise a number to the power INT_MIN.) On my machine, numeric_power() loses all precision if the exponent is INT_MIN, though the actual failure mode might well be

Re: PoC/WIP: Extended statistics on expressions

2021-01-04 Thread Dean Rasheed
On Fri, 11 Dec 2020 at 20:17, Tomas Vondra wrote: > > OK. Attached is an updated version, reworking it this way. Cool. I think this is an exciting development, so I hope it makes it into the next release. I have started looking at it. So far I have only looked at the catalog, parser and client

Re: PoC/WIP: Extended statistics on expressions

2020-12-11 Thread Dean Rasheed
On Tue, 8 Dec 2020 at 12:44, Tomas Vondra wrote: > > Possibly. But I don't think it's worth the extra complexity. I don't > expect people to have a lot of overlapping stats, so the amount of > wasted space and CPU time is expected to be fairly limited. > > So I don't think it's worth spending too

pgsql: Improve estimation of ANDs under ORs using extended statistics.

2020-12-08 Thread Dean Rasheed
bare AND clauses, looking for compatible RestrictInfo clauses underneath them. Dean Rasheed, reviewed by Tomas Vondra. Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=cfbvwrvhflj_...@mail.gmail.com Branch -- master Details --- https://git.postgresql.org/pg

pgsql: Improve estimation of OR clauses using multiple extended statist

2020-12-08 Thread Dean Rasheed
object do not apply to clauses covered by other statistics objects. Dean Rasheed, reviewed by Tomas Vondra. Discussion: https://postgr.es/m/CAEZATCW=J65GUFm50RcPv-iASnS2mTXQbr=cfbvwrvhflj_...@mail.gmail.com Branch -- master Details --- https://git.postgresql.org/pg/commitdiff

Re: Additional improvements to extended statistics

2020-12-07 Thread Dean Rasheed
On Wed, 2 Dec 2020 at 15:51, Dean Rasheed wrote: > > The sort of queries I had in mind were things like this: > > WHERE (a = 1 AND b = 1) OR (a = 2 AND b = 2) > > However, the new code doesn't apply the extended stats directly using > clauselist_selectivity_or() for this

Re: PoC/WIP: Extended statistics on expressions

2020-12-07 Thread Dean Rasheed
On Mon, 7 Dec 2020 at 14:15, Tomas Vondra wrote: > > On 12/7/20 10:56 AM, Dean Rasheed wrote: > > it might actually be > > neater to have separate documented syntaxes for single- and > > multi-column statistics: > > > > CREATE STATISTICS [ IF NOT EXISTS ] st

Re: PoC/WIP: Extended statistics on expressions

2020-12-07 Thread Dean Rasheed
On Thu, 3 Dec 2020 at 15:23, Tomas Vondra wrote: > > Attached is a patch series rebased on top of 25a9e54d2d. After reading this thread and [1], I think I prefer the name "standard" rather than "expressions", because it is meant to describe the kind of statistics being built rather than what

Re: Additional improvements to extended statistics

2020-12-03 Thread Dean Rasheed
On Wed, 2 Dec 2020 at 16:34, Tomas Vondra wrote: > > On 12/2/20 4:51 PM, Dean Rasheed wrote: > > > > Barring any further comments, I'll push this sometime soon. > > +1 > Pushed. Regards, Dean

pgsql: Improve estimation of OR clauses using extended statistics.

2020-12-03 Thread Dean Rasheed
Improve estimation of OR clauses using extended statistics. Formerly we only applied extended statistics to an OR clause as part of the clauselist_selectivity() code path for an OR clause appearing in an implicitly-ANDed list of clauses. This meant that it could only use extended statistics if

Re: Additional improvements to extended statistics

2020-12-02 Thread Dean Rasheed
On Sun, 29 Nov 2020 at 21:02, Tomas Vondra wrote: > > I wonder how much of the comment before clauselist_selectivity should > move to clauselist_selectivity_ext - it does talk about range clauses > and so on, but clauselist_selectivity does not really deal with that. > But maybe that's just an

Re: Additional improvements to extended statistics

2020-12-01 Thread Dean Rasheed
On Sun, 29 Nov 2020 at 21:02, Tomas Vondra wrote: > > Those are fairly minor issues. I don't have any deeper objections, and > it seems committable. Do you plan to do that sometime soon? > OK, I've updated the patch status in the CF app, and I should be able to push it in the next day or so.

Re: proposal: possibility to read dumped table's name from file

2020-11-26 Thread Dean Rasheed
On Thu, 26 Nov 2020 at 06:43, Pavel Stehule wrote: > > st 25. 11. 2020 v 21:00 odesílatel Tom Lane napsal: >> >> (One thing to consider is >> how painful will it be for people to quote table names containing >> funny characters, for instance. On the command line, we largely >> depend on the

Re: proposal: possibility to read dumped table's name from file

2020-11-25 Thread Dean Rasheed
On Thu, 19 Nov 2020 at 19:57, Pavel Stehule wrote: > > minor update - fixed handling of processing names with double quotes inside > I see this is marked RFC, but reading the thread it doesn't feel like we have reached consensus on the design for this feature. I agree that being able to

Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE

2020-11-23 Thread Dean Rasheed
On Sun, 22 Nov 2020 at 20:58, Tom Lane wrote: > > I found only one nitpicky bug: in > findDefaultOnlyColumns, the test must be bms_is_empty(default_only_cols) > not just default_only_cols == NULL, or it will fail to fall out early > as intended when the first row contains some DEFAULTs but later

Re: [bug+patch] Inserting DEFAULT into generated columns from VALUES RTE

2020-11-20 Thread Dean Rasheed
On Sun, 6 Sept 2020 at 22:42, Tom Lane wrote: > > I think you'd be better off to make transformInsertStmt(), specifically > its multi-VALUES-rows code path, check for all-DEFAULT columns and adjust > the tlist itself. Doing it there might be a good bit less inefficient > for very long VALUES

Re: Additional improvements to extended statistics

2020-11-19 Thread Dean Rasheed
On Wed, 18 Nov 2020 at 22:37, Tomas Vondra wrote: > > Seems fine to me, although the "_opt_ext_stats" is rather cryptic. > AFAICS we use "_internal" for similar functions. > There's precedent for using "_opt_xxx" for function variants that add an option to existing functions, but I agree that in

Re: Additional improvements to extended statistics

2020-11-17 Thread Dean Rasheed
On Thu, 12 Nov 2020 at 14:18, Tomas Vondra wrote: > > Here is an improved WIP version of the patch series, modified to address > the issue with repeatedly applying the extended statistics, as discussed > with Dean in this thread. It's a bit rough and not committable, but I > need some feedback so

Re: Additional improvements to extended statistics

2020-11-12 Thread Dean Rasheed
On Thu, 12 Nov 2020 at 14:18, Tomas Vondra wrote: > > Here is an improved WIP version of the patch series, modified to address > the issue with repeatedly applying the extended statistics, as discussed > with Dean in this thread. It's a bit rough and not committable, but I > need some feedback so

Re: Infinities in type numeric

2020-07-22 Thread Dean Rasheed
On Tue, 21 Jul 2020 at 23:18, Tom Lane wrote: > > Here's a v4 that syncs numeric in_range() with the new behavior of > float in_range(), and addresses your other comments too. > LGTM. Regards, Dean

Re: Wrong results from in_range() tests with infinite offset

2020-07-21 Thread Dean Rasheed
On Tue, 21 Jul 2020 at 03:06, Tom Lane wrote: > > Pushed, but I chickened out of back-patching. The improvement in what > happens for finite comparison values seems somewhat counterbalanced by > the possibility that someone might not like the definition we arrived > at for infinities. So, it's

Re: Wrong results from in_range() tests with infinite offset

2020-07-18 Thread Dean Rasheed
On Fri, 17 Jul 2020 at 01:59, Tom Lane wrote: > > Dean Rasheed writes: > > On Thu, 16 Jul 2020, 22:50 Tom Lane, wrote: > >> Actually, after staring at those results awhile longer, I decided > >> they were wrong. The results shown here seem actually sane --

Re: NaN divided by zero should yield NaN

2020-07-17 Thread Dean Rasheed
On Thu, 16 Jul 2020 at 20:29, Tom Lane wrote: > > Dean Rasheed questioned this longstanding behavior: > > regression=# SELECT 'nan'::float8 / '0'::float8; > ERROR: division by zero > > After a bit of research I think he's right: per IEEE 754 this should > yield NaN, not

Re: Wrong results from in_range() tests with infinite offset

2020-07-16 Thread Dean Rasheed
On Thu, 16 Jul 2020, 22:50 Tom Lane, wrote: > I wrote: > > When the current row's value is +infinity, actual computation of > > base - offset would yield NaN, making it a bit unclear whether > > we should consider -infinity to be in-range. It seems to me that > > we should, as that gives more

Re: Infinities in type numeric

2020-07-15 Thread Dean Rasheed
On Tue, 16 Jun 2020 at 18:24, Tom Lane wrote: > > The attached v3 patch fixes these things and also takes care of an > oversight in v2: I'd made numeric() apply typmod restrictions to Inf, > but not numeric_in() or numeric_recv(). I believe the patch itself > is in pretty good shape now, though

Re: factorial of negative numbers

2020-06-16 Thread Dean Rasheed
On Tue, 16 Jun 2020 at 12:18, Peter Eisentraut wrote: > > On 2020-06-16 11:49, Dean Rasheed wrote: > > With [1], we could return 'Infinity', which would be more correct from > > a mathematical point of view, and might be preferable to erroring-out > > in some con

Re: factorial of negative numbers

2020-06-16 Thread Dean Rasheed
On Tue, 16 Jun 2020 at 10:09, Juan José Santamaría Flecha wrote: > > It is defined as NaN (or undefined), which is not in the realm of integer > numbers. You might get a clear idea of the logic from [1], where they also > make a case for the error being ERRCODE_DIVISION_BY_ZERO. > > [1]

Re: factorial of negative numbers

2020-06-16 Thread Dean Rasheed
On Tue, 16 Jun 2020 at 09:55, Bruce Momjian wrote: > > On Tue, Jun 16, 2020 at 08:31:21AM +0100, Dean Rasheed wrote: > > > > Most common implementations do regard factorial as undefined for > > anything other than positive integers, as well as following the > > co

Re: factorial of negative numbers

2020-06-16 Thread Dean Rasheed
On Tue, 16 Jun 2020 at 06:00, Ashutosh Bapat wrote: > > Divison by zero is really undefined, 12345678 * 12345678 (just some numbers) > is out of range of say int4, but factorial of a negative number has some > meaning and is defined but PostgreSQL does not support it. > Actually, I think

Re: Infinities in type numeric

2020-06-15 Thread Dean Rasheed
On Fri, 12 Jun 2020 at 02:16, Tom Lane wrote: > > * I had to invent some semantics for non-standardized functions, > particularly numeric_mod, numeric_gcd, numeric_lcm. This area > could use review to be sure that I chose desirable behaviors. > I think the semantics you've chosen for

Re: Definitional issue: stddev_pop (and related) for 1 input

2020-06-13 Thread Dean Rasheed
On Fri, 12 Jun 2020 at 20:53, Tom Lane wrote: > > I wrote: > > Before v12, stddev_pop() had the following behavior with just a > > single input value: > > ... > > As of v12, though, all three cases produce 0. I am not sure what > > to think about that with respect to an infinity input, but I'm >

Re: Poll: are people okay with function/operator table redesign?

2020-05-05 Thread Dean Rasheed
On Mon, 4 May 2020 at 22:22, Tom Lane wrote: > > * is also quite sticky about inserting other sorts > of font-changing environments inside it. As an example, it'll let > you include but not , which seems pretty weird > to me. This is problematic in some places where it's desirable to > have

Re: Berserk Autovacuum (let's save next Mandrill)

2020-04-01 Thread Dean Rasheed
On Tue, 31 Mar 2020 at 22:16, Tom Lane wrote: > > > Dean Rasheed writes: > >> ... > >> It looks to me as though the problem is that statext_store() needs to > >> take its lock on pg_statistic_ext_data *before* searching for the > >> stats tuple to upd

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-31 Thread Dean Rasheed
On Tue, 31 Mar 2020 at 04:39, David Rowley wrote: > > On Sat, 28 Mar 2020 at 22:22, David Rowley wrote: > > I'm unsure yet if this has caused an instability on lousyjack's run in > > [1]. > > pogona has just joined in on the fun [1], so, we're not out the woods > on this yet. I'll start having a

Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-29 Thread Dean Rasheed
On Sat, 28 Mar 2020 at 13:18, Dean Rasheed wrote: > > OK, I've pushed that with your recommendation for that function name. > Does this now complete everything that you wanted to do for functional dependency stats for PG13? Re-reading the thread, I couldn't see anything else that neede

pgsql: Improve the performance and accuracy of numeric sqrt() and ln().

2020-03-28 Thread Dean Rasheed
Improve the performance and accuracy of numeric sqrt() and ln(). Instead of using Newton's method to compute numeric square roots, use the Karatsuba square root algorithm, which performs better for numbers of all sizes. In practice, this is 3-5 times faster for inputs with just a few digits and

Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-28 Thread Dean Rasheed
On Wed, 25 Mar 2020 at 00:28, Tomas Vondra wrote: > > Seems OK to me. > > I'd perhaps name deps_clauselist_selectivity differently, it's a bit too > similar to dependencies_clauselist_selectivity. Perhaps something like > clauselist_apply_dependencies? But that's a minor detail. > OK, I've

pgsql: Prevent functional dependency estimates from exceeding column es

2020-03-28 Thread Dean Rasheed
Prevent functional dependency estimates from exceeding column estimates. Formerly we applied a functional dependency "a => b with dependency degree f" using the formula P(a,b) = P(a) * [f + (1-f)*P(b)] This leads to the possibility that the combined selectivity P(a,b) could exceed P(b), which

Re: INSERT ... OVERRIDING USER VALUE vs GENERATED ALWAYS identity columns

2020-03-27 Thread Dean Rasheed
On Fri, 27 Mar 2020 at 11:29, Peter Eisentraut wrote: > > We appear to have lost track of this. Ah yes, indeed! > I have re-read everything and > expanded your patch a bit with additional documentation and comments in > the tests. I looked that over, and it all looks good to me. Regards, Dean

Re: Some improvements to numeric sqrt() and ln()

2020-03-25 Thread Dean Rasheed
On Sun, 22 Mar 2020 at 22:16, Tom Lane wrote: > > With resolutions of the XXX items, I think this'd be committable. > Thanks for looking at this! Here is an updated patch with the following updates based on your comments: * Now uses integer arithmetic to compute res_weight and res_ndigits,

Re: Additional improvements to extended statistics

2020-03-24 Thread Dean Rasheed
On Tue, 24 Mar 2020 at 01:08, Tomas Vondra wrote: > > Hmmm. So let's consider a simple OR clause with two arguments, both > covered by single statistics object. Something like this: > >CREATE TABLE t (a int, b int); >INSERT INTO t SELECT mod(i, 10), mod(i, 10) > FROM

Re: Additional improvements to extended statistics

2020-03-23 Thread Dean Rasheed
On Sat, 21 Mar 2020 at 21:59, Tomas Vondra wrote: > > Ah, right. Yeah, I think that should work. I thought there would be some > volatility due to groups randomly not making it into the MCV list, but > you're right it's possible to construct the data in a way to make it > perfectly deterministic.

Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-19 Thread Dean Rasheed
On Wed, 18 Mar 2020 at 00:29, Tomas Vondra wrote: > > OK, I took a look. I think from the correctness POV the patch is OK, but > I think the dependencies_clauselist_selectivity() function now got a bit > too complex. I've been able to parse it now, but I'm sure I'll have > trouble in the future

Re: Additional improvements to extended statistics

2020-03-19 Thread Dean Rasheed
On Wed, 18 Mar 2020 at 19:31, Tomas Vondra wrote: > > Attached is a rebased patch series, addressing both those issues. > > I've been wondering why none of the regression tests failed because of > the 0.0 vs. 1.0 issue, but I think the explanation is pretty simple - to > make the tests stable,

Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-17 Thread Dean Rasheed
On Tue, 17 Mar 2020 at 15:37, Tomas Vondra wrote: > > On Tue, Mar 17, 2020 at 12:42:52PM +, Dean Rasheed wrote: > > >The other thing that I'm still concerned about is the possibility of > >returning estimates with P(a,b) > P(a) or P(b). I think that such a > >

Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-17 Thread Dean Rasheed
On Sat, 14 Mar 2020 at 18:45, Tomas Vondra wrote: > > I realized there's one more thing that probably needs discussing. > Essentially, these two clause types are the same: > >a IN (1, 2, 3) > >(a = 1 OR a = 2 OR a = 3) > > but with 8f321bd1 we only recognize the first one as compatible

Re: Additional improvements to extended statistics

2020-03-15 Thread Dean Rasheed
On Sun, 15 Mar 2020 at 00:08, Tomas Vondra wrote: > > On Sat, Mar 14, 2020 at 05:56:10PM +0100, Tomas Vondra wrote: > > > >Attached is a patch series rebased on top of the current master, after > >committing the ScalarArrayOpExpr enhancements. I've updated the OR patch > >to get rid of the code

Re: Additional improvements to extended statistics

2020-03-13 Thread Dean Rasheed
On Mon, 9 Mar 2020 at 00:06, Tomas Vondra wrote: > > On Mon, Mar 09, 2020 at 01:01:57AM +0100, Tomas Vondra wrote: > > > >Attaches is an updated patch series > >with parts 0002 and 0003 adding tests demonstrating the issue and then > >fixing it (both shall be merged to 0001). > > > > One day I

Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-13 Thread Dean Rasheed
On Thu, 12 Mar 2020 at 17:30, Tomas Vondra wrote: > > I'm sorry, but I don't see how we could do this for arbitrary clauses. I > think we could do that for clauses that have equality semantics and > reference column values as a whole. So I think it's possible to do this > for IN clauses (which is

Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-03-12 Thread Dean Rasheed
[ For the sake of the archives, some of the discussion on the other thread [1-3] should really have been on this thread. ] On Sun, 2 Feb 2020 at 18:41, Tomas Vondra wrote: > > I think the challenge here is in applying the functional dependency > computed for the whole array to individual

Re: Additional improvements to extended statistics

2020-03-11 Thread Dean Rasheed
On Mon, 9 Mar 2020 at 18:19, Tomas Vondra wrote:> > On Mon, Mar 09, 2020 at 08:35:48AM +, Dean Rasheed wrote: > > > > P(a,b) = P(a) * [f + (1-f)*P(b)] > > > >because it might return a value that is larger that P(b), which > >obviously should not be

Re: Additional improvements to extended statistics

2020-03-09 Thread Dean Rasheed
On Mon, 9 Mar 2020 at 00:02, Tomas Vondra wrote: > > Speaking of which, would you take a look at [1]? I think supporting SAOP > is fine, but I wonder if you agree with my conclusion we can't really > support inclusion @> as explained in [2]. > Hmm, I'm not sure. However, thinking about your

Re: Additional improvements to extended statistics

2020-03-08 Thread Dean Rasheed
On Fri, 6 Mar 2020 at 12:58, Tomas Vondra wrote: > > Here is a rebased version of this patch series. I've polished the first > two parts a bit - estimation of OR clauses and (Var op Var) clauses. > Hi, I've been looking over the first patch (OR list support). It mostly looks reasonable to me,

Re: Some improvements to numeric sqrt() and ln()

2020-03-04 Thread Dean Rasheed
On Wed, 4 Mar 2020 at 14:41, David Steele wrote: > > Are these improvements targeted at PG13 or PG14? This seems a pretty > big change for the last CF of PG13. > Well of course that's not entirely up to me, but I was hoping to commit it for PG13. It's very well covered by a large number of

Re: Some improvements to numeric sqrt() and ln()

2020-03-03 Thread Dean Rasheed
On Tue, 3 Mar 2020 at 00:17, Tels wrote: > > Thank you for these patches, these sound like really nice improvements. Thanks for looking! > One thing can to my mind while reading the patch: > > +* If r < 0 Then > +* Let r = r + 2*s - 1 > +

Re: Some improvements to numeric sqrt() and ln()

2020-03-01 Thread Dean Rasheed
On Fri, 28 Feb 2020 at 08:15, Dean Rasheed wrote: > > It's possible that there are further gains to be had in the sqrt() > algorithm on platforms that support 128-bit integers, but I haven't > had a chance to investigate that yet. > Rebased patch attached, now using 128-bit in

pgsql: Fix corner-case loss of precision in numeric ln().

2020-03-01 Thread Dean Rasheed
Fix corner-case loss of precision in numeric ln(). When deciding on the local rscale to use for the Taylor series expansion, ln_var() neglected to account for the fact that the result is subsequently multiplied by a factor of 2^(nsqrt+1), where nsqrt is the number of square root operations

Some improvements to numeric sqrt() and ln()

2020-02-28 Thread Dean Rasheed
Attached is a WIP patch to improve the performance of numeric sqrt() and ln(), which also makes a couple of related improvements to div_var_fast(), all of which have knock-on benefits for other numeric functions. The actual impact varies greatly depending on the inputs, but the overall effect is

Re: Marking some contrib modules as trusted extensions

2020-01-31 Thread Dean Rasheed
On Wed, 29 Jan 2020 at 21:39, Tom Lane wrote: > > >>> pg_stat_statements > > Mmm, I'm not convinced --- the ability to see what statements are being > executed in other sessions (even other databases) is something that > paranoid installations might not be so happy about. Our previous >

<    1   2   3   4   5   6   7   8   9   10   >