Re: [HACKERS] Row Level Security Documentation
On 26 September 2017 at 00:42, Stephen Frost <sfr...@snowman.net> wrote: > * Dean Rasheed (dean.a.rash...@gmail.com) wrote: >> Attached is a proposed patch... > > I've taken a look through this and generally agree with it. Thanks for looking at this. > the bits inside ... tags should be > consistently capitalized (you had one case of 'All' that I noticed) Yes, that's a typo. Will fix. > I wonder if it'd be better to have the "simple" description *first* > instead of later on, eg, where you have "Thus the overall result is > that" we might want to try and reword things to decribe it as "Overall, > it works thusly, ..., and the specifics are ...". OK, that makes sense. I'll try it that way round. > That's a relatively minor point, however, and I do feel that this patch > is a definite improvement. Were you thinking of just applying this for > v10, or back-patching all or part of it..? I was planning on back-patching it to 9.5, taking out the parts relating the restrictive policies as appropriate. Currently the 9.5 and 9.6 docs are identical, as are 10 and HEAD, and 9.5/9.6 only differs from 10/HEAD in a couple of places where they mention restrictive policies. IMO we should stick to that, making any improvements available in the back-branches. I was also thinking the same about the new summary table, although I haven't properly reviewed that yet. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row Level Security Documentation
On 5 August 2017 at 10:03, Fabien COELHOwrote: > Patch applies cleanly, make html ok, new table looks good to me. > So I started looking at this patch, but before even considering the new table proposed, I think there are multiple issues that need to be resolved with the current docs: Firstly, there are 4 separate places in the current CREATE POLICY docs that say that multiple policies are combined using OR, which is of course no longer correct, since PG10 added RESTRICTIVE policy support. In fact, it wasn't even strictly correct in 9.5/9.6, because if multiple different types of policy apply to a single command (e.g. SELECT and UPDATE policies, being applied to an UPDATE command), then they are combined using AND. Rather than trying to make this correct in 4 different places, I think there should be a single new section of the docs that explains how multiple policies are combined. This will also reduce the number of places where the pre- and post-v10 docs differ, making them easier to maintain. In the 6th paragraph of the description section, the terminology used is mixed up and does not match the terminology used elsewhere ("command" instead of "policy" and "policy" instead of "expression"). The description of UPDATE policies lists the commands that the policy applies to (UPDATE and INSERT ... ON CONFLICT DO UPDATE), but fails to mention SELECT FOR UPDATE and SELECT FOR SHARE. The proposed patch distinguishes between UPDATE/DELETE commands that have WHERE or RETURNING clauses, and those that don't, claiming that the former require SELECT permissions and the latter don't. That's based on a couple of statements from the current docs, but it's not entirely accurate. For example, "UPDATE foo SET a=NULL WHERE true RETURNING now()" does not require SELECT permissions despite having both WHERE and RETURNING clauses (OK, I admit that's a rather contrived example, but still...), and conversely (a more realistic example) "UPDATE foo SET a=b+c" does require SELECT permissions despite not having WHERE or RETURNING clauses. I think we need to try to be more precise about when SELECT permissions are required. In the notes section, there is a note about there needing to be at least one permissive policy for restrictive policies to take effect. That's well worth pointing out, however, I fear that this note is buried so far down the page (amongst some other very wordy notes) that readers will miss it. I suggest moving it to the parameters section, where permissive and restrictive policies are first introduced, because it's a pretty crucial fact to be aware of when using these new types of policy. Attached is a proposed patch to address these issues, along with a few other minor tidy-ups. Regards, Dean create-policy-docs.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition b
On 14 September 2017 at 03:49, Noah Mischwrote: > On Wed, Sep 13, 2017 at 12:06:40PM -0400, Robert Haas wrote: >> OK, thanks. I still don't really like allowing this, but I can see >> that compatibility with other systems has some value here, and if >> nobody else is rejecting these cases, maybe we shouldn't either. So >> I'll hold my nose and change my vote to canonicalizing rather than >> rejecting outright. > > I vote for rejecting it. DDL compatibility is less valuable than other > compatibility. The hypothetical affected application can change its DDL to > placate PostgreSQL and use that modified DDL for all other databases, too. So we have 3 options: 1. Do nothing, allowing and keeping any values after a MINVALUE/MAXVALUE. 2. Allow the such values, but canonicalise what we store. 3. Reject anything other than MINVALUE/MAXVALUE after MINVALUE/MAXVALUE. My order of preference is still (1), (2) then (3) because: - Compatibility. - Less verbose / easier to type. - We allow multiple syntaxes for equivalent constraints in other places, without canonicalising what the user enters. - Regarding Robert's coding argument, code in general should not be looking at and basing decisions on any values it sees after a MINVALUE/MAXVALUE. If it does, at the very least it's doing more work than it needs to, and at worst, there's a potential bug there which would be more readily exposed by allowing arbitrary values there. Code that only worked because because of earlier canonicalisation would strike me as being somewhat fragile. However, it seems that there is a clear consensus towards (2) or (3) and we have viable patches for each, although I'm not sure which of those the consensus is really leaning towards. Robert, since partitioning was originally your commit, and you know the wider codebase better, I'm happy to go with whatever you decide. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition b
On 13 September 2017 at 14:51, Robert Haaswrote: > Coincidentally, I wrote a patch for this too, but mine goes back to > rejecting MINVALUE or MAXVALUE followed by anything else. > LGTM, if we decide to go this way. One minor review comment -- you missed an example code snippet using (MINVALUE, 0) further down the page in create_table.sgml, which needs updating. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition b
On 13 September 2017 at 10:05, Amit Langotewrote: > Coincidentally, I just wrote the patch for canonicalizing stored values, > instead of erroring out. Please see attached if that's what you were > thinking too. > Looks reasonable to me, if we decide to go this way. One minor review comment -- it isn't really necessary to have the separate new boolean local variables as well as the datum kind variables. Just tracking the datum kind is sufficient and slightly simpler. That would also address a slight worry I have that your coding might result in a compiler warning about the kind variables being used uninitialised with some less intelligent compilers, not smart enough to work out that it can't actually happen. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition b
On 13 September 2017 at 14:53, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Sep 13, 2017 at 4:51 AM, Dean Rasheed <dean.a.rash...@gmail.com> > wrote: >> A drawback to doing this is that we lose compatibility with syntaxes >> supported by other databases, which was part of the reason for >> choosing the terms MINVALUE and MAXVALUE in the first place. >> >> So thinking about this afresh, my preference would actually be to just >> canonicalise the values stored rather than erroring out. > > Can you be more specific about what other databases do here? Which > other systems support MINVALUE/MAXVALUE, and what are their respective > behaviors in this situation? > Oracle, MySQL and DB2 all use MINVALUE/MAXVALUE. Actually, Oracle and MySQL only use MAXVALUE, not MINVALUE, because they don't allow gaps between partitions and the first partition implicitly starts at MINVALUE, so the bounds that we currently support are a strict superset of those supported by Oracle and MySQL. Both Oracle and MySQL allow finite values after MAXVALUE (usually listed as "0" in code examples, e.g. see [1]). Oracle explicitly documents the fact that values after MAXVALUE are irrelevant in [1]. I'm not sure if MySQL explicitly documents that, but it does behave the same. Also, both Oracle and MySQL store what the user entered (they do not canonicalise), as can be seen by looking at ALL_TAB_PARTITIONS in Oracle, or "show create table" in MySQL. I have not used DB2. Regards, Dean [1] https://docs.oracle.com/cd/E18283_01/server.112/e16541/part_admin001.htm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition b
Robert Haaswrites: > On Tue, Sep 12, 2017 at 9:58 AM, Alvaro Herrera > wrote: >> Did anything happen on this, or did we just forget it completely? > > I forgot it. :-( > > I really think we should fix this. Ah, sorry. This was for me to follow up, and I dropped the ball. Here's a patch restoring the original error checks (actually not the same coding as used in previous versions of the patch, because that would have allowed a MINVALUE after a MAXVALUE and vice versa). A drawback to doing this is that we lose compatibility with syntaxes supported by other databases, which was part of the reason for choosing the terms MINVALUE and MAXVALUE in the first place. So thinking about this afresh, my preference would actually be to just canonicalise the values stored rather than erroring out. Thoughts? Regards, Dean restore-minvalue-maxvalue-error-checks.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dubious error message from partition.c
On 9 August 2017 at 13:03, Robert Haaswrote: > On Tue, Aug 8, 2017 at 11:34 PM, Tom Lane wrote: >> A small suggestion is that it'd be better to write it like "Specified >> upper bound \"%s\" precedes lower bound \"%s\"." I think "succeeds" has >> more alternate meanings than "precedes", so the wording you have seems >> more confusing than it needs to be. (Of course, the situation could be >> the opposite in other languages, but translators have the ability to >> reverse the ordering if they need to.) > > I think that doesn't quite work, because the failure is caused by LB > <= UB, not LB < UB. We could fix that by writing "precedes or equals" > but that seems lame. Maybe: > > Lower bound %s does not precede upper bound %s. > There was an earlier suggestion to use "greater than or equal to". I think that would work quite well: ERROR: invalid range bounds for partition \"%s\" DETAIL: lower bound %s is greater than or equal to upper bound %s. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition b
On 8 August 2017 at 19:22, Robert Haas <robertmh...@gmail.com> wrote: > On Fri, Jul 21, 2017 at 4:24 AM, Dean Rasheed <dean.a.rash...@gmail.com> > wrote: >> Also drop the constraint prohibiting finite values after an unbounded >> column, and just document the fact that any values after MINVALUE or >> MAXVALUE are ignored. Previously it was necessary to repeat UNBOUNDED >> multiple times, which was needlessly verbose. > > I would like to (belatedly) complain about this part of this commit. > Now you can do stuff like this: > > rhaas=# create table foo (a int, b text) partition by range (a, b); > CREATE TABLE > rhaas=# create table foo1 partition of foo for values from (minvalue, > 0) to (maxvalue, 0); > CREATE TABLE > > The inclusion of 0 has no effect, but it is still stored in the > catalog, shows up in \d foo1, and somebody might think it does > something. I think we should go back to requiring bounds after > minvalue or maxvalue to be minvalue or maxvalue. I thought maybe the > idea here was that you were going to allow something like this to > work, which actually would have saved some typing: > > create table foo2 partition of foo for values from (minvalue) to (maxvalue); > > But no: > > ERROR: FROM must specify exactly one value per partitioning column > > So the only way that this has made things less verbose is by letting > you put in a meaningless value of the data type which takes fewer than > 8 characters to type. That doesn't seem to me to be a defensible way > of reducing verbosity. > Well perhaps verbosity-reduction isn't sufficient justification but I still think this is correct because logically any values in the bound after MINVALUE/MAXVALUE are irrelevant, so it seems overly restrictive to force all later values to be MINVALUE/MAXVALUE when the code will just ignore those values. I don't think we should allow values after MINVALUE/MAXVALUE to be omitted altogether because we document multi-column ranges as being most easily understood according to the rules of row-wise comparisons, and they require equal numbers of values in each row. It's also worth noting that this choice is consistent with other databases, so at least some people will be used to being able to do this. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor comment update in partition.c
On 31 July 2017 at 12:53, Beena Emersonwrote: > The commit d363d42bb9a4399a0207bd3b371c966e22e06bd3 changed > RangeDatumContent *content to PartitionRangeDatumKind *kind but a > comment on function partition_rbound_cmp was left unedited and it > still mentions content1 instead of kind1. > > PFA the patch to fix this. > Yes you're right, thanks. Patch committed. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi column range partition table
On 17 July 2017 at 17:37, Dean Rasheed <dean.a.rash...@gmail.com> wrote: > On 17 July 2017 at 16:34, Robert Haas <robertmh...@gmail.com> wrote: >> Do you want to own this open item, then? >> > OK. > > I need to give the patch another read-through, and then I'll aim to > push it sometime in the next few days. > Committed. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi column range partition table
On 17 July 2017 at 16:34, Robert Haas <robertmh...@gmail.com> wrote: > On Sun, Jul 16, 2017 at 6:40 AM, Dean Rasheed <dean.a.rash...@gmail.com> > wrote: >> Technically, anything that can be done using INCLUSIVE/EXCLUSIVE can >> also be done using using MINVALUE/MAXVALUE, by artificially adding >> another partitioning column and making it unbounded above/below, but >> that would really just be a hack, and it (artificially adding an extra >> column) would be unnecessary if we added INCLUSIVE/EXCLUSIVE support >> in a later release. Thus, I think the 2 features would complement each >> other quite nicely. > > OK, works for me. I'm not really keen about the MINVALUE/MAXVALUE > syntax -- it's really +/- infinity, not a value at all -- but I > haven't got a better proposal and yours at least has the virtue of > perhaps being familiar to those who know about Oracle. > Cool. Sounds like we've reached a consensus, albeit with some reservations around the fact that MINVALUE/MAXVALUE aren't actually values, despite their names. +/- infinity *are* values for some datatypes such as timestamps, so it had to be something different from that, and MINVALUE/MAXVALUE are quite short and simple, and match the syntax used by 3 other databases. > Do you want to own this open item, then? > OK. I need to give the patch another read-through, and then I'll aim to push it sometime in the next few days. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi column range partition table
On 14 July 2017 at 06:12, Robert Haaswrote: > I agree that it's a big problem that (10, UNBOUNDED) > interpreted as a maximum value means first_column <= 10 and when > interpreted as a minimum value means first_column >= 10, because those > things aren't opposites of each other. I guess the proposal here > would make (10, MAXVALUE) as a maximum value mean first_column <= 10 > and as a minimum would mean first_column > 10, and contrariwise for > MINVALUE. That seems to restore the intended design principle of the > system, which is good Right. So in general, when using MINVALUE/MAXVALUE for the 2nd column of a 2-column partitioning scheme, the partition constraints simplify as follows: FROM (x, MINVALUE) => col1 >= x FROM (x, MAXVALUE) => col1 > x TO (x, MINVALUE) => col1 < x TO (x, MAXVALUE) => col1 <= x which restores the property that one partition can be made contiguous with another by having the upper bounds of one partition equal to the lower bounds of the other. Note that the choice of MINVALUE or MAXVALUE only affects whether the constraint on the previous column is inclusive or exclusive. That's quite different from what an INCLUSIVE/EXCLUSIVE flag would do. >, but... > > ...originally, Amit proposed to attach a postfix INCLUSIVE or > EXCLUSIVE to each bound specification, and this does feel like a bit > of a back door to the same place, kinda. A partition defined to run > from (10, MAXVALUE) TO (11, MAXVALUE) is a lot like a partition > defined to run from (10) EXCLUSIVE to (11) EXCLUSIVE. And if we > eventually decide to allow that, then what will be the difference > between a partition which starts at (10, MAXVALUE) EXCLUSIVE and one > which starts from (10, MAXVALUE) INCLUSIVE? The INCLUSIVE/EXCLUSIVE flag would apply to the constraint as a whole: FROM (x, y) INCLUSIVE => (col1, col2) >= (x, y) FROM (x, y) EXCLUSIVE => (col1, col2) > (x, y) TO (x, y) INCLUSIVE => (col1, col2) <= (x, y) TO (x, y) EXCLUSIVE => (col1, col2) < (x, y) which, when expanded out, actually only affects the constraint on the final column, and then only in the case where all the other columns are equal to the partition bound value: FROM (x, y) INCLUSIVE => col1 > x OR (col1 = x AND col2 >= y) FROM (x, y) EXCLUSIVE => col1 > x OR (col1 = x AND col2 > y) TO (x, y) INCLUSIVE => col1 < x OR (col2 = x AND col2 <= y) TO (x, y) EXCLUSIVE => col1 < x OR (col2 = x AND col2 < y) So while MINVALUE/MAXVALUE makes a particular column unbounded below/above, and as a side-effect can influence the inclusivity of the preceding column, INCLUSIVE/EXCLUSIVE affects the inclusivity of the final column (something that MINVALUE/MAXVALUE cannot do). MINVALUE/MAXVALUE takes precedence, in the sense that if the bound on any column is MINVALUE/MAXVALUE, that column and any later columns are unbounded and no longer appear in the partition constraint expression, and so any INCLUSIVE/EXCLUSIVE flag would have no effect. That seems pretty intuitive to me -- "unbounded inclusive" is no different from "unbounded exclusive". Technically, anything that can be done using INCLUSIVE/EXCLUSIVE can also be done using using MINVALUE/MAXVALUE, by artificially adding another partitioning column and making it unbounded above/below, but that would really just be a hack, and it (artificially adding an extra column) would be unnecessary if we added INCLUSIVE/EXCLUSIVE support in a later release. Thus, I think the 2 features would complement each other quite nicely. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi column range partition table
On 12 July 2017 at 10:46, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: > On Wed, Jul 12, 2017 at 12:54 AM, Dean Rasheed <dean.a.rash...@gmail.com> > wrote: >> On 11 July 2017 at 13:29, Ashutosh Bapat >>> The description in this paragraph seems to be attaching intuitive >>> meaning of word "unbounded" to MAXVALUE and MINVALUE, which have >>> different intuitive meanings of themselves. Not sure if that's how we >>> should describe MAXVALUE/MINVALUE. >>> >> I'm not sure I understand your point. MINVALUE and MAXVALUE do mean >> unbounded below and above respectively. This paragraph is just making >> the point that that isn't the same as -/+ infinity. >> > What confuses me and probably users is something named min/max"value" > is not a value but something lesser or greater than any other "value". Ah OK, I see what you're saying. It's worth noting though that, after a little looking around, I found that Oracle, MySQL and DB2 all use MINVALUE/MAXVALUE for unbounded range partitions (although in the case of Oracle and MySQL, they currently only support specifying upper bounds, and only use MAXVALUE at the moment). So MINVALUE/MAXVALUE are likely to be familiar to at least some people coming from other databases. Of course, for those other databases, the surrounding syntax for creating partitioned tables is completely different, but at least this makes the bounds themselves portable (our supported set of bounds will be a superset of those supported by Oracle and MySQL, and I think the same as those supported by DB2). I also personally quite like those terms, because they're nice and concise, and it's pretty obvious which is which. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New partitioning - some feedback
On 12 July 2017 at 23:23, Dean Rasheed <dean.a.rash...@gmail.com> wrote: > I also agree that there probably isn't much need for a list that > *only* includes partitions, but if someone comes up with a convincing > use case, then we could add another 2-letter command for that. > Actually, I just thought of a round-about sort of use case: The various 2-letter commands \dE, \dt, etc... are designed to work together, so you can do things like \dEt or \dtE to list all local and foreign tables, whilst excluding views, sequences, etc. So, if for the sake of argument, \dP were made to list partitions, then you'd be able to do things like \dEPt to list all the various kinds of tables, including partitions, whilst excluding views, etc. That seems somewhat more elegant and flexible than \d++ or \d! or whatever. Of course, you'd have to decide whether a foreign partition came under \dE, \dP, both or something else. I'm not sure that we should eat another letter of the alphabet just for that case, because there aren't many left, and I don't think any will be natural mnemonics like the others. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New partitioning - some feedback
On 12 July 2017 at 15:58, Alvaro Herrerawrote: > Amit Langote wrote: >> On 2017/07/11 13:34, Alvaro Herrera wrote: >> > However, the "list tables" >> > command \dt should definitely IMO not list partitions. >> >> Do you mean never? Even if a modifier is specified? In the patch I >> proposed, \d! (or \d+ or \d++, if '!' turns out to be unpopular) will list >> partitions, but \d or \dt won't. That is, partitions are hidden by default. > > I don't think there is any need for a single list of all partition of > all tables -- is there? I can't think of anything, but then I haven't > been exposed very much to this feature yet. For now, I lean towards "never". > So just focusing on the listing issue for now... I tend to agree with some of the upstream comments that a bare \d should list everything, including partitions, because partitions are still tables that you might want to do DML or DDL on. Also, if you look at what we already have, \d lists all types of relations, and then there are 2-letter commands \dE, \di, \dm, \ds, \dt and \dv that list just specific kinds of relations, for example \dE lists foreign tables, and \dt lists local tables, specifically excluding foreign tables. So ISTM that the most logical extension of that is: \d - list all relations, including partitions \dt - list only tables that are not foreign tables or partitions of other tables (that's not quite an exact extension of the existing logic, because of course it's partitioned tables that have the different relkind, not the partitions, but the above seems like the most useful behaviour) With this, I don't think there's any need for any additional modifiers, like ! or ++. I also agree that there probably isn't much need for a list that *only* includes partitions, but if someone comes up with a convincing use case, then we could add another 2-letter command for that. > I don't think \d! works terribly well as a mental model, but maybe > that's just me. > Yeah, I agree. It just looks ugly somehow. >> So it seems most of us are in favor for showing partitioned tables as >> "partitioned table" instead of "table" in the table listing. > > Yeah, that sounds good to me. > +1 Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi column range partition table
On 11 July 2017 at 13:29, Ashutosh Bapatwrote: > + > + Also note that some element types, such as timestamp, > + have a notion of "infinity", which is just another value that can > + be stored. This is different from MINVALUE and > + MAXVALUE, which are not real values that can be stored, > + but rather they are ways of saying the value is unbounded. > + MAXVALUE can be thought of as being greater than any > + other value, including "infinity" and MINVALUE as being > + less than any other value, including "minus infinity". Thus the range > + FROM ('infinity') TO (MAXVALUE) is not an empty range; it > + allows precisely one value to be stored the timestamp > + "infinity". > > > The description in this paragraph seems to be attaching intuitive > meaning of word "unbounded" to MAXVALUE and MINVALUE, which have > different intuitive meanings of themselves. Not sure if that's how we > should describe MAXVALUE/MINVALUE. > I'm not sure I understand your point. MINVALUE and MAXVALUE do mean unbounded below and above respectively. This paragraph is just making the point that that isn't the same as -/+ infinity. > Most of the patch seems to be replacing "content" with "kind", > RangeDatumContent with PartitionRangeDatumKind and RANGE_DATUM_FINITE > with PARTITION_RANGE_DATUM_VALUE. But those changes in name don't seem > to be adding much value to the patch. Replacing RANGE_DATUM_NEG_INF > and RANGE_DATUM_POS_INF with PARTITION_RANGE_DATUM_MINVALUE and > PARTITION_RANGE_DATUM_MAXVALUE looks like a good change in line with > MINVALUE/MAXVALUE change. May be we should reuse the previous > variables, enum type name and except those two, so that the total > change introduced by the patch is minimal. > No, this isn't just renaming that other enum. It's about replacing the boolean "infinite" flag on PartitionRangeDatum with something that can properly enumerate the 3 kinds of PartitionRangeDatum that are allowed (and, as noted above "finite"/"infinite isn't the right terminology either). Putting that new enum in parsenodes.h makes it globally available, wherever the PartitionRangeDatum structure is used. A side-effect of that change is that the old RangeDatumContent enum that was local to partition.c is no longer needed. RangeDatumContent wouldn't be a good name for a globally visible enum of this kind because that name fails to link it to partitioning in any way, and could easily be confused as having something to do with RTEs or range types. Also, the term "content" is more traditionally used in the Postgres sources for a field *holding* content, rather than a field specifying the *kind* of content. On the other hand, you'll note that the term "kind" is by far the most commonly used term for naming this kind of enum, and any matching fields. IMO, code consistency and readability takes precedence over keeping patch sizes down. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi column range partition table
On 6 July 2017 at 22:43, Joe Conwaywrote: > I agree we should get this right the first time and I also agree with > Dean's proposal, so I guess I'm a +2 > On 7 July 2017 at 03:21, Amit Langote wrote: > +1 to releasing this syntax in PG 10. > So, that's 3 votes in favour of replacing UNBOUNDED with MINVALUE/MAXVALUE for range partition bounds in PG 10. Not a huge consensus, but no objections either. Any one else have an opinion? Robert, have you been following this thread? I was thinking of pushing this later today, in time for beta2. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi column range partition table
On 7 July 2017 at 03:21, Amit Langotewrote: > The patch looks generally good, although I found and fixed some minor > issues (typos and such). Please find attached the updated patch. > Thanks for the review. Those changes all look good. I also see that I missed an example in the docs at the bottom of the CREATE TABLE page, so I'll go update that. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi column range partition table
On 6 July 2017 at 21:04, Tom Lane <t...@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rash...@gmail.com> writes: >> However, this is also an incompatible syntax change, and any attempt >> to support both the old and new syntaxes is likely to be messy, so we >> really need to get consensus on whether this is the right thing to do, >> and whether it *can* be done now for PG10. > > FWIW, I'd much rather see us get it right the first time than release > PG10 with a syntax that we'll regret later. I do not think that beta2, > or even beta3, is too late for such a change. > > I'm not taking a position on whether this proposal is actually better > than what we have. But if there's a consensus that it is, we should > go ahead and do it, not worry that it's too late. > OK, thanks. That's good to know. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi column range partition table
On 5 July 2017 at 18:07, Dean Rasheed <dean.a.rash...@gmail.com> wrote: > So if we were to go for maximum flexibility and compatibility with > Oracle, then perhaps what we would do is more like the original idea > of UNBOUNDED ABOVE/BELOW, except call them MINVALUE and MAXVALUE, > which conveniently are already unreserved keywords, as well as being > much shorter. Plus, we would also relax the constraint about having > finite values after MINVALUE/MAXVALUE. > So I know that I have flip-flopped a few times on this now, but I'm now starting to think that this approach, replacing UNBOUNDED with MINVALUE and MAXVALUE is the best way to go, along with permitting finite values after MINVALUE/MAXVALUE. This gives the greatest flexibility, it's not too verbose, and it makes it easy to define contiguous sets of partitions just by making the lower bound of one match the upper bound of another. With this approach, any partition bounds that Oracle allows are also valid in PostgreSQL, not that I would normally give too much weight to that, but it is I think quite a nice syntax. Of course, we also support things that Oracle doesn't allow, such as MINVALUE and gaps between partitions. Parts of the patch are similar to your UNBOUNDED ABOVE/BELOW patch, but there are a number of differences -- most notably, I replaced the "infinite" boolean flag on PartitionRangeDatum with a 3-value enum and did away with all the DefElem nodes and the associated special string constants being copied and compared. However, this is also an incompatible syntax change, and any attempt to support both the old and new syntaxes is likely to be messy, so we really need to get consensus on whether this is the right thing to do, and whether it *can* be done now for PG10. Regards, Dean Replace_UNBOUNDED_with_MINVALUE_and_MAXVALUE.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi column range partition table
On 5 July 2017 at 10:43, Amit Langotewrote: > 0001 is your patch to tidy up check_new_partition_bound() (must be > applied before 0002) > I pushed this first patch, simplifying check_new_partition_bound() for range partitions, since it seemed like a good simplification, but note that I don't think that was actually the cause of the latent bug you saw upthread. I think the real issue was in partition_rbound_cmp() -- normally, if the upper bound of one partition coincides with the lower bound of another, that function would report the upper bound as the smaller one, but that logic breaks if any of the bound values are infinite, since then it will exit early, returning 0, without ever comparing the "lower" flags on the bounds. I'm tempted to push a fix for that independently, since it's a bug waiting to happen, even though it's not possible to hit it currently. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi column range partition table
On 5 July 2017 at 10:43, Amit Langotewrote: > In retrospect, that sounds like something that was implemented in the > earlier versions of the patch, whereby there was no ability to specify > UNBOUNDED on a per-column basis. So the syntax was: > > FROM { (x [, ...]) | UNBOUNDED } TO { (y [, ...]) | UNBOUNDED } > > But, it was pointed out to me [1] that that doesn't address the use case, > for example, where part1 goes up to (10, 10) and part2 goes from (10, 10) > up to (10, unbounded). > [Reading that other thread] It's a reasonable point that our syntax is quite different from Oracle's, and doing this takes it even further away, and removes support for things that they do support. For the record, Oracle allows things like the following: DROP TABLE t1; CREATE TABLE t1 (a NUMBER, b NUMBER, c NUMBER) PARTITION BY RANGE (a,b,c) (PARTITION t1p1 VALUES LESS THAN (1,2,3), PARTITION t1p2 VALUES LESS THAN (2,3,4), PARTITION t1p3 VALUES LESS THAN (3,MAXVALUE,5), PARTITION t1p4 VALUES LESS THAN (4,MAXVALUE,6) ); INSERT INTO t1 VALUES(1,2,3); INSERT INTO t1 VALUES(2,3,4); INSERT INTO t1 VALUES(3,4,5); INSERT INTO t1 VALUES(3.01,4,5); INSERT INTO t1 VALUES(4,5,10); COLUMN subobject_name FORMAT a20; SELECT a, b, c, subobject_name FROM t1, user_objects o WHERE o.data_object_id = dbms_rowid.rowid_object(t1.ROWID) ORDER BY a,b,c; A B C SUBOBJECT_NAME -- -- -- 1 2 3 T1P2 2 3 4 T1P3 3 4 5 T1P3 3.01 4 5 T1P4 4 5 10 T1P4 So they use MAXVALUE instead of UNBOUNDED for an upper bound, which is more explicit. They don't have an equivalent MINVALUE, but it's arguably not necessary, since the first partition's lower bound is implicitly unbounded. With this syntax they don't need to worry about gaps or overlaps between partitions, which is nice, but arguably less flexible. They're also more lax about allowing finite values after MAXVALUE, and they document the fact that any value after a MAXVALUE is ignored. I don't think their scheme provides any way to define a partition of the above table that would hold all rows for which a < some value. So if we were to go for maximum flexibility and compatibility with Oracle, then perhaps what we would do is more like the original idea of UNBOUNDED ABOVE/BELOW, except call them MINVALUE and MAXVALUE, which conveniently are already unreserved keywords, as well as being much shorter. Plus, we would also relax the constraint about having finite values after MINVALUE/MAXVALUE. I think I'll go play around with that idea to see what it looks like in practice. Your previous patch already does much of that, and is far less invasive. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi column range partition table
On 5 July 2017 at 10:43, Amit Langotewrote: >> So the more I think about this, the more I think that a cleaner design >> would be as follows: >> >> 1). Don't allow UNBOUNDED, except in the first column, where it can >> keep it's current meaning. >> >> 2). Allow the partition bounds to have fewer columns than the >> partition definition, and have that mean the same as it would have >> meant if you were partitioning by that many columns. So, for >> example, if you were partitioning by (col1,col2), you'd be allowed >> to define a partition like so: >> >> FROM (x) TO (y) >> >> and it would mean >> >> x <= col1 < y >> >> Or you'd be able to define a partition like >> >> FROM (x1,x2) TO (y) >> >> which would mean >> >> (col1 > x1) OR (col1 = x1 AND col2 >= x2) AND col1 < y >> >> 3). Don't allow any value after UNBOUNDED (i.e., only specify >> UNBOUNDED once in a partition bound). > > I assume we don't need the ability of specifying ABOVE/BELOW in this design. > Yes that's right. > In retrospect, that sounds like something that was implemented in the > earlier versions of the patch, whereby there was no ability to specify > UNBOUNDED on a per-column basis. So the syntax was: > > FROM { (x [, ...]) | UNBOUNDED } TO { (y [, ...]) | UNBOUNDED } > Yes, that's where I ended up too. > But, it was pointed out to me [1] that that doesn't address the use case, > for example, where part1 goes up to (10, 10) and part2 goes from (10, 10) > up to (10, unbounded). > > The new design will limit the usage of unbounded range partitions at the > tail ends. > True, but I don't think that's really a problem. When the first column is a discrete type, an upper bound of (10, unbounded) can be rewritten as (11) in the new design. When it's a continuous type, e.g. floating point, it can no longer be represented, because (10.0, unbounded) really means (col1 <= 10.0). But we've already decided not to support anything other than inclusive lower bounds and exclusive upper bounds, so allowing this upper bound goes against that design choice. >> Of course, it's pretty late in the day to be proposing this kind of >> redesign, but I fear that if we don't tackle it now, it will just be >> harder to deal with in the future. >> >> Actually, a quick, simple hacky implementation might be to just fill >> in any omitted values in a partition bound with negative infinity >> internally, and when printing a bound, omit any values after an >> infinite value. But really, I think we'd want to tidy up the >> implementation, and I think a number of things would actually get much >> simpler. For example, get_qual_for_range() could simply stop when it >> reached the end of the list of values for the bound, and it wouldn't >> need to worry about an unbounded value following a bounded one. >> >> Thoughts? > > I cooked up a patch for the "hacky" implementation for now, just as you > described in the above paragraph. Will you be willing to give it a look? > I will also think about the non-hacky way of implementing this. > OK, I'll take a look. Meanwhile, I already had a go at the "non-hacky" implementation (WIP patch attached). The more I worked on it, the simpler things got, which I think is a good sign. Part-way through, I realised that the PartitionRangeDatum Node type is no longer needed, because each bound value is now necessarily finite, so the lowerdatums and upperdatums lists in a PartitionBoundSpec can now be made into lists of Const nodes, making them match the listdatums field used for LIST partitioning, and then a whole lot of related code gets simplified. It needed a little bit more code in partition.c to track individual bound sizes, but there were a number of other places that could be simplified, so overall this represents a reduction in the code size and complexity. It's not complete (e.g., no doc updates yet), but it passes all the tests, and so far seems to work as I would expect. Regards, Dean diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c new file mode 100644 index 7da2058..aade9f5 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -66,23 +66,26 @@ * is an upper bound. */ -/* Ternary value to represent what's contained in a range bound datum */ -typedef enum RangeDatumContent +/* Type of an individual bound of a range partition */ +typedef enum RangeBoundKind { - RANGE_DATUM_FINITE = 0, /* actual datum stored elsewhere */ - RANGE_DATUM_NEG_INF, /* negative infinity */ - RANGE_DATUM_POS_INF /* positive infinity */ -} RangeDatumContent; + RANGE_BOUND_FINITE = 0, /* actual bound stored in the datums array */ + RANGE_BOUND_NEG_INF, /* negative infinity; NULL datums array */ + RANGE_BOUND_POS_INF /* positive infinity; NULL datums array */ +} RangeBoundKind; typedef struct PartitionBoundInfoData { char strategy; /* list or range bounds? */ int ndatums; /*
Re: [HACKERS] Multi column range partition table
On 3 July 2017 at 10:32, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > On 2017/07/03 17:36, Dean Rasheed wrote: >> The bigger question is do we want this for PG10? If so, time is >> getting tight. My feeling is that we do, because otherwise we'd be >> changing the syntax in PG11 of a feature only just released in PG10, >> and I think the current syntax is flawed, so it would be better not to >> have it in any public release. I'd feel better hearing from the >> original committer though. > > The way I have extended the syntax in the posted patch, ABOVE/BELOW (or > whatever we decide instead) are optional. UNBOUNDED without the > ABOVE/BELOW specifications implicitly means UNBOUNDED ABOVE if in FROM and > vice versa, which seems to me like sensible default behavior and what's > already present in PG 10. > > Do you think ABOVE/BELOW shouldn't really be optional? > Hmm, I'm not so sure about that. The more I think about this, the more I think that the current design is broken, and that introducing UNBOUNDED ABOVE/BELOW is just a sticking plaster to cover that up. Yes, it allows nicer multi-column ranges to be defined, as demonstrated upthread. But, it also allows some pretty counterintuitive things like making the lower bound exclusive and the upper bound inclusive. I think that's actually the real problem with the current design. If I have a single-column partition like (col) FROM (x) TO (y) it's pretty clear that's a simple range, inclusive at the lower end and exclusive at the upper end: (x) <= (col) < (y) If I now make that a 2-column partition, but leave the second column unbounded: (col1,col2) FROM (x,UNBOUNDED) TO (y,UNBOUNDED) my initial expectation would have been for that to mean the same thing, i.e., (x) <= (col1) < (y) but that only happens if "UNBOUNDED" means negative infinity in both places. That then starts to give the sort of desirable properties you'd expect, like using the same expression for the lower bound of one partition as the upper bound of another makes the two partitions contiguous. But of course, that's not exactly a pretty design either, because then you'd be saying that UNBOUNDED means positive infinity if it's the upper bound of the first column, and negative infinity if it's the lower bound of the first column or either bound of any other column. Another aspect of the current design I don't like is that you have to keep repeating UNBOUNDED [ABOVE/BELOW], for each of the rest of the columns in the bound, and anything else is an error. That's a pretty verbose way of saying "the rest of the columns are unbounded". So the more I think about this, the more I think that a cleaner design would be as follows: 1). Don't allow UNBOUNDED, except in the first column, where it can keep it's current meaning. 2). Allow the partition bounds to have fewer columns than the partition definition, and have that mean the same as it would have meant if you were partitioning by that many columns. So, for example, if you were partitioning by (col1,col2), you'd be allowed to define a partition like so: FROM (x) TO (y) and it would mean x <= col1 < y Or you'd be able to define a partition like FROM (x1,x2) TO (y) which would mean (col1 > x1) OR (col1 = x1 AND col2 >= x2) AND col1 < y 3). Don't allow any value after UNBOUNDED (i.e., only specify UNBOUNDED once in a partition bound). This design has a few neat properties: - Lower bounds are always inclusive and upper bounds are always exclusive. - If the expression for the lower bound of one partition is the same as the expression for the upper bound of another, the 2 partitions are contiguous, making it easy to define a covering set of partitions. - It's much easier to understand what a bound of "(x)" means than "(x,UNBOUNDED [ABOVE/BELOW])" - It's much less verbose, and there's no needless repetition. Of course, it's pretty late in the day to be proposing this kind of redesign, but I fear that if we don't tackle it now, it will just be harder to deal with in the future. Actually, a quick, simple hacky implementation might be to just fill in any omitted values in a partition bound with negative infinity internally, and when printing a bound, omit any values after an infinite value. But really, I think we'd want to tidy up the implementation, and I think a number of things would actually get much simpler. For example, get_qual_for_range() could simply stop when it reached the end of the list of values for the bound, and it wouldn't need to worry about an unbounded value following a bounded one. Thoughts? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi column range partition table
On 3 July 2017 at 06:00, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > On 2017/07/03 2:15, Dean Rasheed wrote: >> My first thought was UNBOUNDED ABOVE/BELOW, because that matches the >> terminology already in use of upper and lower bounds. > > I was starting to like the Ashutosh's suggested UNBOUNDED MIN/MAX syntax, > but could you clarify your comment that ABOVE/BELOW is the terminology > already in use of upper and lower bounds? I couldn't find ABOVE/BELOW in > our existing syntax anywhere that uses the upper/lower bound notion, so > was confused a little bit. > I just meant that the words "above" and "below" more closely match the already-used terms "upper" and "lower" for the bounds, so that terminology seemed more consistent, e.g. "UNBOUNDED ABOVE" => no upper bound. > Also, I assume UNBOUNDED ABOVE signifies positive infinity and vice versa. > Right. I'm not particularly wedded to that terminology. I always find naming things hard, so if anyone can think of anything better, let's hear it. The bigger question is do we want this for PG10? If so, time is getting tight. My feeling is that we do, because otherwise we'd be changing the syntax in PG11 of a feature only just released in PG10, and I think the current syntax is flawed, so it would be better not to have it in any public release. I'd feel better hearing from the original committer though. Meanwhile, I'll continue trying to review the latest patches... Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi column range partition table
On 30 June 2017 at 10:04, Ashutosh Bapatwrote: > On Fri, Jun 30, 2017 at 1:36 PM, Amit Langote > wrote: >> >> Alright, I spent some time implementing a patch to allow specifying >> -infinity and +infinity in arbitrary ways. Of course, it prevents >> nonsensical inputs with appropriate error messages. > > I don't think -infinity and +infinity are the right terms. For a > string or character data type there is no -infinity and +infinity. > Similarly for enums. We need to extend UNBOUNDED somehow to indicate > the end of a given type in the given direction. I thought about > UNBOUNDED LEFT/RIGHT but then whether LEFT indicates -ve side or +side > would cause confusion. Also LEFT/RIGHT may work for a single > dimensional datatype but not for multi-dimensional spaces. How about > MINIMUM/MAXIMUM or UNBOUNDED MIN/MAX to indicate the extremities. > Yes, I think you're right. Also, some datatypes include values that are equal to +/-infinity, which would then behave differently from unbounded as range bounds, so it wouldn't be a good idea to overload that term. My first thought was UNBOUNDED ABOVE/BELOW, because that matches the terminology already in use of upper and lower bounds. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi column range partition table
On 30 June 2017 at 09:06, Amit Langotewrote: > When testing the patch, I realized that the current code in > check_new_partition_bound() that checks for range partition overlap had a > latent bug that resulted in false positives for the new cases that the new > less restrictive syntax allowed. I spent some time simplifying that code > while also fixing the aforementioned bug. It's implemented in the > attached patch 0001. > I haven't had time to look at 0002 yet, but looking at 0001, I'm not convinced that this really represents much of a simplification, but I do prefer the way it now consistently reports the first overlapping partition in the error message. I'm not entirely convinced by this change either: -if (equal || off1 != off2) +if (off2 > off1 + 1 || ((off2 == off1 + 1) && !equal)) Passing probe_is_bound = true to partition_bound_bsearch() will I think cause it to return equal = false when the upper bound of one partition equals the lower bound of another, so relying on the "equal" flag here seems a bit dubious. I think I can just about convince myself that it works, but not for the reasons stated in the comments. It also seems unnecessary for this code to be doing 2 binary searches. I think a better simplification would be to just do one binary search to find the gap that the lower bound fits in, and then test the upper bound of the new partition against the lower bound of the next partition (if there is one), as in the attached patch. Regards, Dean diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c new file mode 100644 index 7da2058..96760a0 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -745,78 +745,62 @@ check_new_partition_bound(char *relname, if (partdesc->nparts > 0) { PartitionBoundInfo boundinfo = partdesc->boundinfo; - int off1, -off2; - bool equal = false; + int offset; + bool equal; Assert(boundinfo && boundinfo->ndatums > 0 && boundinfo->strategy == PARTITION_STRATEGY_RANGE); /* - * Firstly, find the greatest range bound that is less - * than or equal to the new lower bound. + * Test whether the new lower bound (which is treated + * inclusively as part of the new partition) lies inside an + * existing partition, or in a gap. + * + * If it's in a gap, the next index value will be -1 (the + * lower bound of the next partition). This is also true + * if there is no next partition, since the index array is + * initialised with an extra -1 at the end. + * + * Note that this also allows for the possibility that the + * new lower bound equals an existing upper bound. */ - off1 = partition_bound_bsearch(key, boundinfo, lower, true, - ); + offset = partition_bound_bsearch(key, boundinfo, lower, + true, ); - /* - * off1 == -1 means that all existing bounds are greater - * than the new lower bound. In that case and the case - * where no partition is defined between the bounds at - * off1 and off1 + 1, we have a "gap" in the range that - * could be occupied by the new partition. We confirm if - * so by checking whether the new upper bound is confined - * within the gap. - */ - if (!equal && boundinfo->indexes[off1 + 1] < 0) + if (boundinfo->indexes[offset + 1] < 0) { - off2 = partition_bound_bsearch(key, boundinfo, upper, - true, ); - /* - * If the new upper bound is returned to be equal to - * the bound at off2, the latter must be the upper - * bound of some partition with which the new - * partition clearly overlaps. - * - * Also, if bound at off2 is not same as the one - * returned for the new lower bound (IOW, off1 != - * off2), then the new partition overlaps at least one - * partition. + * Check that the new partition will fit in the gap. + * For it to fit, the new upper bound must be less than + * or equal to the lower bound of the next partition, + * if there is one. */ - if (equal || off1 != off2) + if (offset + 1 < boundinfo->ndatums) { - overlap = true; + int32 cmpval; - /* - * The bound at off2 could be the lower bound of - * the partition with which the new partition - * overlaps. In that case, use the upper bound - * (that is, the bound at off2 + 1) to get the - * index of that partition. - */ - if (boundinfo->indexes[off2] < 0) -with = boundinfo->indexes[off2 + 1]; - else -with = boundinfo->indexes[off2]; + cmpval = partition_bound_cmp(key, boundinfo, + offset + 1, upper, + true); + if (cmpval < 0) + { +/* + * The new
Re: [HACKERS] Multi column range partition table
On 23 June 2017 at 08:01, Ashutosh Bapatwrote: > The way we have designed our syntax, we don't have a way to tell that > p3 comes after p2 and they have no gap between those. But I don't > think that's your question. What you are struggling with is a way to > specify a lower bound (10, +infinity) so that anything with i1 > 10 > would go to partition 3. > I think actually there is a fundamental problem here, which arises because UNBOUNDED has 2 different meanings depending on context, and thus it is not possible in general to specify the start of one range to be equal to the end of the previous range, as is necessary to get contiguous non-overlapping ranges. Note that this isn't just a problem for floating point datatypes either, it also applies to other types such as strings. For example, given a partition over (text, int) types defined with the following values: FROM ('a', UNBOUNDED) TO ('b', UNBOUNDED) which is equivalent to FROM ('a', -INFINITY) TO ('b', +INFINITY) where should the next range start? Even if you were to find a way to specify "the next string after 'b'", it wouldn't exactly be pretty. The problem is that the above partition corresponds to "all the strings starting with 'a', plus the string 'b', which is pretty ugly. A neater way to define the pair of ranges in this case would be: FROM ('a', -INFINITY) TO ('b', -INFINITY) FROM ('b', -INFINITY) TO ('c', -INFINITY) since then all strings starting with 'a' would fall into the first partition and all the strings starting with 'b' would fall into the second one. Currently, when there are 2 partition columns, the partition constraint is defined as (a is not null) and (b is not null) and (a > al or (a = al and b >= bl)) and (a < au or (a = au and b < bu)) if the upper bound bu were allowed to be -INFINITY (something that should probably be forbidden unless the previous column's upper bound were finite), then this would simplify to (a is not null) and (b is not null) and (a > al or (a = al and b >= bl)) and (a < au) and in the example above, where al is -INFINITY, it would further simplify to (a is not null) and (b is not null) and (a >= al) and (a < au) There would also be a similar simplification possible if the lower bound of a partition column were allowed to be +INFINITY. So, I think that having UNBOUNDED represent both -INFINITY and +INFINITY depending on context is a design flaw, and that we need to allow both -INFINITY and +INFINITY as upper and lower bounds (provided they are preceded by a column with a finite bound). I think that, in general, that's the only way to allow contiguous non-overlapping partitions to be defined on multiple columns. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rules on table partitions
On 20 June 2017 at 03:01, Amit Langotewrote: > Hmm, yes. The following exercise convinced me. > > create table r (a int) partition by range (a); > create table r1 partition of r for values from (1) to (10); > create rule "_RETURN" as on select to r1 do instead select * from r; > > insert into r values (1); > ERROR: cannot insert into view "r1" > HINT: To enable inserting into the view, provide an INSTEAD OF INSERT > trigger or an unconditional ON INSERT DO INSTEAD rule. > > The error is emitted by CheckValidResultRel() that is called on individual > leaf partitions when setting up tuple-routing in ExecInitModifyTable. > > I agree that we should forbid this case, so please find attached a patch. > Committed. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rules on table partitions
On 20 June 2017 at 03:01, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > On 2017/06/19 20:19, Dean Rasheed wrote: >> Currently we allow rules to be defined on table partitions, but these >> rules only fire when the partition is accessed directly, not when it >> is accessed via the parent: > > Yeah, the same thing as will happen with an inheritance setup, but I guess > you are asking whether that's what we want. > > With old-style inheritance based setup, you will see exactly the exact > same result: > > Note that inheritance is expanded in the planner, not the rewriter, so the > rules of partitions (that are added to the query later than the rewriter) > won't fire, AFAICS. Same will apply to partitions. > Ah yes, you're right. I was misremembering how that worked. In an old-style inheritance-based setup you would have to define triggers on the parent table to handle INSERT, and since they would insert directly into the relevant child, any child INSERT rules would be fired, but child UPDATE and DELETE rules would not. At least with built-in partitioning it's consistent in not firing rules on the partitions for any command. >> Perhaps we >> should explicitly forbid this for now -- i.e., raise a "not supported" >> error when attempting to add a rule to a partition, or attach a table >> with rules to a partitioned table. > > We could do that, but an oft-raised question is how different we should > make new partitions from the old-style inheritance child tables? > > Although a slightly different territory, you will also notice that > statement triggers of partitions won't be fired unless they are explicitly > named in the query, which is what happens for inheritance in general and > hence also for partitions. > >> Personally, I wouldn't regard adding proper support for rules on >> partitions as a high priority, so I'd be OK with it remaining >> unsupported unless someone cares enough to implement it, but that >> seems preferable to leaving it partially working in this way. > > Sure, if consensus turns out to be that we prohibit rules, statement > triggers, etc. that depend on the relation being explicitly named in the > query to be defined on partitions, I could draft up a patch for v10. > Hmm, perhaps it's OK as-is. The user can always define the rules/triggers on both the parent and the children, if that's what they want. >> Also, as things stand, it is possible to do the following: >> >> CREATE TABLE t2(a int, b int) PARTITION BY RANGE(a); >> CREATE TABLE t2_p PARTITION OF t2 FOR VALUES FROM (1) TO (10); >> CREATE RULE "_RETURN" AS ON SELECT TO t2_p DO INSTEAD SELECT * FROM t2; >> >> which results in the partition becoming a view that selects from the >> parent, which surely ought to be forbidden. > > Hmm, yes. The following exercise convinced me. > > create table r (a int) partition by range (a); > create table r1 partition of r for values from (1) to (10); > create rule "_RETURN" as on select to r1 do instead select * from r; > > insert into r values (1); > ERROR: cannot insert into view "r1" > HINT: To enable inserting into the view, provide an INSTEAD OF INSERT > trigger or an unconditional ON INSERT DO INSTEAD rule. > > The error is emitted by CheckValidResultRel() that is called on individual > leaf partitions when setting up tuple-routing in ExecInitModifyTable. > > I agree that we should forbid this case, Yeah. Also it would be possible to define the rule to select from a non-empty table, leading to rows appearing in the partition, but not the parent. Since we normally explicitly forbid the use of a view as a table partition, it seems worth closing the loophole in this case. > so please find attached a patch. Thanks, I'll take a look at it later today. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Rules on table partitions
Currently we allow rules to be defined on table partitions, but these rules only fire when the partition is accessed directly, not when it is accessed via the parent: CREATE TABLE t1(a int, b int) PARTITION BY RANGE(a); CREATE TABLE t1_p PARTITION OF t1 FOR VALUES FROM (1) TO (10); INSERT INTO t1 VALUES (1,101), (2,201); CREATE TABLE t1_p_log(a int, b int, d date); CREATE RULE t1_p_log_rule AS ON UPDATE TO t1_p DO ALSO INSERT INTO t1_p_log VALUES(old.a, old.b, now()); UPDATE t1 SET b=b+1 WHERE a=1; UPDATE t1_p SET b=b+1 WHERE a=2; SELECT * FROM t1_p_log; a | b | d ---+-+ 2 | 201 | 2017-06-19 (1 row) I'd regard that as a bug, especially since this kind of thing would have worked with old-style user-defined partitioning. Perhaps we should explicitly forbid this for now -- i.e., raise a "not supported" error when attempting to add a rule to a partition, or attach a table with rules to a partitioned table. Personally, I wouldn't regard adding proper support for rules on partitions as a high priority, so I'd be OK with it remaining unsupported unless someone cares enough to implement it, but that seems preferable to leaving it partially working in this way. Also, as things stand, it is possible to do the following: CREATE TABLE t2(a int, b int) PARTITION BY RANGE(a); CREATE TABLE t2_p PARTITION OF t2 FOR VALUES FROM (1) TO (10); CREATE RULE "_RETURN" AS ON SELECT TO t2_p DO INSTEAD SELECT * FROM t2; which results in the partition becoming a view that selects from the parent, which surely ought to be forbidden. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG10 Partitioned tables and relation_is_updatable()
On 13 June 2017 at 05:50, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: > On Tue, Jun 13, 2017 at 12:03 AM, Dean Rasheed <dean.a.rash...@gmail.com> > wrote: >> Barring objections, I'll push my original patch and work up patches >> for the other couple of issues I found. > > No objections, the patch is good to go as is. Sorry for high-jacking > this thread. > Thanks for the review. Patch pushed. Here are patches for the other 2 issues, with test cases showing how they currently fail on HEAD. Regards, Dean diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c new file mode 100644 index 4a75842..dad31df --- a/src/backend/commands/policy.c +++ b/src/backend/commands/policy.c @@ -474,7 +474,8 @@ RemoveRoleFromObjectPolicy(Oid roleid, O rel = relation_open(relid, AccessExclusiveLock); - if (rel->rd_rel->relkind != RELKIND_RELATION) + if (rel->rd_rel->relkind != RELKIND_RELATION && + rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("\"%s\" is not a table", diff --git a/src/test/regress/expected/rowsecurity.out b/src/test/regress/expected/rowsecurity.out new file mode 100644 index e2ec961..26d28f2 --- a/src/test/regress/expected/rowsecurity.out +++ b/src/test/regress/expected/rowsecurity.out @@ -3885,6 +3885,7 @@ RESET SESSION AUTHORIZATION; CREATE ROLE regress_rls_dob_role1; CREATE ROLE regress_rls_dob_role2; CREATE TABLE dob_t1 (c1 int); +CREATE TABLE dob_t2 (c1 int) PARTITION BY RANGE (c1); CREATE POLICY p1 ON dob_t1 TO regress_rls_dob_role1 USING (true); DROP OWNED BY regress_rls_dob_role1; DROP POLICY p1 ON dob_t1; -- should fail, already gone @@ -3892,6 +3893,9 @@ ERROR: policy "p1" for table "dob_t1" d CREATE POLICY p1 ON dob_t1 TO regress_rls_dob_role1,regress_rls_dob_role2 USING (true); DROP OWNED BY regress_rls_dob_role1; DROP POLICY p1 ON dob_t1; -- should succeed +CREATE POLICY p1 ON dob_t2 TO regress_rls_dob_role1,regress_rls_dob_role2 USING (true); +DROP OWNED BY regress_rls_dob_role1; +DROP POLICY p1 ON dob_t2; -- should succeed DROP USER regress_rls_dob_role1; DROP USER regress_rls_dob_role2; -- diff --git a/src/test/regress/sql/rowsecurity.sql b/src/test/regress/sql/rowsecurity.sql new file mode 100644 index 3ce9293..ba8fed4 --- a/src/test/regress/sql/rowsecurity.sql +++ b/src/test/regress/sql/rowsecurity.sql @@ -1740,6 +1740,7 @@ CREATE ROLE regress_rls_dob_role1; CREATE ROLE regress_rls_dob_role2; CREATE TABLE dob_t1 (c1 int); +CREATE TABLE dob_t2 (c1 int) PARTITION BY RANGE (c1); CREATE POLICY p1 ON dob_t1 TO regress_rls_dob_role1 USING (true); DROP OWNED BY regress_rls_dob_role1; @@ -1749,6 +1750,10 @@ CREATE POLICY p1 ON dob_t1 TO regress_rl DROP OWNED BY regress_rls_dob_role1; DROP POLICY p1 ON dob_t1; -- should succeed +CREATE POLICY p1 ON dob_t2 TO regress_rls_dob_role1,regress_rls_dob_role2 USING (true); +DROP OWNED BY regress_rls_dob_role1; +DROP POLICY p1 ON dob_t2; -- should succeed + DROP USER regress_rls_dob_role1; DROP USER regress_rls_dob_role2; diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c new file mode 100644 index a637551..cc09355 --- a/src/pl/plpgsql/src/pl_comp.c +++ b/src/pl/plpgsql/src/pl_comp.c @@ -1761,7 +1761,8 @@ plpgsql_parse_cwordtype(List *idents) classStruct->relkind != RELKIND_VIEW && classStruct->relkind != RELKIND_MATVIEW && classStruct->relkind != RELKIND_COMPOSITE_TYPE && - classStruct->relkind != RELKIND_FOREIGN_TABLE) + classStruct->relkind != RELKIND_FOREIGN_TABLE && + classStruct->relkind != RELKIND_PARTITIONED_TABLE) goto done; /* @@ -1987,7 +1988,8 @@ build_row_from_class(Oid classOid) classStruct->relkind != RELKIND_VIEW && classStruct->relkind != RELKIND_MATVIEW && classStruct->relkind != RELKIND_COMPOSITE_TYPE && - classStruct->relkind != RELKIND_FOREIGN_TABLE) + classStruct->relkind != RELKIND_FOREIGN_TABLE && + classStruct->relkind != RELKIND_PARTITIONED_TABLE) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("relation \"%s\" is not a table", relname))); diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out new file mode 100644 index 7ebbde6..35b83f7 --- a/src/test/regress/expected/plpgsql.out +++ b/src/test/regress/expected/plpgsql.out @@ -5979,3 +5979,48 @@ LINE 1: SELECT (SELECT string_agg(id || ^ QUERY: SELECT (SELECT string_agg(id || '=' || name, ',') FROM d) CONTEXT: PL/pgSQL function alter_table_under_transition_tables_upd_func() line 3 at RAISE +-- +-- Check type parsing and record fetching from partitioned tables +-- +CREATE TABLE partitioned_table (a int, b text) PARTITION BY LIST (a); +CREATE TABLE pt_p
Re: [HACKERS] PG10 Partitioned tables and relation_is_updatable()
On 13 June 2017 at 05:50, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: > On Tue, Jun 13, 2017 at 12:03 AM, Dean Rasheed <dean.a.rash...@gmail.com> > wrote: >> My initial thought, looking at the patch, is that it might be better >> to have all the macros in one file to make them easier to maintain. > > Right now the macros are listed just below relkind enum in pg_class.h. > Is that a good place or do you think, we should list them in a > separate file? > Yeah, I wondered about putting them in a separate file, but I think just below the relkind enum is probably the best place, so that people changing that enum immediately see the first set of related things to be updated. >> Barring objections, I'll push my original patch and work up patches >> for the other couple of issues I found. > > No objections, the patch is good to go as is. Sorry for high-jacking > this thread. > No worries. I missed that other thread initially, so it was useful to link the 2 threads together. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG10 Partitioned tables and relation_is_updatable()
On 12 June 2017 at 17:51, Joe Conwaywrote: > On 06/12/2017 07:40 AM, Joe Conway wrote: >> On 06/12/2017 01:49 AM, Amit Langote wrote: >>> As he mentioned in his reply, Ashutosh's proposal to abstract away the >>> relkind checks is interesting in this regard. >>> >> I have not looked at Ashutosh's patch yet, but it sounds like a good >> idea to me. > > After looking I remain convinced - +1 in general. > Yes, I think this will probably help, but I worry that it will turn into quite a large and invasive patch, and there are a number of design choices to be made over the naming and precise set of macros. Is this really PG10 material? My initial thought, looking at the patch, is that it might be better to have all the macros in one file to make them easier to maintain. > One thing I would change -- in many cases there are ERROR messages > enumerating the relkinds. E.g.: > 8<- > if (!RELKIND_CAN_HAVE_COLUMN_COMMENT(relation->rd_rel->relkind)) > ereport(ERROR, > (errcode(ERRCODE_WRONG_OBJECT_TYPE), > errmsg("\"%s\" is not a table, view, materialized view, > composite type, or foreign table", > 8<- > > I think those messages should also be rewritten to make them less > relkind specific and more clear, otherwise we still risk getting out of > sync in the future. Maybe something like this: > 8<- >"\"%s\" is not a kind of relation that can have column comments" > 8<- > Thoughts? > -1. I think the existing error messages provide useful information that you'd be removing. If you're worried about the error messages getting out of sync, then wouldn't it be better to centralise them along with the corresponding macros? > In any case, unless another committer else wants it, and assuming no > complaints with the concept, I'd be happy to take this. > OK, have at it. Barring objections, I'll push my original patch and work up patches for the other couple of issues I found. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Make ANALYZE more selective about what is a "most common value"?
On 11 June 2017 at 20:19, Tom Lanewrote: >> The standard way of doing this is to calculate the "standard error" of >> the sample proportion - see, for example [3], [4]: >> SE = sqrt(p*(1-p)/n) >> Note, however, that this formula assumes that the sample size n is >> small compared to the population size N, which is not necessarily the >> case. This can be taken into account by applying the "finite >> population correction" (see, for example [5]), which involves >> multiplying by an additional factor: >> SE = sqrt(p*(1-p)/n) * sqrt((N-n)/(N-1)) > > It's been a long time since college statistics, but that wikipedia article > reminds me that the binomial distribution isn't really the right thing for > our problem anyway. We're doing sampling without replacement, so that the > correct model is the hypergeometric distribution. Yes that's right. > The article points out > that the binomial distribution is a good approximation as long as n << N. > Can this FPC factor be justified as converting binomial estimates into > hypergeometric ones, or is it ad hoc? No, it's not just ad hoc. It comes from the variance of the hypergeometric distribution [1] divided by the variance of a binomial distribution [2] with p=K/N, in the notation of those articles. This is actually a very widely used formula, used in fields like analysis of survey data, which is inherently sampling without replacement (assuming the questioners don't survey the same people more than once!). Regards, Dean [1] https://en.wikipedia.org/wiki/Hypergeometric_distribution [2] https://en.wikipedia.org/wiki/Binomial_distribution -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG10 Partitioned tables and relation_is_updatable()
On 11 June 2017 at 16:59, Joe Conwaywrote: > On 06/11/2017 08:55 AM, Joe Conway wrote: >> Yeah, I noticed the same while working on the RLS related patch. I did >> not see anything else in rewriteHandler.c but it is probably worth >> looking wider for other omissions. > > So in particular: > > #define RelationIsUsedAsCatalogTable(relation) \ > ((relation)->rd_options && \ > ((relation)->rd_rel->relkind == RELKIND_RELATION || \ > (relation)->rd_rel->relkind == RELKIND_MATVIEW) ? \ > ((StdRdOptions *) (relation)->rd_options)->user_catalog_table : false) > 8< > > Does RELKIND_PARTITIONED_TABLE apply there? > Hmm, a quick experiment shows that it won't allow a partitioned table to be used as a user catalog table, although I'm not sure if that's intentional. If it is, it doesn't appear to be documented. I found another RLS-related omission -- RemoveRoleFromObjectPolicy() doesn't work for partitioned tables, so DROP OWNED BY will fail if there are any partitioned tables with RLS. I also found another couple of omissions in PL/pgSQL -- plpgsql_parse_cwordtype() and build_row_from_class(), so for example %rowtype doesn't work for partitioned tables. That's it for my quick once-over the code-base, but I'm not confident that will have caught everything. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Make ANALYZE more selective about what is a "most common value"?
On 05/06/17 09:30, Tom Lane wrote: > First, I think we need a larger hard floor on the number of occurrences > of a value that're required to make ANALYZE decide it is a "most common > value"... > > Second, the code also has a rule that potential MCVs need to have an > estimated frequency at least 25% larger than what it thinks the "average" > value's frequency is. A rule of that general form seems like a good idea, > but I now think the 25% threshold is far too small to do anything useful. > In particular, in any case like this where there are more distinct values > than there are sample rows, the "average frequency" estimate will > correspond to less than one occurrence in the sample, so that this rule is > totally useless to filter anything that we would otherwise consider as an > MCV. I wonder if we shouldn't make it be "at least double the estimated > average frequency". > I think we should attempt to come up with a more principled approach to this, taking into account the table and sample sizes. Here's what I found, after a bit of research: A common initial rule of thumb is that the value should occur at least 10 times in the sample - see, for example [1], [2]. The idea behind that goes as follows: Suppose that N is the population size (total number of rows in the table), and n is the sample size, and that some particular candidate value appears x times in the sample. Then the "sample proportion" is given by p = x/n. Taking a different sample would, in general, give a different value for x, and hence for the sample proportion, so repeatedly taking different random samples of the table would lead to a distribution of values of p, and in general, this distribution would be a binomial distribution, since x is the count of sampled rows matching a yes/no question (does the row match the candidate value?). The idea behind the rule of thumb above is that if n is large enough, and x is not too close to 0 or n, then the binomial distribution of p can be reasonably approximated by a normal distribution. There are various rules of thumb for this to be true, but this one is actually one of the stronger ones, as can be seen in [2]. Technically the rule should be: x >= 10 and x <= n-10 but it's probably not worth bothering with the second test, since we're looking for MCVs. Note that this says nothing about the margin of error of p, just that it is reasonable to treat it as having a normal distribution, which then allows the margin of error to be analysed using standard techniques. The standard way of doing this is to calculate the "standard error" of the sample proportion - see, for example [3], [4]: SE = sqrt(p*(1-p)/n) Note, however, that this formula assumes that the sample size n is small compared to the population size N, which is not necessarily the case. This can be taken into account by applying the "finite population correction" (see, for example [5]), which involves multiplying by an additional factor: SE = sqrt(p*(1-p)/n) * sqrt((N-n)/(N-1)) This correction factor becomes 0 when n=N (whole table sampled => no error) and it approaches 1 when n is much smaller than N. Having computed the standard error of the sample proportion, it is then possible to establish a confidence interval around p. For example, one could say that there is a 95% probability that the total population proportion lies in the range [ pmin = p-2*SE, pmax = p+2*SE ] This gives rise to the possibility of writing another rule for candidate MCVs: If there are Nd distinct values in the table, so that the average frequency of occurrence of any particular value is 1/Nd, then the test pmin > 1/Nd would imply that there is a 97.5% probably that the candidate value is more common than the average (since the 5% of the distribution of p outside the confidence interval is split evenly below pmin and above pmax). To take a concrete example, suppose that the table has N=1000,000 rows, with Nd=500 distinct values, so that each value occurs on average 2000 times. If the sample size is 30,000 then the current rule that a value needs to have an estimated frequency 25% over the average means that a value would need to be seen 75 times to be considered as an MCV. If that rule were changed to "at least double the estimated average frequency", then a value would need to be seen at least 150 times. On the other hand, using the above rule based on a 95% confidence interval, a value would only need to be seen 78 times, in which case the estimated total number of occurrences of the value in the table would be 2600+/-579, and using the MCV estimate would almost certainly be better than not using it. Regards, Dean [1] https://onlinecourses.science.psu.edu/stat200/node/43 [2] https://en.wikipedia.org/wiki/Binomial_distribution#Normal_approximation [3] http://mathworld.wolfram.com/SampleProportion.html [4] https://en.wikipedia.org/wiki/Population_proportion [5]
[HACKERS] PG10 Partitioned tables and relation_is_updatable()
Hi, It looks like relation_is_updatable() didn't get the message about partitioned tables. Thus, for example, information_schema.views and information_schema.columns report that simple views built on top of partitioned tables are non-updatable, which is wrong. Attached is a patch to fix this. I think this kind of omission is an easy mistake to make when adding a new relkind, so it might be worth having more pairs of eyes looking out for more of the same. I did a quick scan of the rewriter code (prompted by the recent similar omission for RLS on partitioned tables) and I didn't find any more problems there, but I haven't looked elsewhere yet. Regards, Dean teach-relation-is-updatable-about-partitioned-tables.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statistics "dependency"
On 23 April 2017 at 03:37, Bruce Momjianwrote: > In looking at the new multi-column statistics "dependency" option in > Postgres 10, I am quite confused by the term "dependency". Wouldn't > "correlation" be clearer and less confusing as "column dependency" > already means something else. > Actually, the terms "dependency" and "correlation" are both quite broad terms that cover a whole range of other different things, and hence could be misleading. The precise term for this is "functional dependency" [1], so if anything, the option name should be "functional_dependencies" or some shortening of that, keeping a part of each of those words. Regards, Dean [1] https://en.wikipedia.org/wiki/Functional_dependency -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
On 21 April 2017 at 01:21, Tomas Vondrawrote: > On 04/21/2017 12:13 AM, Tom Lane wrote: >> >> Alvaro Herrera writes: >>> >>> Simon just pointed out that having the WITH clause appear in the middle >>> of the CREATE STATISTICS command looks odd; apparently somebody else >>> already complained on list about the same. Other commands put the WITH >>> clause at the end, so perhaps we should do likewise in the new command. >> >> +1 for WITH at the end; the existing syntax looks weird to me too. > > -1 from me > Yeah, I'm still marginally in favour of the current syntax because it's a bit more consistent with the CREATE VIEW syntax which puts the WITH (options) clause before the query, and assuming that multi-table and partial statistics support do get added in the future, the thing that the statistics get created on is going to look more like a query. OTOH, a few people have now commented that the syntax looks weird, and not just because of the placement of the WITH clause. I don't have any better ideas, but it's not too late to change if anyone else does... Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v19)
On 11 February 2017 at 01:17, Tomas Vondrawrote: > Thanks for the feedback, I'll fix this. I've allowed myself to be a bit > sloppy because the number of attributes in the statistics is currently > limited to 8, so the overflows are currently not an issue. But it doesn't > hurt to make it future-proof, in case we change that mostly artificial limit > sometime in the future. > Ah right, so it can't overflow at present, but it's neater to have an overflow-proof algorithm. Thinking about the exactness of the division steps is quite interesting. Actually, the order of the multiplying factors doesn't matter as long as the divisors are in increasing order. So in both my proposal: result = 1 for (i = 1; i <= k; i++) result = (result * (n-k+i)) / i; and David's proposal, which is equivalent but has the multiplying factors in the opposite order, equivalent to: result = 1 for (i = 1; i <= k; i++) result = (result * (n-i+1)) / i; the divisions are exact at each step. The first time through the loop it divides by 1 which is trivially exact. The second time it divides by 2, having multiplied by 2 consecutive factors, one of which is therefore guaranteed to be divisible by 2. The third time it divides by 3, having multiplied by 3 consecutive factors, one of which is therefore guaranteed to be divisible by 3, and so on. My approach originally seemed more logical to me because of the way it derives from the recurrence relation binomial(n, k) = binomial(n-1, k-1) * n / k, but they both work fine as long as they have suitable overflow checks. It's also interesting that descriptions of this algorithm tend to talk about setting k to min(k, n-k) at the start as an optimisation step, as I did in fact, whereas it's actually more than that -- it helps prevent unnecessary intermediate overflows when k > n/2. Of course, that's not a worry for the current use of this function, but it's good to have a robust algorithm. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v19)
On 8 February 2017 at 16:09, David Fetterwrote: > Combinations are n!/(k! * (n-k)!), so computing those is more > along the lines of: > > unsigned long long > choose(unsigned long long n, unsigned long long k) { > if (k > n) { > return 0; > } > unsigned long long r = 1; > for (unsigned long long d = 1; d <= k; ++d) { > r *= n--; > r /= d; > } > return r; > } > > which greatly reduces the chance of overflow. > Hmm, but that doesn't actually prevent overflows, since it can overflow in the multiplication step, and there is no protection against that. In the algorithm I presented, the inputs and the intermediate result are kept below INT_MAX, so the multiplication step cannot overflow the 64-bit integer, and it will only raise an overflow error if the actual result won't fit in a 32-bit int. Actually a crucial part of that, which I failed to mention previously, is the first step replacing k with min(k, n-k). This is necessary for inputs like (100,99), which should return 100, and which must be computed as 100 choose 1, not 100 choose 99, otherwise it will overflow internally before getting to the final result. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v19)
On 6 February 2017 at 21:26, Alvaro Herrerawrote: > Tomas Vondra wrote: >> On 02/01/2017 11:52 PM, Alvaro Herrera wrote: > >> > Nearby, some auxiliary functions such as n_choose_k and >> > num_combinations are not documented. What it is that they do? I'd >> > move these at the end of the file, keeping the important entry points >> > at the top of the file. >> >> I'd say n-choose-k is pretty widely known term from combinatorics. The >> comment would essentially say just 'this is n-choose-k' which seems rather >> pointless. So as much as I dislike the self-documenting code, this actually >> seems like a good case of that. > > Actually, we do have such comments all over the place. I knew this as > "n sobre k", so the english name doesn't immediately ring a bell with me > until I look it up; I think the function comment could just say > "n_choose_k -- this function returns the binomial coefficient". > One of the things you have to watch out for when writing code to compute binomial coefficients is integer overflow, since the numerator and denominator get large very quickly. For example, the current code will overflow for n=13, k=12, which really isn't that large. This can be avoided by computing the product in reverse and using a larger datatype like a 64-bit integer to store a single intermediate result. The point about multiplying the terms in reverse is that it guarantees that each intermediate result is an exact integer (a smaller binomial coefficient), so there is no need to track separate numerators and denominators, and you avoid huge intermediate factorials. Here's what that looks like in psuedo-code: binomial(int n, int k): # Save computational effort by using the symmetry of the binomial # coefficients k = min(k, n-k); # Compute the result using binomial(n, k) = binomial(n-1, k-1) * n / k, # starting from binomial(n-k, 0) = 1, and computing the sequence # binomial(n-k+1, 1), binomial(n-k+2, 2), ... # # Note that each intermediate result is an exact integer. int64 result = 1; for (int i = 1; i <= k; i++) { result = (result * (n-k+i)) / i; if (result > INT_MAX) Raise overflow error } return (int) result; Note also that I think num_combinations(n) is just an expensive way of calculating 2^n - n - 1. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving RLS planning
On 29 December 2016 at 15:55, Tom Lane <t...@sss.pgh.pa.us> wrote: > Dean Rasheed <dean.a.rash...@gmail.com> writes: >> On 28 December 2016 at 19:12, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> [ getting back to this patch finally... ] I made the suggested change >>> to that test case, and what I see is a whole lot of "NOTICE: snooped value >>> = whatever" outputs. >>> >>> I'd leave it as shown in the attached diff fragment, except that I'm >>> a bit worried about possible platform dependency of the output. The >>> hashing occurring in the subplans shouldn't affect output order, but >>> I'm not sure if we want a test output like this or not. Thoughts? > >> How about replacing "a = 3" with "a < 7 AND a != 6". That then >> exercises more of the possible types of behaviour for quals: The "a < >> 7" qual is pushed down and used as an index condition. The "a != 6" >> qual is pushed down and used as a filter, because it's cheap and >> leakproof. The leakproof() qual isn't pushed down on cost grounds. The >> snoop() qual isn't pushed down on security grounds. Both snoop() and >> leakproof() are used as filters, along with "a != 6", and a SB subplan >> qual. "a != 6" is executed first because it has a security_level of 0, >> and is cheaper than the subplan. snoop() is executed later, despite >> being cheaper than the other filter quals, because it has a higher >> security_level, and leakproof() is executed last because it has the >> same security level as snoop() but is more expensive. > > Will do, although I think that the next test case (the one with "a = 8") > already shows most of those behaviors. > Except that it doesn't have a cheap leakproof qual like "a != 6", not handled automatically by the index, that order_qual_clauses() can assign security_level = 0 to, and then move to the start of the list. I think it's probably worth having a clause like that in one of the tests, but it could perhaps be added to the "a = 8" test, if you wanted to drop the "a = 3" test. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving RLS planning
On 28 December 2016 at 19:12, Tom Lane <t...@sss.pgh.pa.us> wrote: > Stephen Frost <sfr...@snowman.net> writes: >> * Dean Rasheed (dean.a.rash...@gmail.com) wrote: >>> Hmm. I've not read any of the new code yet, but the fact that this >>> test now reduces to a one-time filter makes it effectively useless as >>> a test of qual evaluation order because it has deduced that it doesn't >>> need to evaluate them. I would suggest replacing the qual with >>> something that can't be reduced, perhaps "2*a = 6". > >> That's a good thought, I agree. > > [ getting back to this patch finally... ] I made the suggested change > to that test case, and what I see is a whole lot of "NOTICE: snooped value > = whatever" outputs. > > I'd leave it as shown in the attached diff fragment, except that I'm > a bit worried about possible platform dependency of the output. The > hashing occurring in the subplans shouldn't affect output order, but > I'm not sure if we want a test output like this or not. Thoughts? > How about replacing "a = 3" with "a < 7 AND a != 6". That then exercises more of the possible types of behaviour for quals: The "a < 7" qual is pushed down and used as an index condition. The "a != 6" qual is pushed down and used as a filter, because it's cheap and leakproof. The leakproof() qual isn't pushed down on cost grounds. The snoop() qual isn't pushed down on security grounds. Both snoop() and leakproof() are used as filters, along with "a != 6", and a SB subplan qual. "a != 6" is executed first because it has a security_level of 0, and is cheaper than the subplan. snoop() is executed later, despite being cheaper than the other filter quals, because it has a higher security_level, and leakproof() is executed last because it has the same security level as snoop() but is more expensive. Thus the test is more likely to highlight any future changes to the pushdown/ordering rules. The output is also neater, because nothing ends up being printed by snoop(), although of course it is OK for values greater than 5 to be printed. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE OR REPLACE VIEW bug
On 17 December 2016 at 15:42, Dean Rasheed <dean.a.rash...@gmail.com> wrote: > It seems that there is a bug in CREATE OR REPLACE VIEW... > > DefineView()/DefineVirtualRelation() will need a little re-jigging to > do things in the required order. ...and the required order for existing views is 1. Add any new columns 2. Add rules to store the new query 3. Update the view options because 2 will fail if the view's columns don't match the query's columns. Attached is a patch enforcing this order and adding some comments to make it clear why the order matters here. Barring objections I'll back-patch this to 9.4 where WCO was added. Regards, Dean diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c new file mode 100644 index c6b0e4f..414507f --- a/src/backend/commands/view.c +++ b/src/backend/commands/view.c @@ -59,15 +59,13 @@ validateWithCheckOption(char *value) /*- * DefineVirtualRelation * - * Create the "view" relation. `DefineRelation' does all the work, - * we just provide the correct arguments ... at least when we're - * creating a view. If we're updating an existing view, we have to - * work harder. + * Create a view relation and use the rules system to store the query + * for the view. *- */ static ObjectAddress DefineVirtualRelation(RangeVar *relation, List *tlist, bool replace, - List *options) + List *options, Query *viewParse) { Oid viewOid; LOCKMODE lockmode; @@ -162,18 +160,13 @@ DefineVirtualRelation(RangeVar *relation checkViewTupleDesc(descriptor, rel->rd_att); /* - * The new options list replaces the existing options list, even if - * it's empty. - */ - atcmd = makeNode(AlterTableCmd); - atcmd->subtype = AT_ReplaceRelOptions; - atcmd->def = (Node *) options; - atcmds = lappend(atcmds, atcmd); - - /* * If new attributes have been added, we must add pg_attribute entries * for them. It is convenient (although overkill) to use the ALTER * TABLE ADD COLUMN infrastructure for this. + * + * Note that we must do this before updating the query for the view, + * since the rules system requires that the correct view columns be in + * place when defining the new rules. */ if (list_length(attrList) > rel->rd_att->natts) { @@ -192,9 +185,38 @@ DefineVirtualRelation(RangeVar *relation atcmd->def = (Node *) lfirst(c); atcmds = lappend(atcmds, atcmd); } + + AlterTableInternal(viewOid, atcmds, true); + + /* Make the new view columns visible */ + CommandCounterIncrement(); } - /* OK, let's do it. */ + /* + * Update the query for the view. + * + * Note that we must do this before updating the view options, because + * the new options may not be compatible with the old view query (for + * example if we attempt to add the WITH CHECK OPTION, we require that + * the new view be automatically updatable, but the old view may not + * have been). + */ + StoreViewQuery(viewOid, viewParse, replace); + + /* Make the new view query visible */ + CommandCounterIncrement(); + + /* + * Finally update the view options. + * + * The new options list replaces the existing options list, even if + * it's empty. + */ + atcmd = makeNode(AlterTableCmd); + atcmd->subtype = AT_ReplaceRelOptions; + atcmd->def = (Node *) options; + atcmds = list_make1(atcmd); + AlterTableInternal(viewOid, atcmds, true); ObjectAddressSet(address, RelationRelationId, viewOid); @@ -211,7 +233,7 @@ DefineVirtualRelation(RangeVar *relation ObjectAddress address; /* - * now set the parameters for keys/inheritance etc. All of these are + * Set the parameters for keys/inheritance etc. All of these are * uninteresting for views... */ createStmt->relation = relation; @@ -224,13 +246,20 @@ DefineVirtualRelation(RangeVar *relation createStmt->if_not_exists = false; /* - * finally create the relation (this will error out if there's an - * existing view, so we don't need more code to complain if "replace" - * is false). + * Create the relation (this will error out if there's an existing + * view, so we don't need more code to complain if "replace" is + * false). */ address = DefineRelation(createStmt, RELKIND_VIEW, InvalidOid, NULL, NULL); Assert(address.objectId != InvalidOid); + + /* Make the new view relation visible */ + CommandCounterIncrement(); + + /* Store the query for the view */ + StoreViewQuery(address.objectId, viewParse, replace); + return address; } } @@ -530,16 +559,7 @@ DefineView(ViewStmt *stmt, const char *q * aborted. */ address = DefineVirtualRelation(view, viewParse->targetList, - stmt->replace, stmt->options); - - /* - * The relation we have just created is not visibl
[HACKERS] CREATE OR REPLACE VIEW bug
It seems that there is a bug in CREATE OR REPLACE VIEW's handling of WITH CHECK OPTION (noticed while thinking about the recent change to pg_dump's handling of circular dependencies in views -- d8c05af). If you use CREATE OR REPLACE VIEW on a view that isn't auto-updatable and turn it into one that is, and at the same time attempt to add a WITH CHECK OPTION (which is exactly what pg_dump will now do) it fails: CREATE TABLE t1 (a int); CREATE VIEW v1 AS SELECT null::int AS a; CREATE OR REPLACE VIEW v1 AS SELECT * FROM t1 WHERE a > 0 WITH CHECK OPTION; ERROR: WITH CHECK OPTION is supported only on automatically updatable views HINT: Views that do not select from a single table or view are not automatically updatable. The problem is that before updating the view's query, DefineView() calls DefineVirtualRelation() which attempts to add the new check option to the existing view via the ALTER VIEW mechanism, and that fails because the new check option isn't valid against the old view query. So if we're going to use the ALTER VIEW mechanism to update the view's options, which is probably the most convenient way to do it, DefineView()/DefineVirtualRelation() will need a little re-jigging to do things in the required order. I'll try to knock up a patch to do that. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add support for restrictive RLS policies
Stephen, I looked through this in a little more detail and I found one other issue: the documentation for the system catalog table pg_policy and the view pg_policies needs to be updated to include the new columns that have been added. Other than that, it all looks good to me, subject to the previous comments. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add support for restrictive RLS policies
On 1 December 2016 at 14:38, Stephen Frost <sfr...@snowman.net> wrote: > * Dean Rasheed (dean.a.rash...@gmail.com) wrote: >> In get_policies_for_relation() ... >> ... I think it should sort the restrictive policies by name > > Hmmm, is it really the case that the quals will always end up being > evaluated in that order though? Isn't order_qual_clauses() going to end > up changing the order based on the relative cost? If the cost is the > same it should maintain the order, but even that could change in the > future based on the comments, no? In short, I'm not entirely sure that > we actually want to be required to always evaluate the quals in order of > policy name and we might get complaints if we happen to make that work > today and it ends up being changed later. > No, this isn't about the quals that get put into the WHERE clause of the resulting queries. As you say, order_quals_clauses() is going to re-order those anyway. This is about the WithCheckOption's that get generated for UPDATEs and INSERTs, and having those checked in a predictable order. The main advantage to that is to guarantee a predictable error message from self tests that attempt to insert invalid data. This is basically the same as what was done for CHECK constraints in e5f455f59fed0632371cddacddd79895b148dc07. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add support for restrictive RLS policies
On 30 November 2016 at 21:54, Stephen Frostwrote: > Unless there's further comments, I'll plan to push this sometime > tomorrow. > Sorry I didn't have time to look at this properly. I was intending to, but my day job just keeps getting in the way. I do have a couple of comments relating to the documentation and one relating to the code: - row-level security policy. + row-level security policy. Note that only the set of roles which the + policy applies to and the USING and + WITH CHECK expressions are able to be changed using + ALTER POLICY. To change other properties of a policy, + such as the command it is applied for or if it is a permissive or + restrictive policy, the policy must be dropped and recreated. This note reads a little awkwardly to me. I think I would write it as: Note that ALTER POLICY only allows the set of roles to which the policy applies and the USING and WITH CHECK expressions to be modified. To change other properties of a policy, such as the command to which it applies or whether it is permissive or restrictive, the policy must be dropped and recreated. +PERMISSIVE + + + Specify that the policy is to be created as a "permissive" policy. + All "permissive" policies which are applicable to a given query will + be combined together using the boolean "OR" operator. By creating + "permissive" policies, administrators can add to the set of records + which can be accessed. Policies are PERMISSIVE by default. + + + + + +RESTRICTIVE + + + Specify that the policy is to be created as a "restrictive" policy. + All "restrictive" policies which are applicable to a given query will + be combined together using the boolean "AND" operator. By creating + "restrictive" policies, administrators can reduce the set of records + which can be accessed as all "restrictive" policies must be passed for + each record. + + + I don't think this or anywhere else makes it entirely clear that the user needs to have at least one permissive policy in addition to any restrictive policies for this to be useful. I think this section is probably a good place to mention that, since it's probably the first place people will read about restrictive policies. I think it also needs to be spelled out exactly how a mix of permissive and restrictive policies are combined, because there is more than one way to combine a set of quals with ANDs and ORs (although only one way really makes sense in this context). So perhaps an additional note along the lines of: Note that there needs to be at least one permissive policy to grant access to records before restrictive policies can be usefully used to reduce that access. If only restrictive policies exist, then no records will be accessible. When a mix of permissive and restrictive policies are present, a record is only accessible if at least one of the permissive policies passes, in addition to all the restrictive policies. Also, I don't think it's necessary to keep quoting "restrictive" and "permissive" here. In get_policies_for_relation(), after the loop that does this: -*permissive_policies = lappend(*permissive_policies, policy); +{ +if (policy->permissive) +*permissive_policies = lappend(*permissive_policies, policy); +else +*restrictive_policies = lappend(*restrictive_policies, policy); +} I think it should sort the restrictive policies by name, for the same reason that hook-restrictive policies are sorted -- so that the WITH CHECK expressions are checked in a well-defined order (which was chosen to be consistent with the order of checking of other similar things, like CHECK constraints and triggers). I also think that this should be a separate sort step from the sort for hook policies, inserted just after the loop above, so that the order is all internal policies sorted by name, followed by all hook policies sorted by name, to be consistent with the existing principle that hook policies are applied after internal policies. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving RLS planning
On 28 November 2016 at 23:55, Stephen Frostwrote: >> diff --git a/src/test/regress/expected/updatable_views.out >> b/src/test/regress/expected/updatable_views.out > [...] >> --- 2104,2114 >> >> EXPLAIN (VERBOSE, COSTS OFF) >> UPDATE v1 SET a=100 WHERE snoop(a) AND leakproof(a) AND a = 3; >> ! QUERY PLAN >> ! -- >> ! Result >> !One-Time Filter: false >> ! (2 rows) > > Perhaps Dean recalls something more specific, but reviewing this test > and the others around it, I believe it was simply to see what happens in > a case which doesn't pass the security barrier view constraint and to > cover the same cases with the UPDATE as were in the SELECTs above it. I > don't see it being an issue that it now results in a one-time filter: > false result. > Hmm. I've not read any of the new code yet, but the fact that this test now reduces to a one-time filter makes it effectively useless as a test of qual evaluation order because it has deduced that it doesn't need to evaluate them. I would suggest replacing the qual with something that can't be reduced, perhaps "2*a = 6". In addition, I think that the tests on this view are probably no longer adequate for the purpose of validating that the qual evaluation order is safe. With the old implementation, the subquery scans in the plans made it pretty clear that it was safe, and likely to remain safe with variants of those queries, but that's not so obvious with the new plans. Maybe some additional quals could be added to the view definition, perhaps based on the other view columns, to verify that the outer leaky qual always gets evaluated after the security barrier quals, regardless of cost. Or perhaps that's something that's better proved with an all-new set of tests, but it does seem to me that the new implementation has a higher risk (or at least introduces different risks) of unsafe evaluation orders that warrant some additional testing. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving RLS planning
On 10 November 2016 at 17:12, Tom Lanewrote: > Yeah, I think we'd be best off to avoid the bare term "security". > It's probably too late to change the RTE field name "securityQuals", > but maybe we could uniformly call those "security barrier quals" in > the comments. Then the basic terminology is that we have security > barrier views and row-level security both implemented on top of > security barrier quals, and we should be careful to use the right > one of those three terms in comments/documentation. > +1 for that terminology and no renaming of fields. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving RLS planning
On 8 November 2016 at 16:46, Robert Haaswrote: > On Thu, Nov 3, 2016 at 6:23 PM, Tom Lane wrote: >>> I think that ordering might be sub-optimal if you had a mix of >>> leakproof quals and security quals and the cost of some security quals >>> were significantly higher than the cost of some other quals. Perhaps >>> all leakproof quals should be assigned security_level 0, to allow them >>> to be checked earlier if they have a lower cost (whether or not they >>> are security quals), and only leaky quals would have a security_level >>> greater than zero. Rule 1 would then not need to check whether the >>> qual was leakproof, and you probably wouldn't need the separate >>> "leakproof" bool field on RestrictInfo. >> >> Hm, but it would also force leakproof quals to be evaluated in front >> of potentially-cheaper leaky quals, whether or not that's semantically >> necessary. >> True. That's also what currently happens with RLS and SB views because leakproof quals are pushed down into subqueries without considering their cost. It would be nice to do better than that. >> I experimented with ignoring security_level altogether for leakproof >> quals, but I couldn't make it work properly, because that didn't lead to >> a comparison rule that satisfies transitivity. For instance, consider >> three quals: >> A: cost 1, security_level 1, leaky >> B: cost 2, security_level 1, leakproof >> C: cost 3, security_level 0, leakproof >> A should sort before B, since same security_level and lower cost; >> B should sort before C, since lower cost and leakproof; >> but A must sort after C, since higher security_level and leaky. > > Yeah, this is pretty thorny. IIUC, all leaky quals of a given > security level must be evaluated before any quals of the next higher > security level, or we have a security problem. Beyond that, we'd > *prefer* to evaluate cheaper quals first (though perhaps we ought to > be also thinking about how selective they are) but that's "just" a > matter of how good the query plan is. So in this example, security > dictates that C must precede A, but that's it. We can pick between > C-A-B, C-B-A, and B-C-A based on cost. C-B-A is clearly inferior to > either of the other two, but it's less obvious whether C-A-B or B-C-A > is better. If you expect each predicate to have a selectivity of 50%, > then C-A-B costs 3+(0.5*1)+(0.25*2) = 4 while B-C-A costs > 2+(0.5*3)+(0.25*1) = 3.75, so B-C-A is better. But now make the cost > of B and C 18 and 20 while keeping the cost of A at 1. Now C-A-B > costs 20+(0.5*1)+(0.25*18) = 25 while B-C-A costs 18+(0.5*20)+(0.25*1) > = 28.25, so now C-A-B is better. > > So I think any attempt to come up with a transitive comparison rule is > doomed. We could do something like: sort by cost then security level; > afterwards, allow leakproof qual to migrate forward as many position > as is possible without passing a qual that is either higher-cost or > (non-leakproof and lower security level). So in the above example we > would start by sorting the like C-A-B and then check whether B can > move forward; it can't, so we're done. If all operators were > leakproof, this would essentially turn into an insertion-sort that > orders them strictly by cost, whereas if they're all leaky, it orders > strictly by security level and then by cost. With a mix of leaky and > non-leaky operators you get something in the middle. > > I'm not sure that this is practically better than the hack you > proposed, but I wanted to take the time to comment on the theory here, > as I see it anyway. > Yes, I think you're right. It doesn't look possible to invent a transitive comparison rule. I thought perhaps the rule could be to only "push down" a leakproof qual (change it to a lower security_level) if there are more expensive quals at the lower level, but as you point out, this doesn't guarantee cheaper execution. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving RLS planning
On 8 November 2016 at 14:45, Tom Lanewrote: > Stephen Frost writes: >> * Tom Lane (t...@sss.pgh.pa.us) wrote: >>> * Since the planner is now depending on Query.hasRowSecurity to be set >>> whenever there are any securityQuals, I put in an Assert about that, >>> and promptly found three places in prepjointree.c and the rewriter where >>> we'd been failing to set it. I have not looked to see if these represent >>> live bugs in existing releases, but they might. Or am I misunderstanding >>> what the flag is supposed to mean? > >> They're independent, actually. securityQuals can be set via either >> security barrier view or from RLS, while hasRowSecurity is specifically >> for the RLS case. The reason for the distinction is that changing your >> role isn't going to impact security barrier views at all, while it could >> impact what RLS policies are used. See extract_query_dependencies(). > Right. securityQuals was added for updatable SB views, which pre-dates RLS. > OK. In that case I'll need to adjust the patch so that the planner keeps > its own flag about whether the query contains any securityQuals; that's > easy enough. But I'm still suspicious that the three places I found may > represent bugs in the management of Query.hasRowSecurity. > I don't believe that there are any existing bugs here, but I think 1 or possibly 2 of the 3 places you changed should be kept on robustness grounds (see below). Query.hasRowSecurity is only used for invalidation of cached plans, when the current role or environment changes, causing a change to the set of policies that need to be applied, and thus requiring that the query be re-planned. This happens in extract_query_dependencies(), which walks the query tree and will find any Query.hasRowSecurity flags and set PlannerGlobal.dependsOnRole, which is sufficient for its intended purpose. Regarding the 3 places you mention... 1). rewriteRuleAction() doesn't need to set Query.hasRowSecurity because it's called during "Step 1" of the rewriter, where non-SELECT rules are expanded, but RLS expansion doesn't happen until later, at the end of "Step 2", after SELECT rules are expanded. That said, you could argue that copying the flag in rewriteRuleAction() makes the code more bulletproof, even though it is expected to always be false at that point. 2). pull_up_simple_subquery() technically doesn't need to set Query.hasRowSecurity because nothing in the planner refers to it, and plancache.c will have already recorded the fact that the original query had RLS. However, this seems like a bug waiting to happen, and this really ought to be copying the flag in case we later add code that does look at the flag later in the planning process. 3). rewriteTargetView() should not set the flag because the flag is only for RLS, not for SB views, and we don't want cached plans for SB views to be invalidated. On a related note, I think it's worth establishing a terminology convention for code and comments in this whole area. In the existing code, or in my head at least, there's a convention that a term that contains the words "row" or "policy" together with "security" refers specifically to RLS, not to SB views. For example: * Row-level security or just row security for the name of the feature itself * row_security -- the configuration setting * get_row_security_policies() * Query.hasRowSecurity * rowsecurity.c On the other hand, terms that contain just the word "security" without the words "row" or "policy" have a broader scope and may refer to either RLS or SB views. For example: * RangeTblEntry.security_barrier * RangeTblEntry.securityQuals * expand_security_quals() * prepsecurity.c * The new security_level field It's a pretty fine distinction, and I don't know how others have been thinking about this, but I think that it's helpful to make the distinction, and there are at least a couple of places in the patch that use RLS-specific terminology for what could also be a SB view. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving RLS planning
On 25 October 2016 at 22:58, Tom Lanewrote: > The alternative I'm now thinking about pursuing is to get rid of the > conversion of RLS quals to subqueries. Instead, we can label individual > qual clauses with security precedence markings. Concretely, suppose we > add an "int security_level" field to struct RestrictInfo. The semantics > of this would be that a qual with a lower security_level value must be > evaluated before a qual with a higher security_level value, unless the > latter qual is leakproof. (It would likely also behoove us to add a > "leakproof" bool field to struct RestrictInfo, to avoid duplicate > leakproof-ness checks on quals. But that's just an optimization.) > +1 for this approach. It looks like it could potentially be much simpler. There's some ugly code in the inheritance planner (and probably one or two other places) that it might be possible to chop out, which would probably also speed up planning times. > In the initial implementation, quals coming from a RangeTblEntry's > securityQuals field would have security_level 0, quals coming from > anywhere else would have security_level 1; except that if we know > there are no security quals anywhere (ie not Query->hasRowSecurity), > we could give all quals security_level 0. (I think this exception > may be worth making because there's no need to test leakproofness > for a qual with security level 0; it could never be a candidate > for security delay anyway.) > Note that the securityQuals list represents nested levels of security barrier (e.g., nested SB views), so you'd have to actually assign security_level 0 to the first security qual, security_level 1 to the second security qual, and so on. Then the quals coming from anywhere else would have to have a security_level one greater than the maximum of all the other security levels. > Having done that much, I think all we need in order to get rid of > RLS subqueries, and just stick RLS quals into their relation's > baserestrictinfo list, are two rules: > > 1. When selecting potential indexquals, a RestrictInfo can be considered > for indexqual use only if it is leakproof or has security_level <= the > minimum among the table's baserestrictinfo clauses. > > 2. In order_qual_clauses, sort first by security_level and second by cost. > I think that ordering might be sub-optimal if you had a mix of leakproof quals and security quals and the cost of some security quals were significantly higher than the cost of some other quals. Perhaps all leakproof quals should be assigned security_level 0, to allow them to be checked earlier if they have a lower cost (whether or not they are security quals), and only leaky quals would have a security_level greater than zero. Rule 1 would then not need to check whether the qual was leakproof, and you probably wouldn't need the separate "leakproof" bool field on RestrictInfo. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v19)
On 4 October 2016 at 09:15, Heikki Linnakangaswrote: > However, for tables and views, each object you store in those views is a > "table" or "view", but with this thing, the object you store is > "statistics". Would you have a catalog table called "pg_scissor"? > No, probably not (unless it was storing individual scissor blades). However, in this case, we have related pre-existing catalog tables, so... > We call the current system table "pg_statistic", though. I agree we should > call it pg_mv_statistic, in singular, to follow the example of pg_statistic. > > Of course, the user-friendly system view on top of that is called > "pg_stats", just to confuse things more :-). > I agree. Given where we are, with a pg_statistic table and a pg_stats view, I think the least worst solution is to have a pg_statistic_ext table, and then maybe a pg_stats_ext view. >> It doesn't seem like the end of the world that it doesn't >> match the user-facing syntax. A bigger concern is the use of "mv" in >> the name, because as has already been pointed out, this table may also >> in the future be used to store univariate expression and partial >> statistics, so I think we should drop the "mv" and go with something >> like pg_statistic_ext, or some other more general name. > > > Also, "mv" makes me think of materialized views, which is completely > unrelated to this. > Yeah, I hadn't thought of that. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v19)
On 30 September 2016 at 12:10, Heikki Linnakangaswrote: > I fear that using "statistics" as the name of the new object might get a bit > awkward. "statistics" is a plural, but we use it as the name of a single > object, like "pants" or "scissors". Not sure I have any better ideas though. > "estimator"? "statistics collection"? Or perhaps it should be singular, > "statistic". I note that you actually called the system table > "pg_mv_statistic", in singular. > I think it's OK. The functional dependency is a single statistic, but MCV lists and histograms are multiple statistics (multiple facts about the data sampled), so in general when you create one of these new objects, you are creating multiple statistics about the data. Also I find "CREATE STATISTIC" just sounds a bit clumsy compared to "CREATE STATISTICS". The convention for naming system catalogs seems to be to use the singular for tables and plural for views, so I guess we should stick with that. It doesn't seem like the end of the world that it doesn't match the user-facing syntax. A bigger concern is the use of "mv" in the name, because as has already been pointed out, this table may also in the future be used to store univariate expression and partial statistics, so I think we should drop the "mv" and go with something like pg_statistic_ext, or some other more general name. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v19)
On 4 October 2016 at 04:25, Michael Paquierwrote: > OK. A second thing was related to the use of schemas in the new system > catalogs. As mentioned in [1], those could be removed. > [1]: > https://www.postgresql.org/message-id/cab7npqtu40q5_nsghvomjfbyh1hdtqmbfdj+kwfjspam35b...@mail.gmail.com. > That doesn't work, because if the intention is to be able to one day support statistics across multiple tables, you can't assume that the statistics are in the same schema as the table. In fact, if multi-table statistics are to be allowed in the future, I think you want to move away from thinking of statistics as depending on and referring to a single table, and handle them more like views -- i.e, store a pg_node_tree representing the from_clause and add multiple dependencies at statistics creation time. That was what I was getting at upthread when I suggested the alternate syntax, and also answers Tomas' question about how JOIN might one day be supported. Of course, if we don't think that we will ever support multi-table statistics, that all goes away, and you may as well make the statistics name local to the table, but I think that's a bit limiting. One way or the other, I think this is a question that needs to be answered now. My vote is to leave expansion room to support multi-table statistics in the future. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v19)
On 3 August 2016 at 02:58, Tomas Vondrawrote: > Attached is v19 of the "multivariate stats" patch series Hi, I started looking at this - just at a very high level - I've not read much of the detail yet, but here are some initial review comments. I think the overall infrastructure approach for CREATE STATISTICS makes sense, and I agree with other suggestions upthread that it would be useful to be able to build statistics on arbitrary expressions, although that doesn't need to be part of this patch, it's useful to keep that in mind as a possible future extension of this initial design. I can imagine it being useful to be able to create user-defined statistics on an arbitrary list of expressions, and I think that would include univariate as well as multivariate statistics. Perhaps that's something to take into account in the naming of things, e.g., as David Rowley suggested, something like pg_statistic_ext, rather than pg_mv_statistic. I also like the idea that this might one day be extended to support statistics across multiple tables, although I think that might be challenging to achieve -- you'd need a method of taking a random sample of rows from a join between 2 or more tables. However, if the intention is to be able to support that one day, I think that needs to be accounted for in the syntax now -- specifically, I think it will be too limiting to only support things extending the current syntax of the form table1(col1, col2, ...), table2(col1, col2, ...), because that precludes building statistics on an expression referring to columns from more than one table. So I think we should plan further ahead and use a syntax giving greater flexibility in the future, for example something structured more like a query (like CREATE VIEW): CREATE STATISTICS name [ WITH (options) ] ON expression [, ...] FROM table [, ...] WHERE condition where the first version of the patch would only support expressions that are simple column references, and would require at least 2 such columns from a single table with no WHERE clause, i.e.: CREATE STATISTICS name [ WITH (options) ] ON column1, column2 [, ...] FROM table For multi-table statistics, a WHERE clause would typically be needed to specify how the tables are expected to be joined, but potentially such a clause might also be useful in single-table statistics, to build partial statistics on a commonly queried subset of the table, just like a partial index. Of course, I'm not suggesting that the current patch do any of that -- it's big enough as it is. I'm just throwing out possible future directions this might go in, so that we don't get painted into a corner when designing the syntax for the current patch. Regarding the statistics themselves, I read the description of soft functional dependencies, and I'm somewhat skeptical about that algorithm. I don't like the arbitrary thresholds or the sudden jump from independence to dependence and clause reduction. As others have said, I think this should account for a continuous spectrum of dependence from fully independent to fully dependent, and combine clause selectivities in a way based on the degree of dependence. For example, if you computed an estimate for the fraction 'f' of the table's rows for which a -> b, then it might be reasonable to combine the selectivities using P(a,b) = P(a) * (f + (1-f) * P(b)) Of course, having just a single number that tells you the columns are correlated, tells you nothing about whether the clauses on those columns are consistent with that correlation. For example, in the following table CREATE TABLE t(a int, b int); INSERT INTO t SELECT x/10, ((x/10)*789)%100 FROM generate_series(0,999) g(x); 'b' is functionally dependent on 'a' (and vice versa), but if you query the rows with a<50 and with b<50, those clauses behave essentially independently, because they're not consistent with the functional dependence between 'a' and 'b', so the best way to combine their selectivities is just to multiply them, as we currently do. So whilst it may be interesting to determine that 'b' is functionally dependent on 'a', it's not obvious whether that fact by itself should be used in the selectivity estimates. Perhaps it should, on the grounds that it's best to attempt to use all the available information, but only if there are no more detailed statistics available. In any case, knowing that there is a correlation can be used as an indicator that it may be worthwhile to build more detailed multivariate statistics like a MCV list or a histogram on those columns. Looking at the ndistinct coefficient 'q', I think it would be better if the recorded statistic were just the estimate for ndistinct(a,b,...) rather than a ratio of ndistinct values. That's a more fundamental statistic, and it's easier to document and easier to interpret. Also, I don't believe that the coefficient 'q' is the right number to use for clause estimation: Looking at
Re: [HACKERS] CREATE POLICY bug ?
[Please reply to the list, not just to me, so that others can benefit from and contribute to the discussion] On 31 August 2016 at 11:52, Andrea Adamiwrote: > Thnaks Dean, i did further investigations: > i set the owner of the view to: "mana...@scuola247.it" with: > ALTER TABLE public.policy_view OWNER TO "mana...@scuola247.it"; > and i thinking to see from the select: > select * from policy_view > the rows: 1,2,3 > then > set role 'mana...@scuola247.it'; > select * from policy_view; > return rows 1,2,3 as expected but: > set role 'teac...@scuola247.it'; > select * from policy_view; > returns rows 4,5 and > set role 'postgres' > select * from policy_view > return nothing ... > what you thinking about ? > > Andrea That's correct. With the table owned by postgres and the view owned by "mana...@scuola247.it", access to the table via the view is subject to the policies that apply to "mana...@scuola247.it". So regardless of who the current user is, when selecting from the view, the policy "standard" will be applied, and that will limit the visible rows to those for which usr = current_user. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS related docs
On 28 August 2016 at 21:23, Joe Conwaywrote: > Apologies for the delay, but new patch attached. Assuming no more > comments, will commit this, backpatched to 9.5, in a day or two. > Looking at this again, I think there is something fishy about these dump/restore flags. If you do pg_dump --enable-row-security, then row_security is turned on during the dump and only the user-visible portions of the tables are dumped. But why does such a dump emit "SET row_security = on;" as part of the dump? There doesn't appear to be any reason for having row_security turned on during the restore just because it was on during the dump. The INSERT policies may well be different from the SELECT policies, and so this may lead to a dump that cannot be restored. ISTM that row_security should be off inside the dump, and only enabled during restore if the user explicitly asks for it, regardless of what setting was used to produce the dump. Also, isn't it the case that --enable-row-security during pg_restore is only relevant when performing a data-only restore (like --disable-triggers). Otherwise, it looks to me as though the restore will create the tables, restore the data, and then only at the end restore the table policies and enable row level security on the tables. So it looks like the flag would have no effect (and a COPY-format dump would work fine) for a non-data-only dump. I never really looked at the RLS dump/restore code. Am I missing something? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE POLICY bug ?
On 20 August 2016 at 03:15, Andrea Adamiwrote: > when i run the query: "select * from public.policy_view" > the ouput is the same (all rows) for all users > i'm doing some mistakes or this is a bug ? > No, it looks correct to me. When going through a view, the policies and permission checks that apply are those that would apply to the view's owner, which in this case is postgres, so no policies are applied. Or, quoting from the notes in the CREATE POLICY documentation: As with normal queries and views, permission checks and policies for the tables which are referenced by a view will use the view owner's rights and any policies which apply to the view owner. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bogus ANALYZE results for an otherwise-unique column with many nulls
On 5 August 2016 at 21:48, Tom Lanewrote: > OK, thanks. What shall we do about Andreas' request to back-patch this? > I'm personally willing to do it, but there is the old bugaboo of "maybe > it will destabilize a plan that someone is happy with". > My inclination would be to back-patch it because arguably it's a bug-fix -- at the very least the old behaviour didn't match the docs for stadistinct: The number of distinct nonnull data values in the column. A value greater than zero is the actual number of distinct values. A value less than zero is the negative of a multiplier for the number of rows in the table; for example, a column in which values appear about twice on the average could be represented by stadistinct = -0.5. Additionally, I think that example is misleading because it's only really true if there are no null values in the column. Perhaps it would help to have a more explicit example to illustrate how nulls affect stadistinct, for example: ... for example, a column in which about 80% of the values are nonnull and each nonnull value appears about twice on average could be represented by stadistinct = -0.4. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Combining hash values
On 1 August 2016 at 08:19, Greg Starkwrote: > Surely combining multiple hashes is the same problem as hashing a block of > memory? Shouldn't we just use the same algorithm as hash_any()? > Yes, I imagine that should work well. I suspect that Thomas's proposal would also probably work well, as would a number of other hashing algorithms with reasonable pedigree, such as the one used for array hashing. I don't have any particular preference, but I do know that what usually turns out to be disastrous is an arbitrary made-up formula like rotating and xor'ing. The last thing we should attempt to do is invent our own hashing algorithms. On that subject, while looking at hashfunc.c, I spotted that hashint8() has a very obvious deficiency, which causes disastrous performance with certain inputs: create table foo (a bigint); insert into foo select (x*2^32)+x from generate_series(1,100) g(x); analyse foo; select * from foo f1, foo f2 where f1.a=f2.a; With random values that query (using a hash join) takes around 2 seconds on my machine, but with the values above I estimate it will take 20 hours (although I didn't wait that long!). The reason is pretty obvious -- all the bigint values above hash to the same value. I'd suggest using hash_uint32() for values that fit in a 32-bit integer and hash_any() otherwise. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing numeric SUM() aggregate
On 27 July 2016 at 10:17, Andrew Borodinwrote: >> if (accum->maxdig > (INT_MAX - INT_MAX / NBASE) / (NBASE - 1)) > Woth noting that (INT_MAX - INT_MAX / NBASE) / (NBASE - 1) == INT_MAX > / NBASE for any NBASE > 1 Interesting. I think it's clearer the way it is in mul_var() though, because the intention is to avoid letting digits exceed INT_MAX - INT_MAX/NBASE, so that there is no danger of overflow in the carry propagation step. The long form makes that clearer (and is just as efficient, since these expressions will be evaluated by the preprocessor). Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing numeric SUM() aggregate
On 27 July 2016 at 09:57, Dean Rasheed <dean.a.rash...@gmail.com> wrote: > it could be > coded using something like > > accum->maxdig += NBASE - 1; > if (accum->maxdig > (INT_MAX - INT_MAX / NBASE) / (NBASE - 1)) > Propagate carries... > Oops, that's not quite right because maxdig is actually epresents the max possible value divided by NBASE-1 in mul_var(), so actually it ought to have been accum->maxdig++ in the first line. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing numeric SUM() aggregate
On 27 July 2016 at 07:33, Andrew Borodinwrote: >>I think we could do carry every 0x7FF / 1 accumulation, couldn't we? > > I feel that I have to elaborate a bit. Probably my calculations are wrong. > > Lets assume we already have accumulated INT_MAX of digits in > previous-place accumulator. That's almost overflow, but that's not > overflow. Carring that accumulator to currents gives us INT_MAX/1 > carried sum. > So in current-place accumulator we can accumulate: ( INT_MAX - INT_MAX > / 1 ) / , where is max value dropped in current-place > accumulator on each addition. > That is INT_MAX * / or simply INT_MAX / 1. > > If we use unsigned 32-bit integer that is 429496. Which is 43 times > less frequent carring. > > As a bonus, we get rid of const in the code (: > > Please correct me if I'm wrong. > This is basically the same problem that has already been solved in mul_var() and other places in numeric.c, so in this case it could be coded using something like accum->maxdig += NBASE - 1; if (accum->maxdig > (INT_MAX - INT_MAX / NBASE) / (NBASE - 1)) Propagate carries... I agree that the new code should avoid explicitly referring to constants like , and I don't think there is any reason for this new code to assume NBASE=1. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
On 30 May 2016 at 15:44, Andrew Gierth <and...@tao11.riddles.org.uk> wrote: >>>>>> "Dean" == Dean Rasheed <dean.a.rash...@gmail.com> writes: > > Dean> That may be so, but we already support FILTER for all windows > Dean> functions as well as aggregates: > > Not so: > > "If FILTER is specified, then only the input rows for which the > filter_clause evaluates to true are fed to the window function; other > rows are discarded. Only window functions that are aggregates accept a > FILTER clause." > Ah, yes. It's all coming back to me now. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
On 23 May 2016 at 17:01, Jeff Daviswrote: > On Fri, May 20, 2016 at 1:41 PM, David G. Johnston > wrote: >> How does the relatively new FILTER clause play into this, if at all? > > My interpretation of the standard is that FILTER is not allowable for > a window function, and IGNORE|RESPECT NULLS is not allowable for an > ordinary aggregate. > That may be so, but we already support FILTER for all windows functions as well as aggregates: https://www.postgresql.org/docs/current/static/sql-expressions.html#SYNTAX-AGGREGATES https://www.postgresql.org/docs/current/static/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS so to be clear, what we're talking about here is just supporting SQL standard syntax for window functions, rather than adding any new functionality, right? > So if we support IGNORE|RESPECT NULLS for anything other than a window > function, we have to come up with our own semantics. > Given that we can already do this using FILTER for aggregates, and that IGNORE|RESPECT NULLS for aggregates is not part of the SQL standard, I see no reason to support it. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS related docs
On 25 May 2016 at 02:04, Joe Conwaywrote: > Please see attached two proposed patches for the docs related to RLS: > > 1) Correction to pg_restore > 2) Additional mentions that "COPY FROM" does not allow RLS to be enabled > > Comments? > The pg_restore change looks good -- that was clearly wrong. Also, +1 for the new note in pg_dump. For COPY, I think perhaps it would be more logical to put the new note immediately after the third note which describes the privileges required, since it's kind of related, and then we can talk about the RLS policies required, e.g.: If row-level security is enabled for the table, COPY table TO is internally converted to COPY (SELECT * FROM table) TO, and the relevant security policies are applied. Currently, COPY FROM is not supported for tables with row-level security. > Related question: I believe > > COPY tbl TO ... > > is internally converted to > > COPY (select * FROM tbl) TO ... > > when RLS is involved. Do we want to document that? > I think so, yes, because that makes it clearer what policies will be applied. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql :: support for \ev viewname and \sv viewname
On 4 May 2016 at 13:23, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > On 5/4/16 3:21 AM, Dean Rasheed wrote: >> Well, appendStringLiteralAH() takes both, so that sets a precedent. > Works for me. Could you supply an updated patch with a static function > instead of a macro? Then I think this is good to go. > > (bonus points for some tests) > OK, pushed that way. I didn't get round to adding any tests though. I strikes me that the most likely bugs in this area are bugs of omission, like this and the missing PARALLEL SAFE recently fixed for functions. Ideally tests would be able to spot those kinds of issues, but it's not obvious how to write such tests. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More inaccurate results from numeric pow()
On 2 May 2016 at 18:38, Tom Lanewrote: > I don't much care for the hardwired magic number here, especially since > exp_var() does not have its limit expressed as "6000" but as > "NUMERIC_MAX_RESULT_SCALE * 3". I think you should rephrase the limit > to use that expression, and also add something like this in exp_var(): OK, that makes sense. Done that way. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql :: support for \ev viewname and \sv viewname
On 4 May 2016 at 01:35, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > On 5/3/16 3:10 PM, Dean Rasheed wrote: >> On 3 May 2016 at 16:52, Peter Eisentraut >>> >>> I would change appendReloptionsArrayAH() to a function and keep AH as the >>> first argument (similar to other functions that take such a handle). >> >> I can understand changing it to a function, but I don't think AH >> should be the first argument. All other append*() functions that >> append to a buffer have the buffer as the first argument, including >> the appendStringLiteralAH() macro on which this is based. > > Well, all the functions that take archive handles have that as the first > argument, so how do we consolidate that? > Well, appendStringLiteralAH() takes both, so that sets a precedent. And I think that makes sense too. The functions that take an archive handle as their first argument are mostly functions whose primary concern is to operate on the archive in some way. All the append*() functions that take a buffer as their first argument are primarily concerned with operating on the buffer. I'd say appendStringLiteralAH() and appendReloptionsArrayAH() fall very much into that second category. They only take an archive handle to get the encoding and std_strings settings controlling *how* they operate on the buffer. The main purpose of those append*() functions is to append to a buffer, so it makes sense that that is their first argument. All the append*() functions are consistent in their argument ordering, including those that also take an archive handle, so I think appendReloptionsArrayAH() should follow that pattern. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql :: support for \ev viewname and \sv viewname
On 3 May 2016 at 16:52, Peter Eisentrautwrote: > I would change appendReloptionsArrayAH() to a function and keep AH as the > first argument (similar to other functions that take such a handle). I can understand changing it to a function, but I don't think AH should be the first argument. All other append*() functions that append to a buffer have the buffer as the first argument, including the appendStringLiteralAH() macro on which this is based. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More inaccurate results from numeric pow()
On 2 May 2016 at 19:40, Robert Haas <robertmh...@gmail.com> wrote: > On Mon, May 2, 2016 at 1:02 PM, Dean Rasheed <dean.a.rash...@gmail.com> wrote: >> Doing some more testing of the numeric code patched in [1] I noticed >> another case where the result is inaccurate -- computing 0.12 ^ >> -2345.6 gives a very large number containing 2162 digits, but only the >> first 2006 correct, while the last 156 digits are wrong. > > Just out of curiosity, how can you tell? Where do you get alternate > output to compare against? > The easiest way is to use bc, although it's pretty slow for this kind of thing. > Also, I wonder what we think the contract with the user is in cases > like this. My expectation is that the numeric computations should generally produce results that are correct in all, or nearly all, digits returned. This is commonly expressed in terms of ULP's -- https://en.wikipedia.org/wiki/Unit_in_the_last_place. An error of a few ULP's may be OK, but I don't think hundreds of ULP's is OK. Actually I think this particular issue is mostly of academic interest, but we went to some effort to get accurate results in the previous patch, and this is just closing a loophole to hopefully complete that work. Surely, if we were dealing with floating point numbers, > nobody would expect a calculation like this to be accurate beyond the > first n digits, where n is surely much less than 2006. I like the > fact that numeric has a lot more precision than any built-in floating > point type, but does it have to get every digit in front of the > decimal point exactly right no matter how many there are? > I would say it should come close. Otherwise we're just returning a lot of noise. Note that there is a limit to how many digits we will ever return, so this is manageable. > rhaas=# select tan(pi()::numeric/2), tan(pi()/2); >tan | tan > -+-- > 618986325617924 | 1.63312393531954e+16 > (1 row) > That doesn't prove anything, since we haven't implemented tan(numeric) (and I don't plan to), so the above is actually select tan((pi()::numeric/2)::float8), tan(pi()/2); and it's not surprising that the 2 results are wildly different (and infinitely far away from the correct answer). It's using tan(float8) in both cases, just with slightly different float8 inputs. There's not really any reason to expect particularly accurate results from float8 functions in cases like this. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] More inaccurate results from numeric pow()
Doing some more testing of the numeric code patched in [1] I noticed another case where the result is inaccurate -- computing 0.12 ^ -2345.6 gives a very large number containing 2162 digits, but only the first 2006 correct, while the last 156 digits are wrong. The reason is this code in power_var(): /* limit to something that won't cause integer overflow */ val = Max(val, -NUMERIC_MAX_RESULT_SCALE); val = Min(val, NUMERIC_MAX_RESULT_SCALE); where "val" is the approximate decimal result weight. Here NUMERIC_MAX_RESULT_SCALE is 2000, so it's clamping the estimated result weight to 2000, and therefore reducing the rscale in the subsequent calculations, causing the loss of precision at around 2000 digits. In fact it's possible to predict exactly how large we need to allow "val" to become, since the final result is computed using exp_var(), which accepts inputs up to 6000, so the result weight "val" can be up to around log10(exp(6000)) ~= 2606 before the final result causes an overflow. The obvious fix would be to modify the clamping limits. I think a better answer though is to replace the clamping code with an overflow test, immediately throwing an error if "val" is outside the allowed range, per the attached patch. This has the advantage that it avoids some expensive computations in the case where the result will end up overflowing, but more importantly it means that power_var() isn't so critically dependent on the limits of exp_var() -- if someone in the future increased the limits of exp_var() without touching power_var(), and power_var() clamped to the old range, the problem would resurface. But doing an overflow test in power_var() instead of clamping "val", it would either compute an accurate result, or throw an overflow error early on. There should be no possibility of it returning an inaccurate result. Regards, Dean [1] http://www.postgresql.org/message-id/CAEZATCV7w+8iB=07dj8q0zihxqt1semcqutek+4_rogc_zq...@mail.gmail.com diff --git a/src/backend/utils/adt/numeric.c b/src/backend/utils/adt/numeric.c new file mode 100644 index 3ba373a..b4a2829 --- a/src/backend/utils/adt/numeric.c +++ b/src/backend/utils/adt/numeric.c @@ -8008,9 +8008,16 @@ power_var(NumericVar *base, NumericVar * val *= 0.434294481903252; /* approximate decimal result weight */ - /* limit to something that won't cause integer overflow */ - val = Max(val, -NUMERIC_MAX_RESULT_SCALE); - val = Min(val, NUMERIC_MAX_RESULT_SCALE); + /* + * Apply a crude overflow test so we can exit early if the result is sure + * to overflow. exp_var() supports inputs up to 6000, so we know that the + * result will overflow if the approximate decimal result weight exceeds + * log10(exp(6000)) ~= 2610. + */ + if (Abs(val) > 2610) + ereport(ERROR, +(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), + errmsg("value overflows numeric format"))); /* choose the result scale */ rscale = NUMERIC_MIN_SIG_DIGITS - (int) val; diff --git a/src/test/regress/expected/numeric_big.out b/src/test/regress/expected/numeric_big.out new file mode 100644 index 1b3608a..8bb09c9 --- a/src/test/regress/expected/numeric_big.out +++ b/src/test/regress/expected/numeric_big.out @@ -1075,6 +1075,20 @@ SELECT b, p, bc_result, b^p AS power, b^ 27.234987 | 20.230957 | 108142427112079606637962972621.121293 | 108142427112079606637962972621.121293 |0.00 (41 rows) +-- Inputs close to overflow +-- +-- bc(1) results computed with a scale of 2700 and truncated to 4 decimal +-- places. +WITH t(b, p, bc_result) AS (VALUES +(0.12, -2829.8369,
Re: [HACKERS] psql :: support for \ev viewname and \sv viewname
On 27 April 2016 at 08:36, Dean Rasheed <dean.a.rash...@gmail.com> wrote: > Here is a rough patch based on the way pg_dump handles this. It still > needs a bit of polishing -- in particular I think fmtReloptionsArray() > (copied from pg_dump) should probably be moved to string_utils.c so > that it can be shared between pg_dump and psql. Also, I'm not sure > that's the best name for it -- I think appendReloptionsArray() is a > more accurate description of what is does. > Here are updated patches doing that --- the first moves fmtReloptionsArray() from pg_dump.c to fe_utils/string_utils.c so that it can be shared by pg_dump and psql, and renames it to appendReloptionsArray(). The second patch fixes the actual psql bug. Regards, Dean From a01c897bf2945f8ebef39bf3c58bb14e1739abf9 Mon Sep 17 00:00:00 2001 From: Dean Rasheed <dean.a.rash...@gmail.com> Date: Mon, 2 May 2016 11:27:15 +0100 Subject: [PATCH 1/2] Move and rename fmtReloptionsArray(). Move fmtReloptionsArray() from pg_dump.c to string_utils.c so that it is available to other frontend code. In particular psql's \ev and \sv commands need it to handle view reloptions. Also rename the function to appendReloptionsArray(), which is a more accurate description of what it does. --- src/bin/pg_dump/pg_backup.h | 7 src/bin/pg_dump/pg_dump.c | 80 +++-- src/fe_utils/string_utils.c | 72 + src/include/fe_utils/string_utils.h | 3 ++ 4 files changed, 88 insertions(+), 74 deletions(-) diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h index 83f6029..6b162a4 100644 --- a/src/bin/pg_dump/pg_backup.h +++ b/src/bin/pg_dump/pg_backup.h @@ -288,4 +288,11 @@ extern int archprintf(Archive *AH, const char *fmt,...) pg_attribute_printf(2, 3 #define appendStringLiteralAH(buf,str,AH) \ appendStringLiteral(buf, str, (AH)->encoding, (AH)->std_strings) +#define appendReloptionsArrayAH(buf,reloptions,prefix,AH) \ + do { \ + if (!appendReloptionsArray(buf, reloptions, prefix, \ + (AH)->encoding, (AH)->std_strings)) \ + write_msg(NULL, "WARNING: could not parse reloptions array\n"); \ + } while (0) + #endif /* PG_BACKUP_H */ diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index d3f5157..3005bf5 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -261,8 +261,6 @@ static void binary_upgrade_extension_member(PQExpBuffer upgrade_buffer, static const char *getAttrName(int attrnum, TableInfo *tblInfo); static const char *fmtCopyColumnList(const TableInfo *ti, PQExpBuffer buffer); static bool nonemptyReloptions(const char *reloptions); -static void fmtReloptionsArray(Archive *fout, PQExpBuffer buffer, - const char *reloptions, const char *prefix); static char *get_synchronized_snapshot(Archive *fout); static PGresult *ExecuteSqlQueryForSingleRow(Archive *fout, char *query); static void setupDumpWorker(Archive *AHX); @@ -15046,7 +15044,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) if (nonemptyReloptions(tbinfo->reloptions)) { appendPQExpBufferStr(q, " WITH ("); - fmtReloptionsArray(fout, q, tbinfo->reloptions, ""); + appendReloptionsArrayAH(q, tbinfo->reloptions, "", fout); appendPQExpBufferChar(q, ')'); } result = createViewAsClause(fout, tbinfo); @@ -15301,13 +15299,14 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo) if (nonemptyReloptions(tbinfo->reloptions)) { addcomma = true; -fmtReloptionsArray(fout, q, tbinfo->reloptions, ""); +appendReloptionsArrayAH(q, tbinfo->reloptions, "", fout); } if (nonemptyReloptions(tbinfo->toast_reloptions)) { if (addcomma) appendPQExpBufferStr(q, ", "); -fmtReloptionsArray(fout, q, tbinfo->toast_reloptions, "toast."); +appendReloptionsArrayAH(q, tbinfo->toast_reloptions, "toast.", + fout); } appendPQExpBufferChar(q, ')'); } @@ -15908,7 +15907,7 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo) if (nonemptyReloptions(indxinfo->indreloptions)) { appendPQExpBufferStr(q, " WITH ("); -fmtReloptionsArray(fout, q, indxinfo->indreloptions, ""); +appendReloptionsArrayAH(q, indxinfo->indreloptions, "", fout); appendPQExpBufferChar(q, ')'); } @@ -16809,7 +16808,7 @@ dumpRule(Archive *fout, RuleInfo *rinfo) { appendPQExpBuffer(cmd, "ALTER VIEW %s SET (", fmtId(tbinfo->dobj.name)); - fmtReloptionsArray(fout, cmd, rinfo->reloptions, ""); + appendReloptionsArrayAH(cmd, rinfo->reloptions, "", fout); appendPQExpBufferStr(cmd, ");\n"); } @@ -17704,73 +17703,6 @@ nonemptyReloptions(const char *reloptions) } /* - * Format a reloptions array and append it to the
Re: [HACKERS] psql :: support for \ev viewname and \sv viewname
[Looking back over old threads] On 22 July 2015 at 22:00, Dean Rasheed <dean.a.rash...@gmail.com> wrote: > This appears to be missing support for view options (WITH CHECK OPTION > and security_barrier), so editing a view with either of those options > will cause them to be stripped off. It seems like this issue was never addressed, and it needs to be fixed for 9.6. Here is a rough patch based on the way pg_dump handles this. It still needs a bit of polishing -- in particular I think fmtReloptionsArray() (copied from pg_dump) should probably be moved to string_utils.c so that it can be shared between pg_dump and psql. Also, I'm not sure that's the best name for it -- I think appendReloptionsArray() is a more accurate description of what is does. Regards, Dean diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c new file mode 100644 index 4fa7760..96bc64d --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -72,6 +72,7 @@ static bool lookup_object_oid(EditableOb Oid *obj_oid); static bool get_create_object_cmd(EditableObjectType obj_type, Oid oid, PQExpBuffer buf); +static bool fmtReloptionsArray(PQExpBuffer buffer, const char *reloptions); static int strip_lineno_from_objdesc(char *obj); static int count_lines_in_buf(PQExpBuffer buf); static void print_with_linenumbers(FILE *output, char *lines, @@ -3274,12 +3275,51 @@ get_create_object_cmd(EditableObjectType * CREATE for ourselves. We must fully qualify the view name to * ensure the right view gets replaced. Also, check relation kind * to be sure it's a view. + * + * Starting with 9.2, views may have reloptions (security_barrier) + * and from 9.4 onwards they may also have WITH [LOCAL|CASCADED] + * CHECK OPTION. These are not part of the view definition + * returned by pg_get_viewdef() and so need to be retrieved + * separately. Materialized views (introduced in 9.3) may have + * arbitrary storage parameter reloptions. */ - printfPQExpBuffer(query, - "SELECT nspname, relname, relkind, pg_catalog.pg_get_viewdef(c.oid, true) FROM " - "pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n " - "ON c.relnamespace = n.oid WHERE c.oid = %u", - oid); + if (pset.sversion >= 90400) + { +printfPQExpBuffer(query, + "SELECT nspname, relname, relkind, " + "pg_catalog.pg_get_viewdef(c.oid, true), " + "array_remove(array_remove(c.reloptions,'check_option=local'),'check_option=cascaded') AS reloptions, " + "CASE WHEN 'check_option=local' = ANY (c.reloptions) THEN 'LOCAL'::text " + "WHEN 'check_option=cascaded' = ANY (c.reloptions) THEN 'CASCADED'::text ELSE NULL END AS checkoption " + "FROM pg_catalog.pg_class c " + "LEFT JOIN pg_catalog.pg_namespace n " +"ON c.relnamespace = n.oid WHERE c.oid = %u", + oid); + } + else if (pset.sversion >= 90200) + { +printfPQExpBuffer(query, + "SELECT nspname, relname, relkind, " + "pg_catalog.pg_get_viewdef(c.oid, true), " + "c.reloptions AS reloptions, " + "NULL AS checkoption " + "FROM pg_catalog.pg_class c " + "LEFT JOIN pg_catalog.pg_namespace n " +"ON c.relnamespace = n.oid WHERE c.oid = %u", + oid); + } + else + { +printfPQExpBuffer(query, + "SELECT nspname, relname, relkind, " + "pg_catalog.pg_get_viewdef(c.oid, true), " + "NULL AS reloptions, " + "NULL AS checkoption " + "FROM pg_catalog.pg_class c " + "LEFT JOIN pg_catalog.pg_namespace n " +"ON c.relnamespace = n.oid WHERE c.oid = %u", + oid); + } break; } @@ -3304,6 +3344,8 @@ get_create_object_cmd(EditableObjectType char *relname = PQgetvalue(res, 0, 1); char *relkind = PQgetvalue(res, 0, 2); char *viewdef = PQgetvalue(res, 0, 3); + char *reloptions = PQgetvalue(res, 0, 4); + char *checkoption = PQgetvalue(res, 0, 5); /* * If the backend ever supports CREATE OR REPLACE @@ -3328,11 +3370,30 @@ get_create_object_cmd(EditableObjectType break; } appendPQExpBuffer(buf, "%s.", fmtId(nspname)); - appendPQExpBuffer(buf, "%s AS\n", fmtId(relname)); - appendPQExpBufferStr(buf, viewdef); + appendPQExpBufferStr(buf, fmtId(relname)); + + /* reloptions, if not an empty array "{}" */ + if (reloptions != NULL && strlen(reloptions) > 2) + { + appendPQExpBufferStr(buf, "\n WITH ("); + if (!fmtReloptionsArray(buf, reloptions)) + { + psql_error("Could not parse relopti
Re: [HACKERS] [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.
On 26 April 2016 at 16:26, Tom Lanewrote: > The next time somebody proposes that we can get exact results out of > floating-point arithmetic, I'm going to run away screaming. > Yeah, I think I will have the same reaction. Thanks for all your hard work getting this to work. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.
On 26 April 2016 at 04:48, Andres Freundwrote: > No, I think we got to do this in all branches. I was just wondering > about how to fix vm_extend(). Which I do think we got to fix, even in > the back-branches. > I think replacing CacheInvalidateSmgr() with CacheInvalidateRelcache() in vm_extend() is probably the safer thing to do, and ought to be relatively harmless. It means that an index-only scan won't be notified of VM extension until the end of the other transaction, which might lead to extra heap fetches, but I think that's unlikely to have any performance impact because it ought to be a fairly rare event, and if it was another transaction adding tuples, they wouldn't be all visible before it was committed anyway, so the extra heap fetches would be required in any case. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.
On 26 April 2016 at 04:25, Tom Lanewrote: > In short, these tests suggest that we need a coding pattern like > this: > > volatile float8 asin_x = asin(x); > return (asin_x / asin_0_5) * 30.0; > > We could drop the "volatile" by adopting -ffloat-store, but that > doesn't seem like a better answer, because -ffloat-store would > pessimize a lot of code elsewhere. (It causes a whole lot of > changes in float.c, for sure.) Agreed. That looks like the least hacky way of solving the problem. I think it's more readable when the logic is kept local, and it's preferable to avoid any compiler-specific options or global flags that would affect other code. 6b1a213bbd6599228b2b67f7552ff7cc378797bf by itself looks like a worthwhile improvement that ought to be more robust, even though it didn't fix this problem. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.
On 19 April 2016 at 14:38, Tom Lanewrote: > Yeah, what I was thinking of printing is something like > > asind(x), > asind(x) IN (-90,-30,0,30,90) AS asind_exact, > ... > > with extra_float_digits=3. The point of this is not necessarily to give > any extra information, though it might, but for failures to be more easily > interpretable. If I'd forgotten how the test worked just a few months > after committing it, how likely is it that some random user faced with a > similar failure would understand what they were seeing? > > Also, though I agree that it might not help much to know whether the > output is 45.0001 or 44., our thoughts would > be trending in quite a different direction if it turns out that the > output is radically wrong, or even a NaN. The existing test cannot > exclude that possibility. > OK, that sounds like it would be a useful improvement to the tests. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.
On 19 April 2016 at 05:16, Noah Mischwrote: > On Mon, Apr 18, 2016 at 11:56:55PM -0400, Tom Lane wrote: >> Noah Misch writes: >> > On Mon, Apr 18, 2016 at 10:22:59PM -0400, Tom Lane wrote: >> >> We could alternatively set extra_float_digits to its max value and hope >> >> that off-by-one-in-the-last-place values would get printed as something >> >> visibly different from the exact result. I'm not sure I want to trust >> >> that that works reliably; but maybe it would be worth printing the >> >> result both ways, just to provide additional info when there's a failure. >> >> > We'd have an independent problem if extra_float_digits=3 prints the same >> > digits for distinguishable float values, so I wouldn't mind relying on it >> > not >> > to do that. But can we expect the extra_float_digits=3 representation of >> > those particular values to be the same for every implementation? >> >> Hm? The expected answer is exact (30, 45, or whatever) in each case. >> If we get some residual low-order digits then it's a failure, so we don't >> need to worry about whether it's the same failure everywhere. > > Does something forbid snprintf implementations from printing '45'::float8 as > 45.0001 under extra_float_digits=3? I'm not sure it's really worth having the test output something like 45.0001 since that extra detail doesn't really seem particularly useful beyond the fact that the result wasn't exactly 45. Also you'd have to be careful how you modified the test, since it's possible that 45.0001 might be printed as '45' even under extra_float_digits=3 and so there'd be a risk of the test passing when it ought to fail, unless you also printed something else out to indicate exactness. Thinking about the actual failure in this case (asind(0.5) not returning exactly 30) a couple of theories spring to mind. One is that the compiler is being more aggressive over function inlining, so init_degree_constants() is being inlined, and then asin_0_5 is being evaluated at compile time rather than runtime, giving a slightly different result, as happened in the original version of this patch. Another theory is that the compiler is performing an unsafe ordering rearrangement and transforming (asin(x) / asin_0_5) * 30.0 into asin(x) * (30.0 / asin_0_5). This might happen for example under something like -funsafe-math-optimizations. I did a search for other people encountering similar problems and I came across the following quite interesting example in the Gnu Scientific Library's implementation of log1p() -- https://github.com/ampl/gsl/blob/master/sys/log1p.c. The reason the code is now written in that way is in response to this bug: https://lists.gnu.org/archive/html/bug-gsl/2007-07/msg8.html with an overly aggressive compiler. So using that technique, we might try using a volatile local variable to enforce the desired evaluation order, e.g.: volatile double tmp; tmp = asin(x) / asin_0_5; return tmp * 30.0; A similar trick could be used to protect against init_degree_constants() being inlined, by writing it as volatile double one_half = 0.5; asin_0_5 = asin(one_half); since then the compiler wouldn't be able to assume that one_half was still equal to 0.5, and wouldn't be able to optimise away the runtime evaluation of asin() in favour of a compile-time constant. These are both somewhat unconventional uses of volatile, but I think they stand a better chance of being more portable, and also more future-proof against compilers that might in the future make more advanced code inlining and rearrangement choices. Of course this is all pure speculation at this stage, but it seems like it's worth a try. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improving GROUP BY estimation
On 31 March 2016 at 22:02, Tom Lanewrote: > I'm just concerned about what happens when the Dellera paper stops being > available. I don't mind including that URL as a backup to the written-out > argument I just suggested. > Here's an updated patch with references to both papers, and a more detailed justification for the formula, along with the other changes discussed. Note that although equation (2) in the Dell'Era paper looks different from the Yao formula, it's actually the same. Regards, Dean diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c new file mode 100644 index a6555e9..99f5f7c --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -3439,9 +3439,51 @@ estimate_num_groups(PlannerInfo *root, L reldistinct = clamp; /* - * Multiply by restriction selectivity. + * Update the estimate based on the restriction selectivity, + * guarding against division by zero when reldistinct is zero. + * Also skip this if we know that we are returning all rows. */ - reldistinct *= rel->rows / rel->tuples; + if (reldistinct > 0 && rel->rows < rel->tuples) + { +/* + * Given a table containing N rows with n distinct values in a + * uniform distribution, if we select p rows at random then + * the expected number of distinct values selected is + * + * n * (1 - product((N-N/n-i)/(N-i), i=0..p-1)) + * + * = n * (1 - (N-N/n)! / (N-N/n-p)! * (N-p)! / N!) + * + * See "Approximating block accesses in database + * organizations", S. B. Yao, Communications of the ACM, + * Volume 20 Issue 4, April 1977 Pages 260-261. + * + * Alternatively, re-arranging the terms from the factorials, + * this may be written as + * + * n * (1 - product((N-p-i)/(N-i), i=0..N/n-1)) + * + * This form of the formula is more efficient to compute in + * the common case where p is larger than N/n. Additionally, + * as pointed out by Dell'Era, if i << N for all terms in the + * product, it can be approximated by + * + * n * (1 - ((N-p)/N)^(N/n)) + * + * See "Expected distinct values when selecting from a bag + * without replacement", Alberto Dell'Era, + * http://www.adellera.it/investigations/distinct_balls/. + * + * The condition i << N is equivalent to n >> 1, so this is a + * good approximation when the number of distinct values in + * the table is large. It turns out that this formula also + * works well even when n is small. + */ +reldistinct *= + (1 - pow((rel->tuples - rel->rows) / rel->tuples, + rel->tuples / reldistinct)); + } + reldistinct = clamp_row_est(reldistinct); /* * Update estimate of total distinct groups. diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out new file mode 100644 index de64ca7..0fc93d9 --- a/src/test/regress/expected/subselect.out +++ b/src/test/regress/expected/subselect.out @@ -807,27 +807,24 @@ select * from int4_tbl where explain (verbose, costs off) select * from int4_tbl o where (f1, f1) in (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1); - QUERY PLAN --- - Hash Join + QUERY PLAN + + Hash Semi Join Output: o.f1 Hash Cond: (o.f1 = "ANY_subquery".f1) -> Seq Scan on public.int4_tbl o Output: o.f1 -> Hash Output: "ANY_subquery".f1, "ANY_subquery".g - -> HashAggregate + -> Subquery Scan on "ANY_subquery" Output: "ANY_subquery".f1, "ANY_subquery".g - Group Key: "ANY_subquery".f1, "ANY_subquery".g - -> Subquery Scan on "ANY_subquery" - Output: "ANY_subquery".f1, "ANY_subquery".g - Filter: ("ANY_subquery".f1 = "ANY_subquery".g) - -> HashAggregate - Output: i.f1, (generate_series(1, 2) / 10) - Group Key: i.f1 - -> Seq Scan on public.int4_tbl i - Output: i.f1 -(18 rows) + Filter: ("ANY_subquery".f1 = "ANY_subquery".g) + -> HashAggregate + Output: i.f1, (generate_series(1, 2) / 10) + Group Key: i.f1 + -> Seq Scan on public.int4_tbl i + Output: i.f1 +(15 rows) select * from int4_tbl o where (f1, f1) in (select f1, generate_series(1,2) / 10 g from int4_tbl i group by f1); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improving GROUP BY estimation
On 31 March 2016 at 21:40, Tom Lanewrote: > Alvaro Herrera writes: >> Tom Lane wrote: >>> Another minor gripe is the use of a random URL as justification. This >>> code will still be around when that URL exists nowhere but the Wayback >>> Machine. Can't we find a more formal citation to use? > >> The article text refers to this 1977 S. B. Yao paper "Approximating >> block accesses in database organizations" which doesn't appear to be >> available online, except behind ACM's paywall at >> http://dl.acm.org/citation.cfm?id=359475 > > Well, a CACM citation is perfectly fine by my lights (especially one > that's that far back and therefore certainly patent-free ...) > > Let's use something like this: > > See "Approximating block accesses in database organizations", S. B. Yao, > Communications of the ACM, Volume 20 Issue 4, April 1977 Pages 260-261 > Sounds good. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improving GROUP BY estimation
On 31 March 2016 at 20:18, Tom Lanewrote: > Also, I wonder if it'd be a good idea to provide a guard against division > by zero --- we know rel->tuples > 0 at this point, but I'm less sure that > reldistinct can't be zero. In the same vein, I'm worried about the first > argument of pow() being slightly negative due to roundoff error, leading > to a NaN result. Yeah, that makes sense. In fact, if we only apply the adjustment when reldistinct > 0 and rel->rows < rel->tuples, and rewrite the first argument to pow() as (rel->tuples - rel->rows) / rel->tuples, then it is guaranteed to be non-negative. If rel->rows >= rel->tuples (not sure if it can be greater), then we just want the original reldistinct. > Maybe we should also consider clamping the final reldistinct estimate to > an integer with clamp_row_est(). The existing code doesn't do that but > it seems like a good idea on general principles. OK, that seems sensible. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improving GROUP BY estimation
On 30 March 2016 at 14:03, Tomas Vondrawrote: > Attached is v4 of the patch Thanks, I think this is good to go, except that I think we need to use pow() rather than powl() because AIUI powl() is new to C99, and so won't necessarily be available on all supported platforms. I don't think we need worry about loss of precision, since that would only be an issue if rel->rows / rel->tuples were smaller than maybe 10^-14 or so, and it seems unlikely we'll get anywhere near that any time soon. I think this is a good, well thought-out change, so unless anyone objects I'll push it (probably this weekend). Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Add generate_series(date,date) and generate_series(date,date,integer)
On 16 March 2016 at 23:32, David Steelewrote: > On 3/10/16 1:24 PM, Corey Huinker wrote: > >> New patch for Alvaro's consideration. >> >> Very minor changes since the last time, the explanations below are >> literally longer than the changes: >> - Rebased, though I don't think any of the files had changed in the >> mean time >> - Removed infinity checks/errors and the test cases to match >> - Amended documentation to add 'days' after 'step' as suggested >> - Did not add a period as suggested, to remain consistent with other >> descriptions in the same sgml table >> - Altered test case and documentation of 7 day step example so that >> the generated dates do not land evenly on the end date. A reader >> might incorrectly infer that the end date must be in the result set, >> when it doesn't have to be. >> - Removed unnecessary indentation that existed purely due to >> following of other generate_series implementations > > > As far as I can see overall support is in favor of this patch although it is > not overwhelming by any means. > > I think in this case it comes down to a committer's judgement so I have > marked this "ready for committer" and passed the buck on to Álvaro. > So I was pretty much "meh" on this patch too, because I'm not convinced it actually saves much typing, if any. However, I now realise that it introduces a backwards-compatibility breakage. Today it is possible to type SELECT * FROM generate_series('01-01-2000'::date, '01-04-2000', '7 days'); and it works with just that one cast. However, this patch breaks that. Now I'm not saying that I have used the above construct. Probably not in fact, but maybe my work colleagues have. I honestly can't say, but the thought of having to grep through thousands of lines of application code to check isn't particularly appealing. So I'm afraid it's -1 from me. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improving GROUP BY estimation
On 18 March 2016 at 00:37, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: >> On Sun, 2016-03-13 at 15:24 +, Dean Rasheed wrote: >>> I think that a better formula to use would be >>> >>> reldistinct *= (1 - powl(1 - rel-rows / rel->tuples, rel->tuples / >>> reldistinct) > > Attached is a v3 of the patch using this formula instead of the original > one. Interestingly, that apparently reduces the number of regression tests > that get broken to a single one. > Cool. > I'm not sure whether we need to provide a link to the PDF the formula comes > from - perhaps we should? > Probably a better URL to give is http://www.adellera.it/investigations/distinct_balls/ which has a link to the PDF version of the paper and also some supporting material. However, while that paper is in general very clear, I don't think it gives a very clear explanation of that particular formula, and it doesn't explain what it represents. It merely says that if "i" can be ignored "for some reason (e.g. i << Nr)", then that formula is an approximation to the exact "without replacement" formula, which is the subject of that paper. But actually, that formula is the exact formula for the expected number of distinct values when selecting *with replacement* from a collection where each of the distinct values occurs an equal number of times. So I think we should say that. Perhaps something along the lines of: /* * Update the estimate based on the restriction selectivity. * * This uses the formula for the expected number of distinct values * when selecting with replacement from a collection where each of * the distinct values occurs an equal number of times (a uniform * distribution of values). This is a very close approximation to * the more correct method of selecting without replacement when * the number of distinct values is quite large --- for example, * see http://www.adellera.it/investigations/distinct_balls/. * Additionally, it also turns out to work very well even when the * number of distinct values is small. */ Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improving GROUP BY estimation
On 4 March 2016 at 13:10, Tomas Vondrawrote: > The risk associated with over-estimation is that we may switch from > HashAggregate to GroupAggregate, and generally select paths better > suited for large number of rows. Not great, because the plan may not be > optimal and thus run much slower - but that usually only happens on > small tables, because on large tables we would probably end up using > those same paths anyway. > > With under-estimation, the main risks are much graver, as we may end up > using HashAggregate only to get killed by OOM. When we're lucky not to > hit that, my experience this often leads to a cascade of NestedLoop > nodes processing orders of magnitude more tuples than expected. Awful. > > So I think the risk is asymmetric, and that's why I like the new > estimator more, because it tends to over-estimate, but in a very > reasonable way. > Agreed. Under-estimating is worse than over-estimating. - reldistinct *= rel->rows / rel->tuples; + reldistinct *= (1 - powl((reldistinct - 1) / reldistinct, rel->rows) Looking at this, I agree that this new formula from [1] is generally better than the old one in most (but not all) cases. One problem with it is that it no longer takes into account rel->tuples, which means that when returning all the tuples from the table, it no longer gives an estimate equal to the total number of distinct values in the table. In fact it tends to underestimate when returning nearly all the rows from the table. The old formula is clearly not a good general-purpose estimate. However, there is one case where it does return an accurate estimate -- when all the values in the underlying table are distinct. In this case the new formula consistently produces underestimates, while the old formula works well. For example: CREATE TABLE t AS SELECT (1 * random())::int a, i::int b FROM generate_series(1,100) s(i); ANALYSE t; EXPLAIN ANALYSE SELECT a FROM t GROUP BY a; So there are clearly around 1 million distinct values for "a", but the new formula gives an estimate of around 630,000. That's not a massive underestimate, but it's sufficiently obvious and visible that it would be good to do better if we can. I think that a better formula to use would be reldistinct *= (1 - powl(1 - rel-rows / rel->tuples, rel->tuples / reldistinct) This comes from [2], which gives a general formula for selection without replacement, and then gives the above as an approximation to the uniform distribution case. It's not really made clear in that paper, but that approximation is actually the "with replacement" approximation, but starting from different initial assumptions to give a formula that has better boundary conditions and special-case behaviour, as described below. For the record, here's a quick analysis of how the 2 formulae come about: Assume there are: N rows in the table n distinct values (n <= N) p rows are chosen at random (p <= N) so the problem is to estimate how many distinct values there will be in the p rows chosen. Both approaches work by first estimating the probability that some particular value x is *not* chosen. [1] starts from the assumption that each row of the table has a probability of 1/n of being equal to x. So the probability that x is not the first value chosen is P(not x, 1) = 1 - 1/n Then, if the value chosen first is replaced, the probability that x is not the second value chosen is the same. The "with replacement" approximation makes each choice an independent probability, so the overall probability that x is not in any of the p rows chosen is just the product of the p individual probabilities, which is just P(not x, p) = (1 - 1/n)^p Then of course the probability that x *is* chosen at least once in the p rows is P(x, p) = 1 - (1 - 1/n)^p Finally, since there are n possible values of x, and they're all equally likely in the table, the expected number of distinct values is D(p) = n * (1 - (1 - 1/n)^p) The flaw in this approach is that for large p the "with replacement" approximation gets worse and worse, and the probabilities P(x, p) systematically under-estimate the actual probability that x is chosen, which increases as more values not equal to x are chosen. By the time the last value is chosen P(x, p=N) should actually be 1. [2] starts from a different initial assumption (uniform distribution case) -- for any given value x, assume that the table contains N/n instances of x (ignoring the fact that that might not be an integer). Note that with this assumption there is guaranteed to be at least one instance of x, which is not the case with the above approach. Now consider the first instance of x in the table. If we choose p rows from the table, the probability that that first instance of x is not chosen is P(not x[1], p) = 1 - p / N Making the same "with replacement" simplification, the probability that the
Re: [HACKERS] custom function for converting human readable sizes to bytes
On 18 February 2016 at 10:05, Dean Rasheed <dean.a.rash...@gmail.com> wrote: > OK, I'll add a check for that. > Thanks for testing. > Pushed, with a catversion bump. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom function for converting human readable sizes to bytes
On 18 February 2016 at 02:00, Vitaly Burovoywrote: >> + else >> + have_digits = false; > Does it worth to move it to the declaration and remove that "else" block? > + boolhave_digits = false; OK, that's probably a bit neater. > Is it important to use: >> + ObjectIdGetDatum(InvalidOid), >> + Int32GetDatum(-1))); > instead of "0, -1"? Previously I left immediate constants because I > found the same thing in jsonb.c (rows 363, 774)... I think it's preferable to use the macros because they make it clearer what datatypes are being passed around. I think that's the more common style, but the JSONB code is not technically wrong either. >> + if (pg_strcasecmp(strptr, "bytes") == 0) >> + else if >> + else if >> + else if > It seems little ugly for me. In that case (not using values from GUC) > I'd create a static array for it and do a small cycle via it. Would it > better? That's a matter of personal preference. I prefer the values inline because then the values are closer to where they're being used, which for me makes it easier to follow (see e.g., convert_priv_string()). In guc.c they're in lookup tables because they're referred to from multiple places, and because it needs to switch between tables based on context, neither of which is true here. If there is a need to re-use these values elsewhere in the future it can be refactored, but right now that feels like an over-engineered solution for this problem. >> + NumericGetDatum(mul_num), > Two more space for indentation. pgindent pulled that back. > Also I've found that with allowing exponent there is a weird result > (we tried to avoid "type numeric" in the error message): > SELECT > pg_size_bytes('123e1000 > kB'); > ERROR: invalid input syntax for type numeric: > "123e1000" OK, I'll add a check for that. Thanks for testing. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] Proposal for \crosstabview in psql
On 17 February 2016 at 02:32, Jim Nasby <jim.na...@bluetreble.com> wrote: > On 2/11/16 4:21 AM, Dean Rasheed wrote: >> >> Thinking about this some more though, perhaps*sorting* the columns is >> the wrong way to be thinking about it. Perhaps a better approach would >> be to allow the columns to be*listed* (optionally, using a separate >> query). Something like the following (don't get too hung up on the >> syntax): >> >> SELECT name, >> to_char(date, 'Mon') AS month, >> sum(amount) AS amount >> FROM invoices >> GROUP BY 1,2 >> ORDER BY name >> \crosstabview cols = (select to_char(d, 'Mon') from >> generate_series('2000-01-01'::date, '2000-12-01', '1 month') d) > > > My concern with that is that often you don't know what the columns will be, > because you don't know what exact data the query will produce. So to use > this syntax you'd have to re-create a huge chunk of the original query. :( > Yeah, that's a reasonable concern. On the flip side, one of the advantages of the above syntax is that you have absolute control over the columns, whereas with the sort-based syntax you might find some columns missing (e.g., if there were no invoices in August) and that could lead to confusion parsing the results. I'm not totally opposed to specifying a column sort order in psql, and perhaps there's a way to support both 'cols' and 'col_order' options in psql, since there are different situations where one or the other might be more useful. What I am opposed to is specifying the row order in psql, because IMO that's something that should be done entirely in the SQL query. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [patch] Proposal for \crosstabview in psql
On 15 February 2016 at 14:08, Daniel Verite <dan...@manitou-mail.org> wrote: > Dean Rasheed wrote: > >> My biggest problem is with the sorting, for all the reasons discussed >> above. There is absolutely no reason for \crosstabview to be >> re-sorting rows -- they should just be left in the original query >> result order > > Normal top-down display: > > select v,to_char(d,'Mon') as m, c from v_data order by d; > > v | m | c > +-+- > v2 | Jan | bar > v1 | Apr | foo > v1 | Jul | baz > v0 | Jul | qux > > At this point, it seems to me that it's perfectly reasonable for our user > to expect the possibility of sorting additionally by "v" , without > changing the query and without changing the order of the horizontal > header: > > \crosstabview +v m c > > v | Jan | Apr | Jul > +-+-+- > v0 | | | qux > v1 | | foo | baz > v2 | bar | | > I don't find that example particularly compelling. If I want to sort the rows coming out of a query, my first thought is always going to be to add/adjust the query's ORDER BY clause, not use some weird +/- psql syntax. The crux of the problem here is that in a pivoted query resultset SQL can be used to control the order of the rows or the columns, but not both at the same time. IMO it is more natural to use SQL to control the order of the rows. The columns are the result of the psql pivoting, so it's reasonable to control them via psql options. A couple of other points to bear in mind: The number of columns is always going to be quite limited (at most 1600, and usually far less than that), whereas the number of rows could be arbitrarily large. So sorting the rows client-side in the way that you are could get very inefficient, whereas that's not such a problem for the columns. The column values are non-NULL, so they require a more limited set of sort options, whereas the rows could be anything, and people will want all the sort options to be available. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom function for converting human readable sizes to bytes
On 17 February 2016 at 00:39, Vitaly Burovoy <vitaly.buro...@gmail.com> wrote: > On 2/16/16, Vitaly Burovoy <vitaly.buro...@gmail.com> wrote: >> On 2/16/16, Dean Rasheed <dean.a.rash...@gmail.com> wrote: >>> Fixing that in parse_memory_unit() would be messy because it assumes a >>> base unit of kB, so it would require a negative multiplier, and >>> pg_size_bytes() would have to be taught to divide by the magnitude of >>> negative multipliers in the same way that guc.c does. > > Now parse_memory_unit returns -1024 for bytes as divider, constant > "bytes" has moved there. > Add new memory_units_bytes_hint which differs from an original > memory_units_int by "bytes" size unit. > Allow hintmsg be NULL and if so, skip setting dereferenced variable to > memory_units_bytes_hint. > I think that approach is getting more and more unwieldy, and it simply isn't worth the effort just to share a few values from the unit conversion table, especially given that the set of supported units differs between the two places. >>> ISTM that it would be far less code, and much simpler and more >>> readable to just parse the supported units directly in >>> pg_size_bytes(), rather than trying to share code with guc.c, when the >>> supported units are actually different and may well diverge further in >>> the future. >> I've gone with this approach and it is indeed far less code, and much simpler and easier to read. This will also make it easier to maintain/extend in the future. I've made a few minor tweaks to the docs, and added a note to make it clear that the units in these functions work in powers of 2 not 10. I also took the opportunity to tidy up the number scanning code somewhat (I was tempted to rip it out entirely, since it feels awfully close to duplicating the numeric code, but it's probably worth it for the better error message). Additionally, note that I replaced strcasecmp() with pg_strcasecmp(), since AIUI the former is not available on all supported platforms. Barring objections, and subject to some more testing, I intend to commit this version. Regards, Dean diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml new file mode 100644 index f9eea76..60f117a --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -17767,6 +17767,9 @@ postgres=# SELECT * FROM pg_xlogfile_nam pg_relation_size +pg_size_bytes + + pg_size_pretty @@ -17838,6 +17841,15 @@ postgres=# SELECT * FROM pg_xlogfile_nam +pg_size_bytes(text) + + bigint + + Converts a size in human-readable format with size units into bytes + + + + pg_size_pretty(bigint) text @@ -17968,11 +17980,27 @@ postgres=# SELECT * FROM pg_xlogfile_nam pg_size_pretty can be used to format the result of one of -the other functions in a human-readable way, using kB, MB, GB or TB as -appropriate. +the other functions in a human-readable way, using bytes, kB, MB, GB or TB +as appropriate. +pg_size_bytes can be used to get the size in bytes from a +string in human-readable format. The input may have units of bytes, kB, +MB, GB or TB, and is parsed case-insensitively. If no units are specified, +bytes are assumed. + + + + + The units kB, MB, GB and TB used by the functions + pg_size_pretty and pg_size_bytes are defined + using powers of 2 rather than powers of 10, so 1kB is 1024 bytes, 1MB is + 10242 = 1048576 bytes, and so on. + + + + The functions above that operate on tables or indexes accept a regclass argument, which is simply the OID of the table or index in the pg_class system catalog. You do not have to look up diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c new file mode 100644 index 2084692..91260cd --- a/src/backend/utils/adt/dbsize.c +++ b/src/backend/utils/adt/dbsize.c @@ -700,6 +700,145 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS) } /* + * Convert a human-readable size to a size in bytes + */ +Datum +pg_size_bytes(PG_FUNCTION_ARGS) +{ + text *arg = PG_GETARG_TEXT_PP(0); + char *str, + *strptr, + *endptr; + bool have_digits; + char saved_char; + Numeric num; + int64 result; + + str = text_to_cstring(arg); + + /* Skip leading whitespace */ + strptr = str; + while (isspace((unsigned char) *strptr)) + strptr++; + + /* Check that we have a valid number and determine where it ends */ + endptr = strptr; + + /* Part (1): sign */ + if (*endptr == '-' || *endptr == '+') + endptr++; + + /* Part (2): main digit string */ + if (isdigit((unsigned char) *endptr)) + { + have_digits = true; + do + endptr++; + while (isdigit((unsigned char) *endptr)); + } + else + have_digits = false; + + /*
Re: [HACKERS] custom function for converting human readable sizes to bytes
On 16 February 2016 at 05:01, Pavel Stehule <pavel.steh...@gmail.com> wrote: > 2016-02-15 10:16 GMT+01:00 Dean Rasheed <dean.a.rash...@gmail.com>: >> Is there any reason not to make >> pg_size_bytes() return numeric? >> > This is a question. I have not a strong opinion about it. There are no any > technical objection - the result will be +/- same. But you will enforce > Numeric into outer expression evaluation. > > The result will not be used together with function pg_size_pretty, but much > more with functions pg_relation_size, pg_relation_size, .. and these > functions doesn't return Numeric. These functions returns bigint. Bigint is > much more natural type for this purpose. > > Is there any use case for Numeric? > [Shrug] I don't really have a strong opinion about it either, but it seemed that maybe the function might have some wider uses beyond just comparing object sizes, and since it's already computing the numeric size, it might as well just return it. Looking at the rest of the patch, I think there are other things that need tidying up -- the unit parsing code for one. This is going to some effort to reuse the memory_unit_conversion_table from guc.c, but the result is that it has added quite a bit of new code and now the responsibility for parsing different units is handled by different functions in different files, which IMO is quite messy. Worse, the error message is wrong and misleading: select pg_size_bytes('10 bytes'); -- OK select pg_size_bytes('10 gallons'); ERROR: invalid size: "10 gallons" DETAIL: Invalid size unit "gallons" HINT: Valid units for this parameter are "kB", "MB", "GB", and "TB". which fails to mention that "bytes" is also a valid unit. Fixing that in parse_memory_unit() would be messy because it assumes a base unit of kB, so it would require a negative multiplier, and pg_size_bytes() would have to be taught to divide by the magnitude of negative multipliers in the same way that guc.c does. ISTM that it would be far less code, and much simpler and more readable to just parse the supported units directly in pg_size_bytes(), rather than trying to share code with guc.c, when the supported units are actually different and may well diverge further in the future. I'll try to hack something up, and see what it looks like. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom function for converting human readable sizes to bytes
> On 12/02/16 10:19, Dean Rasheed wrote: >> This seems like a reasonable first patch for me as a committer, so >> I'll take it unless anyone else was planning to do so. > So looking at this, it seems that for the most part pg_size_bytes() will parse any output produced by pg_size_pretty(). The exception is that there are 2 versions of pg_size_pretty(), one that takes bigint and one that takes numeric, whereas pg_size_bytes() returns bigint, so it can't handle all inputs. Is there any reason not to make pg_size_bytes() return numeric? It would still be compatible with the example use cases, but it would be a better inverse of both variants of pg_size_pretty() and would be more future-proof. It already works internally using numeric, so it's a trivial change to make now, but impossible to change in the future without introducing a new function with a different name, which is messy. Thoughts? Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] custom function for converting human readable sizes to bytes
On 12 February 2016 at 06:25, Pavel Stehulewrote: > Thank you for review and other work > This seems like a reasonable first patch for me as a committer, so I'll take it unless anyone else was planning to do so. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_parallel_degree context level
On 11 February 2016 at 13:18, Robert Haaswrote: > On Thu, Feb 11, 2016 at 7:40 AM, Thom Brown wrote: >> As it currently stands, max_parallel_degree is set to a superuser >> context > > I don't have a clue why it's like that. It seems like it should be > PGC_USERSSET just like, say, work_mem. I think that's just brain fade > on my part, and I think the current setting will be really > inconvenient for unprivileged users: as it is, they have no way to > turn parallel query off. Unless somebody objects, I'll go change > that. > +1. I would want it to be user settable. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers