Re: [HACKERS] WITH clause in CREATE STATISTICS
Alvaro Herrera writes: > Tom Lane wrote: >> Hm. That would behave less than desirably if getObjectClass() could >> return a value that wasn't a member of the enum, but it's hard to >> credit that happening. I guess I'd vote for removing the default: >> case from all of the places that have "switch (getObjectClass(object))", >> as forgetting to add an entry seems like a much larger hazard. > Ignoring the one in alter.c's AlterObjectNamespace_oid, which only > handles a small subset of the enum values, that gives the following > warnings: > /pgsql/source/master/src/backend/catalog/dependency.c: In function > 'doDeletion': > /pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: > enumeration value 'OCLASS_ROLE' not handled in switch [-Wswitch] > switch (getObjectClass(object)) > ^ > /pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: > enumeration value 'OCLASS_DATABASE' not handled in switch [-Wswitch] > /pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: > enumeration value 'OCLASS_TBLSPACE' not handled in switch [-Wswitch] > /pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: > enumeration value 'OCLASS_SUBSCRIPTION' not handled in switch [-Wswitch] Hm. I think it's intentional that shared objects are not handled there, but I wonder whether the lack of SUBSCRIPTION represents a bug. Anyway I think it would be fine to add those to the switch as explicit error cases. > /pgsql/source/master/src/backend/commands/tablecmds.c: In function > 'ATExecAlterColumnType': > /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: > enumeration value 'OCLASS_AM' not handled in switch [-Wswitch] >switch (getObjectClass(&foundObject)) >^ > /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: > enumeration value 'OCLASS_STATISTIC_EXT' not handled in switch [-Wswitch] > /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: > enumeration value 'OCLASS_EVENT_TRIGGER' not handled in switch [-Wswitch] > /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: > enumeration value 'OCLASS_PUBLICATION' not handled in switch [-Wswitch] > /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: > enumeration value 'OCLASS_PUBLICATION_REL' not handled in switch [-Wswitch] > /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: > enumeration value 'OCLASS_SUBSCRIPTION' not handled in switch [-Wswitch] > /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: > enumeration value 'OCLASS_TRANSFORM' not handled in switch [-Wswitch] All of those ought to go into the category that prints "unexpected object depending on column", I think; certainly "unrecognized object class" is not a better report, and the fact that we've evidently not touched this switch in the last few additions of object types provides fodder for the idea that the default: is unhelpful. (Not to mention that not having OCLASS_STATISTIC_EXT here is an outright bug as of today...) So losing the default: case here seems good to me. Alternatively, we could drop the explicit listing of oclasses we're not expecting to see, and let the default: case be "unexpected object depending on column". Treating the default separately is only of value if we'd expect getObjectDescription to fail, but there's no good reason for this code to be passing judgment on whether that might happen. What do we want this to do about statistics objects? We could either throw a feature-not-implemented error or try to update the stats object for the new column type. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> Oh, I see your patch also fixes missing code in getObjectDescription(). > >> We need that. Is there a decent way to get the compiler to warn about > >> that oversight? > > > We could remove the default clause. That results in the expected > > warning, and no others (i.e. the switch was already complete): > > > /pgsql/source/master/src/backend/catalog/objectaddress.c:2657:2: warning: > > enumeration value 'OCLASS_STATISTIC_EXT' not handled in switch [-Wswitch] > > Hm. That would behave less than desirably if getObjectClass() could > return a value that wasn't a member of the enum, but it's hard to > credit that happening. I guess I'd vote for removing the default: > case from all of the places that have "switch (getObjectClass(object))", > as forgetting to add an entry seems like a much larger hazard. Ignoring the one in alter.c's AlterObjectNamespace_oid, which only handles a small subset of the enum values, that gives the following warnings: /pgsql/source/master/src/backend/catalog/dependency.c: In function 'doDeletion': /pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: enumeration value 'OCLASS_ROLE' not handled in switch [-Wswitch] switch (getObjectClass(object)) ^ /pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: enumeration value 'OCLASS_DATABASE' not handled in switch [-Wswitch] /pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: enumeration value 'OCLASS_TBLSPACE' not handled in switch [-Wswitch] /pgsql/source/master/src/backend/catalog/dependency.c:1106:2: warning: enumeration value 'OCLASS_SUBSCRIPTION' not handled in switch [-Wswitch] /pgsql/source/master/src/backend/commands/tablecmds.c: In function 'ATExecAlterColumnType': /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: enumeration value 'OCLASS_AM' not handled in switch [-Wswitch] switch (getObjectClass(&foundObject)) ^ /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: enumeration value 'OCLASS_STATISTIC_EXT' not handled in switch [-Wswitch] /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: enumeration value 'OCLASS_EVENT_TRIGGER' not handled in switch [-Wswitch] /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: enumeration value 'OCLASS_PUBLICATION' not handled in switch [-Wswitch] /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: enumeration value 'OCLASS_PUBLICATION_REL' not handled in switch [-Wswitch] /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: enumeration value 'OCLASS_SUBSCRIPTION' not handled in switch [-Wswitch] /pgsql/source/master/src/backend/commands/tablecmds.c:9112:3: warning: enumeration value 'OCLASS_TRANSFORM' not handled in switch [-Wswitch] -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Alvaro Herrera writes: > Tom Lane wrote: >> Oh, I see your patch also fixes missing code in getObjectDescription(). >> We need that. Is there a decent way to get the compiler to warn about >> that oversight? > We could remove the default clause. That results in the expected > warning, and no others (i.e. the switch was already complete): > /pgsql/source/master/src/backend/catalog/objectaddress.c:2657:2: warning: > enumeration value 'OCLASS_STATISTIC_EXT' not handled in switch [-Wswitch] Hm. That would behave less than desirably if getObjectClass() could return a value that wasn't a member of the enum, but it's hard to credit that happening. I guess I'd vote for removing the default: case from all of the places that have "switch (getObjectClass(object))", as forgetting to add an entry seems like a much larger hazard. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> Oh, sorry --- I already pushed something about this. > > > That's fine, thanks. > > Oh, I see your patch also fixes missing code in getObjectDescription(). > We need that. Is there a decent way to get the compiler to warn about > that oversight? We could remove the default clause. That results in the expected warning, and no others (i.e. the switch was already complete): /pgsql/source/master/src/backend/catalog/objectaddress.c:2657:2: warning: enumeration value 'OCLASS_STATISTIC_EXT' not handled in switch [-Wswitch] -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Alvaro Herrera writes: > Tom Lane wrote: >> Oh, sorry --- I already pushed something about this. > That's fine, thanks. Oh, I see your patch also fixes missing code in getObjectDescription(). We need that. Is there a decent way to get the compiler to warn about that oversight? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> Although I've not done anything about it here, I'm not happy about the > >> handling of dependencies for stats objects. > > > Here are two patches regarding handling of dependencies. > > Oh, sorry --- I already pushed something about this. That's fine, thanks. I'm glad that this thread got you interested in look over the extended statistics stuff. Thanks for the input. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Alvaro Herrera writes: > Tom Lane wrote: >> Although I've not done anything about it here, I'm not happy about the >> handling of dependencies for stats objects. > Here are two patches regarding handling of dependencies. Oh, sorry --- I already pushed something about this. > The first one > implements your first suggestion: add a NORMAL dependency on each > column, and do away with RemoveStatisticsExt. This works well and > should uncontroversial. No, I wanted an AUTO dependency there, for exactly the reason you mention: > If we only do this, then DROP TABLE needs to say CASCADE if there's a > statistics object in the table. I don't think we really want that, do we? A stats object can't be of any value if the underlying table is gone. Perhaps that calculus would change for cross-table stats but I don't see why. > This seems pointless to me, so the > second patch throws in an additional dependency on the table as a whole, > AUTO this time, so that if the table is dropped, the statistics goes > away without requiring cascade. There is no point in forcing CASCADE > for this case, ISTM. This works well too, but I admit there might be > resistance to doing it. OTOH this is how CREATE INDEX works. Uh, no it isn't. Indexes have auto dependencies on the individual columns they index, and *not* on the table as a whole. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Tom Lane wrote: > Although I've not done anything about it here, I'm not happy about the > handling of dependencies for stats objects. I do not think that cloning > RemoveStatistics as RemoveStatisticsExt is sane at all. The former is > meant to deal with cleaning up pg_statistic rows that we know will be > there, so there's no real need to expend pg_depend overhead to track them. > For objects that are only loosely connected, the right thing is to use > the dependency system; in particular, this design has no real chance of > working well with cross-table stats. Also, it's really silly to have > *both* this hard-wired mechanism and a pg_depend entry; the latter is > surely redundant if you have the former. IMO we should revert > RemoveStatisticsExt and instead handle things by making stats objects > auto-dependent on the individual column(s) they reference (not the whole > table). > > I'm also of the opinion that having an AUTO dependency, rather than > a NORMAL dependency, on the stats object's schema is the wrong semantics. > There isn't any other case where you can drop a non-empty schema without > saying CASCADE, and I'm mystified why this case should act that way. Here are two patches regarding handling of dependencies. The first one implements your first suggestion: add a NORMAL dependency on each column, and do away with RemoveStatisticsExt. This works well and should uncontroversial. If we only do this, then DROP TABLE needs to say CASCADE if there's a statistics object in the table. This seems pointless to me, so the second patch throws in an additional dependency on the table as a whole, AUTO this time, so that if the table is dropped, the statistics goes away without requiring cascade. There is no point in forcing CASCADE for this case, ISTM. This works well too, but I admit there might be resistance to doing it. OTOH this is how CREATE INDEX works. (I considered what would happen with a stats object covering multiple tables. I think it's reasonable to drop the stats too in that case, under the rationale that the object is no longer useful. Not really sure about this.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 06fad759888d5060f9b4cf4d4f4fdec777350027 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Fri, 12 May 2017 17:16:26 -0300 Subject: [PATCH 1/2] fix dependencies problem --- src/backend/catalog/heap.c | 74 - src/backend/catalog/objectaddress.c | 20 + src/backend/commands/statscmds.c| 18 src/test/regress/expected/stats_ext.out | 13 -- src/test/regress/sql/stats_ext.sql | 9 ++-- 5 files changed, 45 insertions(+), 89 deletions(-) diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index ab3d83f29b..ba63939022 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -1615,10 +1615,7 @@ RemoveAttributeById(Oid relid, AttrNumber attnum) heap_close(attr_rel, RowExclusiveLock); if (attnum > 0) - { RemoveStatistics(relid, attnum); - RemoveStatisticsExt(relid, attnum); - } relation_close(rel, NoLock); } @@ -1873,7 +1870,6 @@ heap_drop_with_catalog(Oid relid) * delete statistics */ RemoveStatistics(relid, 0); - RemoveStatisticsExt(relid, 0); /* * delete attribute tuples @@ -2784,76 +2780,6 @@ RemoveStatistics(Oid relid, AttrNumber attnum) heap_close(pgstatistic, RowExclusiveLock); } - -/* - * RemoveStatisticsExt --- remove entries in pg_statistic_ext for a relation - * - * If attnum is zero, remove all entries for rel; else remove only the - * one(s) involving that column. - */ -void -RemoveStatisticsExt(Oid relid, AttrNumber attnum) -{ - Relationpgstatisticext; - SysScanDesc scan; - ScanKeyData key; - HeapTuple tuple; - - /* -* Scan pg_statistic_ext to delete relevant tuples -*/ - pgstatisticext = heap_open(StatisticExtRelationId, RowExclusiveLock); - - ScanKeyInit(&key, - Anum_pg_statistic_ext_stxrelid, - BTEqualStrategyNumber, F_OIDEQ, - ObjectIdGetDatum(relid)); - - scan = systable_beginscan(pgstatisticext, - StatisticExtRelidIndexId, - true, NULL, 1, &key); - - while (HeapTupleIsValid(tuple = systable_getnext(scan))) - { - booldelete = false; - - if (attnum == 0) - delete = true; - else if (attnum != 0) - { - Form_pg_statistic_ext staForm; - int i; - - /* -
Re: [HACKERS] WITH clause in CREATE STATISTICS
I wrote: > Tomas Vondra writes: >> On 5/12/17 4:46 AM, Tom Lane wrote: >>> Although I've not done anything about it here, I'm not happy about the >>> handling of dependencies for stats objects. >> Yeah, it's a bit frankensteinian ... > I'm prepared to create a fix for that, but it'd be easier to commit the > current patch first, to avoid merge conflicts. Done. I noted that we weren't making an owner dependency either :-(. Also, we might want to think about letting stats objects be extension members sometime. As things stand, if an extension script does CREATE STATISTICS, the stats object will just be "loose" rather than bound into the extension. We still have docs and nomenclature issues to work on. Anyone want to take point on those? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> I'm prepared to create a fix for that, but it'd be easier to commit the > >> current patch first, to avoid merge conflicts. > > > It seems we're mostly in agreement regarding the parts I was touching. > > Do you want to push your version of the patch? > > It's still almost all your work, so I figured you should push it. OK, I'll do that shortly then. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Alvaro Herrera writes: > Tom Lane wrote: >> I'm prepared to create a fix for that, but it'd be easier to commit the >> current patch first, to avoid merge conflicts. > It seems we're mostly in agreement regarding the parts I was touching. > Do you want to push your version of the patch? It's still almost all your work, so I figured you should push it. >> Hm. OK, but then that example is pretty misleading, because it leads >> the reader to suppose that the planner can tell the difference between >> the selectivities of the two queries. Maybe what's lacking is an >> explanation of how you'd use this statistics type. > I think we should remove the complex example from that page, and instead > refer the reader to chapters 14 and 69. Seems reasonable. If we did add enough material there to be a standalone description, it would be duplicative probably. However, that does put the onus on the other chapters to offer a complete definition, rather than leaving details to the man page as we do in many other cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Tom Lane wrote: > Tomas Vondra writes: > >> Although I've not done anything about it here, I'm not happy about the > >> handling of dependencies for stats objects. > > > Yeah, it's a bit frankensteinian ... > > I'm prepared to create a fix for that, but it'd be easier to commit the > current patch first, to avoid merge conflicts. It seems we're mostly in agreement regarding the parts I was touching. Do you want to push your version of the patch? > >> Lastly, I tried the example given in the CREATE STATISTICS reference page, > >> and it doesn't seem to work. > > > I assume you're talking about the functional dependencies and in that > > case that's expected behavior, because f. dependencies do assume the > > conditions are "consistent" with the functional dependencies. > > Hm. OK, but then that example is pretty misleading, because it leads > the reader to suppose that the planner can tell the difference between > the selectivities of the two queries. Maybe what's lacking is an > explanation of how you'd use this statistics type. I think we should remove the complex example from that page, and instead refer the reader to chapters 14 and 69. The CREATE STATISTICS page can just give some overview of what the syntax looks like, and all explanations of complex topics such as "what does it mean for a query to be consistent with the functional dependencies" can be given at length elsewhere. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Tomas Vondra writes: > On 5/12/17 4:46 AM, Tom Lane wrote: >> If people agree that that reads better, we should make an effort to >> propagate that terminology elsewhere in the docs, and into error messages, >> objectaddress.c output, etc. > I'm fine with the 'statistics object' wording. I've been struggling with > this bit while working on the patch, and I agree it sounds better and > getting it consistent is definitely worthwhile. OK. We can work on that as a followon patch. >> Although I've not done anything about it here, I'm not happy about the >> handling of dependencies for stats objects. > Yeah, it's a bit frankensteinian ... I'm prepared to create a fix for that, but it'd be easier to commit the current patch first, to avoid merge conflicts. >> Lastly, I tried the example given in the CREATE STATISTICS reference page, >> and it doesn't seem to work. > I assume you're talking about the functional dependencies and in that > case that's expected behavior, because f. dependencies do assume the > conditions are "consistent" with the functional dependencies. Hm. OK, but then that example is pretty misleading, because it leads the reader to suppose that the planner can tell the difference between the selectivities of the two queries. Maybe what's lacking is an explanation of how you'd use this statistics type. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
On 5/12/17 4:46 AM, Tom Lane wrote: Alvaro Herrera writes: Hmm, yeah, makes sense. Here's a patch for this approach. ... Also, while reading the docs changes, I got rather ill from the inconsistent and not very grammatical handling of "statistics" as a noun, particularly the inability to decide from one sentence to the next if that is singular or plural. In the attached, I fixed this in the ref/*_statistics.sgml files by always saying "statistics object" instead. If people agree that that reads better, we should make an effort to propagate that terminology elsewhere in the docs, and into error messages, objectaddress.c output, etc. I'm fine with the 'statistics object' wording. I've been struggling with this bit while working on the patch, and I agree it sounds better and getting it consistent is definitely worthwhile. > Although I've not done anything about it here, I'm not happy about the handling of dependencies for stats objects. I do not think that cloning RemoveStatistics as RemoveStatisticsExt is sane at all. The former is meant to deal with cleaning up pg_statistic rows that we know will be there, so there's no real need to expend pg_depend overhead to track them. For objects that are only loosely connected, the right thing is to use the dependency system; in particular, this design has no real chance of working well with cross-table stats. Also, it's really silly to have *both* this hard-wired mechanism and a pg_depend entry; the latter is surely redundant if you have the former. IMO we should revert RemoveStatisticsExt and instead handle things by making stats objects auto-dependent on the individual column(s) they reference (not the whole table). I'm also of the opinion that having an AUTO dependency, rather than a NORMAL dependency, on the stats object's schema is the wrong semantics. There isn't any other case where you can drop a non-empty schema without saying CASCADE, and I'm mystified why this case should act that way. Yeah, it's a bit frankensteinian ... > Lastly, I tried the example given in the CREATE STATISTICS reference page, and it doesn't seem to work. Without the stats object, the two queries are both estimated at one matching row, whereas they really produce 100 and 0 rows respectively. With the stats object, they seem to both get estimated at ~100 rows, which is a considerable improvement for one case but a considerable disimprovement for the other. If this is not broken, I'd like to know why not. What's the point of an expensive extended- stats mechanism if it can't get this right? I assume you're talking about the functional dependencies and in that case that's expected behavior, because f. dependencies do assume the conditions are "consistent" with the functional dependencies. This is an inherent limitation of functional dependencies, and perhaps a price for the simplicity of this statistics type. If your application executes queries that are likely not consistent with this, don't use functional dependencies. The simplicity is why dependencies were implemented first, mostly to introduce all the infrastructure. The other statistics types (MCV, histograms) don't have this limitation, but those did not make it into pg10. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Alvaro Herrera writes: > Hmm, yeah, makes sense. Here's a patch for this approach. I did some code review on this patch. The attached update makes the following hopefully-uncontroversial changes: * fix inconsistencies in the set of fields in CreateStatsStmt * get rid of struct CreateStatsArgument, which seems to be a leftover * tighten up parse checks in CreateStatistics(), and improve comments * add missing check for ownership of target table; the documentation says this is checked, and surely it is a security bug that it's not * adjust regression test to compensate for the above, because it fails otherwise * change end of regression test to suppress cascade drop messages; it's astonishing that this test hasn't caused intermittent buildfarm failures due to unstable drop order I did not review the rest of the regression test changes, nor the tab-complete changes. I can however report that DROP STATISTICS doesn't successfully complete any stats object names. Also, while reading the docs changes, I got rather ill from the inconsistent and not very grammatical handling of "statistics" as a noun, particularly the inability to decide from one sentence to the next if that is singular or plural. In the attached, I fixed this in the ref/*_statistics.sgml files by always saying "statistics object" instead. If people agree that that reads better, we should make an effort to propagate that terminology elsewhere in the docs, and into error messages, objectaddress.c output, etc. Although I've not done anything about it here, I'm not happy about the handling of dependencies for stats objects. I do not think that cloning RemoveStatistics as RemoveStatisticsExt is sane at all. The former is meant to deal with cleaning up pg_statistic rows that we know will be there, so there's no real need to expend pg_depend overhead to track them. For objects that are only loosely connected, the right thing is to use the dependency system; in particular, this design has no real chance of working well with cross-table stats. Also, it's really silly to have *both* this hard-wired mechanism and a pg_depend entry; the latter is surely redundant if you have the former. IMO we should revert RemoveStatisticsExt and instead handle things by making stats objects auto-dependent on the individual column(s) they reference (not the whole table). I'm also of the opinion that having an AUTO dependency, rather than a NORMAL dependency, on the stats object's schema is the wrong semantics. There isn't any other case where you can drop a non-empty schema without saying CASCADE, and I'm mystified why this case should act that way. Lastly, I tried the example given in the CREATE STATISTICS reference page, and it doesn't seem to work. Without the stats object, the two queries are both estimated at one matching row, whereas they really produce 100 and 0 rows respectively. With the stats object, they seem to both get estimated at ~100 rows, which is a considerable improvement for one case but a considerable disimprovement for the other. If this is not broken, I'd like to know why not. What's the point of an expensive extended- stats mechanism if it can't get this right? regards, tom lane diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index b10b734..32e17ee 100644 *** a/doc/src/sgml/perform.sgml --- b/doc/src/sgml/perform.sgml *** WHERE tablename = 'road'; *** 1132,1139 To inspect functional dependencies on a statistics stts, you may do this: ! CREATE STATISTICS stts WITH (dependencies) !ON (zip, city) FROM zipcodes; ANALYZE zipcodes; SELECT stxname, stxkeys, stxdependencies FROM pg_statistic_ext --- 1132,1139 To inspect functional dependencies on a statistics stts, you may do this: ! CREATE STATISTICS stts (dependencies) !ON zip, city FROM zipcodes; ANALYZE zipcodes; SELECT stxname, stxkeys, stxdependencies FROM pg_statistic_ext *** EXPLAIN (ANALYZE, TIMING OFF) SELECT * F *** 1219,1226 Continuing the above example, the n-distinct coefficients in a ZIP code table may look like the following: ! CREATE STATISTICS stts2 WITH (ndistinct) !ON (zip, state, city) FROM zipcodes; ANALYZE zipcodes; SELECT stxkeys AS k, stxndistinct AS nd FROM pg_statistic_ext --- 1219,1226 Continuing the above example, the n-distinct coefficients in a ZIP code table may look like the following: ! CREATE STATISTICS stts2 (ndistinct) !ON zip, state, city FROM zipcodes; ANALYZE zipcodes; SELECT stxkeys AS k, stxndistinct AS nd FROM pg_statistic_ext diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml index 11580bf..ef847b9 100644 *** a/doc/src/sgml/planstats.sgml --- b/doc/src/sgml/planstats.sgml *** EXPLAIN (ANALYZE, TIMING OFF) SELECT * F *** 526,532 multivariate s
Re: [HACKERS] WITH clause in CREATE STATISTICS
Alvaro Herrera writes: > BTW the new castNode() family of macros don't work with Value nodes > (because the tags are different depending on what's stored, but each > type does not have its own struct. Oh well.) Yeah. Value nodes are pretty much of a wart --- it's not clear to me that they add anything compared to just having a separate node type for each of the possible subclasses. It's never bothered anyone enough to try to replace them, though. I'll take a look at the patch later. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Tom Lane wrote: > Alvaro Herrera writes: > > Tom Lane wrote: > >> Have you thought further about the upthread suggestion to just borrow > >> SELECT's syntax lock stock and barrel? > > > Bison seems to like the productions below. Is this what you had in > > mind? These mostly mimic joined_table and table_ref, stripping out the > > rules that we don't need. > > I'd suggest just using the from_list production and then complaining > at runtime if what you get is too complicated. Otherwise, you have > to maintain a duplicate set of productions, and you're going to be > unable to throw anything more informative than "syntax error" when > somebody tries to exceed the implementation limits. Hmm, yeah, makes sense. Here's a patch for this approach. I ended up using on ON again for the list of columns. I suppose the checks in CreateStatistics() could still be improved, but I'd go ahead and push this tomorrow morning and we can hammer those details later on, if it's still needed. Better avoid shipping beta with outdated grammar ... BTW the new castNode() family of macros don't work with Value nodes (because the tags are different depending on what's stored, but each type does not have its own struct. Oh well.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index b10b734b90..32e17ee5f8 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -1132,8 +1132,8 @@ WHERE tablename = 'road'; To inspect functional dependencies on a statistics stts, you may do this: -CREATE STATISTICS stts WITH (dependencies) - ON (zip, city) FROM zipcodes; +CREATE STATISTICS stts (dependencies) + ON zip, city FROM zipcodes; ANALYZE zipcodes; SELECT stxname, stxkeys, stxdependencies FROM pg_statistic_ext @@ -1219,8 +1219,8 @@ EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 10; Continuing the above example, the n-distinct coefficients in a ZIP code table may look like the following: -CREATE STATISTICS stts2 WITH (ndistinct) - ON (zip, state, city) FROM zipcodes; +CREATE STATISTICS stts2 (ndistinct) + ON zip, state, city FROM zipcodes; ANALYZE zipcodes; SELECT stxkeys AS k, stxndistinct AS nd FROM pg_statistic_ext diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml index 11580bfd22..ef847b9633 100644 --- a/doc/src/sgml/planstats.sgml +++ b/doc/src/sgml/planstats.sgml @@ -526,7 +526,7 @@ EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1; multivariate statistics on the two columns: -CREATE STATISTICS stts WITH (dependencies) ON (a, b) FROM t; +CREATE STATISTICS stts (dependencies) ON a, b FROM t; ANALYZE t; EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1; QUERY PLAN @@ -569,7 +569,7 @@ EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY a, b; calculation, the estimate is much improved: DROP STATISTICS stts; -CREATE STATISTICS stts WITH (dependencies, ndistinct) ON (a, b) FROM t; +CREATE STATISTICS stts (dependencies, ndistinct) ON a, b FROM t; ANALYZE t; EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY a, b; QUERY PLAN diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml index edbcf5840b..84ff52df04 100644 --- a/doc/src/sgml/ref/create_statistics.sgml +++ b/doc/src/sgml/ref/create_statistics.sgml @@ -22,8 +22,8 @@ PostgreSQL documentation CREATE STATISTICS [ IF NOT EXISTS ] statistics_name -WITH ( option [= value] [, ... ] ) -ON ( column_name, column_name [, ...]) +[ ( statistic_type [, ... ] ) ] +ON column_name, column_name [, ...] FROM table_name @@ -75,6 +75,19 @@ CREATE STATISTICS [ IF NOT EXISTS ] statistics_na +statistic_type + + + A statistic type to be enabled for this statistics. Currently + supported types are ndistinct, which enables + n-distinct coefficient tracking, + and dependencies, which enables functional + dependencies. + + + + + column_name @@ -94,42 +107,6 @@ CREATE STATISTICS [ IF NOT EXISTS ] statistics_na - - - Parameters - - - statistics parameters - - - -The WITH clause can specify options -for the statistics. Available options are listed below. - - - - - -dependencies (boolean) - - - Enables functional dependencies for the statistics. - - - - - -ndistinct (boolean) - - - Enables ndistinct coefficients for the statistics. - - - - - - - @@ -158,7 +135,7 @@ CREATE TABLE t1 ( INSERT INTO t1 SELECT i/100, i/500
Re: [HACKERS] WITH clause in CREATE STATISTICS
Alvaro Herrera writes: > Tom Lane wrote: >> Have you thought further about the upthread suggestion to just borrow >> SELECT's syntax lock stock and barrel? > Bison seems to like the productions below. Is this what you had in > mind? These mostly mimic joined_table and table_ref, stripping out the > rules that we don't need. I'd suggest just using the from_list production and then complaining at runtime if what you get is too complicated. Otherwise, you have to maintain a duplicate set of productions, and you're going to be unable to throw anything more informative than "syntax error" when somebody tries to exceed the implementation limits. > Note that I had to put back the FOR keyword in there; without the FOR, > that is "CREATE STATS any_name opt_name_list expr_list FROM" there was > one shift/reduce conflict, which I tried to solve by splitting > opt_name_list using two rules (one with "'(' name_list ')'" and one > without), but that gave rise to two reduce/reduce conflicts, and I didn't > see any way to deal with them. That's fine. I was going to suggest using ON after the stats type list just because it seemed to read better. FOR is about as good. > (Note that I ended up realizing that stats_type_list is unnecessary > since we can use name_list.) Check. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Tom Lane wrote: > Have you thought further about the upthread suggestion to just borrow > SELECT's syntax lock stock and barrel? That is, it'd look something > like > > CREATE STATISTICS name [(list of stats types)] expression-list FROM ... > [ WHERE ... ] [ WITH (options) ] > > This would certainly support any FROM stuff we might want later. > Yeah, you'd need to reject any features you weren't supporting at > execution, but doing that would allow issuing a friendlier message than > "syntax error" anyway. > > If you don't have the cycles for that, I'd be willing to look into it. Bison seems to like the productions below. Is this what you had in mind? These mostly mimic joined_table and table_ref, stripping out the rules that we don't need. Note that I had to put back the FOR keyword in there; without the FOR, that is "CREATE STATS any_name opt_name_list expr_list FROM" there was one shift/reduce conflict, which I tried to solve by splitting opt_name_list using two rules (one with "'(' name_list ')'" and one without), but that gave rise to two reduce/reduce conflicts, and I didn't see any way to deal with them. (Note that I ended up realizing that stats_type_list is unnecessary since we can use name_list.) CreateStatsStmt: CREATE opt_if_not_exists STATISTICS any_name opt_name_list FOR expr_list FROM stats_joined_table { CreateStatsStmt *n = makeNode(CreateStatsStmt); n->defnames = $4; n->stat_types = $6; n->if_not_exists = $2; stmt->relation = $9; stmt->keys = $7; $$ = (Node *)n; } ; stats_joined_table: '(' stats_joined_table ')' { } | stats_table_ref CROSS JOIN stats_table_ref { } | stats_table_ref join_type JOIN stats_table_ref join_qual { } | stats_table_ref JOIN stats_table_ref join_qual { } | stats_table_ref NATURAL join_type JOIN stats_table_ref { } | stats_table_ref NATURAL JOIN table_ref { } ; stats_table_ref: relation_expr opt_alias_clause { } | stats_joined_table { } | '(' stats_joined_table ')' alias_clause { } ; -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Tom Lane wrote: > Have you thought further about the upthread suggestion to just borrow > SELECT's syntax lock stock and barrel? That is, it'd look something > like > > CREATE STATISTICS name [(list of stats types)] expression-list FROM ... > [ WHERE ... ] [ WITH (options) ] > > This would certainly support any FROM stuff we might want later. > Yeah, you'd need to reject any features you weren't supporting at > execution, but doing that would allow issuing a friendlier message than > "syntax error" anyway. Hmm, no, I hadn't looked at this angle. > If you don't have the cycles for that, I'd be willing to look into it. I'll give it a try today, and pass the baton if I'm unable to. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Alvaro Herrera writes: > Tom Lane wrote: >> Hmm ... I'm not sure that I buy that particular argument. If you're >> concerned that the grammar could not handle "FROM x JOIN y USING (z)", >> wouldn't it also have a problem with "FROM x JOIN y ON (z)"? > Tomas spent some time trying to shoehorn the whole join syntax into the > FROM clause, but stopped once he realized that the joined_table > production uses table_ref, which allow things like TABLESAMPLE, SRFs, > LATERAL, etc which presumably we don't want to accept in CREATE STATS. > I didn't look into it any further. But because of the other > considerations, I did end up changing the ON to FOR. Have you thought further about the upthread suggestion to just borrow SELECT's syntax lock stock and barrel? That is, it'd look something like CREATE STATISTICS name [(list of stats types)] expression-list FROM ... [ WHERE ... ] [ WITH (options) ] This would certainly support any FROM stuff we might want later. Yeah, you'd need to reject any features you weren't supporting at execution, but doing that would allow issuing a friendlier message than "syntax error" anyway. If you don't have the cycles for that, I'd be willing to look into it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Tom Lane wrote: > Hmm ... I'm not sure that I buy that particular argument. If you're > concerned that the grammar could not handle "FROM x JOIN y USING (z)", > wouldn't it also have a problem with "FROM x JOIN y ON (z)"? > > It might work anyway, since the grammar should know whether ON or USING > is needed to complete the JOIN clause. But I think you'd better check > whether the complete join syntax works there, even if we're not going > to support it now. Tomas spent some time trying to shoehorn the whole join syntax into the FROM clause, but stopped once he realized that the joined_table production uses table_ref, which allow things like TABLESAMPLE, SRFs, LATERAL, etc which presumably we don't want to accept in CREATE STATS. I didn't look into it any further. But because of the other considerations, I did end up changing the ON to FOR. So the attached is the final version which I intend to push shortly. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services *** a/doc/src/sgml/perform.sgml --- b/doc/src/sgml/perform.sgml *** *** 1132,1138 WHERE tablename = 'road'; To inspect functional dependencies on a statistics stts, you may do this: ! CREATE STATISTICS stts WITH (dependencies) ON (zip, city) FROM zipcodes; ANALYZE zipcodes; SELECT stxname, stxkeys, stxdependencies --- 1132,1138 To inspect functional dependencies on a statistics stts, you may do this: ! CREATE STATISTICS stts (dependencies) ON (zip, city) FROM zipcodes; ANALYZE zipcodes; SELECT stxname, stxkeys, stxdependencies *** *** 1219,1225 EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 10; Continuing the above example, the n-distinct coefficients in a ZIP code table may look like the following: ! CREATE STATISTICS stts2 WITH (ndistinct) ON (zip, state, city) FROM zipcodes; ANALYZE zipcodes; SELECT stxkeys AS k, stxndistinct AS nd --- 1219,1225 Continuing the above example, the n-distinct coefficients in a ZIP code table may look like the following: ! CREATE STATISTICS stts2 (ndistinct) ON (zip, state, city) FROM zipcodes; ANALYZE zipcodes; SELECT stxkeys AS k, stxndistinct AS nd *** a/doc/src/sgml/planstats.sgml --- b/doc/src/sgml/planstats.sgml *** *** 526,532 EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1; multivariate statistics on the two columns: ! CREATE STATISTICS stts WITH (dependencies) ON (a, b) FROM t; ANALYZE t; EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1; QUERY PLAN --- 526,532 multivariate statistics on the two columns: ! CREATE STATISTICS stts (dependencies) ON (a, b) FROM t; ANALYZE t; EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1; QUERY PLAN *** *** 569,575 EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY a, b; calculation, the estimate is much improved: DROP STATISTICS stts; ! CREATE STATISTICS stts WITH (dependencies, ndistinct) ON (a, b) FROM t; ANALYZE t; EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY a, b; QUERY PLAN --- 569,575 calculation, the estimate is much improved: DROP STATISTICS stts; ! CREATE STATISTICS stts (dependencies, ndistinct) ON (a, b) FROM t; ANALYZE t; EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY a, b; QUERY PLAN *** a/doc/src/sgml/ref/create_statistics.sgml --- b/doc/src/sgml/ref/create_statistics.sgml *** *** 22,29 PostgreSQL documentation CREATE STATISTICS [ IF NOT EXISTS ] statistics_name ! WITH ( option [= value] [, ... ] ) ! ON ( column_name, column_name [, ...]) FROM table_name --- 22,29 CREATE STATISTICS [ IF NOT EXISTS ] statistics_name ! [ ( statistic_type [, ... ] ) ] ! FOR ( column_name, column_name [, ...]) FROM table_name *** *** 75,80 CREATE STATISTICS [ IF NOT EXISTS ] statistics_na --- 75,93 + statistic_type + + + A statistic type to be enabled for this statistics. Currently + supported types are ndistinct, which enables + n-distinct coefficient tracking, + and dependencies, which enables functional + dependencies. + + + + + column_name *** *** 94,135 CREATE STATISTICS [ IF NOT EXISTS ] statistics_na - - -Parameters - - - st
Re: [HACKERS] WITH clause in CREATE STATISTICS
On 04.05.2017 23:13, Tom Lane wrote: I'm not against what you've done here, because I had no love for USING in this context anyway; it conveys approximately nothing to the mind about what is in the list it's introducing. But I'm concerned whether we're boxing ourselves in by using ON. Actually, "ON" doesn't seem all that mnemonic either. Maybe "FOR" would be a good substitute, if it turns out that "ON" has a problem? The whole syntax reminds me of a regular SELECT clause. So, SELECT? Also considering the most generic form of statistic support mentioned in [1], one could even thing about allowing aggregates, windowing functions etc, aka the full SELECT clause in the future. Sven [1] https://www.postgresql.org/message-id/CAEZATCUtGR+U5+QTwjHhe9rLG2nguEysHQ5NaqcK=vbj78v...@mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Alvaro Herrera writes: > Here's a patch implementing this idea. From gram.y's comment, the > support syntax is now: > > /* >* >*QUERY : > ! *CREATE STATISTICS stats_name [(stat types)] arguments > ! > ! *where 'arguments' can be one or more of: > ! *{ ON (columns) > ! * | FROM relations > ! * | WITH (options) > ! * | WHERE expression } > Note that I removed the USING keyword in the stat types list, and also > made it mandatory that that list appears immediately after the new stats > name. This should make it possible to have USING in the relation list > (the FROM clause), if we allow explicit multiple relations with join > syntax there. The other options can appear in any order. Hmm ... I'm not sure that I buy that particular argument. If you're concerned that the grammar could not handle "FROM x JOIN y USING (z)", wouldn't it also have a problem with "FROM x JOIN y ON (z)"? It might work anyway, since the grammar should know whether ON or USING is needed to complete the JOIN clause. But I think you'd better check whether the complete join syntax works there, even if we're not going to support it now. I'm not against what you've done here, because I had no love for USING in this context anyway; it conveys approximately nothing to the mind about what is in the list it's introducing. But I'm concerned whether we're boxing ourselves in by using ON. Actually, "ON" doesn't seem all that mnemonic either. Maybe "FOR" would be a good substitute, if it turns out that "ON" has a problem? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Here's a patch implementing this idea. From gram.y's comment, the support syntax is now: /* * *QUERY : ! *CREATE STATISTICS stats_name [(stat types)] arguments ! ! *where 'arguments' can be one or more of: ! *{ ON (columns) ! * | FROM relations ! * | WITH (options) ! * | WHERE expression } Note that I removed the USING keyword in the stat types list, and also made it mandatory that that list appears immediately after the new stats name. This should make it possible to have USING in the relation list (the FROM clause), if we allow explicit multiple relations with join syntax there. The other options can appear in any order. Also, both WITH and WHERE are accepted by the grammar, but immediately throw "feature not implemented" error at parse time. I was on the fence about adding copy/equal/out support for the new StatisticArgument node; it seems pointless because that node does not leave gram.y anyway. Unless there are objections, I'll push this tomorrow. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services *** a/doc/src/sgml/perform.sgml --- b/doc/src/sgml/perform.sgml *** *** 1132,1138 WHERE tablename = 'road'; To inspect functional dependencies on a statistics stts, you may do this: ! CREATE STATISTICS stts WITH (dependencies) ON (zip, city) FROM zipcodes; ANALYZE zipcodes; SELECT stxname, stxkeys, stxdependencies --- 1132,1138 To inspect functional dependencies on a statistics stts, you may do this: ! CREATE STATISTICS stts (dependencies) ON (zip, city) FROM zipcodes; ANALYZE zipcodes; SELECT stxname, stxkeys, stxdependencies *** *** 1219,1225 EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 10; Continuing the above example, the n-distinct coefficients in a ZIP code table may look like the following: ! CREATE STATISTICS stts2 WITH (ndistinct) ON (zip, state, city) FROM zipcodes; ANALYZE zipcodes; SELECT stxkeys AS k, stxndistinct AS nd --- 1219,1225 Continuing the above example, the n-distinct coefficients in a ZIP code table may look like the following: ! CREATE STATISTICS stts2 (ndistinct) ON (zip, state, city) FROM zipcodes; ANALYZE zipcodes; SELECT stxkeys AS k, stxndistinct AS nd *** a/doc/src/sgml/planstats.sgml --- b/doc/src/sgml/planstats.sgml *** *** 526,532 EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1; multivariate statistics on the two columns: ! CREATE STATISTICS stts WITH (dependencies) ON (a, b) FROM t; ANALYZE t; EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1; QUERY PLAN --- 526,532 multivariate statistics on the two columns: ! CREATE STATISTICS stts (dependencies) ON (a, b) FROM t; ANALYZE t; EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1; QUERY PLAN *** *** 569,575 EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY a, b; calculation, the estimate is much improved: DROP STATISTICS stts; ! CREATE STATISTICS stts WITH (dependencies, ndistinct) ON (a, b) FROM t; ANALYZE t; EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY a, b; QUERY PLAN --- 569,575 calculation, the estimate is much improved: DROP STATISTICS stts; ! CREATE STATISTICS stts (dependencies, ndistinct) ON (a, b) FROM t; ANALYZE t; EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY a, b; QUERY PLAN *** a/doc/src/sgml/ref/create_statistics.sgml --- b/doc/src/sgml/ref/create_statistics.sgml *** *** 22,28 PostgreSQL documentation CREATE STATISTICS [ IF NOT EXISTS ] statistics_name ! WITH ( option [= value] [, ... ] ) ON ( column_name, column_name [, ...]) FROM table_name --- 22,28 CREATE STATISTICS [ IF NOT EXISTS ] statistics_name ! [ ( statistic_type [, ... ] ) ] ON ( column_name, column_name [, ...]) FROM table_name *** *** 75,80 CREATE STATISTICS [ IF NOT EXISTS ] statistics_na --- 75,93 + statistic_type + + + A statistic type to be enabled for this statistics. Currently + supported types are ndistinct, which enables + n-distinct coefficient tracking, + and dependen
Re: [HACKERS] WITH clause in CREATE STATISTICS
On 3 May 2017 at 23:31, Andrew Dunstan wrote: >> It also seems like we don't need to have *both* fully-reserved keywords >> introducing each clause *and* parentheses around the lists. Maybe >> dropping the parens around the stats-types list and the column-names >> list would help to declutter? (But I'd keep parens around the WITH >> options, for consistency with other statements.) +1 >> One other point is that as long as we've got reserved keywords introducing >> each clause, there isn't actually an implementation reason why we couldn't >> accept the clauses in any order. Not sure I want to document it that way, >> but it might not be a bad thing if the grammar was forgiving about whether >> you write the USING or ON part first ... > > +1 for allowing arbitrary order of clauses. +1 > I would document it with the > USING clause at the end, and have that be what psql supports and pg_dump > produces. Since there are no WITH options now we should leave that out > until it's required. Let's record the target syntax in parser comments so we can just slot things in when needed later, without rediscussion. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
On 5/3/17 11:36 PM, Alvaro Herrera wrote: Andrew Dunstan wrote: On 05/03/2017 04:42 PM, Tom Lane wrote: One other point is that as long as we've got reserved keywords introducing each clause, there isn't actually an implementation reason why we couldn't accept the clauses in any order. Not sure I want to document it that way, but it might not be a bad thing if the grammar was forgiving about whether you write the USING or ON part first ... +1 for allowing arbitrary order of clauses. I would document it with the USING clause at the end, and have that be what psql supports and pg_dump produces. Since there are no WITH options now we should leave that out until it's required. Ok, sounds good to me. Unless there are objections I'm going to have a shot at implementing this. Thanks for the discussion. Works for me. Do you also plan to remove the parentheses for the USING clause? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Andrew Dunstan wrote: > On 05/03/2017 04:42 PM, Tom Lane wrote: > > One other point is that as long as we've got reserved keywords introducing > > each clause, there isn't actually an implementation reason why we couldn't > > accept the clauses in any order. Not sure I want to document it that way, > > but it might not be a bad thing if the grammar was forgiving about whether > > you write the USING or ON part first ... > > +1 for allowing arbitrary order of clauses. I would document it with the > USING clause at the end, and have that be what psql supports and pg_dump > produces. Since there are no WITH options now we should leave that out > until it's required. Ok, sounds good to me. Unless there are objections I'm going to have a shot at implementing this. Thanks for the discussion. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
On 05/03/2017 04:42 PM, Tom Lane wrote: > Alvaro Herrera writes: >> Yawn. So, we have not achieved the stated goal which was to get rid of >> the ugly clause in the middle of the command; moreover we have gained >> *yet another* clause in the middle of the command. Is this really an >> improvement? We're trading this >> CREATE STATISTICS s1 WITH (dependencies, ndistinct, options) ON (a, b) >> FROM t1 WHERE partial-stuff; >> for this: >> CREATE STATISTICS s1 USING (dependencies, ndistinct) WITH (options) ON >> (a, b) FROM t1 WHERE partial-stuff; >> Can we decide by a show of hands, please, whether we're really on board >> with this change? > That seems totally messy :-( > > I can't see any good reason why "WITH (options)" can't be at the end of > the query. WITH is a fully reserved word, there is not going to be any > danger of a parse problem from having it follow the FROM or WHERE clauses. > And the end is where we have other instances of "WITH (options)". > > It also seems like we don't need to have *both* fully-reserved keywords > introducing each clause *and* parentheses around the lists. Maybe > dropping the parens around the stats-types list and the column-names > list would help to declutter? (But I'd keep parens around the WITH > options, for consistency with other statements.) > > One other point is that as long as we've got reserved keywords introducing > each clause, there isn't actually an implementation reason why we couldn't > accept the clauses in any order. Not sure I want to document it that way, > but it might not be a bad thing if the grammar was forgiving about whether > you write the USING or ON part first ... > > +1 for allowing arbitrary order of clauses. I would document it with the USING clause at the end, and have that be what psql supports and pg_dump produces. Since there are no WITH options now we should leave that out until it's required. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Alvaro Herrera writes: > Yawn. So, we have not achieved the stated goal which was to get rid of > the ugly clause in the middle of the command; moreover we have gained > *yet another* clause in the middle of the command. Is this really an > improvement? We're trading this > CREATE STATISTICS s1 WITH (dependencies, ndistinct, options) ON (a, b) > FROM t1 WHERE partial-stuff; > for this: > CREATE STATISTICS s1 USING (dependencies, ndistinct) WITH (options) ON > (a, b) FROM t1 WHERE partial-stuff; > Can we decide by a show of hands, please, whether we're really on board > with this change? That seems totally messy :-( I can't see any good reason why "WITH (options)" can't be at the end of the query. WITH is a fully reserved word, there is not going to be any danger of a parse problem from having it follow the FROM or WHERE clauses. And the end is where we have other instances of "WITH (options)". It also seems like we don't need to have *both* fully-reserved keywords introducing each clause *and* parentheses around the lists. Maybe dropping the parens around the stats-types list and the column-names list would help to declutter? (But I'd keep parens around the WITH options, for consistency with other statements.) One other point is that as long as we've got reserved keywords introducing each clause, there isn't actually an implementation reason why we couldn't accept the clauses in any order. Not sure I want to document it that way, but it might not be a bad thing if the grammar was forgiving about whether you write the USING or ON part first ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Stephen Frost wrote: > > Here I add one, which seems to work for me. Pushed it -- I also added another one which specifies options, to test WITH handling in ruleutils. > > Considering that Stephen missed a terminating semicolon for test with > > create_order 96 (the last one prior to my commit) in commit > > 31a8b77a9244, I propose that we change whatever is concatenating those > > strings append a terminating semicolon. (Surely we don't care about two > > tests stanzas writing a single SQL command by omitting the semicolon > > terminator.) > > Whoops, sorry about that. Yes, we could pretty easily add that. The > create SQL is built up at the bottom of 002_pg_dump.pl: > > $create_sql .= $tests{$test}->{create_sql}; Okay, I added it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Alvaro Herrera wrote: > Simon Riggs wrote: > > > 2. > > USING keyword, no brackets > > CREATE STATISTICS s1 USING (dependencies, ndistinct) ON (a, b) FROM t1 > > WHERE partial-stuff; > > > > and if there are options, use the WITH for the optional parameters like this > > CREATE STATISTICS s1 USING (dependencies, ndistinct) WITH (options) ON > > (a, b) FROM t1 WHERE partial-stuff; > > > > > > I think I like (2) > > OK, sounds sensible. Yawn. So, we have not achieved the stated goal which was to get rid of the ugly clause in the middle of the command; moreover we have gained *yet another* clause in the middle of the command. Is this really an improvement? We're trading this CREATE STATISTICS s1 WITH (dependencies, ndistinct, options) ON (a, b) FROM t1 WHERE partial-stuff; for this: CREATE STATISTICS s1 USING (dependencies, ndistinct) WITH (options) ON (a, b) FROM t1 WHERE partial-stuff; Can we decide by a show of hands, please, whether we're really on board with this change? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Alvaro, * Alvaro Herrera (alvhe...@2ndquadrant.com) wrote: > Alvaro Herrera wrote: > > > In the meantime, I noticed that pg_dump support for extstats is not > > covered, which I'll go fix next. > > Here I add one, which seems to work for me. > > Considering that Stephen missed a terminating semicolon for test with > create_order 96 (the last one prior to my commit) in commit > 31a8b77a9244, I propose that we change whatever is concatenating those > strings append a terminating semicolon. (Surely we don't care about two > tests stanzas writing a single SQL command by omitting the semicolon > terminator.) Whoops, sorry about that. Yes, we could pretty easily add that. The create SQL is built up at the bottom of 002_pg_dump.pl: $create_sql .= $tests{$test}->{create_sql}; > I wonder if there's any rationale to the create_order numbers. Surely > we only care for objects that depend on others. Yes, it was just a way to manage those dependencies. If there's value in doing something more complicated then we could certainly do that, but I'm not sure why it would be necessary to add that complexity. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] WITH clause in CREATE STATISTICS
Alvaro Herrera wrote: > In the meantime, I noticed that pg_dump support for extstats is not > covered, which I'll go fix next. Here I add one, which seems to work for me. Considering that Stephen missed a terminating semicolon for test with create_order 96 (the last one prior to my commit) in commit 31a8b77a9244, I propose that we change whatever is concatenating those strings append a terminating semicolon. (Surely we don't care about two tests stanzas writing a single SQL command by omitting the semicolon terminator.) I wonder if there's any rationale to the create_order numbers. Surely we only care for objects that depend on others. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 1dcb16de656d48c1004d4f4a12475438618a5c72 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Wed, 3 May 2017 14:18:22 -0300 Subject: [PATCH] Add pg_dump tests for CREATE STATISTICS --- src/bin/pg_dump/t/002_pg_dump.pl | 36 +++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl index ccd0ed6a53..79108cd331 100644 --- a/src/bin/pg_dump/t/002_pg_dump.pl +++ b/src/bin/pg_dump/t/002_pg_dump.pl @@ -1053,7 +1053,7 @@ my %tests = ( all_runs => 1, catch_all => 'ALTER TABLE ... commands', create_order => 96, - create_sql => 'ALTER TABLE dump_test.test_table CLUSTER ON test_table_pkey', + create_sql => 'ALTER TABLE dump_test.test_table CLUSTER ON test_table_pkey;', regexp=> qr/^ \QALTER TABLE test_table CLUSTER ON test_table_pkey;\E\n /xm, @@ -4920,6 +4920,40 @@ qr/CREATE TRANSFORM FOR integer LANGUAGE sql \(FROM SQL WITH FUNCTION pg_catalog role => 1, section_post_data=> 1, }, }, + 'CREATE STATISTICS extended_stats_no_using' => { + all_runs => 1, + catch_all=> 'CREATE ... commands', + create_order => 97, + create_sql => 'CREATE STATISTICS dump_test.test_ext_stats_no_using + ON (col1, col2) FROM dump_test.test_fifth_table;', + regexp => qr/^ + \QCREATE STATISTICS dump_test.test_ext_stats_no_using ON (col1, col2) FROM test_fifth_table;\E + /xms, + like => { + binary_upgrade => 1, + clean => 1, + clean_if_exists => 1, + createdb=> 1, + defaults=> 1, + exclude_test_table => 1, + exclude_test_table_data => 1, + no_blobs=> 1, + no_privs=> 1, + no_owner=> 1, + only_dump_test_schema => 1, + pg_dumpall_dbprivs => 1, + schema_only => 1, + section_post_data => 1, + test_schema_plus_blobs => 1, + with_oids => 1, }, + unlike => { + exclude_dump_test_schema => 1, + only_dump_test_table => 1, + pg_dumpall_globals => 1, + pg_dumpall_globals_clean => 1, + role => 1, + section_pre_data => 1, }, }, + 'CREATE SEQUENCE test_table_col1_seq' => { all_runs => 1, catch_all => 'CREATE ... commands', -- 2.11.0 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Simon Riggs wrote: > 2. > USING keyword, no brackets > CREATE STATISTICS s1 USING (dependencies, ndistinct) ON (a, b) FROM t1 > WHERE partial-stuff; > > and if there are options, use the WITH for the optional parameters like this > CREATE STATISTICS s1 USING (dependencies, ndistinct) WITH (options) ON > (a, b) FROM t1 WHERE partial-stuff; > > > I think I like (2) OK, sounds sensible. Note that the USING list is also optional -- if you don't specify it, we default to creating all stat types. Also note that we currently don't have any option other than stat types, so the WITH would always be empty -- in other words we should remove it until we implement some option. (I can readily see two possible options to implement for pg11, so the omission of the WITH clause would be temporary: 1. sample size to use instead of the per-column values 2. whether to forcibly collect stats for all column for this stat object even if the column has gotten a SET STATISTICS 0 Surely there can be others.) Patch attached that adds the USING clause replacing the WITH clause, which is also optional and only accepts statistic types (it doesn't accept "foo = OFF" anymore, as it seems pointless, but I'm open to accepting it if people care about it.) (This patch removes WITH, but I verified that bison accepts having both. The second attached reversed patch is what I used for removal.) In the meantime, I noticed that pg_dump support for extstats is not covered, which I'll go fix next. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index b10b734b90..3406b7a1cd 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -1132,7 +1132,7 @@ WHERE tablename = 'road'; To inspect functional dependencies on a statistics stts, you may do this: -CREATE STATISTICS stts WITH (dependencies) +CREATE STATISTICS stts USING (dependencies) ON (zip, city) FROM zipcodes; ANALYZE zipcodes; SELECT stxname, stxkeys, stxdependencies @@ -1219,7 +1219,7 @@ EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 10; Continuing the above example, the n-distinct coefficients in a ZIP code table may look like the following: -CREATE STATISTICS stts2 WITH (ndistinct) +CREATE STATISTICS stts2 USING (ndistinct) ON (zip, state, city) FROM zipcodes; ANALYZE zipcodes; SELECT stxkeys AS k, stxndistinct AS nd diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml index f4430eb23c..16c433c3a2 100644 --- a/doc/src/sgml/planstats.sgml +++ b/doc/src/sgml/planstats.sgml @@ -526,7 +526,7 @@ EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1; multivariate statistics on the two columns: -CREATE STATISTICS stts WITH (dependencies) ON (a, b) FROM t; +CREATE STATISTICS stts USING (dependencies) ON (a, b) FROM t; ANALYZE t; EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1; QUERY PLAN @@ -569,7 +569,7 @@ EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY a, b; calculation, the estimate is much improved: DROP STATISTICS stts; -CREATE STATISTICS stts WITH (dependencies, ndistinct) ON (a, b) FROM t; +CREATE STATISTICS stts USING (dependencies, ndistinct) ON (a, b) FROM t; ANALYZE t; EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY a, b; QUERY PLAN diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml index edbcf5840b..ff6ed0668f 100644 --- a/doc/src/sgml/ref/create_statistics.sgml +++ b/doc/src/sgml/ref/create_statistics.sgml @@ -22,7 +22,7 @@ PostgreSQL documentation CREATE STATISTICS [ IF NOT EXISTS ] statistics_name -WITH ( option [= value] [, ... ] ) +USING ( statistic_type [, ... ] ) ON ( column_name, column_name [, ...]) FROM table_name @@ -103,14 +103,14 @@ CREATE STATISTICS [ IF NOT EXISTS ] statistics_na -The WITH clause can specify options -for the statistics. Available options are listed below. +The USING clause can specify types of statistics +to be enabled. Available types are listed below. -dependencies (boolean) +dependencies Enables functional dependencies for the statistics. @@ -119,7 +119,7 @@ CREATE STATISTICS [ IF NOT EXISTS ] statistics_na -ndistinct (boolean) +ndistinct Enables ndistinct coefficients for the statistics. diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c index 0b9c33e30a..f4d1712091 100644 --- a/src/backend/commands/statscmds.c +++ b/src/backend/commands/statscmds.c @@ -194,23 +194,22 @@ CreateStatistics(CreateStatsStmt *stmt) stxkeys = bui
Re: [HACKERS] WITH clause in CREATE STATISTICS
On 22 April 2017 at 21:30, Simon Riggs wrote: > CREATE STATISTICS s1 USING (dependencies, ndistinct) ON (a, b) FROM t1 > WHERE partial-stuff; +1 Seems much more CREATE INDEX like, and that's pretty good given that most of the complaints so far were about it bearing enough resemblance to CREATE INDEX -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
2017-04-22 11:30 GMT+02:00 Simon Riggs : > On 21 April 2017 at 01:21, Tomas Vondra > wrote: > > On 04/21/2017 12:13 AM, Tom Lane wrote: > >> > >> Alvaro Herrera writes: > >>> > >>> Simon just pointed out that having the WITH clause appear in the middle > >>> of the CREATE STATISTICS command looks odd; apparently somebody else > >>> already complained on list about the same. Other commands put the WITH > >>> clause at the end, so perhaps we should do likewise in the new command. > >> > >> > >>> Here's a patch to implement that. I verified that if I change > >>> qualified_name to qualified_name_list, bison does not complain about > >>> conflicts, so this new syntax should support extension to multiple > >>> relations without a problem. > >> > >> > >> Yeah, WITH is fully reserved, so as long as the clause looks like > >> WITH ( stuff... ) you're pretty much gonna be able to drop it > >> wherever you want. > >> > >>> Discuss. > >> > >> > >> +1 for WITH at the end; the existing syntax looks weird to me too. > >> > > > > -1 from me > > > > I like the current syntax more, and WHERE ... WITH seems a bit weird to > me. > > But more importantly, one thing Dean probably considered when proposing > the > > current syntax was that we may add support for partial statistics, pretty > > much like partial indexes. And we don't allow WITH at the end (after > WHERE) > > for indexes: > > > > test=# create index on t (a) where a < 100 with (fillfactor=10); > > ERROR: syntax error at or near "with" > > LINE 1: create index on t (a) where a < 100 with (fillfactor=10); > > ^ > > test=# create index on t (a) with (fillfactor=10) where a < 100; > > OK, didn't know about WHERE clause; makes sense. > > Currently WITH is supported in two places, which feels definitely > wrong. The WITH clause is used elsewhere to provide optional > parameters, and if there are none present it is optional. > > OK, so lets try... > > 1. > No keyword at all, just list of statistics we store (i.e. just lose the > WITH) > CREATE STATISTICS s1 (dependencies, ndistinct) ON (a, b) FROM t1 WHERE > partial-stuff; > > and if there are options, use the WITH for the optional parameters like > this > CREATE STATISTICS s1 (dependencies, ndistinct) WITH (options) ON (a, > b) FROM t1 WHERE partial-stuff; > > 2. > USING keyword, no brackets > CREATE STATISTICS s1 USING (dependencies, ndistinct) ON (a, b) FROM t1 > WHERE partial-stuff; > > and if there are options, use the WITH for the optional parameters like > this > CREATE STATISTICS s1 USING (dependencies, ndistinct) WITH (options) ON > (a, b) FROM t1 WHERE partial-stuff; > > > I think I like (2) > +1 Regards Pavel > > -- > Simon Riggshttp://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] WITH clause in CREATE STATISTICS
On 21 April 2017 at 01:21, Tomas Vondra wrote: > On 04/21/2017 12:13 AM, Tom Lane wrote: >> >> Alvaro Herrera writes: >>> >>> Simon just pointed out that having the WITH clause appear in the middle >>> of the CREATE STATISTICS command looks odd; apparently somebody else >>> already complained on list about the same. Other commands put the WITH >>> clause at the end, so perhaps we should do likewise in the new command. >> >> >>> Here's a patch to implement that. I verified that if I change >>> qualified_name to qualified_name_list, bison does not complain about >>> conflicts, so this new syntax should support extension to multiple >>> relations without a problem. >> >> >> Yeah, WITH is fully reserved, so as long as the clause looks like >> WITH ( stuff... ) you're pretty much gonna be able to drop it >> wherever you want. >> >>> Discuss. >> >> >> +1 for WITH at the end; the existing syntax looks weird to me too. >> > > -1 from me > > I like the current syntax more, and WHERE ... WITH seems a bit weird to me. > But more importantly, one thing Dean probably considered when proposing the > current syntax was that we may add support for partial statistics, pretty > much like partial indexes. And we don't allow WITH at the end (after WHERE) > for indexes: > > test=# create index on t (a) where a < 100 with (fillfactor=10); > ERROR: syntax error at or near "with" > LINE 1: create index on t (a) where a < 100 with (fillfactor=10); > ^ > test=# create index on t (a) with (fillfactor=10) where a < 100; OK, didn't know about WHERE clause; makes sense. Currently WITH is supported in two places, which feels definitely wrong. The WITH clause is used elsewhere to provide optional parameters, and if there are none present it is optional. OK, so lets try... 1. No keyword at all, just list of statistics we store (i.e. just lose the WITH) CREATE STATISTICS s1 (dependencies, ndistinct) ON (a, b) FROM t1 WHERE partial-stuff; and if there are options, use the WITH for the optional parameters like this CREATE STATISTICS s1 (dependencies, ndistinct) WITH (options) ON (a, b) FROM t1 WHERE partial-stuff; 2. USING keyword, no brackets CREATE STATISTICS s1 USING (dependencies, ndistinct) ON (a, b) FROM t1 WHERE partial-stuff; and if there are options, use the WITH for the optional parameters like this CREATE STATISTICS s1 USING (dependencies, ndistinct) WITH (options) ON (a, b) FROM t1 WHERE partial-stuff; I think I like (2) -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
On 21 April 2017 at 01:21, Tomas Vondra wrote: > On 04/21/2017 12:13 AM, Tom Lane wrote: >> >> Alvaro Herrera writes: >>> >>> Simon just pointed out that having the WITH clause appear in the middle >>> of the CREATE STATISTICS command looks odd; apparently somebody else >>> already complained on list about the same. Other commands put the WITH >>> clause at the end, so perhaps we should do likewise in the new command. >> >> +1 for WITH at the end; the existing syntax looks weird to me too. > > -1 from me > Yeah, I'm still marginally in favour of the current syntax because it's a bit more consistent with the CREATE VIEW syntax which puts the WITH (options) clause before the query, and assuming that multi-table and partial statistics support do get added in the future, the thing that the statistics get created on is going to look more like a query. OTOH, a few people have now commented that the syntax looks weird, and not just because of the placement of the WITH clause. I don't have any better ideas, but it's not too late to change if anyone else does... Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Moin, On Fri, April 21, 2017 7:04 am, David Rowley wrote: > On 21 April 2017 at 22:30, Tels wrote: >> I'd rather see: >> >> CREATE STATISTICS stats_name ON table(col); >> >> as this both mirrors CREATE INDEX and foreign keys with REFERENCES. It >> could also be extended to both more columns, expressions or other tables >> like so: >> >> CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b); >> >> and even: >> >> CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b) WITH (options) >> WHERE t2.a > 4; > > How would you express a join condition with that syntax? > >> This looks easy to remember, since it compares to: >> >> CREATE INDEX idx_name ON t2(a,b) WITH (options) WHERE t2.a > 4; >> >> Or am I'm missing something? > > Sadly yes, you are, and it's not the first time. > > I seem to remember mentioning this to you already in [1]. > > Please, can you read over [2], it mentions exactly what you're > proposing and why it's not any good. > > [1] > https://www.postgresql.org/message-id/cakjs1f9hmet+7adiceau8heompob5pkkcvyzliezje3dvut...@mail.gmail.com > [2] > https://www.postgresql.org/message-id/CAEZATCUtGR+U5+QTwjHhe9rLG2nguEysHQ5NaqcK=vbj78v...@mail.gmail.com Ah, ok, thank you, somehow I missed your answer the first time. So, just ignore me :) Best wishes, Tels -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
On 21 April 2017 at 22:30, Tels wrote: > I'd rather see: > > CREATE STATISTICS stats_name ON table(col); > > as this both mirrors CREATE INDEX and foreign keys with REFERENCES. It > could also be extended to both more columns, expressions or other tables > like so: > > CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b); > > and even: > > CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b) WITH (options) > WHERE t2.a > 4; How would you express a join condition with that syntax? > This looks easy to remember, since it compares to: > > CREATE INDEX idx_name ON t2(a,b) WITH (options) WHERE t2.a > 4; > > Or am I'm missing something? Sadly yes, you are, and it's not the first time. I seem to remember mentioning this to you already in [1]. Please, can you read over [2], it mentions exactly what you're proposing and why it's not any good. [1] https://www.postgresql.org/message-id/cakjs1f9hmet+7adiceau8heompob5pkkcvyzliezje3dvut...@mail.gmail.com [2] https://www.postgresql.org/message-id/CAEZATCUtGR+U5+QTwjHhe9rLG2nguEysHQ5NaqcK=vbj78v...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Moin, On Thu, April 20, 2017 8:21 pm, Tomas Vondra wrote: > On 04/21/2017 12:13 AM, Tom Lane wrote: >> Alvaro Herrera writes: >>> Simon just pointed out that having the WITH clause appear in the middle >>> of the CREATE STATISTICS command looks odd; apparently somebody else >>> already complained on list about the same. Other commands put the WITH >>> clause at the end, so perhaps we should do likewise in the new command. >> >>> Here's a patch to implement that. I verified that if I change >>> qualified_name to qualified_name_list, bison does not complain about >>> conflicts, so this new syntax should support extension to multiple >>> relations without a problem. >> >> Yeah, WITH is fully reserved, so as long as the clause looks like >> WITH ( stuff... ) you're pretty much gonna be able to drop it >> wherever you want. >> >>> Discuss. >> >> +1 for WITH at the end; the existing syntax looks weird to me too. >> > > -1 from me > > I like the current syntax more, and WHERE ... WITH seems a bit weird to > me. But more importantly, one thing Dean probably considered when > proposing the current syntax was that we may add support for partial > statistics, pretty much like partial indexes. And we don't allow WITH at > the end (after WHERE) for indexes: > > test=# create index on t (a) where a < 100 with (fillfactor=10); > ERROR: syntax error at or near "with" > LINE 1: create index on t (a) where a < 100 with (fillfactor=10); > ^ > test=# create index on t (a) with (fillfactor=10) where a < 100; While I'm not sure about where to put the WITH, so to speak, I do favour consistency. So I'm inclinded to keep the syntax like for the "create index". More importantly however, I'd rather see the syntax on the "ON (column) FROM relname" to be changed to match the existing examples. (Already wrote this to Simon, not sure if my email made it to the list) So instead: CREATE STATISTICS stats_name ON (columns) FROM relname I'd rather see: CREATE STATISTICS stats_name ON table(col); as this both mirrors CREATE INDEX and foreign keys with REFERENCES. It could also be extended to both more columns, expressions or other tables like so: CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b); and even: CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b) WITH (options) WHERE t2.a > 4; This looks easy to remember, since it compares to: CREATE INDEX idx_name ON t2(a,b) WITH (options) WHERE t2.a > 4; Or am I'm missing something? Regards, Tels -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
On 04/21/2017 12:13 AM, Tom Lane wrote: Alvaro Herrera writes: Simon just pointed out that having the WITH clause appear in the middle of the CREATE STATISTICS command looks odd; apparently somebody else already complained on list about the same. Other commands put the WITH clause at the end, so perhaps we should do likewise in the new command. Here's a patch to implement that. I verified that if I change qualified_name to qualified_name_list, bison does not complain about conflicts, so this new syntax should support extension to multiple relations without a problem. Yeah, WITH is fully reserved, so as long as the clause looks like WITH ( stuff... ) you're pretty much gonna be able to drop it wherever you want. Discuss. +1 for WITH at the end; the existing syntax looks weird to me too. -1 from me I like the current syntax more, and WHERE ... WITH seems a bit weird to me. But more importantly, one thing Dean probably considered when proposing the current syntax was that we may add support for partial statistics, pretty much like partial indexes. And we don't allow WITH at the end (after WHERE) for indexes: test=# create index on t (a) where a < 100 with (fillfactor=10); ERROR: syntax error at or near "with" LINE 1: create index on t (a) where a < 100 with (fillfactor=10); ^ test=# create index on t (a) with (fillfactor=10) where a < 100; regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
Alvaro Herrera writes: > Simon just pointed out that having the WITH clause appear in the middle > of the CREATE STATISTICS command looks odd; apparently somebody else > already complained on list about the same. Other commands put the WITH > clause at the end, so perhaps we should do likewise in the new command. > Here's a patch to implement that. I verified that if I change > qualified_name to qualified_name_list, bison does not complain about > conflicts, so this new syntax should support extension to multiple > relations without a problem. Yeah, WITH is fully reserved, so as long as the clause looks like WITH ( stuff... ) you're pretty much gonna be able to drop it wherever you want. > Discuss. +1 for WITH at the end; the existing syntax looks weird to me too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WITH clause in CREATE STATISTICS
Simon just pointed out that having the WITH clause appear in the middle of the CREATE STATISTICS command looks odd; apparently somebody else already complained on list about the same. Other commands put the WITH clause at the end, so perhaps we should do likewise in the new command. Here's a patch to implement that. I verified that if I change qualified_name to qualified_name_list, bison does not complain about conflicts, so this new syntax should support extension to multiple relations without a problem. Discuss. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services commit 237909c440054e97c09dc6e7711a4ca92892465b[m Author: Alvaro Herrera AuthorDate: Thu Apr 20 18:19:06 2017 -0300 CommitDate: Thu Apr 20 18:19:06 2017 -0300 Put WITH clause at the end diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml index edbcf5840b..910d6be8ab 100644 --- a/doc/src/sgml/ref/create_statistics.sgml +++ b/doc/src/sgml/ref/create_statistics.sgml @@ -22,9 +22,9 @@ PostgreSQL documentation CREATE STATISTICS [ IF NOT EXISTS ] statistics_name -WITH ( option [= value] [, ... ] ) ON ( column_name, column_name [, ...]) FROM table_name +WITH ( option [= value] [, ... ] ) @@ -158,7 +158,7 @@ CREATE TABLE t1 ( INSERT INTO t1 SELECT i/100, i/500 FROM generate_series(1,100) s(i); -CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t1; +CREATE STATISTICS s1 ON (a, b) FROM t1 WITH (dependencies); ANALYZE t1; diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 89d2836c49..92e9aa8b28 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -3847,28 +3847,28 @@ ExistingIndex: USING INDEX index_name { $$ = $3; } /* * * QUERY : - * CREATE STATISTICS stats_name WITH (options) ON (columns) FROM relname + * CREATE STATISTICS stats_name ON (columns) FROM relname WITH (options) * */ -CreateStatsStmt: CREATE STATISTICS any_name opt_reloptions ON '(' columnList ')' FROM qualified_name +CreateStatsStmt: CREATE STATISTICS any_name ON '(' columnList ')' FROM qualified_name opt_reloptions { CreateStatsStmt *n = makeNode(CreateStatsStmt); n->defnames = $3; - n->relation = $10; - n->keys = $7; - n->options = $4; + n->relation = $9; + n->keys = $6; + n->options = $10; n->if_not_exists = false; $$ = (Node *)n; } - | CREATE STATISTICS IF_P NOT EXISTS any_name opt_reloptions ON '(' columnList ')' FROM qualified_name + | CREATE STATISTICS IF_P NOT EXISTS any_name ON '(' columnList ')' FROM qualified_name opt_reloptions { CreateStatsStmt *n = makeNode(CreateStatsStmt); n->defnames = $6; - n->relation = $13; - n->keys = $10; - n->options = $7; + n->relation = $12; + n->keys = $9; + n->options = $13; n->if_not_exists = true; $$ = (Node *)n; } diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out index 0d6f65e604..7dc6011e6b 100644 --- a/src/test/regress/expected/stats_ext.out +++ b/src/test/regress/expected/stats_ext.out @@ -389,7 +389,7 @@ EXPLAIN (COSTS OFF) (2 rows) -- create statistics -CREATE STATISTICS func_deps_stat WITH (dependencies) ON (a, b, c) FROM functional_dependencies; +CREATE STATISTICS func_deps_stat ON (a, b, c) FROM functional_dependencies WITH