Re: [HACKERS] Casting issues with domains
Le 11/12/2014 00:46, Tom Lane a écrit : Kevin Grittner kgri...@ymail.com writes: Tom Lane t...@sss.pgh.pa.us wrote: As far as that goes, I think the OP was unhappy about the performance of the information_schema views, which in our implementation do exactly that so that the exposed types of the view columns conform to the SQL standard, even though the underlying catalogs use PG-centric types. I don't believe that that's the only reason why the performance of the information_schema views tends to be sucky, but it's certainly a reason. Is that schema too edge case to justify some functional indexes on the cast values on the underlying catalogs? (I'm inclined to think so, but it seemed like a question worth putting out there) We don't support functional indexes on system catalogs, so whether it'd be justified is sorta moot. On the whole though I'm inclined to agree that the information_schema views aren't used enough to justify adding overhead to system-catalog updates, even if the pieces for that all existed. Or, since these particular domains are known, is there any sane way to special-case these to allow the underlying types to work? I don't particularly care for a kluge solution here. I notice that recent versions of the SQL spec contain the notion of a distinct type, which is a user-defined type that is representationally identical to some base type but has its own name, and comes equipped with assignment-grade casts to and from the base type (which in PG terms would be binary-compatible casts, though the spec doesn't require that). It seems like this might be intended to be the sort of zero cost type alias we were talking about, except that the SQL committee seems to have got it wrong by not specifying the cast-to-base-type as being implicit. Which ISTM you would want so that operators/functions on the base type would apply automatically to the distinct type. But perhaps we could extend the spec with some option to CREATE TYPE to allow the cast to come out that way. Or in short, maybe we should try to replace the domains used in the current information_schema with distinct types. That's interesting and could easily solve the problem. To give some context, for some reason, Drupal queries the information_schema views before displaying some pages. As our customer has many tables (approx 6 tables, organised à la Oracle with one schema per database user). Thus, the seq scan against pg_class takes ~50ms and the very same one without the cast takes less than 1ms. There is an example of query used : SELECT column_name, data_type, column_default FROM information_schema.columns WHERE table_schema = 'one_schema' AND table_name = 'one_table' AND ( data_type = 'bytea' OR ( numeric_precision IS NOT NULL AND column_default::text LIKE '%nextval%' ) ); Regards -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Casting issues with domains
Thomas Reiss thomas.re...@dalibo.com wrote: postgres=# create table test1 (a text); CREATE TABLE postgres=# insert into test1 select generate_series(1,10); INSERT 0 10 postgres=# create index idx1 on test1(a); CREATE INDEX postgres=# analyze test1 ; ANALYZE; postgres=# explain select * from test1 where a = 'toto'; QUERY PLAN --- Index Only Scan using idx1 on test1 (cost=0.29..8.31 rows=1 width=5) Index Cond: (a = 'toto'::text) (2 lignes) Now we create a tstdom domain and cast the a column to tstdom in the view definition : postgres=# create domain tstdom as text; CREATE DOMAIN postgres=# create view test2 as select a::tstdom from test1 ; CREATE VIEW postgres=# explain select * from test2 where a='toto'; QUERY PLAN -- Seq Scan on test1 (cost=0.00..1693.00 rows=500 width=5) Filter: (((a)::tstdom)::text = 'toto'::text) (2 lignes) As you can see, a is casted to tstdom then again to text. This casts prevents the optimizer to choose an index scan to retrieve the data. The casts are however strictly equivalent and should be not prevent the optimizer to use indexes. You can create an index to be used for searching using the domain. Following the steps in your example, you can run this: postgres=# create index idx2 on test1 ((a::tstdom)); CREATE INDEX postgres=# vacuum analyze test1; VACUUM postgres=# explain select * from test2 where a='toto'; QUERY PLAN -- Index Scan using idx2 on test1 (cost=0.29..8.31 rows=1 width=5) Index Cond: (((a)::tstdom)::text = 'toto'::text) (2 rows) It's even easier if a is defined to be a member of the domain in the original table: postgres=# create domain tstdom as text; CREATE DOMAIN postgres=# create table test1 (a tstdom); CREATE TABLE postgres=# insert into test1 select generate_series(1,10); INSERT 0 10 postgres=# create index idx1 on test1(a); CREATE INDEX postgres=# analyze test1 ; ANALYZE postgres=# explain select * from test1 where a = 'toto'; QUERY PLAN --- Index Only Scan using idx1 on test1 (cost=0.29..8.31 rows=1 width=5) Index Cond: (a = 'toto'::text) (2 rows) postgres=# create view test2 as select a::tstdom from test1 ; CREATE VIEW postgres=# explain select * from test2 where a='toto'; QUERY PLAN --- Index Only Scan using idx1 on test1 (cost=0.29..8.31 rows=1 width=5) Index Cond: (a = 'toto'::text) (2 rows) It's kinda hard for me to visualize where it makes sense to define the original table column as the bare type but use a domain when referencing it in the view. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Casting issues with domains
Kevin Grittner kgri...@ymail.com writes: It's kinda hard for me to visualize where it makes sense to define the original table column as the bare type but use a domain when referencing it in the view. As far as that goes, I think the OP was unhappy about the performance of the information_schema views, which in our implementation do exactly that so that the exposed types of the view columns conform to the SQL standard, even though the underlying catalogs use PG-centric types. I don't believe that that's the only reason why the performance of the information_schema views tends to be sucky, but it's certainly a reason. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Casting issues with domains
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: It's kinda hard for me to visualize where it makes sense to define the original table column as the bare type but use a domain when referencing it in the view. As far as that goes, I think the OP was unhappy about the performance of the information_schema views, which in our implementation do exactly that so that the exposed types of the view columns conform to the SQL standard, even though the underlying catalogs use PG-centric types. I don't believe that that's the only reason why the performance of the information_schema views tends to be sucky, but it's certainly a reason. Is that schema too edge case to justify some functional indexes on the cast values on the underlying catalogs? (I'm inclined to think so, but it seemed like a question worth putting out there) Or, since these particular domains are known, is there any sane way to special-case these to allow the underlying types to work? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Casting issues with domains
Kevin Grittner kgri...@ymail.com writes: Tom Lane t...@sss.pgh.pa.us wrote: As far as that goes, I think the OP was unhappy about the performance of the information_schema views, which in our implementation do exactly that so that the exposed types of the view columns conform to the SQL standard, even though the underlying catalogs use PG-centric types. I don't believe that that's the only reason why the performance of the information_schema views tends to be sucky, but it's certainly a reason. Is that schema too edge case to justify some functional indexes on the cast values on the underlying catalogs? (I'm inclined to think so, but it seemed like a question worth putting out there) We don't support functional indexes on system catalogs, so whether it'd be justified is sorta moot. On the whole though I'm inclined to agree that the information_schema views aren't used enough to justify adding overhead to system-catalog updates, even if the pieces for that all existed. Or, since these particular domains are known, is there any sane way to special-case these to allow the underlying types to work? I don't particularly care for a kluge solution here. I notice that recent versions of the SQL spec contain the notion of a distinct type, which is a user-defined type that is representationally identical to some base type but has its own name, and comes equipped with assignment-grade casts to and from the base type (which in PG terms would be binary-compatible casts, though the spec doesn't require that). It seems like this might be intended to be the sort of zero cost type alias we were talking about, except that the SQL committee seems to have got it wrong by not specifying the cast-to-base-type as being implicit. Which ISTM you would want so that operators/functions on the base type would apply automatically to the distinct type. But perhaps we could extend the spec with some option to CREATE TYPE to allow the cast to come out that way. Or in short, maybe we should try to replace the domains used in the current information_schema with distinct types. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Casting issues with domains
Le 08/12/2014 16:18, Tom Lane a écrit : Thomas Reiss thomas.re...@dalibo.com writes: postgres=# explain select * from test2 where a='toto'; QUERY PLAN -- Seq Scan on test1 (cost=0.00..1693.00 rows=500 width=5) Filter: (((a)::tstdom)::text = 'toto'::text) (2 lignes) As you can see, a is casted to tstdom then again to text. This casts prevents the optimizer to choose an index scan to retrieve the data. The casts are however strictly equivalent and should be not prevent the optimizer to use indexes. No, they are not equivalent. The optimizer can't simply drop the cast-to-domain, because that cast might result in a runtime error due to a domain CHECK constraint violation. (This is true even if no such constraint exists at planning time, unfortunately. If we had a mechanism to force replanning at ALTER DOMAIN ADD CONSTRAINT, maybe the no-constraints case could be handled better, but we don't; and adding one would also imply adding more locks around domain usage, so it's not all that attractive to do it.) The short answer is that SQL domains are not zero-cost type aliases. Perhaps there would be value in having a feature that *is* a a zero-cost alias, but it wouldn't be a domain. I agree regarding the feature for zero-cost aliases. It would ease access on the catalog done via the information_schema for example. Thanks for your answer. There's some room for improvement for sure, but it not as easy as it seems. Regards, Thomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Casting issues with domains
Hello all, We experienced some casting issues with domains. We experienced the problem while querying the information_schema btw, but here is a simpler test case : postgres=# create table test1 (a text); CREATE TABLE postgres=# insert into test1 select generate_series(1,10); INSERT 0 10 postgres=# create index idx1 on test1(a); CREATE INDEX postgres=# analyze test1 ; ANALYZE; postgres=# explain select * from test1 where a = 'toto'; QUERY PLAN --- Index Only Scan using idx1 on test1 (cost=0.29..8.31 rows=1 width=5) Index Cond: (a = 'toto'::text) (2 lignes) Now we create a tstdom domain and cast the a column to tstdom in the view definition : postgres=# create domain tstdom as text; CREATE DOMAIN postgres=# create view test2 as select a::tstdom from test1 ; CREATE VIEW postgres=# explain select * from test2 where a='toto'; QUERY PLAN -- Seq Scan on test1 (cost=0.00..1693.00 rows=500 width=5) Filter: (((a)::tstdom)::text = 'toto'::text) (2 lignes) As you can see, a is casted to tstdom then again to text. This casts prevents the optimizer to choose an index scan to retrieve the data. The casts are however strictly equivalent and should be not prevent the optimizer to use indexes. Also, the same problem appears in the information_schema views, as every object names are casted to information_schema.sql_identifier. Even if this domain is declared as name, no index will be used because of this cast. Shouldn't the planner simplify the casts when it's possible ? Regards, Thomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Casting issues with domains
Thomas Reiss thomas.re...@dalibo.com writes: postgres=# explain select * from test2 where a='toto'; QUERY PLAN -- Seq Scan on test1 (cost=0.00..1693.00 rows=500 width=5) Filter: (((a)::tstdom)::text = 'toto'::text) (2 lignes) As you can see, a is casted to tstdom then again to text. This casts prevents the optimizer to choose an index scan to retrieve the data. The casts are however strictly equivalent and should be not prevent the optimizer to use indexes. No, they are not equivalent. The optimizer can't simply drop the cast-to-domain, because that cast might result in a runtime error due to a domain CHECK constraint violation. (This is true even if no such constraint exists at planning time, unfortunately. If we had a mechanism to force replanning at ALTER DOMAIN ADD CONSTRAINT, maybe the no-constraints case could be handled better, but we don't; and adding one would also imply adding more locks around domain usage, so it's not all that attractive to do it.) The short answer is that SQL domains are not zero-cost type aliases. Perhaps there would be value in having a feature that *is* a a zero-cost alias, but it wouldn't be a domain. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Casting issues with domains
On 12/8/14, 9:18 AM, Tom Lane wrote: The short answer is that SQL domains are not zero-cost type aliases. Perhaps there would be value in having a feature that*is* a a zero-cost alias, but it wouldn't be a domain. Note that you can actually re-use the support functions of one data type to create a new one. So if you wanted a special type called document that actually behaved the same as text you could do that fairly easily (though not as easily as creating a domain). If we were going to expend energy here, I suspect it would be more useful to look at ways of creating new types without requiring C. C isn't an option on many (even most) environments in today's cloud world, aside from the intimidation factor. There are comments in the code that hypothesize about making cstring a full type; that might be all that's needed. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers