Re: [GENERAL] Scalar subquery
Thanks, Tom Can this be clarified in docs? It is stated there now that scalar subquery is one of the kinds of expressions and it is somewhat counter-intuitive that an expression may sometimes not respect its own degree of volatility. On Wed, Sep 1, 2010 at 2:16 AM, Tom Lane t...@sss.pgh.pa.us wrote: Vyacheslav Kalinin v...@mgcp.com writes: I just got my hands on mysql (5.0.something) and it does not cache the scalar subquery result. So... now I'm completely puzzled whether this is a bug, a desired result or just a loosely standardized thing. It's loosely standardized. AFAICS, the spec doesn't address the detailed semantics of subqueries at all, except in wording to this effect: Each subquery in the search condition is effectively executed for each row of T and the results used in the ap- plication of the search condition to the given row of T. If any executed subquery contains an outer reference to a column of T, the reference is to the value of that column in the given row of T. There is wording like this for subqueries in WHERE and HAVING, but I haven't found anything at all that mentions the behavior for subqueries in the SELECT targetlist. In any case, the fact that they said effectively executed and not simply executed seems to be meant to leave implementors a lot of wiggle room. In particular, there isn't any wording that I can find suggesting that the presence of volatile (or in the spec's classification, nondeterministic) functions ought to affect the behavior. PG's interpretation is that if there is no outer reference in a subquery, it's okay to implement it as an initplan, meaning it gets evaluated at most once per call of the containing query. We don't pay attention to whether there are volatile functions in there. regards, tom lane
Re: [GENERAL] Scalar subquery
I just got my hands on mysql (5.0.something) and it does not cache the scalar subquery result. So... now I'm completely puzzled whether this is a bug, a desired result or just a loosely standardized thing. Help anyone? On Fri, Aug 27, 2010 at 5:41 PM, Vyacheslav Kalinin v...@mgcp.com wrote: Hi, Apparently scalar subquery when used as a part of SELECT statement and when it does not depend on outer query columns is executed only once per statement, e.g.: postgres=# select i, (select random()) rand from generate_series(1, 3) i; i | rand ---+--- 1 | 0.992319826036692 2 | 0.992319826036692 3 | 0.992319826036692 (Though term depend is subtle, compare these: postgres=# select i, (select random() + case when false then i else 0 end ) rand from generate_series(1, 3) i; i | rand ---+--- 1 | 0.806265413761139 2 | 0.806265413761139 3 | 0.806265413761139 (3 rows) postgres=# select i, (select random() where i=i ) rand from generate_series(1, 3) i; i | rand ---+--- 1 | 0.426443862728775 2 | 0.133071997668594 3 | 0.751982506364584 (3 rows) postgres=# select i, (select random() where i=i or i is null ) rand from generate_series(1, 3) i; i | rand ---+--- 1 | 0.320982406847179 2 | 0.996762252878398 3 | 0.076554249972105 (3 rows) Looks like dependence is not there anymore if PG is smart enough to simplify boolean expressions) Anyway, as some older PG versions and Oracle behave similarly I suppose this result is expected and desired (correct?), but unfortunately not well-documented (did I miss it mentioned?). Can anyone shed some light on this and/or probably update docs? P.S. I got bitten by a statement like this: select (select nextval('someseq') * a + b from somefunc()), col1, with a and b being OUT parameters of somefunc().
[GENERAL] Scalar subquery
Hi, Apparently scalar subquery when used as a part of SELECT statement and when it does not depend on outer query columns is executed only once per statement, e.g.: postgres=# select i, (select random()) rand from generate_series(1, 3) i; i | rand ---+--- 1 | 0.992319826036692 2 | 0.992319826036692 3 | 0.992319826036692 (Though term depend is subtle, compare these: postgres=# select i, (select random() + case when false then i else 0 end ) rand from generate_series(1, 3) i; i | rand ---+--- 1 | 0.806265413761139 2 | 0.806265413761139 3 | 0.806265413761139 (3 rows) postgres=# select i, (select random() where i=i ) rand from generate_series(1, 3) i; i | rand ---+--- 1 | 0.426443862728775 2 | 0.133071997668594 3 | 0.751982506364584 (3 rows) postgres=# select i, (select random() where i=i or i is null ) rand from generate_series(1, 3) i; i | rand ---+--- 1 | 0.320982406847179 2 | 0.996762252878398 3 | 0.076554249972105 (3 rows) Looks like dependence is not there anymore if PG is smart enough to simplify boolean expressions) Anyway, as some older PG versions and Oracle behave similarly I suppose this result is expected and desired (correct?), but unfortunately not well-documented (did I miss it mentioned?). Can anyone shed some light on this and/or probably update docs? P.S. I got bitten by a statement like this: select (select nextval('someseq') * a + b from somefunc()), col1, with a and b being OUT parameters of somefunc().
Re: [GENERAL] Incorrect FTS query results with GIN index
Great, thank you! I assume this one goes into 8.4.3, right? 2010/1/18 Teodor Sigaev teo...@sigaev.ru Basically, I started testing prefix matching in FTS and got into troubles. Self-contained example follows: Thank you, fixed. The reason was in incorrect optimization of GIN scan: GIN reuses scan result for equals key, but comparison of key didn't take into account a difference of scan's strategy. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/
[GENERAL] Incorrect FTS query results with GIN index
Hello, Basically, I started testing prefix matching in FTS and got into troubles. Self-contained example follows: postgres=# select version(); version PostgreSQL 8.4.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44), 32-bit postgres=# create table test (id int, data text, tsvd tsvector); postgres=# insert into test (id, data) values (1, 'hot stuff is here'), (2, 'light is hotter than dark'), (3, 'nothing is that hottie'); postgres=# update test set tsvd = to_tsvector('english', data); postgres=# select * from test; id | data| tsvd +---+--- 1 | hot stuff is here | 'hot':1 'stuff':2 2 | light is hotter than dark | 'dark':5 'hotter':3 'light':1 3 | nothing is that hottie| 'hotti':4 'noth':1 (3 rows) Now let's play with queries: postgres=# select * from test where tsvd @@ to_tsquery('english', 'hot'); id | data| tsvd +---+--- 1 | hot stuff is here | 'hot':1 'stuff':2 (1 row) postgres=# select * from test where tsvd @@ to_tsquery('english', 'hot:*'); id | data| tsvd +---+--- 1 | hot stuff is here | 'hot':1 'stuff':2 2 | light is hotter than dark | 'dark':5 'hotter':3 'light':1 3 | nothing is that hottie| 'hotti':4 'noth':1 (3 rows) postgres=# select * from test where tsvd @@ to_tsquery('english', 'hot:* | hot'); id | data| tsvd +---+--- 1 | hot stuff is here | 'hot':1 'stuff':2 2 | light is hotter than dark | 'dark':5 'hotter':3 'light':1 3 | nothing is that hottie| 'hotti':4 'noth':1 (3 rows) Looks good so far. Let's introduce an index: postgres=# create index ix_test on test using gin(tsvd); CREATE INDEX postgres=# set enable_seqscan to off; SET First two queries result in the same row sets, but look at the third one: postgres=# explain select * from test where tsvd @@ to_tsquery('english', 'hot:* | hot'); QUERY PLAN -- Bitmap Heap Scan on test (cost=4.26..8.28 rows=1 width=68) Recheck Cond: (tsvd @@ to_tsquery('english', 'hot:* | hot'::text)) - Bitmap Index Scan on ix_test (cost=0.00..4.26 rows=1 width=0) Index Cond: (tsvd @@ to_tsquery('english', 'hot:* | hot'::text)) postgres=# select * from test where tsvd @@ to_tsquery('english', 'hot:* | hot'); id | data| tsvd +---+--- 1 | hot stuff is here | 'hot':1 'stuff':2 (1 row) WTH? Apparently prefixed part of the query stopped working. Interesting that the bug doesn't show up with GiST: postgres=# drop index ix_test; DROP INDEX postgres=# create index ix_test on test using gist(tsvd); CREATE INDEX postgres=# select * from test where tsvd @@ to_tsquery('english', 'hot:* | hot'); id | data| tsvd +---+--- 1 | hot stuff is here | 'hot':1 'stuff':2 2 | light is hotter than dark | 'dark':5 'hotter':3 'light':1 3 | nothing is that hottie| 'hotti':4 'noth':1 (3 rows)
Re: [GENERAL] How can I calculate differences between values
With 8.4's analytic capabilities you can do this: select * from ( select userid, data - lag(data) over (partition by userid order by data) diff from foo) q where diff is not null; On Tue, Nov 10, 2009 at 5:40 PM, A B gentosa...@gmail.com wrote: Hello there! I have a tablefoo( userid integer, data integer); with the constraint unique(userid,data) Now I wish to select one userid and calculate the differences between the data -values (when they are sorted) . For example if the table contains: 4, 100 5, 200 5, 210 5, 231 I want the values 5,10 5,21 what should happen to the 4,100 record you may ask, I will try to exclude that case by requireing each userid to have at least two data values. The question is now: is there a clever and efficient way of calculating these differences in data values? The only thing I can think of is picking a data value and select min(data) from foo where userid=5 and data200 and then calculate the difference and do what I want with it, and the repeat this process with the last selected data value. I guess that will work, but I'm curious, are there other ways? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] 'a' = any( $${'a','x'} )
Try select 'a' = any( $${a,x}$$ ) ; or select 'a' = any( array['a','x'] ) ; On Sat, Aug 8, 2009 at 3:08 PM, Jasen Betts ja...@xnet.co.nz wrote: jasen=# select 'a' = any( $${'a','x'}$$ ) ; ?column? -- f (1 row) I expect it to be true instead. what am I doing wrong? PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2 jasen. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgres Pg_connect PHP
On Thu, Jun 11, 2009 at 4:36 PM, Dimitri Fontaine dfonta...@hi-media.comwrote: Vyacheslav Kalinin v...@mgcp.com writes: $conn = pg_pconnect(dbname=foo); Please reconsider and use plain pg_connect(). -- dim Uh, I just copied/pasted that part from somewhere in PHP manual, personally I tend to use plain pg_connect and pgbouncer for pooling when needed.For those curious why pconnect is not the best option for connection pooling - discussion on this topic rises in this list from time to time, check this out, for instance: http://archives.postgresql.org/pgsql-general/2007-08/msg00660.php
Re: [GENERAL] Postgres Pg_connect PHP
however the COPY command requires superuser and I really don't want to open up the DB from a remote system with Superuser access. COPY FROM STDIN does not need superuser privileges.
Re: [GENERAL] Postgres Pg_connect PHP
On Tue, Jun 9, 2009 at 10:35 PM, Tory M Blue tmb...@gmail.com wrote: Thanks guys, the problem with copy from or to is that it creates an array and thus puts a load of stuff in memory, it's possible the file will get huge and I can't take that memory hit. I'll look again and see if I missed something Thanks Tory It's not hard to do some streaming with copy (untested): $conn = pg_pconnect(dbname=foo); $fd = fopen('file.dat', 'r'); while (!feof($fd)) { pg_put_line($conn, fgets($fd)); } fclose($fd); pg_put_line($conn, \\.\n); pg_end_copy($conn); http://ru.php.net/manual/en/function.pg-put-line.php
Re: [GENERAL] Postgres Pg_connect PHP
Forgot about COPY command in my previous reply: $conn = pg_pconnect(dbname=foo); $fd = fopen('file.dat', 'r'); pg_query($conn, copy bar from stdin); while (!feof($fd)) { pg_put_line($conn, fgets($fd)); } fclose($fd); pg_put_line($conn, \\.\n); pg_end_copy($conn);
Re: [GENERAL] Duplicate key issue in a transaction block
On Mon, Jun 8, 2009 at 8:33 PM, Bill Moran wmo...@potentialtech.com wrote: Perhaps you want to take an exclusive lock on the table? The operation you describe seems to suggest that you'd want to guarantee exclusive write access to the table. Exclusive table lock is a bit excessive IMO. Locking particular group should be good, though it is not quite straightforward to achieve. I'd use advisory locks or would lock a row in a parent group table (if such table exists, if not - it might be worth to make one) referenced by rows in question.
Re: [GENERAL] Read a CLOB data from an Oracle table and INSERT it into a BYTEA column in Postgres using jdbc?
You are fetching an Oracle LOB locator. I am not a guru of OCI let alone JDBC but AFAIK you should create a LOB descriptor, bind it to the locator and load the LOB data that you can insert into postgres.
Re: [GENERAL] pg_toast_temp_xx AND pg_temp_xx SCHEMAS
They're just there to hold temporary tables (so that the names of such tables don't conflict with any non-temporary tables). The reason they seem to accumulate is we only create one when needed, and there's one for each concurrently executing backend if it creates any temp tables. Hm, do they get garbage-collected in any way? I have several such schemes that won't disappear even though no other sessions but one is there. On a side note: I can drop temporary tables of other sessions via DROP pg_temp_xx.table (although I can't seem to affect table data seen from other session). Is there any security issues/gotchas here?
[GENERAL] FTS on patterns
Hello, In our application we need a pattern based text search, that is if user supplies 'foo bar' he receives only documents containing words that begin with 'foo' or 'bar', including themselves (something similar to MySQL '+foo* +bar*' text search query). Currently we use separate table for unique words in document and do LIKE 'foo%' search on that table, combining the results afterwards, and I am not really happy with that solution because of its speed and overcomplication of the queries involved. Does anyone know of a better approach? Could this be achieved with Postgres full text search?
Re: [GENERAL] FTS on patterns
On Tue, Aug 5, 2008 at 6:53 PM, Tom Lane [EMAIL PROTECTED] wrote: There's support for prefix text search in CVS HEAD, but not in any released version ... regards, tom lane Oh, does that mean it won't be released until 8.4?
Re: [GENERAL] Inaccurate row count estimation
Thanks for the reply, Tom. After tracing through this I see that the problem is that we don't have statistics for inheritance trees, and so you're getting a default estimate for the selectivity of the join condition. I might be wrong but I suspect that the inheritance is not the only reason here. If I change the table definitions to: create table pinfo_p00 ( pid integer, constraint pk_pinfo_p00 primary key (pid), constraint cc_pinfo_p00_pid check(pid 0 and pid 10) ); create table pinfo_p01 ( pid integer, constraint pk_pinfo_p01 primary key (pid), constraint cc_pinfo_p01_pid check(pid = 10 and pid 20) ); and create a view pinfo, or just do a query with subselect: explain analyze select * from contacts c left join ( select * from pinfo_p00 union all select * from pinfo_p01 ) pi on (pi.pid = c.cpid) where c.pid = 200 ; the row-count assessment doesn't seem to be different: QUERY PLAN Nested Loop Left Join (cost=4.56..514.25 rows=3896 width=16) (actual time=0.125..3.976 rows=40 loops=1) Join Filter: (pinfo_p00.pid = c.cpid) - Bitmap Heap Scan on contacts c (cost=4.56..100.34 rows=39 width=12) (actual time=0.069..0.421 rows=40 loops=1) Recheck Cond: (pid = 200) - Bitmap Index Scan on ix_contacts_pid (cost=0.00..4.55 rows=39 width=0) (actual time=0.042..0.042 rows=40 loops=1) Index Cond: (pid = 200) - Append (cost=0.00..10.59 rows=2 width=4) (actual time=0.033..0.061 rows=1 loops=40) - Index Scan using pk_pinfo_p00 on pinfo_p00 (cost=0.00..5.29 rows=1 width=4) (actual time=0.011..0.015 rows=0 loops=40) Index Cond: (pinfo_p00.pid = c.cpid) - Index Scan using pk_pinfo_p01 on pinfo_p01 (cost=0.00..5.29 rows=1 width=4) (actual time=0.012..0.015 rows=0 loops=40) Index Cond: (pinfo_p01.pid = c.cpid) Total runtime: 4.341 ms It scares me a bit as it seems that innocent-looking combination of union's and join's could destroy the subsequent plan completely.
Re: [GENERAL] Inaccurate row count estimation
This is 8.3.0. Here is the reproduce code: create table contacts ( cid integer primary key, pid integer not null, cpid integer ); create index ix_contacts_pid on contacts (pid); create index ix_contacts_cpid on contacts (cpid); create table pinfo ( pid integer, constraint pk_pinfo primary key (pid) ); create table pinfo_p00 ( constraint pk_pinfo_p00 primary key (pid), constraint cc_pinfo_p00_pid check(pid 0 and pid 10) ) inherits (pinfo); create table pinfo_p01 ( constraint pk_pinfo_p01 primary key (pid), constraint cc_pinfo_p01_pid check(pid = 10 and pid 20) ) inherits (pinfo); insert into pinfo_p00 (pid) select i from generate_series(10, 1) i; insert into pinfo_p01 (pid) select i from generate_series(100010, 11) i; create sequence contacts_seq start 100; insert into contacts (cid, pid, cpid) select nextval('contacts_seq'), p, cp.pid from generate_series(100, 1000) p, ( (select pid from pinfo_p00 order by random() limit 20) union all (select pid from pinfo_p01 order by random() limit 20) ) cp analyze contacts; analyze pinfo; analyze pinfo_p00; analyze pinfo_p01; explain analyze select * from contacts c left join pinfo pi on (pi.pid = c.cpid) where c.pid = 200 ; QUERY PLAN Nested Loop Left Join (cost=4.56..569.17 rows=4364 width=16) (actual time=0.150..4.595 rows=40 loops=1) Join Filter: (pi.pid = c.cpid) - Bitmap Heap Scan on contacts c (cost=4.56..100.34 rows=39 width=12) (actual time=0.067..0.441 rows=40 loops=1) Recheck Cond: (pid = 200) - Bitmap Index Scan on ix_contacts_pid (cost=0.00..4.55 rows=39 width=0) (actual time=0.041..0.041 rows=40 loops=1) Index Cond: (pid = 200) - Append (cost=0.00..11.98 rows=3 width=4) (actual time=0.048..0.076 rows=1 loops=40) - Index Scan using pk_pinfo on pinfo pi (cost=0.00..1.40 rows=1 width=4) (actual time=0.008..0.008 rows=0 loops=40) Index Cond: (pi.pid = c.cpid) - Index Scan using pk_pinfo_p00 on pinfo_p00 pi (cost=0.00..5.29 rows=1 width=4) (actual time=0.012..0.015 rows=0 loops=40) Index Cond: (pi.pid = c.cpid) - Index Scan using pk_pinfo_p01 on pinfo_p01 pi (cost=0.00..5.29 rows=1 width=4) (actual time=0.011..0.015 rows=0 loops=40) Index Cond: (pi.pid = c.cpid) Total runtime: 4.941 ms
[GENERAL] Inaccurate row count estimation
Hello, Consider two tables: contacts: cid integer primary key, pid integer not null, cpid integer ... pinfo: pid integer, ... pinfo is a parent table with two partitions pinfo_p00 and pinfo_p01, all three have primary keys on pid and partitions have proper constraints that guarantee pid uniqueness across them. Now here's the part of the query: select * from contacts c left join pinfo pi on (pi.pid = c.cpid) QUERY PLAN - Nested Loop Left Join (cost=0.00..444.90 rows=1515 width=408) (actual time=0.108..5.561 rows=44 loops=1) Join Filter: (pi.pid = c.cpid) - Index Scan using ix_contacts_pid on contacts c (cost=0.00..14.84 rows=50 width=26) (actual time=0.038..0.425 rows=44 loops=1) Index Cond: (pid = 167) - Append (cost=0.00..8.56 rows=3 width=386) (actual time=0.067..0.090 rows=1 loops=44) - Index Scan using pk_pinfo on pinfo pi (cost=0.00..1.15 rows=1 width=386) (actual time=0.008..0.008 rows=0 loops=44) Index Cond: (pi.pid = c.cpid) - Index Scan using pk_pinfo_p00 on pinfo_p00 pi (cost=0.00..3.23 rows=1 width=46) (actual time=0.011..0.014 rows=0 loops=44) Index Cond: (pi.pid = c.cpid) - Index Scan using pk_pinfo_p01 on pinfo_p01 pi (cost=0.00..4.19 rows=1 width=46) (actual time=0.011..0.014 rows=0 loops=44) Index Cond: (pi.pid = c.cpid) How come that outermost join expects 1515 rows given the row estimations of the inner and outer nested loop's parts?
Re: [GENERAL] syntax error with execute
EXECUTE 'INSERT INTO '||tablename||' ('||fields||') VALUES ('||vals||') RETURNING currval('''||seqname||''') INTO newid' You probably were to do: EXECUTE 'INSERT INTO '||tablename||' ('||fields||') VALUES ('||vals||') RETURNING currval('''||seqname||''')' INTO newid Note where last quote goes.
[GENERAL] Couple of question on functions
Hello, Reading the manual recently I came across this: ( http://www.postgresql.org/docs/8.3/interactive/xfunc-volatility.html ) Because of the snapshotting behavior of MVCC (see Chapter 13http://www.postgresql.org/docs/8.3/interactive/mvcc.html) a function containing only SELECT commands can safely be marked STABLE, even if it selects from tables that might be undergoing modifications by concurrent queries. PostgreSQL will execute a STABLE function using the snapshot established for the calling query, and so it will see a fixed view of the database throughout that query. Also note that the current_timestamp family of functions qualify as stable, since their values do not change within a transaction. It stroke me that it might be not all that safe to mark SELECTing only function STABLE vs VOLATILE (or vice versa). Consider an example: create table t1(id int); create or replace function f1() returns void as $$ declare i int; begin select count(*) into i from t1; raise notice '%', i; -- waste some time for i in 1..7 loop end loop; select count(*) into i from t1; raise notice '%', i; end; $$ language 'plpgsql'; Now in first connection do: select f1(); While the execution is in the loop which takes a while do in another connection: insert into t1 values (1); The function returns with the following notices: NOTICE: 0 NOTICE: 1 Should I change the volatility type of f1() to STABLE and run the above again I would get: NOTICE: 1 NOTICE: 1 It looks like at least plpgsql functions use most recent snapshot on each call to SPI manager instead that of a calling query, so since default transaction isolation level in postgres is READ COMMITTED concurrent transactions may affect result of pure-reader VOLATILE function. I wonder if any-language (including SQL,and C) function would behave in the same way? Another thing I've recently discover is that SQL function seem to be unexpectedly slow to call. Example: create or replace function f2sql(int) returns int as $$ select case when $1 10 then 1 when 10 = $1 and $1 50 then 2 when $1 = 50 then 3 end; $$ language 'sql' immutable; create or replace function f2plpgsql(int) returns int as $$ begin return case when $1 10 then 1 when 10 = $1 and $1 50 then 2 when $1 = 50 then 3 end; end; $$ language 'plpgsql' immutable; These two function do exactly the same calculation on input and differ only in language used. Now I write some query involving them and wrap it into another function (so that I could use PERFORM to avoid possible overhead on fetching results to the client, to cache the plan and to measure the time in more precise manner): create or replace function f3() returns void as $$ declare st timestamp; begin st := clock_timestamp(); perform f2sql(trunc(100*random())::int) + f2sql(trunc(100*random())::int) + f2sql(trunc(100*random())::int) + f2sql(trunc(100*random())::int) + f2sql(trunc(100*random())::int) + f2sql(trunc(100*random())::int) + f2sql(trunc(100*random())::int) + f2sql(trunc(100*random())::int) + f2sql(trunc(100*random())::int) + f2sql(trunc(100*random())::int) from generate_series(1, 10); raise notice '%', clock_timestamp() - st; end; $$ language 'plpgsql' ; create or replace function f4() returns void as $$ declare st timestamp; begin st := clock_timestamp(); perform f2plpgsql(trunc(100*random())::int) + f2plpgsql(trunc(100*random())::int) + f2plpgsql(trunc(100*random())::int) + f2plpgsql(trunc(100*random())::int) + f2plpgsql(trunc(100*random())::int) + f2plpgsql(trunc(100*random())::int) + f2plpgsql(trunc(100*random())::int) + f2plpgsql(trunc(100*random())::int) + f2plpgsql(trunc(100*random())::int) + f2plpgsql(trunc(100*random())::int) from generate_series(1, 10); raise notice '%', clock_timestamp() - st; end; $$ language 'plpgsql' ; Now f4() reports 4.2 sec of runtime on average while f3() - 10.3 sec, that is a notable difference especially considering that SQL function is likely to be inlined. Do i miss something?
Re: [GENERAL] Couple of question on functions
You might think it's likely, but in fact no inlining will happen because you have a volatile argument that's used more than once inside the function body. Inlining would change the results. Yeah, right, apparently I overestimated chances of inilining, thanks for the hint, Tom. In fact in my project performance problem was caused by the fact that the analogue of f2sql() was declared SECURITY INVOKER and thus could not be inlined :-/. Uhm, does anyone else think inlining tips are worth mentioning in docs or it's just me?
[GENERAL] Regular expression
Hello, Case insensitive pattern matching gives strange results for non-ascii character (such as UTF-8 encoded cyrillic letters): test=# select 'б' ~* 'Б' ; ?column? -- f (1 row) ( 'б' and 'Б' are lower and upper case variants of cyrillic 'B') at the same time: test=# select 'б' ilike 'Б' ; ?column? -- t (1 row) (PG 8.3 on Linux, UTF-8 locale) Also, what could be the reason for that cyrillic letters are not treated by regexp engine as the part of [:alpha:], [:alnum:], \w etc. classes? Or they never meant to be?
[GENERAL] Column order
Hello, It is often convenient to have columns of a table in certain order (as shown by psql or most GUI database explorers, it also affects INSERT's without columns specified behavior) so as to most significant columns to come first, semantically close columns to be grouped etc, while the columns might be added to the table in different order during development process. So, the question is - is it an acceptable way to play with pg_attribute's attnum and set it to needed value or recreate the table is the only way? On the related note - should the columns necessarily be numbered from 1 up with the step of 1? Thanks, Viatcheslav
[GENERAL] Subplan and index usage
Consider the following case which is almost exact snapshot of part of our scheme: Table cities Column| Type | Modifiers | Description --++---+- ficity_id| integer| not null | ficountry_id | integer| | firegion_id | integer| | fsname | character varying(100) | | fsname_ru| character varying(200) | | Indexes: pk_geocities PRIMARY KEY, btree (ficity_id) idx_cities_name btree (lower(fsname::text) varchar_pattern_ops) idx_cities_name_ru btree (lower(fsname_ru::text) varchar_pattern_ops) idx_geocities_country_id btree (ficountry_id) idx_geocities_region_id btree (firegion_id) Foreign-key constraints: fk_geocities_country_id FOREIGN KEY (ficountry_id) REFERENCES countries(ficountry_id) ON UPDATE CASCADE ON DELETE CASCADE fk_geocities_region_id FOREIGN KEY (firegion_id) REFERENCES regions(firegion_id) ON UPDATE CASCADE ON DELETE CASCADE Table cities_name_words Column | Type | Modifiers | Description ---++---+- ficity_id | integer| not null | fsword| character varying(200) | not null | Indexes: idx_cities_name_words_city_id btree (ficity_id) idx_cities_name_words_word btree (fsword varchar_pattern_ops) Foreign-key constraints: fk_cities_name_words_city_id FOREIGN KEY (ficity_id) REFERENCES cities(ficity_id) ON UPDATE CASCADE ON DELETE CASCADE Table cities_name_ru_words Column | Type | Modifiers | Description ---++---+- ficity_id | integer| not null | fsword| character varying(200) | not null | Indexes: idx_cities_name_ru_words_city_id btree (ficity_id) idx_cities_name_ru_words_word btree (fsword varchar_pattern_ops) Foreign-key constraints: fk_cities_name_ru_words_city_id FOREIGN KEY (ficity_id) REFERENCES cities(ficity_id) ON UPDATE CASCADE ON DELETE CASCADE This is the part of geo location database. The purpose of cities_name_words and cities_name_ru_words is to facilitate indexing on separate words in city name - they contain words of fsname or fsname_ru respectively of the corresponding record in cities if it has more than word. Cities has about 19 records, cities_name_words about 8 and cities_name_ru_words about 5000. Now the query for city by name looks like this: select * from cities where ( ficity_id in ( select ficity_id from cities_name_words where fsword like 'novgorod%' union select ficity_id from cities_name_ru_words where fsword like 'novgorod%' ) or lower(fsname) like 'novgorod%' or lower(fsname_ru) like 'novgorod%' ) QUERY PLAN Seq Scan on cities (cost=16.63..5949.26 rows=95014 width=60) Filter: ((hashed subplan) OR (lower((fsname)::text) ~~ 'novgorod%'::text) OR (lower((fsname_ru)::text) ~~ 'novgorod%'::text)) SubPlan - Unique (cost=16.61..16.62 rows=2 width=4) - Sort (cost=16.61..16.62 rows=2 width=4) Sort Key: cities_name_words.ficity_id - Append (cost=0.00..16.60 rows=2 width=4) - Index Scan using idx_cities_name_words_word on cities_name_words (cost=0.00..8.31 rows=1 width=4) Index Cond: (((fsword)::text ~=~ 'novgorod'::text) AND ((fsword)::text ~~ 'novgoroe'::text)) Filter: ((fsword)::text ~~ 'novgorod%'::text) - Index Scan using idx_cities_name_ru_words_word on cities_name_ru_words (cost=0.00..8.27 rows=1 width=4) Index Cond: (((fsword)::text ~=~ 'novgorod'::text) AND ((fsword)::text ~~ 'novgoroe'::text)) Filter: ((fsword)::text ~~ 'novgorod%'::text) Notice how it uses proper indexes in subplan and goes for sequence scan on the main table. If the where- conditions are applied separately it uses indexes as expected: select * from cities where ( lower(fsname) like 'novgorod%' or lower(fsname_ru) like 'novgorod%' ) QUERY PLAN Bitmap Heap Scan on cities (cost=8.57..12.59 rows=1 width=60) Recheck Cond: ((lower((fsname)::text) ~~ 'novgorod%'::text) OR (lower((fsname_ru)::text) ~~ 'novgorod%'::text)) Filter: ((lower((fsname)::text) ~~ 'novgorod%'::text) OR (lower((fsname_ru)::text) ~~ 'novgorod%'::text)) - BitmapOr (cost=8.57..8.57 rows=1 width=0) - Bitmap Index Scan on idx_cities_name (cost=0.00..4.29 rows=1 width=0) Index Cond: ((lower((fsname)::text) ~=~ 'novgorod'::text) AND (lower((fsname)::text) ~~ 'novgoroe'::text)) - Bitmap Index Scan on idx_cities_name_ru (cost=0.00..4.28 rows=1 width=0)
Re: [GENERAL] request help forming query
Given that (type_id, group_id) is the PK of widgets it is possible to avoid self-join: select group_id from widgets where type_id = A or type_id = B group by group_id having count(1) = 2; ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] How to return a large String with C
Yes, I know, but the backend does not allow for a bigger buffer. Trying to use a 80K (char[81920])buffer did not work and returns: INFO: string-size : 48015 INFO: +++ server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. The surprising thing is that the data can be displayed using elog but not returend with a string pointer. http://archives.postgresql.org/ You cannot just return pointer to stack (local for function) buffer - it gets freed on return from the function. You must explicitly palloc required memory chunk.
Re: [GENERAL] problem with ORDER BY
I am guessing that collation rules are not applied to bytea columns, so one can compare text according to C locale anyway if he populates bytea column or just does something like select * from url order by byteain(textout(url))
Re: [GENERAL] aggregate hash function
Most implementations of md5 internally consist of 3 functions: md5_init - which initializes internal context, md5_update - which accepts portions of data and processes them and md5_final - which finalizes the hash and releases the context. These roughly suit aggregate's internal functions (SFUNC and FINALFUNC, md5_init is probably to be called on first actual input). Since performance is important for you the functions should be written in low-level language as C, to me it doesn't look difficult to take some C md5 module and adapt it to be an aggregate... though it's not like I would do this easily myself :)
[GENERAL] Prepared statement's plan
Hello, I would appreciate if someone explained me how exactly prepared parametrized statements are planned, i.e. what kind of assumptions planner makes on param values, selectivity, expected row count etc. that affect in particular whether indexes will be used or not. For instance consider the following case: create table t1 ( id serial, val int); insert into t1 (val) select trunc(10*random()) from generate_series(1, 100); create index idx_t1 on t1(val); analyze t1; -- 1. prepare stmt (int) as select * from t1 where val $1; explain execute stmt(10); --- QUERY PLAN Seq Scan on t1 (cost=0.00..17401.94 rows=32 width=8) Filter: (val $1) 2. prepare stmt (int, int) as select * from t1 where val $1 and val $2; explain execute stmt(2, 3); --- QUERY PLAN Bitmap Heap Scan on t1 (cost=151.74..5307.59 rows=5000 width=8) Recheck Cond: ((val $1) AND (val $2)) - Bitmap Index Scan on idx_t1 (cost=0.00..150.49 rows=5000 width=0) Index Cond: ((val $1) AND (val $2)) Hmm, why does it expect 5000 rows here? What influences this expectation? 3. prepare stmt (int) as select * from t1 where val = $1 or $1 is null; explain execute stmt(2); QUERY PLAN Seq Scan on t1 (cost=0.00..17401.94 rows=5013 width=8) Filter: ((val = $1) OR ($1 IS NULL)) That's the weirdest behavior: where did 5013 rows assumption came from? Why use seq scan then? I should mention that planner refuses to use anything but seq scan here even if I explicitly disable it with set enable_seqscan to off. In general, I wonder if one could get somewhat predictable planner behavior in such cases since we have a lot of code written in plpgsql and the patterns above are pretty common there. Thanks, Viatcheslav
[GENERAL] Prepared statement's planning
Hello, I would appreciate if someone explained me how exactly prepared parametrized statements are planned, i.e. what kind of assumptions planner makes on param values, selectivity, expected row count etc. that affect in particular whether indexes will be used or not. For instance consider the following case: create table t1 ( id serial, val int); insert into t1 (val) select trunc(10*random()) from generate_series(1, 100); create index idx_t1 on t1(val); analyze t1; -- 1. prepare stmt (int) as select * from t1 where val $1; explain execute stmt(10); --- QUERY PLAN Seq Scan on t1 (cost=0.00..17401.94 rows=32 width=8) Filter: (val $1) 2. prepare stmt (int, int) as select * from t1 where val $1 and val $2; explain execute stmt(2, 3); --- QUERY PLAN Bitmap Heap Scan on t1 (cost= 151.74..5307.59 rows=5000 width=8) Recheck Cond: ((val $1) AND (val $2)) - Bitmap Index Scan on idx_t1 (cost=0.00..150.49 rows=5000 width=0) Index Cond: ((val $1) AND (val $2)) Hmm, why does it expect 5000 rows here? What influences this expectation? 3. prepare stmt (int) as select * from t1 where val = $1 or $1 is null; explain execute stmt(2); QUERY PLAN Seq Scan on t1 (cost=0.00..17401.94 rows=5013 width=8) Filter: ((val = $1) OR ($1 IS NULL)) That's the weirdest behavior: where did 5013 rows assumption came from? Why use seq scan then? I should mention that planner refuses to use anything but seq scan here even if I explicitly disable it with set enable_seqscan to off. In general, I wonder if one could get somewhat predictable planner behavior in such cases since we have a lot of code written in plpgsql and the patterns above are pretty common there. Thanks, Viatcheslav
Re: [GENERAL] Prepared statement's planning
On Jan 15, 2008 7:58 PM, Tom Lane [EMAIL PROTECTED] wrote: If that really is the behavior you want, and not a typo It is, most of parameters passed to a plpgsql function are in fact parts of the filter and if certain filter item is null it is considered unknown and we don't want it to affect the result in that case. The query pattern itself is a shortcut to cover both cases in one expressions which works fine while applied to secondary fields and delivers troubles when used on key indexed fields that might influence planning decisions. Thanks for the union idea, other ways around I could think of are dynamic queries or a bunch of plpgsql's 'if-elseif's. It can see that it's got a range constraint on 'val', but not exactly how wide the range is, so the selectivity estimate is DEFAULT_RANGE_INEQ_SEL which is hardwired at 0.005. 0.005 * 100 = 5000. Probably this (parametrized query's planning) is something worth mentioning in the docs one day. As for the rowcount estimate, I think it's using DEFAULT_UNK_SEL (which also happens to be 0.005) as the selectivity of an IS NULL test with a non-Var target. That's a bit useless in this case, since for any one call of the query it's either going to be constant true or constant false, but there's not any obvious better way to do it. Hm.. It could build union'ed plans for every possible value of unknown boolean expression similar to that of your example's, though this could make plans grow huge.
Re: [GENERAL] Understanding Aliases
SELECT-list output aliases name the *output* columns of the SELECT. Those output columns are not available to the SELECT's computation clauses Then it's unclear how could the first query work
Re: [GENERAL] Pg_catalog reference
It is not possible to assign triggers to system tables hence it is impossible to reference system table in FK constraint. Obviously, users that don't exist in the database cannot modify anything, you can use session_user and current_user functions for logging purposes, see http://www.postgresql.org/docs/8.2/interactive/functions-info.html
Re: [GENERAL] storage size of bit data type..
It needs to store the number of bits present as well Couldn't that be reduced to 1 byte that'd say how many bits count in the last byte? Only in the sense that numeric also has to store some meta data as well like the weight and display precision. Is it really necessary to store display precision when it can be taken from the table column definition?
Re: [GENERAL] Dynamic expressions set in order by clause
You could write a set of expressions that yield proper order by field in one case and say null in another. Say we want order by columns col1, col2 desc if the first function argument is '1' and by col3 desc, col1, col2 if it is '2', this can be achieved as: select col1, col2, col3 from mytable where ... order by case when $1 = '1' then col1 end, case when $1 = '1' then col2 else col3 end desc, case when $1 = '2' then col1 end, case when $1 = '2' then col2 end ; This would work as following when $1 = '1': select col1, col2, col3 from mytable where ... order by col1, col2desc, null, null; and when $1 = '2': select col1, col2, col3 from mytable where ... order by null, col3 desc, col1, col2 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] strange infinite loop in plpgsql
Isn't EXTRACT(dow FROM day) IN (0,6) always true thus making select return a row every time? On 11/10/07, rihad [EMAIL PROTECTED] wrote: I've been reading the online docs, but... code like this somehow ends up in an indefinite loop: CREATE OR REPLACE FUNCTION foo() RETURNS int AS $$ DECLARE timeout int; day date; BEGIN day := current_date + 1; LOOP SELECT date+1 INTO day FROM days WHERE date=day OR EXTRACT(dow FROM day) IN (0,6); EXIT WHEN NOT FOUND; timeout := timeout + 86400; END LOOP; END; $$ LANGUAGE plpgsql; It's Saturday at our place, and the days table has only one record for tomorrow's date. I hope it's been a very very long day for me :-) Thank you for your help. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Extracting Index Creation Script
If you know the index name then: SELECT pg_get_indexdef('your_index_name'::regclass) will do. In case you want a full list of indices for a certain table: SELECT c2.relname, pg_get_indexdef(i.indexrelid, 0, true), c2.reltablespace FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i WHERE c.oid = 'your_table_name'::regclass AND c.oid = i.indrelid AND i.indexrelid = c2.oid If you have more questions of that kind try starting psql with -E option which enables internal queries' display (this is what I did).
Re: [GENERAL] (Never?) Kill Postmaster?
Postgres starts separate process for each connection and you can just `kill` (not `kill -9`) the unwanted process. Or you can do `select pg_cancel_backend(pid)` from another session.
Re: [GENERAL] Polymorphic functions' weird behavior
Tom Lane wrote: The fix is pretty easy if you need it now Thanks for the update, Tom. Another little issue with that function. I cannot make it accept NULL as an argument. It won't normally accept NULL reporting an appropriate error that it cannot determine argument type, nor will it accept it if I make the function STRICT, nor can I handle it inside the function since it isn't actually called. Thus I always get the error which is not desirable behavior. Any way around it? regards, Viatcheslav ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq