Re: [GENERAL] Scalar subquery

2010-09-01 Thread Vyacheslav Kalinin
Thanks, Tom

Can this be clarified in docs? It is stated there now that scalar subquery
is one of the kinds of expressions
and it is somewhat counter-intuitive that an expression may sometimes not
respect its own degree of volatility.

On Wed, Sep 1, 2010 at 2:16 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Vyacheslav Kalinin v...@mgcp.com writes:
  I just got my hands on mysql (5.0.something) and it does not cache the
  scalar subquery result.
  So... now I'm completely puzzled whether this is a bug, a desired result
 or
  just a loosely standardized thing.

 It's loosely standardized.

 AFAICS, the spec doesn't address the detailed semantics of subqueries at
 all, except in wording to this effect:

  Each subquery in the search condition is effectively
  executed for each row of T and the results used in the ap-
  plication of the search condition to the given row of T.
  If any executed subquery contains an outer reference to a
  column of T, the reference is to the value of that column in
  the given row of T.

 There is wording like this for subqueries in WHERE and HAVING, but I
 haven't found anything at all that mentions the behavior for subqueries
 in the SELECT targetlist.  In any case, the fact that they said
 effectively executed and not simply executed seems to be meant to
 leave implementors a lot of wiggle room.

 In particular, there isn't any wording that I can find suggesting
 that the presence of volatile (or in the spec's classification,
 nondeterministic) functions ought to affect the behavior.

 PG's interpretation is that if there is no outer reference in a
 subquery, it's okay to implement it as an initplan, meaning it gets
 evaluated at most once per call of the containing query.  We don't
 pay attention to whether there are volatile functions in there.

regards, tom lane



Re: [GENERAL] Scalar subquery

2010-08-31 Thread Vyacheslav Kalinin
I just got my hands on mysql (5.0.something) and it does not cache the
scalar subquery result.
So... now I'm completely puzzled whether this is a bug, a desired result or
just a loosely standardized thing.
Help anyone?

On Fri, Aug 27, 2010 at 5:41 PM, Vyacheslav Kalinin v...@mgcp.com wrote:

 Hi,

 Apparently scalar subquery when used as a part of SELECT statement and when
 it does not depend on outer query columns
 is executed only once per statement, e.g.:

 postgres=# select i, (select random()) rand from generate_series(1, 3) i;
  i |   rand
 ---+---
  1 | 0.992319826036692
  2 | 0.992319826036692
  3 | 0.992319826036692

 (Though term depend is subtle, compare these:

 postgres=# select i, (select random() + case when false then i else 0 end )
 rand from generate_series(1, 3) i;
  i |   rand
 ---+---
  1 | 0.806265413761139
  2 | 0.806265413761139
  3 | 0.806265413761139
 (3 rows)


 postgres=# select i, (select random() where i=i ) rand from
 generate_series(1, 3) i;
  i |   rand
 ---+---
  1 | 0.426443862728775
  2 | 0.133071997668594
  3 | 0.751982506364584
 (3 rows)


 postgres=# select i, (select random() where i=i or i is null ) rand from
 generate_series(1, 3) i;
  i |   rand
 ---+---
  1 | 0.320982406847179
  2 | 0.996762252878398
  3 | 0.076554249972105
 (3 rows)

 Looks like dependence is not there anymore if PG is smart enough to
 simplify boolean expressions)

 Anyway, as some older PG versions and Oracle behave similarly I suppose
 this result is expected and desired (correct?),
 but unfortunately not well-documented (did I miss it mentioned?).
 Can anyone shed some light on this and/or probably update docs?

 P.S.
 I got bitten by a statement like this:
   select (select nextval('someseq') * a + b from somefunc()), col1, 
 with a and b being OUT parameters of somefunc().






[GENERAL] Scalar subquery

2010-08-27 Thread Vyacheslav Kalinin
Hi,

Apparently scalar subquery when used as a part of SELECT statement and when
it does not depend on outer query columns
is executed only once per statement, e.g.:

postgres=# select i, (select random()) rand from generate_series(1, 3) i;
 i |   rand
---+---
 1 | 0.992319826036692
 2 | 0.992319826036692
 3 | 0.992319826036692

(Though term depend is subtle, compare these:

postgres=# select i, (select random() + case when false then i else 0 end )
rand from generate_series(1, 3) i;
 i |   rand
---+---
 1 | 0.806265413761139
 2 | 0.806265413761139
 3 | 0.806265413761139
(3 rows)


postgres=# select i, (select random() where i=i ) rand from
generate_series(1, 3) i;
 i |   rand
---+---
 1 | 0.426443862728775
 2 | 0.133071997668594
 3 | 0.751982506364584
(3 rows)


postgres=# select i, (select random() where i=i or i is null ) rand from
generate_series(1, 3) i;
 i |   rand
---+---
 1 | 0.320982406847179
 2 | 0.996762252878398
 3 | 0.076554249972105
(3 rows)

Looks like dependence is not there anymore if PG is smart enough to simplify
boolean expressions)

Anyway, as some older PG versions and Oracle behave similarly I suppose this
result is expected and desired (correct?),
but unfortunately not well-documented (did I miss it mentioned?).
Can anyone shed some light on this and/or probably update docs?

P.S.
I got bitten by a statement like this:
  select (select nextval('someseq') * a + b from somefunc()), col1, 
with a and b being OUT parameters of somefunc().


Re: [GENERAL] Incorrect FTS query results with GIN index

2010-01-18 Thread Vyacheslav Kalinin
Great, thank you!
I assume this one goes into 8.4.3, right?

2010/1/18 Teodor Sigaev teo...@sigaev.ru

 Basically, I started testing prefix matching in FTS and got into
 troubles. Self-contained example follows:


 Thank you, fixed. The reason was in incorrect optimization of GIN scan: GIN
 reuses scan result for equals key, but comparison of key didn't take into
 account a difference of scan's strategy.



 --
 Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW:
 http://www.sigaev.ru/



[GENERAL] Incorrect FTS query results with GIN index

2010-01-15 Thread Vyacheslav Kalinin
Hello,

Basically, I started testing prefix matching in FTS and got into troubles.
Self-contained example follows:


postgres=# select version();
  version

 PostgreSQL 8.4.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
20080704 (Red Hat 4.1.2-44), 32-bit


postgres=# create table test (id int, data text, tsvd tsvector);
postgres=# insert into test (id, data) values (1, 'hot stuff is here'), (2,
'light is hotter than dark'), (3, 'nothing is that hottie');
postgres=# update test set tsvd = to_tsvector('english', data);
postgres=# select * from test;
 id |   data| tsvd
+---+---
  1 | hot stuff is here | 'hot':1 'stuff':2
  2 | light is hotter than dark | 'dark':5 'hotter':3 'light':1
  3 | nothing is that hottie| 'hotti':4 'noth':1
(3 rows)



Now let's play with queries:

postgres=# select * from test where tsvd @@ to_tsquery('english', 'hot');
 id |   data|   tsvd
+---+---
  1 | hot stuff is here | 'hot':1 'stuff':2
(1 row)

postgres=# select * from test where tsvd @@ to_tsquery('english', 'hot:*');
 id |   data| tsvd
+---+---
  1 | hot stuff is here | 'hot':1 'stuff':2
  2 | light is hotter than dark | 'dark':5 'hotter':3 'light':1
  3 | nothing is that hottie| 'hotti':4 'noth':1
(3 rows)

postgres=# select * from test where tsvd @@ to_tsquery('english', 'hot:* |
hot');
 id |   data| tsvd
+---+---
  1 | hot stuff is here | 'hot':1 'stuff':2
  2 | light is hotter than dark | 'dark':5 'hotter':3 'light':1
  3 | nothing is that hottie| 'hotti':4 'noth':1
(3 rows)



Looks good so far. Let's introduce an index:

postgres=# create index ix_test on test using gin(tsvd);
CREATE INDEX
postgres=# set enable_seqscan to off;
SET


First two queries result in the same row sets, but look at the third one:

postgres=# explain select * from test where tsvd @@
to_tsquery('english', 'hot:* | hot');
  QUERY PLAN
--
 Bitmap Heap Scan on test  (cost=4.26..8.28 rows=1 width=68)
   Recheck Cond: (tsvd @@ to_tsquery('english', 'hot:* | hot'::text))
   -  Bitmap Index Scan on ix_test  (cost=0.00..4.26 rows=1 width=0)
 Index Cond: (tsvd @@ to_tsquery('english', 'hot:* | hot'::text))

postgres=# select * from test where tsvd @@ to_tsquery('english', 'hot:* |
hot');
 id |   data|   tsvd
+---+---
  1 | hot stuff is here | 'hot':1 'stuff':2
(1 row)


WTH? Apparently prefixed part of the query stopped working.
Interesting that the bug doesn't show up with GiST:

postgres=# drop index ix_test;
DROP INDEX
postgres=# create index ix_test on test using gist(tsvd);
CREATE INDEX
postgres=# select * from test where tsvd @@ to_tsquery('english', 'hot:* |
hot');
 id |   data| tsvd
+---+---
  1 | hot stuff is here | 'hot':1 'stuff':2
  2 | light is hotter than dark | 'dark':5 'hotter':3 'light':1
  3 | nothing is that hottie| 'hotti':4 'noth':1
(3 rows)


Re: [GENERAL] How can I calculate differences between values

2009-11-10 Thread Vyacheslav Kalinin
With 8.4's analytic capabilities you can do this:

select * from (
  select userid, data - lag(data) over (partition by userid order by data)
diff
from foo) q
  where diff is not null;

On Tue, Nov 10, 2009 at 5:40 PM, A B gentosa...@gmail.com wrote:

 Hello there!

 I have a tablefoo( userid integer, data integer); with the
 constraint unique(userid,data)

  Now I wish to select one userid and calculate the differences between
 the data -values (when they are sorted) .
 For example
 if the table contains:

 4, 100
 5, 200
 5, 210
 5, 231


 I want the values

 5,10
 5,21

 what should happen to the  4,100 record you may ask, I will try to
 exclude that case  by requireing each userid to have at least two data
 values.

 The question is now: is there a clever and efficient way of
 calculating these differences in data values?

 The only thing I can think of is picking a data value and
 select min(data) from foo where userid=5 and data200
 and then calculate the difference and do what I want with it, and the
 repeat this process with the last selected data value.
 I guess that will work, but I'm curious, are there other ways?

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] 'a' = any( $${'a','x'} )

2009-08-08 Thread Vyacheslav Kalinin
Try

select 'a' = any( $${a,x}$$ ) ;

or

select 'a' = any( array['a','x'] ) ;


On Sat, Aug 8, 2009 at 3:08 PM, Jasen Betts ja...@xnet.co.nz wrote:

 jasen=# select 'a' = any( $${'a','x'}$$ ) ;
  ?column?
  --
  f
  (1 row)

 I expect it to be true instead. what am I doing wrong?


 PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian
 4.3.2-1.1) 4.3.2


 jasen.

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



Re: [GENERAL] Postgres Pg_connect PHP

2009-06-11 Thread Vyacheslav Kalinin
On Thu, Jun 11, 2009 at 4:36 PM, Dimitri Fontaine dfonta...@hi-media.comwrote:

 Vyacheslav Kalinin v...@mgcp.com writes:

   $conn = pg_pconnect(dbname=foo);

 Please reconsider and use plain pg_connect().
 --
 dim

Uh, I just copied/pasted that part from somewhere in PHP manual, personally
I tend to use plain pg_connect and pgbouncer for pooling when needed.For
those curious why pconnect is not the best option for connection pooling -
discussion on this topic rises in this list from time to time, check this
out, for instance:
http://archives.postgresql.org/pgsql-general/2007-08/msg00660.php


Re: [GENERAL] Postgres Pg_connect PHP

2009-06-09 Thread Vyacheslav Kalinin
 however the COPY
 command requires superuser and I really don't want to open up the DB
 from a remote system with Superuser access.

COPY FROM STDIN does not need superuser privileges.


Re: [GENERAL] Postgres Pg_connect PHP

2009-06-09 Thread Vyacheslav Kalinin
On Tue, Jun 9, 2009 at 10:35 PM, Tory M Blue tmb...@gmail.com wrote:


 Thanks guys, the problem with copy from or to is that it creates an
 array and thus puts a load of stuff in memory, it's possible the file
 will get huge and I can't take that memory hit.

 I'll look again and see if I missed something

 Thanks
 Tory


It's not hard to do some streaming with copy (untested):

  $conn = pg_pconnect(dbname=foo);
  $fd = fopen('file.dat', 'r');
  while (!feof($fd)) {
pg_put_line($conn, fgets($fd));
  }
  fclose($fd);
  pg_put_line($conn, \\.\n);
  pg_end_copy($conn);

http://ru.php.net/manual/en/function.pg-put-line.php


Re: [GENERAL] Postgres Pg_connect PHP

2009-06-09 Thread Vyacheslav Kalinin
Forgot about COPY command in my previous reply:

  $conn = pg_pconnect(dbname=foo);
  $fd = fopen('file.dat', 'r');
  pg_query($conn, copy bar from stdin);
  while (!feof($fd)) {
pg_put_line($conn, fgets($fd));
  }
  fclose($fd);
  pg_put_line($conn, \\.\n);
  pg_end_copy($conn);


Re: [GENERAL] Duplicate key issue in a transaction block

2009-06-08 Thread Vyacheslav Kalinin
On Mon, Jun 8, 2009 at 8:33 PM, Bill Moran wmo...@potentialtech.com wrote:


 Perhaps you want to take an exclusive lock on the table?  The operation
 you describe seems to suggest that you'd want to guarantee exclusive
 write access to the table.


Exclusive table lock is a bit excessive  IMO. Locking particular group
should be good, though it is not quite straightforward to achieve. I'd use
advisory locks or would lock a row in a parent group table (if such table
exists, if not - it might be worth to make one) referenced by rows in
question.


Re: [GENERAL] Read a CLOB data from an Oracle table and INSERT it into a BYTEA column in Postgres using jdbc?

2009-03-30 Thread Vyacheslav Kalinin
You are fetching an Oracle LOB locator. I am not a guru of OCI let alone
JDBC but AFAIK you should create a LOB descriptor, bind it to the locator
and load the LOB data that you can insert into postgres.


Re: [GENERAL] pg_toast_temp_xx AND pg_temp_xx SCHEMAS

2009-03-10 Thread Vyacheslav Kalinin
 They're just there to hold temporary tables (so that the names of such
 tables don't conflict with any non-temporary tables).  The reason they
 seem to accumulate is we only create one when needed, and there's one
 for each concurrently executing backend if it creates any temp tables.

Hm, do they get garbage-collected in any way? I have several such schemes
that won't disappear even though no other sessions but one is there.
On a side note: I can drop temporary tables of other sessions via DROP
pg_temp_xx.table (although I can't seem to affect table data seen from
other session). Is there any security issues/gotchas here?


[GENERAL] FTS on patterns

2008-08-05 Thread Vyacheslav Kalinin
Hello,

In our application we need a pattern based text search, that is if user
supplies 'foo bar' he receives only documents containing words that begin
with 'foo' or 'bar', including themselves (something similar to MySQL '+foo*
+bar*' text search query). Currently we use separate table for unique words
in document and do LIKE 'foo%' search on that table, combining the results
afterwards, and I am not really happy with that solution because of its
speed and overcomplication of the queries involved. Does anyone know of a
better approach? Could this be achieved with Postgres full text search?


Re: [GENERAL] FTS on patterns

2008-08-05 Thread Vyacheslav Kalinin
On Tue, Aug 5, 2008 at 6:53 PM, Tom Lane [EMAIL PROTECTED] wrote:


 There's support for prefix text search in CVS HEAD, but not in any
 released version ...

regards, tom lane


Oh, does that mean it won't be released until 8.4?


Re: [GENERAL] Inaccurate row count estimation

2008-07-15 Thread Vyacheslav Kalinin
Thanks for the reply, Tom.

After tracing through this I see that the problem is that we don't have
 statistics for inheritance trees, and so you're getting a default
 estimate for the selectivity of the join condition.


I might be wrong but I suspect that the inheritance is not the only reason
here. If I change the table definitions to:

create table pinfo_p00 (
  pid integer,
  constraint pk_pinfo_p00 primary key (pid),
  constraint cc_pinfo_p00_pid check(pid  0 and pid  10)
);

create table pinfo_p01 (
  pid integer,
  constraint pk_pinfo_p01 primary key (pid),
  constraint cc_pinfo_p01_pid check(pid = 10 and pid  20)
);

and create a view pinfo, or just do a query with subselect:

explain analyze
select *
  from contacts c
  left join (
  select * from pinfo_p00
  union all
  select * from pinfo_p01
  ) pi on (pi.pid = c.cpid)
 where c.pid = 200 ;

the row-count assessment doesn't seem to be different:

QUERY PLAN
Nested Loop Left Join  (cost=4.56..514.25 rows=3896 width=16) (actual
time=0.125..3.976 rows=40 loops=1)
  Join Filter: (pinfo_p00.pid = c.cpid)
  -  Bitmap Heap Scan on contacts c  (cost=4.56..100.34 rows=39 width=12)
(actual time=0.069..0.421 rows=40 loops=1)
Recheck Cond: (pid = 200)
-  Bitmap Index Scan on ix_contacts_pid  (cost=0.00..4.55 rows=39
width=0) (actual time=0.042..0.042 rows=40 loops=1)
  Index Cond: (pid = 200)
  -  Append  (cost=0.00..10.59 rows=2 width=4) (actual time=0.033..0.061
rows=1 loops=40)
-  Index Scan using pk_pinfo_p00 on pinfo_p00  (cost=0.00..5.29
rows=1 width=4) (actual time=0.011..0.015 rows=0 loops=40)
  Index Cond: (pinfo_p00.pid = c.cpid)
-  Index Scan using pk_pinfo_p01 on pinfo_p01  (cost=0.00..5.29
rows=1 width=4) (actual time=0.012..0.015 rows=0 loops=40)
  Index Cond: (pinfo_p01.pid = c.cpid)
Total runtime: 4.341 ms

It scares me a bit as it seems that innocent-looking combination of union's
and join's could destroy the subsequent plan completely.


Re: [GENERAL] Inaccurate row count estimation

2008-07-10 Thread Vyacheslav Kalinin
This is 8.3.0.

Here is the reproduce code:

create table contacts (
  cid integer primary key,
  pid integer not null,
  cpid integer
);

create index ix_contacts_pid on contacts (pid);
create index ix_contacts_cpid on contacts (cpid);

create table pinfo (
  pid integer,
  constraint pk_pinfo primary key (pid)
);

create table pinfo_p00 (
  constraint pk_pinfo_p00 primary key (pid),
  constraint cc_pinfo_p00_pid check(pid  0 and pid  10)
) inherits (pinfo);

create table pinfo_p01 (
  constraint pk_pinfo_p01 primary key (pid),
  constraint cc_pinfo_p01_pid check(pid = 10 and pid  20)
) inherits (pinfo);

insert into pinfo_p00 (pid)
select i from generate_series(10, 1) i;

insert into pinfo_p01 (pid)
select i from generate_series(100010, 11) i;

create sequence contacts_seq start 100;

insert into contacts (cid, pid, cpid)
select  nextval('contacts_seq'), p, cp.pid
  from generate_series(100, 1000) p,
  (
   (select pid from pinfo_p00 order by random() limit 20)
   union all
   (select pid from pinfo_p01 order by random() limit 20)
  ) cp

analyze contacts;
analyze pinfo;
analyze pinfo_p00;
analyze pinfo_p01;

explain analyze
select *
  from contacts c
  left join pinfo pi on (pi.pid = c.cpid)
 where c.pid = 200 ;

QUERY PLAN
Nested Loop Left Join  (cost=4.56..569.17 rows=4364 width=16) (actual
time=0.150..4.595 rows=40 loops=1)
  Join Filter: (pi.pid = c.cpid)
  -  Bitmap Heap Scan on contacts c  (cost=4.56..100.34 rows=39 width=12)
(actual time=0.067..0.441 rows=40 loops=1)
Recheck Cond: (pid = 200)
-  Bitmap Index Scan on ix_contacts_pid  (cost=0.00..4.55 rows=39
width=0) (actual time=0.041..0.041 rows=40 loops=1)
  Index Cond: (pid = 200)
  -  Append  (cost=0.00..11.98 rows=3 width=4) (actual time=0.048..0.076
rows=1 loops=40)
-  Index Scan using pk_pinfo on pinfo pi  (cost=0.00..1.40 rows=1
width=4) (actual time=0.008..0.008 rows=0 loops=40)
  Index Cond: (pi.pid = c.cpid)
-  Index Scan using pk_pinfo_p00 on pinfo_p00 pi  (cost=0.00..5.29
rows=1 width=4) (actual time=0.012..0.015 rows=0 loops=40)
  Index Cond: (pi.pid = c.cpid)
-  Index Scan using pk_pinfo_p01 on pinfo_p01 pi  (cost=0.00..5.29
rows=1 width=4) (actual time=0.011..0.015 rows=0 loops=40)
  Index Cond: (pi.pid = c.cpid)
Total runtime: 4.941 ms


[GENERAL] Inaccurate row count estimation

2008-07-09 Thread Vyacheslav Kalinin
Hello,

Consider two tables:

contacts:
  cid integer primary key,
  pid integer not null,
  cpid integer
  ...

pinfo:
  pid integer,
  ...

pinfo is a parent table with two partitions pinfo_p00 and pinfo_p01, all
three have primary keys on pid and partitions have proper constraints
that guarantee pid uniqueness across them.

Now here's the part of the query:

select *
  from contacts c
  left join pinfo pi on (pi.pid = c.cpid)

QUERY PLAN
-  Nested Loop Left Join  (cost=0.00..444.90 rows=1515
width=408) (actual time=0.108..5.561 rows=44 loops=1)
  Join Filter: (pi.pid = c.cpid)
  -  Index Scan using ix_contacts_pid on contacts
c  (cost=0.00..14.84 rows=50 width=26) (actual time=0.038..0.425 rows=44
loops=1)
Index Cond: (pid = 167)
  -  Append  (cost=0.00..8.56 rows=3 width=386)
(actual time=0.067..0.090 rows=1 loops=44)
-  Index Scan using pk_pinfo on pinfo pi
(cost=0.00..1.15 rows=1 width=386) (actual time=0.008..0.008 rows=0
loops=44)
  Index Cond: (pi.pid = c.cpid)
-  Index Scan using pk_pinfo_p00 on
pinfo_p00 pi  (cost=0.00..3.23 rows=1 width=46) (actual time=0.011..0.014
rows=0 loops=44)
  Index Cond: (pi.pid = c.cpid)
-  Index Scan using pk_pinfo_p01 on
pinfo_p01 pi  (cost=0.00..4.19 rows=1 width=46) (actual time=0.011..0.014
rows=0 loops=44)
  Index Cond: (pi.pid = c.cpid)

How come that outermost join expects 1515 rows given the row estimations of
the inner and outer nested loop's parts?


Re: [GENERAL] syntax error with execute

2008-05-30 Thread Vyacheslav Kalinin
  EXECUTE 'INSERT INTO '||tablename||' ('||fields||') VALUES
  ('||vals||') RETURNING currval('''||seqname||''') INTO newid'


You probably were to do:

EXECUTE 'INSERT INTO '||tablename||' ('||fields||') VALUES
 ('||vals||') RETURNING currval('''||seqname||''')' INTO newid

Note where last quote goes.


[GENERAL] Couple of question on functions

2008-05-13 Thread Vyacheslav Kalinin
Hello,

Reading the manual recently I came across this: (
http://www.postgresql.org/docs/8.3/interactive/xfunc-volatility.html )
 Because of the snapshotting behavior of MVCC (see Chapter 
 13http://www.postgresql.org/docs/8.3/interactive/mvcc.html)
a function containing only SELECT commands can safely be marked
 STABLE, even if it selects from tables that might be undergoing
modifications by concurrent queries. PostgreSQL will execute a STABLE
 function using the snapshot established for the calling query, and so it
will see a fixed view of the database throughout that query. Also
 note that the current_timestamp family of functions qualify as stable,
since their values do not change within a transaction.


It stroke me that it might be not all that safe to mark SELECTing only
function STABLE vs VOLATILE (or vice versa). Consider an example:

create table t1(id int);

create or replace function f1() returns void as
$$
declare
  i int;
begin
select count(*) into i from t1;
raise notice '%', i;
-- waste some time
for i in 1..7 loop
end loop;
select count(*) into i from t1;
raise notice '%', i;
end;
$$
language 'plpgsql';

Now in first connection do:
select f1();

While the execution is in the loop which takes a while do in another
connection:
insert into t1 values (1);

The function returns with the following notices:
NOTICE:  0
NOTICE:  1

Should I change the volatility type of f1() to STABLE and run the above
again I would get:
NOTICE:  1
NOTICE:  1

It looks like at least plpgsql functions use most recent snapshot on each
call to SPI manager instead that of a calling query, so since default
transaction isolation level in postgres is READ COMMITTED concurrent
transactions may affect result of pure-reader VOLATILE function. I wonder if
any-language (including SQL,and C) function would behave in the same way?

Another thing I've recently discover is that SQL function seem to be
unexpectedly slow to call. Example:

create or replace function f2sql(int) returns int as
$$
select case when $1  10 then 1
when 10 = $1 and $1  50 then 2
when $1 = 50 then 3
   end;
$$
language 'sql' immutable;

create or replace function f2plpgsql(int) returns int as
$$
begin
return case when $1  10 then 1
when 10 = $1 and $1  50 then 2
when $1 = 50 then 3
   end;
end;
$$
language 'plpgsql' immutable;

These two function do exactly the same calculation on input and differ only
in language used. Now I write some query involving them and wrap it into
another function (so that I could use PERFORM to avoid possible overhead on
fetching results to the client, to cache the plan  and to measure the time
in more precise manner):

create or replace function f3() returns void as
$$
declare
  st timestamp;
begin
st := clock_timestamp();
perform f2sql(trunc(100*random())::int) +
   f2sql(trunc(100*random())::int) +
f2sql(trunc(100*random())::int) +
f2sql(trunc(100*random())::int) +
f2sql(trunc(100*random())::int) +
f2sql(trunc(100*random())::int) +
f2sql(trunc(100*random())::int) +
f2sql(trunc(100*random())::int) +
f2sql(trunc(100*random())::int) +
f2sql(trunc(100*random())::int)
   from generate_series(1, 10);
raise notice '%', clock_timestamp() - st;
end;
$$
language 'plpgsql' ;

create or replace function f4() returns void as
$$
declare
  st timestamp;
begin
st := clock_timestamp();
perform f2plpgsql(trunc(100*random())::int) +
   f2plpgsql(trunc(100*random())::int) +
f2plpgsql(trunc(100*random())::int) +
f2plpgsql(trunc(100*random())::int) +
f2plpgsql(trunc(100*random())::int) +
f2plpgsql(trunc(100*random())::int) +
f2plpgsql(trunc(100*random())::int) +
f2plpgsql(trunc(100*random())::int) +
f2plpgsql(trunc(100*random())::int) +
f2plpgsql(trunc(100*random())::int)
   from generate_series(1, 10);
raise notice '%', clock_timestamp() - st;
end;
$$
language 'plpgsql' ;

Now f4() reports 4.2 sec of runtime on average while f3() - 10.3 sec, that
is a notable difference especially considering that SQL function is likely
to be inlined. Do i miss something?


Re: [GENERAL] Couple of question on functions

2008-05-13 Thread Vyacheslav Kalinin
 You might think it's likely, but in fact no inlining will happen
 because you have a volatile argument that's used more than once inside
 the function body.  Inlining would change the results.

Yeah, right, apparently I overestimated chances of inilining, thanks for the
hint, Tom. In fact in my project performance problem was caused by the fact
that the analogue of f2sql() was declared SECURITY INVOKER and thus could
not be inlined :-/. Uhm, does anyone else think inlining tips are worth
mentioning in docs or it's just me?


[GENERAL] Regular expression

2008-04-26 Thread Vyacheslav Kalinin
Hello,

Case insensitive pattern matching gives strange results for non-ascii
character (such as UTF-8 encoded cyrillic letters):
test=# select 'б' ~* 'Б' ;
?column?
--
f
(1 row)
( 'б' and 'Б' are lower and upper case variants of cyrillic 'B')

at the same time:
test=# select 'б' ilike 'Б' ;
?column?
--
t
(1 row)

(PG 8.3 on Linux, UTF-8 locale)

Also, what could be the reason for that cyrillic letters are not treated by
regexp engine as the part of [:alpha:], [:alnum:], \w etc. classes? Or they
never meant to be?


[GENERAL] Column order

2008-04-23 Thread Vyacheslav Kalinin
Hello,

It is often convenient to have columns of a table in certain order (as shown
by psql or most GUI database explorers, it also affects INSERT's
without columns specified behavior) so as to most significant columns
to
come first, semantically close columns to be grouped etc, while the columns
might be added to the table in different order during development
process. So, the question is - is it an acceptable way to play with
pg_attribute's attnum and set it to needed value
or recreate the table is the only
way? On the related note - should the columns necessarily be numbered
from 1 up with the step of 1?

Thanks,
Viatcheslav


[GENERAL] Subplan and index usage

2008-03-12 Thread Vyacheslav Kalinin
Consider the following case which is almost exact snapshot of part of our
scheme:

   Table cities
Column|  Type  | Modifiers | Description
--++---+-
 ficity_id| integer| not null  |
 ficountry_id | integer|   |
 firegion_id  | integer|   |
 fsname   | character varying(100) |   |
 fsname_ru| character varying(200) |   |
Indexes:
pk_geocities PRIMARY KEY, btree (ficity_id)
idx_cities_name btree (lower(fsname::text) varchar_pattern_ops)
idx_cities_name_ru btree (lower(fsname_ru::text) varchar_pattern_ops)
idx_geocities_country_id btree (ficountry_id)
idx_geocities_region_id btree (firegion_id)
Foreign-key constraints:
fk_geocities_country_id FOREIGN KEY (ficountry_id) REFERENCES
countries(ficountry_id) ON UPDATE CASCADE ON DELETE CASCADE
fk_geocities_region_id FOREIGN KEY (firegion_id) REFERENCES
regions(firegion_id) ON UPDATE CASCADE ON DELETE CASCADE


Table cities_name_words
  Column   |  Type  | Modifiers | Description
---++---+-
 ficity_id | integer| not null  |
 fsword| character varying(200) | not null  |
Indexes:
idx_cities_name_words_city_id btree (ficity_id)
idx_cities_name_words_word btree (fsword varchar_pattern_ops)
Foreign-key constraints:
fk_cities_name_words_city_id FOREIGN KEY (ficity_id) REFERENCES
cities(ficity_id) ON UPDATE CASCADE ON DELETE CASCADE


Table cities_name_ru_words
  Column   |  Type  | Modifiers | Description
---++---+-
 ficity_id | integer| not null  |
 fsword| character varying(200) | not null  |
Indexes:
idx_cities_name_ru_words_city_id btree (ficity_id)
idx_cities_name_ru_words_word btree (fsword varchar_pattern_ops)
Foreign-key constraints:
fk_cities_name_ru_words_city_id FOREIGN KEY (ficity_id) REFERENCES
cities(ficity_id) ON UPDATE CASCADE ON DELETE CASCADE

This is the part of geo location database. The purpose of cities_name_words
and cities_name_ru_words is to facilitate indexing on separate words in city
name - they contain words of fsname or fsname_ru respectively of the
corresponding record in cities if it has more than word. Cities has about
19 records, cities_name_words about 8 and cities_name_ru_words about
5000. Now the query for city by name looks like this:

select *
  from cities
 where  ( ficity_id in (
   select ficity_id from cities_name_words
where fsword like 'novgorod%'
union
   select ficity_id from cities_name_ru_words
where fsword like 'novgorod%'
  )
  or lower(fsname) like 'novgorod%'
  or lower(fsname_ru) like 'novgorod%'
 )

QUERY PLAN
Seq Scan on cities  (cost=16.63..5949.26 rows=95014 width=60)
  Filter: ((hashed subplan) OR (lower((fsname)::text) ~~ 'novgorod%'::text)
OR (lower((fsname_ru)::text) ~~ 'novgorod%'::text))
  SubPlan
-  Unique  (cost=16.61..16.62 rows=2 width=4)
  -  Sort  (cost=16.61..16.62 rows=2 width=4)
Sort Key: cities_name_words.ficity_id
-  Append  (cost=0.00..16.60 rows=2 width=4)
  -  Index Scan using idx_cities_name_words_word on
cities_name_words  (cost=0.00..8.31 rows=1 width=4)
Index Cond: (((fsword)::text ~=~
'novgorod'::text) AND ((fsword)::text ~~ 'novgoroe'::text))
Filter: ((fsword)::text ~~ 'novgorod%'::text)
  -  Index Scan using idx_cities_name_ru_words_word on
cities_name_ru_words  (cost=0.00..8.27 rows=1 width=4)
Index Cond: (((fsword)::text ~=~
'novgorod'::text) AND ((fsword)::text ~~ 'novgoroe'::text))
Filter: ((fsword)::text ~~ 'novgorod%'::text)

Notice how it uses proper indexes in subplan and goes for sequence scan on
the main table. If the where- conditions are applied separately it uses
indexes as expected:

select *
  from cities
 where  ( lower(fsname) like 'novgorod%'
  or lower(fsname_ru) like 'novgorod%'
 )

QUERY PLAN
Bitmap Heap Scan on cities  (cost=8.57..12.59 rows=1 width=60)
  Recheck Cond: ((lower((fsname)::text) ~~ 'novgorod%'::text) OR
(lower((fsname_ru)::text) ~~ 'novgorod%'::text))
  Filter: ((lower((fsname)::text) ~~ 'novgorod%'::text) OR
(lower((fsname_ru)::text) ~~ 'novgorod%'::text))
  -  BitmapOr  (cost=8.57..8.57 rows=1 width=0)
-  Bitmap Index Scan on idx_cities_name  (cost=0.00..4.29 rows=1
width=0)
  Index Cond: ((lower((fsname)::text) ~=~ 'novgorod'::text) AND
(lower((fsname)::text) ~~ 'novgoroe'::text))
-  Bitmap Index Scan on idx_cities_name_ru  (cost=0.00..4.28 rows=1
width=0)
  

Re: [GENERAL] request help forming query

2008-02-24 Thread Vyacheslav Kalinin
Given that (type_id, group_id) is the PK of widgets it is possible to
avoid self-join:

select group_id from widgets
 where type_id = A or type_id = B
 group by group_id
having count(1) = 2;

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] How to return a large String with C

2008-02-17 Thread Vyacheslav Kalinin
Yes, I know, but the backend does not allow for a bigger buffer. Trying
 to use a 80K (char[81920])buffer did not work and returns:
 INFO:  string-size : 48015
 INFO:  +++
 server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
 The connection to the server was lost. Attempting reset: Succeeded.


 The surprising thing is that the data can be displayed using elog but
 not returend with a string pointer.
 http://archives.postgresql.org/


You cannot just return pointer to stack (local for function) buffer - it
gets freed on return from the function. You must explicitly palloc required
memory chunk.


Re: [GENERAL] problem with ORDER BY

2008-02-01 Thread Vyacheslav Kalinin
I am guessing that collation rules are not applied to bytea columns, so one
can compare text according to C locale anyway if he populates bytea column
or just does something like

select * from url order by byteain(textout(url))


Re: [GENERAL] aggregate hash function

2008-01-30 Thread Vyacheslav Kalinin
Most implementations of md5 internally consist of 3 functions: md5_init -
which initializes internal context, md5_update - which accepts portions of
data and processes them and md5_final - which finalizes the hash and
releases the context. These roughly suit  aggregate's  internal functions
(SFUNC and FINALFUNC,  md5_init is probably to be called on first actual
input). Since performance  is important for you the functions should be
written in low-level language as C, to me it doesn't look difficult to take
some C md5 module and adapt it to be an aggregate... though it's not like I
would do this easily myself :)


[GENERAL] Prepared statement's plan

2008-01-16 Thread Vyacheslav Kalinin
Hello,

I would appreciate if someone explained me how exactly prepared parametrized
statements are planned, i.e. what kind of assumptions planner makes on param
values, selectivity, expected row count etc. that affect in particular
whether indexes will be used or not. For instance consider the following
case:

create table t1 ( id serial, val int);

insert into t1 (val)
select trunc(10*random())
  from generate_series(1, 100);

create index idx_t1 on t1(val);

analyze t1;
--

1.
prepare stmt (int) as
 select * from t1 where val  $1;

explain execute stmt(10);

---
QUERY PLAN
Seq Scan on t1  (cost=0.00..17401.94 rows=32 width=8)
 Filter: (val  $1)

2.
prepare stmt (int, int) as
 select * from t1 where val  $1 and val  $2;

explain execute stmt(2, 3);

---
QUERY PLAN
Bitmap Heap Scan on t1  (cost=151.74..5307.59 rows=5000 width=8)
  Recheck Cond: ((val  $1) AND (val  $2))
  -  Bitmap Index Scan on idx_t1  (cost=0.00..150.49 rows=5000 width=0)
Index Cond: ((val  $1) AND (val  $2))

Hmm, why does it expect 5000 rows here? What influences this expectation?

3.
prepare stmt (int) as
 select * from t1 where val = $1 or $1 is null;

explain execute stmt(2);

QUERY PLAN
Seq Scan on t1  (cost=0.00..17401.94 rows=5013 width=8)
  Filter: ((val = $1) OR ($1 IS NULL))

That's the weirdest behavior: where did 5013 rows assumption came from? Why
use seq scan then? I should mention that planner refuses to use anything but
seq scan here even if I explicitly disable it with set enable_seqscan to
off.

In general, I wonder if one could get somewhat predictable planner behavior
in such cases since we have a lot of code written in plpgsql and the
patterns above are pretty common there.

Thanks,
Viatcheslav


[GENERAL] Prepared statement's planning

2008-01-15 Thread Vyacheslav Kalinin
Hello,

I would appreciate if someone explained me how exactly prepared parametrized
statements are planned, i.e. what kind of assumptions planner makes on param
values, selectivity, expected row count etc. that affect in particular
whether indexes will be used or not. For instance consider the following
case:

create table t1 ( id serial, val int);

insert into t1 (val)
select trunc(10*random())
  from generate_series(1, 100);

create index idx_t1 on t1(val);

analyze t1;
--

1.
prepare stmt (int) as
 select * from t1 where val  $1;

explain execute stmt(10);

---
QUERY PLAN
Seq Scan on t1  (cost=0.00..17401.94 rows=32 width=8)
 Filter: (val  $1)

2.
prepare stmt (int, int) as
 select * from t1 where val  $1 and val  $2;

explain execute stmt(2, 3);

---
QUERY PLAN
Bitmap Heap Scan on t1  (cost= 151.74..5307.59 rows=5000 width=8)
  Recheck Cond: ((val  $1) AND (val  $2))
  -  Bitmap Index Scan on idx_t1  (cost=0.00..150.49 rows=5000 width=0)
Index Cond: ((val  $1) AND (val  $2))

Hmm, why does it expect 5000 rows here? What influences this expectation?

3.
prepare stmt (int) as
 select * from t1 where val = $1 or $1 is null;

explain execute stmt(2);

QUERY PLAN
Seq Scan on t1  (cost=0.00..17401.94 rows=5013 width=8)
  Filter: ((val = $1) OR ($1 IS NULL))

That's the weirdest behavior: where did 5013 rows assumption came from? Why
use seq scan then? I should mention that planner refuses to use anything but
seq scan here even if I explicitly disable it with set enable_seqscan to
off.

In general, I wonder if one could get somewhat predictable planner behavior
in such cases since we have a lot of code written in plpgsql and the
patterns above are pretty common there.

Thanks,
Viatcheslav


Re: [GENERAL] Prepared statement's planning

2008-01-15 Thread Vyacheslav Kalinin
On Jan 15, 2008 7:58 PM, Tom Lane [EMAIL PROTECTED] wrote:
 If that really is the behavior you want, and not a typo
It is, most of parameters passed to a plpgsql function are in fact parts of
the filter and if certain filter item is null it is considered unknown and
we don't want it to affect the result in that case. The query pattern itself
is a shortcut to cover both cases in one expressions which works fine while
applied to secondary fields and delivers troubles when used on key indexed
fields that might influence planning decisions. Thanks for the union idea,
other ways around I could think of are dynamic queries or a bunch of
plpgsql's 'if-elseif's.

 It can see that it's got a range constraint on 'val', but not exactly
 how wide the range is, so the selectivity estimate is
DEFAULT_RANGE_INEQ_SEL
 which is hardwired at 0.005.  0.005 * 100 = 5000.

Probably this (parametrized query's planning) is something worth mentioning
in the docs one day.

 As for the rowcount estimate, I think it's using DEFAULT_UNK_SEL (which
 also happens to be 0.005) as the selectivity of an IS NULL test with a
 non-Var target.  That's a bit useless in this case, since for any one
 call of the query it's either going to be constant true or constant
 false, but there's not any obvious better way to do it.

Hm.. It could build union'ed plans for every possible value of unknown
boolean expression similar to that of your example's, though this could make
plans grow huge.


Re: [GENERAL] Understanding Aliases

2007-12-12 Thread Vyacheslav Kalinin
 SELECT-list output aliases name the *output* columns of the SELECT.
 Those output columns are not available to the SELECT's computation
 clauses

Then it's unclear how could the first query work


Re: [GENERAL] Pg_catalog reference

2007-12-12 Thread Vyacheslav Kalinin
It is not possible to assign triggers to system tables hence it is
impossible to reference system table in FK constraint. Obviously, users that
don't exist in the database cannot modify anything, you can use
session_user and current_user functions for logging purposes, see
http://www.postgresql.org/docs/8.2/interactive/functions-info.html


Re: [GENERAL] storage size of bit data type..

2007-12-07 Thread Vyacheslav Kalinin
 It needs to store the number of bits present as well

Couldn't that be reduced to 1 byte that'd say how many bits count in the
last byte?

 Only in the sense that numeric also has to store some meta data as well
like
the weight and display precision.

Is it really necessary to store display precision when it can be taken from
the table column definition?


Re: [GENERAL] Dynamic expressions set in order by clause

2007-11-22 Thread Vyacheslav Kalinin
You could write a set of expressions that yield proper order by field
in one case and say null in another. Say we want order by columns
col1, col2 desc if the first function argument is '1' and by col3
desc, col1, col2 if it is '2', this can be achieved as:

select col1, col2, col3 from mytable
 where ...
 order by case when $1 = '1' then col1 end,
  case when $1 = '1' then col2 else col3 end desc,
  case when $1 = '2' then col1 end,
  case when $1 = '2' then col2 end
;

This would work as following when $1 = '1':
select col1, col2, col3 from mytable
 where ...
 order by col1, col2desc, null, null;

and when $1 = '2':
select col1, col2, col3 from mytable
 where ...
 order by null, col3 desc, col1, col2

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] strange infinite loop in plpgsql

2007-11-12 Thread Vyacheslav Kalinin
Isn't EXTRACT(dow FROM day) IN (0,6)  always true thus making select
return a row every time?

On 11/10/07, rihad [EMAIL PROTECTED] wrote:

 I've been reading the online docs, but... code like this somehow ends up
 in an indefinite loop:

 CREATE OR REPLACE FUNCTION foo() RETURNS int AS $$
 DECLARE
timeout int;
day date;
 BEGIN
  day := current_date + 1;
  LOOP
SELECT date+1 INTO day FROM days WHERE date=day OR EXTRACT(dow
 FROM day) IN (0,6);
EXIT WHEN NOT FOUND;
timeout := timeout + 86400;
  END LOOP;
 END; $$ LANGUAGE plpgsql;

 It's Saturday at our place, and the days table has only one record for
 tomorrow's date.

 I hope it's been a very very long day for me :-) Thank you for your help.

 ---(end of broadcast)---
 TIP 6: explain analyze is your friend



Re: [GENERAL] Extracting Index Creation Script

2007-10-29 Thread Vyacheslav Kalinin
If you know the index name then:
SELECT pg_get_indexdef('your_index_name'::regclass)
will do.

In case you want a full list of indices for a certain table:
SELECT c2.relname, pg_get_indexdef(i.indexrelid, 0, true), c2.reltablespace
  FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i
 WHERE c.oid = 'your_table_name'::regclass AND c.oid = i.indrelid AND
i.indexrelid = c2.oid

If you have more questions of that kind try starting psql with -E option
which enables internal queries' display (this is what I did).


Re: [GENERAL] (Never?) Kill Postmaster?

2007-10-29 Thread Vyacheslav Kalinin
Postgres starts separate process for each connection and you can just `kill`
(not `kill -9`) the unwanted process. Or you can do `select
pg_cancel_backend(pid)` from another session.


Re: [GENERAL] Polymorphic functions' weird behavior

2007-07-31 Thread Vyacheslav Kalinin

Tom Lane wrote:

The fix is pretty easy if you need it now


Thanks for the update, Tom.
Another little issue with that function. I cannot make it 
accept NULL as an argument. It won't normally accept NULL 
reporting an appropriate error that it cannot determine 
argument type, nor will it accept it if I make the 
function STRICT, nor can I handle it inside the function 
since it isn't actually called. Thus I always get the 
error which is not desirable behavior. Any way around it?


regards, Viatcheslav

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq