from:"Maxim Boguk"

[BUGS] BUG #8408: Not exactly correct error messages for failed check in presence of dropped columns.

2013-08-29 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  8408
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.2.4
Operating system:   Any
Description:

There are simple test case:


create table test_columns (id serial primary key, value int
check(value10));
NOTICE:  CREATE TABLE will create implicit sequence test_columns_id_seq
for serial column test_columns.id
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
test_columns_pkey for table test_columns
CREATE TABLE




insert into test_columns(value) values (100);
ERROR:  new row for relation test_columns violates check constraint
test_columns_value_check
DETAIL:  Failing row contains (1, 100).


--it's ok...
--now let drop value column and add new one


alter table test_columns drop column value;
ALTER TABLE


alter table test_columns add column value int check (value  10);
ALTER TABLE


insert into test_columns(value) values (100);
ERROR:  new row for relation test_columns violates check constraint
test_columns_value_check
DETAIL:  Failing row contains (2, null, 100).


-- now (2, null, 100) sound weird...


And yes I completely understand where this NULL came from and why this
happened. 


However, I think error message should filter dropped column and strange null
values (because if a table has a lot dropped column this error message
quickly become completely unreadable).


Regards,
Maksym



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #8176: problem with the ALTER TYPE name RENAME TO new_name [ CASCADE | RESTRICT ] syntax

2013-05-27 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  8176
Logged by:  Maksym Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.2.4
Operating system:   Any
Description:

Hi,

It seems that documentation wrong or [ CASCADE | RESTRICT ] feature of ALTER
TYPE ... RENAME TO isn't implemented.

Test case:

create type test as (qqq integer);
CREATE TYPE

alter type test rename to test1 RESTRICT;
ERROR:  syntax error at or near RESTRICT
LINE 1: alter type test rename to test1 RESTRICT;
^

alter type test rename to test1 CASCADE;
ERROR:  syntax error at or near CASCADE
LINE 1: alter type test rename to test1 CASCADE;
^

However:
[local]:5432 postgres@postgres=# alter type test rename to test1;
ALTER TYPE




-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #7920: Sequence rename leave stale value for sequence_name

2013-03-06 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  7920
Logged by:  Maksym Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.2.3
Operating system:   Linux
Description:

sequence_name left stale after sequence rename:

Test case shows same problem on versions 9.0 9.1 9.2:

[postgres]=# create sequence qqq;

[postgres]=# SELECT sequence_name FROM qqq;
 sequence_name
---
 qqq

[postgres]=# alter sequence qqq rename to lalala;

--surprise  [postgres]=# SELECT sequence_name
FROM lalala;
 sequence_name
---
 qqq


pg_dump -F p -s postgres | grep qqq
--empty




-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #7920: Sequence rename leave stale value for sequence_name

2013-03-06 Thread Maxim Boguk


 I don't find this particularly suprising. Nothing looks at that field in
 sequences, there imo is no point on having the name inside at all.

 Do you need that for some usecase or did you just happen to notice it?


 I personally don't see any way to nicely fix that. We can add code to
 also change the contents, but currently thats generic code. Or we could
 just remove the column in the next release?




Well, this story began with one very bad named sequence.
After database structure audit the developers were asked to rename this
sequence to something more appropriate.
And when they performed alter ... rename they found that the name still
same.
After that they came to me with questions.

Task is remove that very bad name from the production database altogether.
It seems that the easiest way is drop sequence and create new sequece.

+1 for just remove the column in the next release


-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru/

People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.

Re: [BUGS] BUG #7703: Error with partitioning with distinct over aggregate

2012-11-26 Thread Maxim Boguk

On Tue, Nov 27, 2012 at 5:01 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 maxim.bo...@gmail.com writes:
 test=# create table test (id serial);
 NOTICE:  CREATE TABLE will create implicit sequence test_id_seq for serial
 column test.id
 CREATE TABLE
 test=# create index test_id_key on test(id);
 CREATE INDEX
 test=# create table test_part_1 (like test including all) INHERITS (test);
 NOTICE:  merging column id with inherited definition
 CREATE TABLE
 test=# explain analyze select distinct min(id) from test;
 ERROR:  failed to re-find MinMaxAggInfo record

 Fixed, thanks for the report.

 regards, tom lane

Thank you very much.

PS: I know that the query sounds stupid, but most CMS/ORM
unfortunately could produce unlimited amount of strange/stupid
queries.

-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #7703: Error with partitioning with distinct over aggregate

2012-11-25 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  7703
Logged by:  Maksym Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.2.1
Operating system:   Any
Description:

Simple test case:

test=# create table test (id serial);
NOTICE:  CREATE TABLE will create implicit sequence test_id_seq for serial
column test.id
CREATE TABLE
test=# create index test_id_key on test(id);
CREATE INDEX
test=# create table test_part_1 (like test including all) INHERITS (test);
NOTICE:  merging column id with inherited definition
CREATE TABLE
test=# explain analyze select distinct min(id) from test;
ERROR:  failed to re-find MinMaxAggInfo record


PS: without partioning or without index over id - no error. Tested 9.0.*
versions is not affected by bug, but 9.2.1 and 9.1.6 is.




-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #7612: Wrong result with join between two values () set

2012-10-18 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  7612
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.2.1
Operating system:   Linux
Description:

Join between two values() set could produce wrong results:

Test case:

Correct answer:
SELECT a.val, b.val FROM (VALUES( (2), (1)  )) AS a (val) JOIN (VALUES( (2),
(42) )) AS b (val) ON a.val = b.val;
 val | val
-+-
   2 |   2
(1 row)

now just change position of (2) and (1) in a(val):
Wrong answer:
 SELECT a.val, b.val FROM (VALUES( (1), (2)  )) AS a (val) JOIN (VALUES(
(2), (42) )) AS b (val) ON a.val = b.val;
 val | val
-+-
(0 rows)

explain (analyze, verbose)  results of the both queries:

mboguk=# explain (analyze, verbose) SELECT a.val, b.val FROM (VALUES( (2),
(1)  )) AS a (val) JOIN (VALUES( (2), (42) )) AS b (val) ON a.val = b.val;
 QUERY PLAN
-
 Nested Loop  (cost=0.00..0.04 rows=1 width=8) (actual time=0.070..0.118
rows=1 loops=1)
   Output: *VALUES*.column1, *VALUES*.column1
   Join Filter: (*VALUES*.column1 = *VALUES*.column1)
   -  Values Scan on *VALUES*  (cost=0.00..0.01 rows=1 width=4) (actual
time=0.016..0.027 rows=1 loops=1)
 Output: *VALUES*.column1, *VALUES*.column2
   -  Values Scan on *VALUES*  (cost=0.00..0.01 rows=1 width=4) (actual
time=0.013..0.024 rows=1 loops=1)
 Output: *VALUES*.column1, *VALUES*.column2
 Total runtime: 0.209 ms
(8 rows)

mboguk=# explain (analyze, verbose) SELECT a.val, b.val FROM (VALUES( (1),
(2)  )) AS a (val) JOIN (VALUES( (2), (42) )) AS b (val) ON a.val = b.val;
 QUERY PLAN
-
 Nested Loop  (cost=0.00..0.04 rows=1 width=8) (actual time=0.056..0.056
rows=0 loops=1)
   Output: *VALUES*.column1, *VALUES*.column1
   Join Filter: (*VALUES*.column1 = *VALUES*.column1)
   Rows Removed by Join Filter: 1
   -  Values Scan on *VALUES*  (cost=0.00..0.01 rows=1 width=4) (actual
time=0.008..0.013 rows=1 loops=1)
 Output: *VALUES*.column1, *VALUES*.column2
   -  Values Scan on *VALUES*  (cost=0.00..0.01 rows=1 width=4) (actual
time=0.007..0.013 rows=1 loops=1)
 Output: *VALUES*.column1, *VALUES*.column2
 Total runtime: 0.100 ms
(9 rows)





-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #7612: Wrong result with join between two values () set

2012-10-18 Thread Maxim Boguk

On 10/19/12, Vik Reykja vikrey...@gmail.com wrote:
 On Thu, Oct 18, 2012 at 5:40 PM, maxim.bo...@gmail.com wrote:

 The following bug has been logged on the website:

 Bug reference:  7612
 Logged by:  Maxim Boguk
 Email address:  maxim.bo...@gmail.com
 PostgreSQL version: 9.2.1
 Operating system:   Linux
 Description:

 Join between two values() set could produce wrong results:


 This is not a bug; your test case produces correct results.

 Your VALUES clauses are producing one row with two columns each called
 val and column2.  You are joining on val and so when you switch the
 values to put 2 in different columns, no results are found.

 Is it possible you didn't intend to put the parentheses immediately
 following the VALUES keyword?  I think that is the case because your
 parentheses around the numbers are superfluous.

Oops sorry for noise.
I should be more careful.


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #7580: repeatable test case for the BUG #7579

2012-10-03 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  7580
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.2.1
Operating system:   Linux
Description:


create table t1 as select id from generate_series(1,10) as g(id);
create table t2 as select id from generate_series(1,10) as g(id);
alter table t1 add primary key (id);
alter table t2 add primary key (id);
analyze t1;
analyze t2;

explain select * from t1
left outer join 
(
  select distinct t2.id from t2 
  where t2.id100
) as _t on _t.id=t1.id
where
(t1.id100  or (t1.id100 and _t.id IS NOT NULL));

The connection to the server was lost.



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #7573: data loss in corner case using delete_old_cluster.sh (pg_upgrade)

2012-10-03 Thread Maxim Boguk

  Test database have a bit unusual tablespace layout:
  main tablespace partition was mounted inside data directory of the old
  cluster...
  E.g.:
  data directory - /var/lib/postgresql/9.2/main
  main tablespace (another partition mount point) -
  /var/lib/postgresql/9.2/main/largedb

 Can you show us the data directory path of the old and new clusters?


--old-datadir=/var/lib/postgresql/9.0/main
--new-datadir=/var/lib/postgresql/9.2/main

second partition used as tablespace were mounted as:
/var/lib/postgresql/9.0/main/largedb


 pg_upgrade really doesn't know what is inside that old cluster, so it
 just deletes everything under the data directory.


Hmm... may be good idea to try opposite way:
default directories and files layout in PostgreSQL data directory well
documented and almost never changes.
May be instead of rm -rf whole data directory try rm -rf only files and
directories which sure belong to the PostgreSQL?

Something along with:
1)rm -rf base global pg_clog pg_multixact ... and so on
2)produce warning if any unusual files left in data directory after that
(but not delete them).
3)delete data directory itself only if that directory completely empty
after step 1 and 2

PS: I know that solution will be not completely error-prone but it will
prevent most probably data-loss scenarios. So it's better then nothing.

PS: I also think deleting postgresql.conf and pg_hba.conf from old data
directory is wrong move too... if admin forget copy pg_hba.conf to the new
cluster - these settings could be lost forever after delete_old_cluster.sh .

--
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com

   + It's impossible for everything to be true. +




-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru/

People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.

[BUGS] BUG #7573: data loss in corner case using delete_old_cluster.sh (pg_upgrade)

2012-09-27 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  7573
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.2.0
Operating system:   Linux
Description:

Hi,

today while performing migration of test database (with no critical data...
and that was good thing).
I found very nasty corner case with using delete_old_cluster.sh after
pg_upgrade.

Test database have a bit unusual tablespace layout:
main tablespace partition was mounted inside data directory of the old
cluster...
E.g.:
data directory - /var/lib/postgresql/9.2/main
main tablespace (another partition mount point) -
/var/lib/postgresql/9.2/main/largedb

Now funny part: migration was successful but after few days I decided to
clear old cluster data...
I echecked content of delete_old_cluster.sh but found nothing suspicious...
just one string...
rm -rf /var/lib/postgresql/9.2/main

Well I know I should be more careful, but in result that command deleted
whole tablespace data on another partition including 9.2 version
tablespace.

It was surprise...

May be it is good idea to add:
   --one-file-system
  when removing a hierarchy recursively, skip any directory that
is on a file system different from that of the corresponding command line
argument

to rm call into that script.

However, it is Linux only feature.

PS: Yes I know that keeping any foreign data inside PostgreSQL data
directory is bad idea.



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #7500: hot-standby replica crash after an initial rsync

2012-08-22 Thread Maxim Boguk

  Does that mean that file was damaged during rsync?
 Not necessarily. When did you initially set up that cluster? Normally the
 file
 should get zeroed out before its being used. If the cluster was copied
 improperly (i.e. no pg_start/stop backup or such) it could easily happen.
 But
 I would guess that you did it properly?


That server was master before planned switchover to replica server was
performed.
Procedure:
1)stop former master
2)trigger replica failover (thus server become new master)

There are unusual part because human error from system administrator:
2.1)start former master
2.2)stop former master

3)empty pg_xlog directory on former master
4)pg_start_backup on new master
5)re-resync datafiles to former master using:
rsync -aq --delete --exclude pg_xlog /db/data [former master]:/db/
6)pg_stop_backup on new master
7)start former master as replica...

that procedure used more than 10 times without any problem in the past.
May be I missing something in it.




 Could you show pg_controldata output for primary/replica? To see how new
 that
 multixact offset is.


On failing replica:
[pgsql@db1 ~/tmp/postgresql-9.0.8]$ pg_controldata /db/data_back
pg_control version number:903
Catalog version number:   201008051
Database system identifier:   5687760461964967531
Database cluster state:   in archive recovery
pg_control last modified: Wed Aug 22 13:47:35 2012
Latest checkpoint location:   925/2D41CAA0
Prior checkpoint location:925/2D41CAA0
Latest checkpoint's REDO location:925/2C0004C8
Latest checkpoint's TimeLineID:   5
Latest checkpoint's NextXID:  0/1768511856
Latest checkpoint's NextOID:  59880938
Latest checkpoint's NextMultiXactId:  361788644
Latest checkpoint's NextMultiOffset:  822580965
Latest checkpoint's oldestXID:1569182525
Latest checkpoint's oldestXID's DB:   16424
Latest checkpoint's oldestActiveXID:  1768498919
Time of latest checkpoint:Tue Aug 21 08:21:42 2012
Minimum recovery ending location: 925/2C0004C8
Backup start location:925/2C0004C8
Current wal_level setting:hot_standby
Current max_connections setting:  300
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   2048
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value

On master no sense to look into it because master 1 day ahead of that
replica.
It's a production system.




  What next I should examine?
 If you could give me a DEBUG3 log from startup till it crashes it would be
 helpful. Please #define MULTIXACT_DEBUG ontop of multixact.c before doing
 so.
 Thats probably going to be to big to go through the mailinglist.


I see nothing new after I rebuit postgres with that define   (
/usr/local/pgsql/tmp/postgresql/bin/postgres -d 3 -D /db/data_back/ ).
Just:

LOCATION:  readRecoveryCommandFile, xlog.c:5269
DEBUG:  0: primary_conninfo = 'host=10.12.22.19 port=5433 user=replica
password=acilper'
LOCATION:  readRecoveryCommandFile, xlog.c:5276
DEBUG:  0: trigger_file = '/db/data/failover'
LOCATION:  readRecoveryCommandFile, xlog.c:5283
LOG:  0: entering standby mode
LOCATION:  StartupXLOG, xlog.c:5827
DEBUG:  0: checkpoint record is at 925/2D41CAA0
LOCATION:  StartupXLOG, xlog.c:5894
DEBUG:  0: redo record is at 925/2C0004C8; shutdown FALSE
LOCATION:  StartupXLOG, xlog.c:5929
DEBUG:  0: next transaction ID: 0/1768511856; next OID: 59880938
LOCATION:  StartupXLOG, xlog.c:5933
DEBUG:  0: next MultiXactId: 361788644; next MultiXactOffset: 822580965
LOCATION:  StartupXLOG, xlog.c:5936
DEBUG:  0: oldest unfrozen transaction ID: 1569182525, in database 16424
LOCATION:  StartupXLOG, xlog.c:5939
DEBUG:  0: MultiXact: setting next multi to 361788644 offset 822580965
LOCATION:  MultiXactSetNextMXact, multixact.c:1668
DEBUG:  0: transaction ID wrap limit is 371172, limited by database
with OID 16424
LOCATION:  SetTransactionIdLimit, varsup.c:334
WARNING:  01000: pid of startup is: 16919, sleeping for 60s
LOCATION:  StartupXLOG, xlog.c:5983
DEBUG:  0: initializing for hot standby
LOCATION:  StartupXLOG, xlog.c:6073
LOG:  0: redo starts at 925/2C0004C8
LOCATION:  StartupXLOG, xlog.c:6202
FATAL:  XX000: could not access status of transaction 361788644
DETAIL:  Could not read from file pg_multixact/offsets/1590 at offset
114688: Unknown error: 0.
CONTEXT:  xlog redo create multixact 361788644 offset 822580965: 1768508061
1768508078
LOCATION:  SlruReportIOError, slru.c:840





 If

[BUGS] BUG #7500: hot-standby replica crash after an initial rsync

2012-08-21 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  7500
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.0.8
Operating system:   FreeBSD
Description:

Hi,

Reinitialization of the replica after failover is procedure performed by me
a lot times in that installation. And I never had any issues. But today
after following the same pg_start_backup/rsync procedure replica doesn't
start with following error:

2012-08-21 11:17:07.777 MSK 34329 @ from  [vxid: txid:0] []LOG:  entering
standby mode
2012-08-21 11:17:07.780 MSK 34329 @ from  [vxid:1/0 txid:0] []LOG:  redo
starts at 925/2C0004C8
2012-08-21 11:17:07.783 MSK 34329 @ from  [vxid:1/0 txid:0] []FATAL:  could
not access status of transaction 361788644
2012-08-21 11:17:07.783 MSK 34329 @ from  [vxid:1/0 txid:0] []DETAIL:  Could
not read from file pg_multixact/offsets/1590 at offset 114688: Unknown
error: 0.
2012-08-21 11:17:07.783 MSK 34329 @ from  [vxid:1/0 txid:0] []CONTEXT:  xlog
redo create multixact 361788644 offset 822580965: 1768508061 1768508078
2012-08-21 11:17:07.791 MSK 34327 @ from  [vxid: txid:0] []LOG:  startup
process (PID 34329) exited with exit code 1
2012-08-21 11:17:07.791 MSK 34327 @ from  [vxid: txid:0] []LOG:  terminating
any other active server processes


I have kept all that database files for the future investigation.

What I should look into first?




-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #7500: hot-standby replica crash after an initial rsync

2012-08-21 Thread Maxim Boguk

 
 
  I have kept all that database files for the future investigation.
 
  What I should look into first?
 Could you reproduce the error with log_error_verbosity=verbose? Or even
 better
 provide a backtrace with gdb?


There log with log_error_verbosity=verbose:


2012-08-21 14:04:18.973 MSK 33950 @ from  [vxid: txid:0] []LOG:  0:
database system was interrupted while in recovery at log time 2012-08-21
08:21:42 MSK
2012-08-21 14:04:18.973 MSK 33950 @ from  [vxid: txid:0] []HINT:  If this
has occurred more than once some data might be corrupted and you might need
to choose an earlier recovery target.
2012-08-21 14:04:18.973 MSK 33950 @ from  [vxid: txid:0] []LOCATION:
StartupXLOG, xlog.c:5757
2012-08-21 14:04:18.973 MSK 33950 @ from  [vxid: txid:0] []LOG:  0:
creating missing WAL directory pg_xlog/archive_status
2012-08-21 14:04:18.973 MSK 33950 @ from  [vxid: txid:0] []LOCATION:
ValidateXLOGDirectoryStructure, xlog.c:3452
2012-08-21 14:04:19.009 MSK 33950 @ from  [vxid: txid:0] []LOG:  0:
entering standby mode
2012-08-21 14:04:19.009 MSK 33950 @ from  [vxid: txid:0] []LOCATION:
StartupXLOG, xlog.c:5827
2012-08-21 14:04:19.041 MSK 33951 @ from  [vxid: txid:0] []LOG:  0:
streaming replication successfully connected to primary
2012-08-21 14:04:19.041 MSK 33951 @ from  [vxid: txid:0] []LOCATION:
libpqrcv_connect, libpqwalreceiver.c:169
2012-08-21 14:04:20.140 MSK 33950 @ from  [vxid:1/0 txid:0] []LOG:  0:
redo starts at 925/2C0004C8
2012-08-21 14:04:20.140 MSK 33950 @ from  [vxid:1/0 txid:0] []LOCATION:
StartupXLOG, xlog.c:6199
2012-08-21 14:04:20.160 MSK 33950 @ from  [vxid:1/0 txid:0] []FATAL:
XX000: could not access status of transaction 361788644
2012-08-21 14:04:20.161 MSK 33950 @ from  [vxid:1/0 txid:0] []DETAIL:
Could not read from file pg_multixact/offsets/1590 at offset 114688:
Unknown error: 0.
2012-08-21 14:04:20.161 MSK 33950 @ from  [vxid:1/0 txid:0] []CONTEXT:
xlog redo create multixact 361788644 offset 822580965: 1768508061 1768508078
2012-08-21 14:04:20.161 MSK 33950 @ from  [vxid:1/0 txid:0] []LOCATION:
SlruReportIOError, slru.c:840
2012-08-21 14:04:20.169 MSK 33948 @ from  [vxid: txid:0] []LOG:  0:
startup process (PID 33950) exited with exit code 1
2012-08-21 14:04:20.169 MSK 33948 @ from  [vxid: txid:0] []LOCATION:
LogChildExit, postmaster.c:2840
2012-08-21 14:04:20.169 MSK 33948 @ from  [vxid: txid:0] []LOG:  0:
terminating any other active server processes
2012-08-21 14:04:20.169 MSK 33948 @ from  [vxid: txid:0] []LOCATION:
HandleChildCrash, postmaster.c:2674


Kind Regards,
Maksym

Re: [BUGS] BUG #7500: hot-standby replica crash after an initial rsync

2012-08-21 Thread Maxim Boguk

Hi Andres,


  I have some problems with debug startup process with gdb...
  I following next sequence of commands (and got no useful results):
 Youre debugging the postmaster that way. The easiest way would be to just
 attach to the startup process with gdb -p. Not sure if you can manage that
 timingwise without manually adding sleeps.


It's imposible without added sleep seems. Too fast for me.
Where I should add sleep call in PostgreSQL source? (process managment part
of PostgreSQL isn't known for me at all).



 Otherwise you probably will have to work with set follow-fork-mode ask in
 gdb. Getting to the right child isn't easy though.


Unfortunatelly gdb under FreeBSD doesn't suport set follow-fork-mode.

Kind Regards,
Maksy

Re: [BUGS] BUG #7500: hot-standby replica crash after an initial rsync

2012-08-21 Thread Maxim Boguk

On Wed, Aug 22, 2012 at 6:08 AM, Andres Freund and...@2ndquadrant.comwrote:

 On Tuesday, August 21, 2012 03:30:44 PM Maxim Boguk wrote:
  Hi Andres,

 I would add something akin to

 elog(WARNING, pid of startup is: %d, sleeping for 10s, getpid());
 sleep(10);


Hi Andres,

There is it:

gdb backtrace:

#0  0x000800bd1ab0 in exit () from /lib/libc.so.7
#1  0x005d77f6 in proc_exit (code=1) at ipc.c:137
#2  0x00697948 in errfinish (dummy=optimized out) at elog.c:501
#3  0x00488956 in SlruReportIOError (ctl=optimized out,
pageno=optimized out, xid=361788644) at slru.c:857
#4  0x0048956a in SimpleLruReadPage (ctl=0x95bf80, pageno=176654,
write_ok=1 '\001', xid=361788644) at slru.c:427
#5  0x00489f7f in RecordNewMultiXact (multi=361788644,
offset=822580965, nxids=2, xids=0x800e5302c) at multixact.c:764
#6  0x0048a126 in multixact_redo (lsn=..., record=0x800e53000) at
multixact.c:2017
#7  0x00484bea in StartupXLOG () at xlog.c:6277
#8  0x004870a8 in StartupProcessMain () at xlog.c:9335
#9  0x0049cab2 in AuxiliaryProcessMain (argc=2,
argv=0x7fffe450) at bootstrap.c:412
#10 0x005ba926 in StartChildProcess (type=StartupProcess) at
postmaster.c:4442
#11 0x005bccb2 in PostmasterMain (argc=5, argv=0x7fffea38) at
postmaster.c:1093
#12 0x00566f3b in main (argc=5, argv=0x7fffea38) at main.c:188

relevant database log:
2012-08-22 07:47:21.330 MSK 18155 @ from  [vxid:1/0 txid:0] []LOG:  0:
redo starts at 925/2C0004C8
2012-08-22 07:47:21.330 MSK 18155 @ from  [vxid:1/0 txid:0] []LOCATION:
StartupXLOG, xlog.c:6202
2012-08-22 07:47:21.384 MSK 18155 @ from  [vxid:1/0 txid:0] []FATAL:
XX000: could not access status of transaction 361788644
2012-08-22 07:47:21.384 MSK 18155 @ from  [vxid:1/0 txid:0] []DETAIL:
Could not read from file pg_multixact/offsets/1590 at offset 114688:
Unknown error: 0.
2012-08-22 07:47:21.384 MSK 18155 @ from  [vxid:1/0 txid:0] []CONTEXT:
xlog redo create multixact 361788644 offset 822580965: 1768508061 1768508078
2012-08-22 07:47:21.384 MSK 18155 @ from  [vxid:1/0 txid:0] []LOCATION:
SlruReportIOError, slru.c:840


Kind Regards,
Maksym

Re: [BUGS] BUG #7500: hot-standby replica crash after an initial rsync

2012-08-21 Thread Maxim Boguk

On Wed, Aug 22, 2012 at 1:50 PM, Maxim Boguk maxim.bo...@gmail.com wrote:



 On Wed, Aug 22, 2012 at 6:08 AM, Andres Freund and...@2ndquadrant.comwrote:

 On Tuesday, August 21, 2012 03:30:44 PM Maxim Boguk wrote:
  Hi Andres,

 I would add something akin to

 elog(WARNING, pid of startup is: %d, sleeping for 10s, getpid());
 sleep(10);


 Hi Andres,

 There is it:

 gdb backtrace:

 #0  0x000800bd1ab0 in exit () from /lib/libc.so.7
 #1  0x005d77f6 in proc_exit (code=1) at ipc.c:137
 #2  0x00697948 in errfinish (dummy=optimized out) at elog.c:501
 #3  0x00488956 in SlruReportIOError (ctl=optimized out,
 pageno=optimized out, xid=361788644) at slru.c:857
 #4  0x0048956a in SimpleLruReadPage (ctl=0x95bf80, pageno=176654,
 write_ok=1 '\001', xid=361788644) at slru.c:427
 #5  0x00489f7f in RecordNewMultiXact (multi=361788644,
 offset=822580965, nxids=2, xids=0x800e5302c) at multixact.c:764
 #6  0x0048a126 in multixact_redo (lsn=..., record=0x800e53000) at
 multixact.c:2017
 #7  0x00484bea in StartupXLOG () at xlog.c:6277
 #8  0x004870a8 in StartupProcessMain () at xlog.c:9335
 #9  0x0049cab2 in AuxiliaryProcessMain (argc=2,
 argv=0x7fffe450) at bootstrap.c:412
 #10 0x005ba926 in StartChildProcess (type=StartupProcess) at
 postmaster.c:4442
 #11 0x005bccb2 in PostmasterMain (argc=5, argv=0x7fffea38) at
 postmaster.c:1093
 #12 0x00566f3b in main (argc=5, argv=0x7fffea38) at main.c:188

 relevant database log:
 2012-08-22 07:47:21.330 MSK 18155 @ from  [vxid:1/0 txid:0] []LOG:  0:
 redo starts at 925/2C0004C8
 2012-08-22 07:47:21.330 MSK 18155 @ from  [vxid:1/0 txid:0] []LOCATION:
 StartupXLOG, xlog.c:6202
 2012-08-22 07:47:21.384 MSK 18155 @ from  [vxid:1/0 txid:0] []FATAL:
 XX000: could not access status of transaction 361788644
 2012-08-22 07:47:21.384 MSK 18155 @ from  [vxid:1/0 txid:0] []DETAIL:
 Could not read from file pg_multixact/offsets/1590 at offset 114688:
 Unknown error: 0.
 2012-08-22 07:47:21.384 MSK 18155 @ from  [vxid:1/0 txid:0] []CONTEXT:
 xlog redo create multixact 361788644 offset 822580965: 1768508061 1768508078
 2012-08-22 07:47:21.384 MSK 18155 @ from  [vxid:1/0 txid:0] []LOCATION:
 SlruReportIOError, slru.c:840


 Kind Regards,
 Maksym


And finally gdb backtrace on build without optimization
( ./configure --prefix=/usr/local/pgsql/tmp/postgresql/ --enable-debug
--enable-cassert CFLAGS='-O0' ):

#0  0x000800d42ab0 in exit () from /lib/libc.so.7
#1  0x006aafc6 in proc_exit (code=1) at ipc.c:137
#2  0x007bcbb1 in errfinish (dummy=0) at elog.c:501
#3  0x004b7065 in SlruReportIOError (ctl=0xad9740, pageno=176654,
xid=361788644) at slru.c:836
#4  0x004b6353 in SimpleLruReadPage (ctl=0xad9740, pageno=176654,
write_ok=1 '\001', xid=361788644) at slru.c:427
#5  0x004b85dd in RecordNewMultiXact (multi=361788644,
offset=822580965, nxids=2, xids=0x800f5302c) at multixact.c:764
#6  0x004b9dd1 in multixact_redo (lsn=..., record=0x800f53000) at
multixact.c:2017
#7  0x004aef0c in StartupXLOG () at xlog.c:6277
#8  0x004b446d in StartupProcessMain () at xlog.c:9335
#9  0x004d9e36 in AuxiliaryProcessMain (argc=2,
argv=0x7fffe8b0) at bootstrap.c:412
#10 0x00683398 in StartChildProcess (type=StartupProcess) at
postmaster.c:4442
#11 0x0067ef60 in PostmasterMain (argc=5, argv=0x7fffea38) at
postmaster.c:1093
#12 0x005fbf5a in main (argc=5, argv=0x7fffea38) at main.c:188

It seems failing around function SlruPhysicalReadPage at block

errno = 0;
if (read(fd, shared-page_buffer[slotno], BLCKSZ) != BLCKSZ)
{
slru_errcause = SLRU_READ_FAILED;
slru_errno = errno;
close(fd);
return false;
}

It seems that it read less than BLCKSZ bytes. And it is not an error
condition for read call
(from man 2 read:
Note that reading less than size bytes is not an error.
).

Actual size of pg_multixact/offsets/1590 file is:
[pgsql@db1 ~/tmp/postgresql-9.0.8]$ ls -la
/db/data_back/pg_multixact/offsets/1590
-rw---  1 pgsql  pgsql  81920 Aug 21 07:46
/db/data_back/pg_multixact/offsets/1590

Now it seems that the (lseek(fd, (off_t) offset, SEEK_SET)  0) call moving
seek position to outside of an actual file size.
And after that, read call actually could read only 0 bytes from it.

Does that mean that file was damaged during rsync?


Kind Regards,
Maksym

What next I should examine?

Kind Regards,
Maksym

[BUGS] BUG #6736: Hot-standby replica crashed after failover:

2012-07-13 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  6736
Logged by:  Maksym Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.0.4
Operating system:   FreeBSD
Description:

Hi,

May be it was fixed in more recent releases, but I can not find related
things in release notices.

The next sequence leads to the standby crash after failover:

1)stop master (wait untill full stop)
2)initiate failover
hot-standby crashed during failover,
however successfully started after manual start (and seems work without any
future errors).

In hot-standby logs:
2012-07-13 17:06:08.199 MSK 58208 @ from  [vxid: txid:0] []FATAL: 
replication terminated by primary server
2012-07-13 17:06:08.788 MSK 67891 @ from  [vxid: txid:0] []FATAL:  could not
connect to the primary server: server closed the connection unexpectedly
2012-07-13 17:06:19.521 MSK 67912 @ from  [vxid: txid:0] []FATAL: 
terminating walreceiver process due to administrator command
2012-07-13 17:06:19.732 MSK 79724 @ from  [vxid:1/0 txid:0] []WARNING:  page
2026 of relation base/14482918/32905335 was uninitialized
2012-07-13 17:06:19.732 MSK 79724 @ from  [vxid:1/0 txid:0] []WARNING:  page
7327 of relation base/14482918/32905343 was uninitialized
2012-07-13 17:06:19.732 MSK 79724 @ from  [vxid:1/0 txid:0] []WARNING:  page
7299 of relation base/14482918/32905343 was uninitialized
2012-07-13 17:06:19.732 MSK 79724 @ from  [vxid:1/0 txid:0] []WARNING:  page
2040 of relation base/14482918/32905335 was uninitialized
2012-07-13 17:06:19.732 MSK 79724 @ from  [vxid:1/0 txid:0] []WARNING:  page
1761 of relation base/16424/33266809 was uninitialized
2012-07-13 17:06:19.732 MSK 79724 @ from  [vxid:1/0 txid:0] []WARNING:  page
1750 of relation base/16424/33266809 was uninitialized
2012-07-13 17:06:19.732 MSK 79724 @ from  [vxid:1/0 txid:0] []PANIC:  WAL
contains references to invalid pages
2012-07-13 17:06:19.895 MSK 67435 pgsql@pgsql from [local] [vxid:2/0 txid:0]
[]WARNING:  terminating connection because of crash of another server
process
2012-07-13 17:06:19.895 MSK 67435 pgsql@pgsql from [local] [vxid:2/0 txid:0]
[]DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
2012-07-13 17:06:19.895 MSK 67435 pgsql@pgsql from [local] [vxid:2/0 txid:0]
[]HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
2012-07-13 17:07:27.072 MSK 68607 @ from  [vxid: txid:0] []HINT:  If this
has occurred more than once some data might be corrupted and you might need
to choose an earlier recovery target.


Is that something which was fixed 
already?

PS: update to 9.0.8 planned on next week.




-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #6725: hot-standby slave repeatable crashed after restart (GIN error)

2012-07-09 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  6725
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.0.7
Operating system:   Debian Linux
Description:

Hi,

On one of my production databases hot-standby server start repeatable crash
after restart.

There is log:

2012-07-10 03:30:20 MSK 12034 postgres@postgres from [local] [vxid:2/0
txid:0] [CHECKPOINT] LOG:  duration: 100.262 ms  statement: checkpoint;
2012-07-10 03:30:24 MSK 12034 postgres@postgres from [local] [vxid:2/0
txid:0] [CHECKPOINT] LOG:  duration: 100.207 ms  statement: checkpoint;
2012-07-10 03:30:41 MSK 24618 @ from  [vxid: txid:0] [] LOG:  received fast
shutdown request
2012-07-10 03:30:41 MSK 24618 @ from  [vxid: txid:0] [] LOG:  aborting any
active transactions
2012-07-10 03:30:41 MSK 25312 @ from  [vxid: txid:0] [] FATAL:  terminating
walreceiver process due to administrator command
2012-07-10 03:30:41 MSK 24646 @ from  [vxid: txid:0] [] LOG:  shutting down
2012-07-10 03:30:41 MSK 24646 @ from  [vxid: txid:0] [] LOG:  database
system is shut down

2012-07-10 03:30:45 MSK 12202 @ from  [vxid: txid:0] [] LOG:  database
system was shut down in recovery at 2012-07-10 03:30:41 MSK
2012-07-10 03:30:45 MSK 12202 @ from  [vxid: txid:0] [] LOG:  entering
standby mode
2012-07-10 03:30:45 MSK 12202 @ from  [vxid:1/0 txid:0] [] LOG:  redo starts
at 127C/51EF6798
2012-07-10 03:31:35 MSK 12202 @ from  [vxid:1/0 txid:0] [] PANIC:  GIN
metapage disappeared
2012-07-10 03:31:35 MSK 12202 @ from  [vxid:1/0 txid:0] [] CONTEXT:  xlog
redo Update metapage, node: 1663/62450/27663577 blkno: 2164
2012-07-10 03:31:35 MSK 12189 @ from  [vxid: txid:0] [] LOG:  startup
process (PID 12202) was terminated by signal 6: Aborted
2012-07-10 03:31:35 MSK 12189 @ from  [vxid: txid:0] [] LOG:  terminating
any other active server processes

2012-07-10 03:33:37 MSK 13071 @ from  [vxid: txid:0] [] LOG:  database
system was interrupted while in recovery at log time 2012-07-10 02:17:21
MSK
2012-07-10 03:33:37 MSK 13071 @ from  [vxid: txid:0] [] HINT:  If this has
occurred more than once some data might be corrupted and you might need to
choose an earlier recovery target.
2012-07-10 03:33:37 MSK 13071 @ from  [vxid: txid:0] [] LOG:  entering
standby mode
2012-07-10 03:33:37 MSK 13071 @ from  [vxid:1/0 txid:0] [] LOG:  redo starts
at 127C/51EF6798
2012-07-10 03:34:15 MSK 13071 @ from  [vxid:1/0 txid:0] [] PANIC:  GIN
metapage disappeared
2012-07-10 03:34:15 MSK 13071 @ from  [vxid:1/0 txid:0] [] CONTEXT:  xlog
redo Update metapage, node: 1663/62450/27663577 blkno: 2164
2012-07-10 03:34:15 MSK 13065 @ from  [vxid: txid:0] [] LOG:  startup
process (PID 13071) was terminated by signal 6: Aborted
2012-07-10 03:34:15 MSK 13065 @ from  [vxid: txid:0] [] LOG:  terminating
any other active server processes

and so on.

I almost sure there are no disk corruption or such errors.

Now some additional information from the master server:
1)1663/62450/27663577  - corresponding to the pg_default tablespace and
wsfilm database. However, no entries with relfilenode=27663577 were found in
pg_class.
In the same time:  
ls -la ./9.0/main/base/62450/27663577
-rw--- 1 postgres postgres 0 Июл 10 02:29
./9.0/main/base/62450/27663577

In wsfilm db only one GIN index exists:
wsfilm=# SELECT * from pg_indexes where indexdef like '%gin%';
-[ RECORD 1 ]---
schemaname | public
tablename  | movie
indexname  | movie_fti_idx
tablespace |
indexdef   | CREATE INDEX movie_fti_idx ON movie USING gin (fti)

But it has different relfilenode:
wsfilm=# SELECT relfilenode from pg_class where relname='movie_fti_idx';
-[ RECORD 1 ]-
relfilenode | 27677714


2)Just few minutes before that index was rebuild on the master database:

2012-07-10 02:29:50 MSK 1480 postgres@wsfilm from [local] [vxid:33/3324338
txid:0] [idle] LOG:  statement: CREATE INDEX CONCURRENTLY
pgcompactor_tmp1436 ON movie USING gin (fti);
2012-07-10 02:29:52 MSK 1480 postgres@wsfilm from [local] [vxid:33/3324343
txid:0] [idle] LOG:  statement: BEGIN; DROP INDEX public.movie_fti_idx;
ALTER INDEX public.pgcompactor_tmp1436 RENAME TO movie_fti_idx; END;


My theory that the wal replay trying replay GIN index information on already
dropped index.





-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6725: hot-standby slave repeatable crashed after restart (GIN error)

2012-07-09 Thread Maxim Boguk


  2012-07-10 03:31:35 MSK 12202 @ from  [vxid:1/0 txid:0] [] PANIC:  GIN
  metapage disappeared

 This is fixed in current minor releases.


 http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=268ca4f57ea2dc3bf0dfb0f5c17fbda269b5d462

 regards, tom lane


Ooops sorry, I should check a release notes from the atest version before
posting new bugs.

Thank you.

Kind Regards,
Maksym

[BUGS] BUG #6662: Database do not push condition to subquery, test case for bug 6658

2012-05-22 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  6662
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.1.3
Operating system:   Linux
Description:

I managed create simple self-contained test case for 6658.



create table test as select * from generate_series(1,10) as g(val);
create index test_val_special on test((val || ''));
analyze test;
select count(*) from test;

--ok
--index scan
explain analyze
SELECT val
FROM test
WHERE (val || '')='something';


  QUERY PLAN
---
 Index Scan using test_val_special on test  (cost=0.01..8.29 rows=1 width=4)
(actual time=0.011..0.011 rows=0 loops=1)
   Index Cond: (((val)::text || ''::text) = 'something'::text)
 Total runtime: 0.038 ms


--not ok
--seq scan on 9.1.3
explain analyze
SELECT val 
FROM
(
SELECT
val,
(val || '') AS search_string
FROM test
) AS t1
WHERE search_string='something';

QUERY PLAN
--
 Subquery Scan on t1  (cost=0.00..3443.00 rows=500 width=4) (actual
time=47.076..47.076 rows=0 loops=1)
   Filter: (t1.search_string = 'something'::text)
   -  Seq Scan on test  (cost=0.00..2193.00 rows=10 width=4) (actual
time=0.012..34.949 rows=10 loops=1)
 Total runtime: 47.091 ms


--the same second query on 8.3.13
--plan correct
  QUERY PLAN
---
 Index Scan using test_val_special on test  (cost=0.01..8.29 rows=1 width=4)
(actual time=0.004..0.004 rows=0 loops=1)
   Index Cond: (((val)::text || ''::text) = 'something'::text)
 Total runtime: 0.018 ms

Kind Regards,
Maksym


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6662: Database do not push condition to subquery, test case for bug 6658

2012-05-22 Thread Maxim Boguk

Hi,

Thank you very much for answer.
Explicit casting resolved an issue.

Just single question:
if anytextcat() and textanycat() are marked volatile,
why the database allows create index on supposedly to be volatile
expression:
create index test_val_special on test((val || ''));
?

On Wed, May 23, 2012 at 11:46 AM, Tom Lane t...@sss.pgh.pa.us wrote:

maxim.bo...@gmail.com writes:
I managed create simple self-contained test case for 6658.

This works fine in HEAD. The reason it doesn't work fine in 9.1 (or
9.0) is that in those branches, anytextcat() and textanycat() are marked
volatile, for reasons that were good at the time but were superseded by
later policy changes. So you have a subselect containing a volatile
output expression, which prevents flattening of the subselect, so the
restriction clause doesn't get pushed down to where it could be used
with the index. For some history see

http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=c82d931dd180965a9a0c06acc764404f91de8170

http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=aab353a60b95aadc00f81da0c6d99bde696c4b75

http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=3db6524fe63f0598dcb2b307bb422bc126f2b15d

If this is a big problem for you, you could safely adjust the
provolatile marking of those functions by hand in 9.1. It might be
better just to include an explicit cast to text, though, instead of
relying on the assumption that the system will let you concat an integer
directly to a text string.

regards, tom lane

--
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru/

People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.

Re: [BUGS] BUG #6662: Database do not push condition to subquery, test case for bug 6658

2012-05-22 Thread Maxim Boguk

On Wed, May 23, 2012 at 12:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maxim Boguk maxim.bo...@gmail.com writes:
  if anytextcat() and textanycat() are marked volatile,
  why the database allows create index on supposedly to be volatile
  expression:
  create index test_val_special on test((val || ''));

 CREATE INDEX inlines anytextcat (which is just a SQL function) before
 making the volatility test, cf commit 5a86e5e19.  The reason the
 subselect flattening code has difficulty with this case is that that
 happens before function inlining does, so the expression still looks
 volatile even though it really isn't.

regards, tom lane




Thank you very much for all detailed explanation and all help.

Kind Regards,
Maksym

[BUGS] BUG #6658: Database do not push down static condition to subquery.

2012-05-21 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  6658
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.1.3
Operating system:   Linux
Description:

Hi,

I have 2 queries which have the same plan/same performance on 8.3 but on
9.1.3 fist one doesn't perform well because condition doesn't pushed down to
subquery:

--slow on 9.1.3 but fast on 8.3.13
--helper just shell view
db=# explain (analyze,verbose) select id from
(SELECT
id,
regexp_replace(btrim(((COALESCE(NULLIF(start_number, 0) || ' '::text,
''::text) || address) || ' '::text) || suburb) || ' '::text) || state) || '
'::text) || postcode), '  +'::text, ' '::text, 'g'::text) AS search_string
FROM helper
WHERE
start_number IS NOT NULL AND
start_alpha IS NOT NULL AND
end_number IS NULL
) AS _t WHERE
search_string='something';





   QUERY PLAN   




 

 Subquery Scan on _t  (cost=7715.15..2141451.98 rows=55630 width=4) (actual
time=4132.959..4132.959 rows=0 loops=1)
   Output: _t.id
   Filter: (_t.search_string = 'something'::text)
   -  Bitmap Heap Scan on data_table (cost=7715.15..1557337.03
rows=11125999 width=45) (actual time=92.290..4066.518 rows=403344 loops=1)
 Output: id,
regexp_replace(btrimCOALESCE(((NULLIF(start_number, 0))::text || '
'::text), ''::text) || upper(regexp_replace(btrim(((COALESCE((name || '
'::text), ''::text) || COALESCE((extension || ' '::text), ''::text)) ||
COALESCE(direction, ''::text))), '  +'::text, ' '::text, 'g'::text))) || '
'::text) || upper(regexp_replace(btrim(COALESCE(locality_name, ''::text)), '
 +'::text, ' '::text, 'g'::text))) || ' '::text) ||
upper(regexp_replace(btrim(COALESCE(state_code, ''::text)), '  +'::text, '
'::text, 'g'::text))) || ' '::text) ||
upper(regexp_replace(btrim(COALESCE(post_code, ''::text)), '  +'::text, '
'::text, 'g'::text, '  +'::text, ' '::text, 'g'::text)
 Recheck Cond: ((start_number IS NOT NULL) AND
(upper((start_alpha)::text) IS NOT NULL) AND (end_number IS NULL))
 -  Bitmap Index Scan on ix_search_full5  (cost=0.00..4933.65
rows=11125999 width=0) (actual time=59.118..59.118 rows=403344 loops=1)
 Total runtime: 4133.001 ms
(8 rows)


--fast on both 9.1.3 and 8.3.13
--flat version (search_string condition manually inlined)
db=# explain (analyze, verbose)
SELECT
id
FROM helper
WHERE
start_number IS NOT NULL AND
start_alpha IS NOT NULL AND
end_number IS NULL AND
regexp_replace(btrim(((COALESCE(NULLIF(start_number, 0) || ' '::text,
''::text) || address) || ' '::text) || suburb) || ' '::text) || state) || '
'::text) || postcode), '  +'::text, ' '::text, 'g'::text) = 'something';




 QUERY PLAN

[BUGS] BUG #6513: explain (analyze, buffers) and toasted fields

2012-03-04 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  6513
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.0.7
Operating system:   Linux
Description:

I got hit by that bug when explored reasons of one very slow production
query again. 
And I lost a lot time trying reproduce the problem query on production
server with explain analyze.
Finally I found I need some workaround to get explain perform actual work
with toasted data.

So there is going the bug report:
EXPLAIN do not take into account that some result fields will require
detoasting and produce wrong buffers result.

Test case:

pgsql=# drop table if exists test;
DROP TABLE
Time: 277.926 ms
pgsql=# CREATE TABLE test (id integer primary key, value text);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index test_pkey
for table test
CREATE TABLE
Time: 3.812 ms
--populate table with large toasted field
pgsql=# INSERT INTO test select i,(select array_agg(random()) from
generate_series (1,1000))::text from generate_series (1,1000) as g(i);
INSERT 0 1000
Time: 3176.286 ms

--wrong explain value at least for the buffers value... and whats even worse
wrong (100x less that would be actual) runtime
--(Buffers: shared hit=7) - wrong value
pgsql=# EXPLAIN (analyze, costs, buffers) select * from test;
  QUERY PLAN
--
 Seq Scan on test  (cost=0.00..10.70 rows=1000 width=36) (actual
time=0.020..1.656 rows=1000 loops=1)
   Buffers: shared hit=7
 Total runtime: 3.252 ms
(3 rows)
Time: 3.704 ms

--again wrong
pgsql=# EXPLAIN (analyze, costs, buffers) select value from test;
  QUERY PLAN
--
 Seq Scan on test  (cost=0.00..10.70 rows=1000 width=32) (actual
time=0.013..1.625 rows=1000 loops=1)
   Buffers: shared hit=7
 Total runtime: 3.141 ms
(3 rows)
Time: 3.428 ms

--force take detoasting into account via || with toasted field
--result close to reality (Buffers: shared hit=4700 - real value)
pgsql=# EXPLAIN (analyze, costs, buffers) select value||'a' from test;
   QUERY PLAN

 Seq Scan on test  (cost=0.00..13.20 rows=1000 width=32) (actual
time=0.264..187.855 rows=1000 loops=1)
   Buffers: shared hit=4700
 Total runtime: 189.696 ms
(3 rows)
Time: 190.001 ms

--actual timings of the both queries the same.
pgsql=# \o /dev/null
pgsql=# select * from test;
Time: 219.845 ms
pgsql=# \o /dev/null
pgsql=# select value||'a' from test;
Time: 221.599 ms

Having correct buffer hit/read values could be critical when toasted fields
are likely to be read from the HDD.




-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #6422: User without any priviledges on a table can lock the table from other users in some cases

2012-01-30 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  6422
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.1.2
Operating system:   Linux
Description:

Hi.

Unfortunately I was hit by that problem in the real project.

During active pg_dump (or any long transaction keeping lock on table), any
users with access to database can lock table from access, even if that user
have absolutely no rights on that table.

Test scenario:

1)CREATE ROLE dummy with login password 'something';

2)start pg_dump on database or any long transactions involving table news


3)now:
psql -U dummy -d testdb

= SELECT * from news;
ERROR:  permission denied for relation news

That perfectly ok...

sports= begin;
BEGIN
sports= lock table news;
oops... table locked from everyone access untill canceled or pg_dump
finished 

(and in realily - large web site down for 3 minute until my intervention).


The same can be done with any
alter table news ... ; statement.

Again I not sure is it an actual bug or feature. Seems Postgres trying to
put a lock on the table before check grants.

May be it's possible to check grants before set any heavy lock on the
table?

That behavior could be very troublesome in multi-user database environment.





-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6393: cluster sometime fail under heavy concurrent write load

2012-01-11 Thread Maxim Boguk

Hi.

The test case:

The first stage - preparation:

CREATE TABLE test (id serial primary key, flag integer, value text);
cluster test_pkey on test;
insert into test select nextval('test_id_seq') as id, 0 as flag,(SELECT
array(select random() from generate_series(1,1000)))::text as value from
generate_series(1,1000) as gs(id);
cluster test;

The second stage - test:

Console one:
postgres@db10:~$ while true; do psql -d test_db -c begin;update test set
flag=(random()*1000)::integer where id IN (select (random()*1000)::integer
from generate_series(1,30)); select pg_sleep(1); commit;; done
COMMIT
COMMIT
COMMIT
COMMIT
COMMIT
COMMIT
COMMIT
COMMIT
COMMIT
COMMIT

Console two:
postgres@db10:~$ while true; do psql -d test_db -c cluster test;; done
CLUSTER
ERROR:  duplicate key value violates unique constraint
pg_toast_119812558_index
DETAIL:  Key (chunk_id, chunk_seq)=(119811577, 0) already exists.
ERROR:  duplicate key value violates unique constraint
pg_toast_119812564_index
DETAIL:  Key (chunk_id, chunk_seq)=(119811601, 0) already exists.
ERROR:  duplicate key value violates unique constraint
pg_toast_119812570_index
DETAIL:  Key (chunk_id, chunk_seq)=(119811494, 0) already exists.
ERROR:  duplicate key value violates unique constraint
pg_toast_119812576_index
DETAIL:  Key (chunk_id, chunk_seq)=(119811552, 0) already exists.
ERROR:  duplicate key value violates unique constraint
pg_toast_119812582_index
DETAIL:  Key (chunk_id, chunk_seq)=(119811525, 0) already exists.

PS: no such effects happens on 9.0.4
I think it can be related to TOAST related changes in 9.0.6

On Thu, Jan 12, 2012 at 2:24 AM, Alvaro Herrera
alvhe...@commandprompt.comwrote:


 Excerpts from maxim.boguk's message of mar ene 10 23:00:59 -0300 2012:
  The following bug has been logged on the website:
 
  Bug reference:  6393
  Logged by:  Maxim Boguk
  Email address:  maxim.bo...@gmail.com
  PostgreSQL version: 9.0.6
  Operating system:   Linux Ubuntu
  Description:
 
  I have heavy write-load table under PostgreSQL 9.0.6 and sometime (not
  always but more then 50% chance) i'm getting the next error during
 cluster:
 
  db=# cluster public.enqueued_mail;
  ERROR:  duplicate key value violates unique constraint
  pg_toast_119685646_index
  DETAIL:  Key (chunk_id, chunk_seq)=(119685590, 0) already exists.
 
  chunk_id different each time.
 
  No uncommon datatypes exists in the table.
 
  Currently I work on create reproducible test case (but it seems require
 2-3
  open write transaction on the table).

 I don't see how can this be done at all, given that cluster grabs an
 exclusive lock on the table in question.  An better example illustrating
 what you're really doing would be useful.

 --
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support




-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage.

[BUGS] BUG #6393: cluster sometime fail under heavy concurrent write load

2012-01-10 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  6393
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.0.6
Operating system:   Linux Ubuntu
Description:

I have heavy write-load table under PostgreSQL 9.0.6 and sometime (not
always but more then 50% chance) i'm getting the next error during cluster:

db=# cluster public.enqueued_mail;
ERROR:  duplicate key value violates unique constraint
pg_toast_119685646_index
DETAIL:  Key (chunk_id, chunk_seq)=(119685590, 0) already exists.

chunk_id different each time.

No uncommon datatypes exists in the table.

Currently I work on create reproducible test case (but it seems require 2-3
open write transaction on the table).



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #6359: excessively inlining subquery leads to slow queries

2011-12-26 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  6359
Logged by:  Maksym Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.1.2
Operating system:   Ubuntu linux
Description:

Sometime Postgres inline subrequest even if it produce slower plan (and that
slow plan have higher actual cost than non-inlined plan):

test case:

drop table if exists t1;
drop table if exists t2;

create table t1 as select id from generate_series(1,1) as g(id);
create table t2 as select id from generate_series(1,1000) as g(id);
alter table t1 add primary key (id);
alter table t2 add primary key (id);


analyze t1;
analyze t2;

--fast non-inlined plan
explain (verbose, analyze)
select
id,
   
t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id
FROM
(
   select  t1.id,
   (select t2.id from t2 where t2.id=t1.id) as t2_id
   from t1
   offset 0
) as t;


  QUERY PLAN
-
 Subquery Scan on t  (cost=0.00..0.65 rows=1 width=8) (actual
time=0.066..0.069 rows=1 loops=1)
   Output: t.id, t.t2_id + t.t2_id) + t.t2_id) + t.t2_id) +
t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) +
t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id)
   -  Limit  (cost=0.00..0.60 rows=1 width=4) (actual time=0.053..0.056
rows=1 loops=1)
 Output: t1.id, ((SubPlan 1))
 -  Seq Scan on public.t1  (cost=0.00..0.60 rows=1 width=4) (actual
time=0.052..0.053 rows=1 loops=1)
   Output: t1.id, (SubPlan 1)
   SubPlan 1
 -  Index Scan using t2_pkey on public.t2  (cost=0.00..0.49
rows=1 width=4) (actual time=0.025..0.028 rows=1 loops=1)
   Output: t2.id
   Index Cond: (t2.id = t1.id)
 Total runtime: 0.161 ms
(11 rows)



--slow inlined plan
explain (verbose, analyze)
select
id,
   
t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id
FROM
(
   select  t1.id,
   (select t2.id from t2 where t2.id=t1.id) as t2_id
   from t1
--   offset 0
) as t;


 QUERY PLAN 


--
 Seq Scan on public.t1  (cost=0.00..8.44 rows=1 width=4) (actual
time=0.180..0.181 rows=1 loops=1)
   Output: t1.id, (SubPlan 1) + (SubPlan 2)) + (SubPlan 3))
+ (SubPlan 4)) + (SubPlan 5)) + (SubPlan 6)) + (SubPlan 7)) + (SubPlan 8)) +
(SubPlan 9)) + (SubPlan 10)) + (SubPlan 11)) + (SubPlan 12)) + (SubPlan 13))
+ (SubPlan 14)) + (SubPlan 15)) + (SubPlan 16)) + (SubPlan 17))
   SubPlan 1
 -  Index Scan using t2_pkey on public.t2  (cost=0.00..0.49 rows=1
width=4) (actual time=0.025..0.028 rows=1 loops=1)
   Output: public.t2.id
   Index Cond: (public.t2.id = t1.id)
   SubPlan 2
 -  Index Scan using t2_pkey on public.t2  (cost=0.00..0.49 rows=1
width=4) (actual time=0.006..0.007 rows=1 loops=1)
   Output: public.t2.id
   Index Cond: (public.t2.id = t1.id)
   SubPlan 3
 -  Index Scan using t2_pkey on public.t2  (cost=0.00..0.49 rows=1
width=4) (actual time=0.005..0.006 rows=1 loops=1)
   Output: public.t2.id
   Index Cond: (public.t2.id = t1.id)
   SubPlan 4
 -  Index Scan using t2_pkey on public.t2  (cost=0.00..0.49 rows=1
width=4) (actual time=0.004..0.005 rows=1 loops=1)
   Output: public.t2.id
   Index Cond: (public.t2.id = t1.id)
   SubPlan 5
 -  Index Scan using t2_pkey on public.t2  (cost=0.00..0.49 rows=1
width=4) (actual time=0.004..0.004 rows=1 loops=1)
   Output: public.t2.id
   Index Cond: (public.t2.id = t1.id)
   SubPlan 6
 -  Index Scan using t2_pkey on public.t2  (cost=0.00..0.49 rows=1
width=4) (actual time=0.004..0.004 rows=1 loops=1)
   Output: public.t2.id
   Index Cond: (public.t2.id = t1.id)
   SubPlan 7
 -  Index Scan using t2_pkey on public.t2  (cost=0.00..0.49 rows=1
width=4) (actual time=0.004..0.004 rows=1 loops=1)
   Output: public.t2.id
   Index Cond: (public.t2.id = t1.id)
   SubPlan 8
 -  Index Scan using t2_pkey on public.t2  (cost=0.00..0.49 rows=1
width=4)

Re: [BUGS] BUG #6335: Weird planner decision with exists (a join b) condition

2011-12-15 Thread Maxim Boguk

On Thu, Dec 15, 2011 at 7:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 maxim.bo...@gmail.com writes:
  EXPLAIN analyze select *
  from applicant_adv_subscription aas
  where
  aas.user_id in (5112699,7995496)
  and exists (
  SELECT * from resume
  join resume_view_history using (resume_id)
  where
  resume.user_id = aas.user_id
  );

 I'm hoping to fix this type of case with the generalized inner
 indexscan work that I've been nattering about for a year or two now.
 What you need to make this fast, given that resume and
 resume_view_history are both large, is to push the current value of
 aas.user_id down into the table scan of resume --- and because the join
 and semijoin can't be reordered, that's not possible with the planner's
 current simpleminded idea of what an inner indexscan can be.

 The other example you show manages to luck out and get a good plan due
 to transitive propagation of equality conditions, but that's a narrow
 special case.  Any other form of constraint whatsoever on aas is going
 to end up with the crummy plan where the whole lower join gets computed.

regards, tom lane


Thank you very much for information.
Rewriting the query did the trick and resolved performance issues.

Do you plan create generalized inner indexscan mechanics for 9.2 version?

-- 
Maxim Boguk

[BUGS] BUG #6335: Weird planner decision with exists (a join b) condition

2011-12-14 Thread maxim . boguk

The following bug has been logged on the website:

Bug reference:  6335
Logged by:  Maksym Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.0.4
Operating system:   Linux Ubuntu
Description:

I was explored reasons of high DB load and I localized the next problem
query:

That is correct version:

EXPLAIN ANALYZE  select *
from applicant_adv_subscription aas
where
aas.user_id in (5112699)
and exists (
SELECT * from resume
join resume_view_history using (resume_id)
where
resume.user_id = aas.user_id
);




 QUERY PLAN 
  
--
 Nested Loop Semi Join  (cost=0.00..20.51 rows=1 width=65) (actual
time=0.031..0.032 rows=1 loops=1)
   -  Index Scan using applicant_adv_subscription_user_id_key on
applicant_adv_subscription aas  (cost=0.00..0.02 rows=1 width=65) (actual
time=0.011..0.012 rows=1 loops=1)
 Index Cond: (user_id = 5112699)
   -  Nested Loop  (cost=0.00..20.49 rows=3118 width=4) (actual
time=0.018..0.018 rows=1 loops=1)
 -  Index Scan using resume_user_id_key on resume  (cost=0.00..0.13
rows=18 width=8) (actual time=0.008..0.008 rows=1 loops=1)
   Index Cond: (user_id = 5112699)
 -  Index Scan using resume_view_history_fk73b63ccd36b06a5 on
resume_view_history  (cost=0.00..0.95 rows=173 width=4) (actual
time=0.009..0.009 rows=1 loops=1)
   Index Cond: (resume_view_history.resume_id =
resume.resume_id)
 Total runtime: 0.080 ms


But once I add second value into IN list plan become completely screwed:

EXPLAIN analyze select *
from applicant_adv_subscription aas
where
aas.user_id in (5112699,7995496)
and exists (
SELECT * from resume
join resume_view_history using (resume_id)
where
resume.user_id = aas.user_id
);


 QUERY
PLAN
-
 Nested Loop Semi Join  (cost=38967.39..735076.65 rows=2 width=65) (actual
time=14656.388..389866.211 rows=1 loops=1)
   Join Filter: (aas.user_id = resume.user_id)
   -  Bitmap Heap Scan on applicant_adv_subscription aas  (cost=0.02..0.04
rows=2 width=65) (actual time=0.018..0.021 rows=2 loops=1)
 Recheck Cond: (user_id = ANY ('{5112699,7995496}'::integer[]))
 -  Bitmap Index Scan on applicant_adv_subscription_user_id_key 
(cost=0.00..0.02 rows=2 width=0) (actual time=0.014..0.014 rows=2 loops=1)
   Index Cond: (user_id = ANY ('{5112699,7995496}'::integer[]))
   -  Hash Join  (cost=38967.36..726839.23 rows=272203680 width=4) (actual
time=13267.456..182842.841 rows=136136926 loops=2)
 Hash Cond: (resume_view_history.resume_id = resume.resume_id)
 -  Seq Scan on resume_view_history  (cost=0.00..282228.92
rows=272203680 width=4) (actual time=0.004..25574.666 rows=136161776
loops=2)
 -  Hash  (cost=21737.05..21737.05 rows=16110150 width=8) (actual
time=13260.145..13260.145 rows=16114222 loops=2)
   Buckets: 2097152  Batches: 2  Memory Usage: 314776kB
   -  Seq Scan on resume  (cost=0.00..21737.05 rows=16110150
width=8) (actual time=0.005..8839.480 rows=16114222 loops=2)
 Total runtime: 389866.374 ms


Ooops.

Changes in *_cost settings have no effect.
Disabling seq_scan/merge_join/hash_join doest not help as well.

Nothing special about tables, all required indexes on the place.



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6335: Weird planner decision with exists (a join b) condition

2011-12-14 Thread Maxim Boguk

Here goes self-contained test case.

I tested it on the 9.1.2, 9.1.1, 9.0.5, 9.0.4, 8.4.7
all of them affected by the problem:

select version();
drop table if exists test1;
drop table if exists test2;
drop table if exists test3;
CREATE TABLE test1 AS SELECT user_id FROM generate_series(1,100) AS
gs(user_id);
CREATE TABLE test2 AS SELECT user_id,user_id AS resume_id FROM
generate_series(1,100) AS gs(user_id);
CREATE TABLE test3 AS SELECT user_id AS resume_id FROM
generate_series(1,100) AS gs(user_id);
create index test1_user_id_key on test1(user_id);
create index test2_user_id_key on test2(user_id);
create index test2_resume_id_key on test2(resume_id);
create index test3_resume_id_key on test3(resume_id);
analyze test1;
analyze test2;
analyze test3;
--good
EXPLAIN ANALYZE  select *
from test1
where
test1.user_id in (100)
and exists (
SELECT * from test2
join test3 using (resume_id)
where
test2.user_id = test1.user_id
);
--bad
EXPLAIN ANALYZE  select *
from test1
where
test1.user_id in (100, 101)
and exists (
SELECT * from test2
join test3 using (resume_id)
where
test2.user_id = test1.user_id
);



On Thu, Dec 15, 2011 at 1:00 AM, Alvaro Herrera
alvhe...@commandprompt.comwrote:


 Excerpts from maxim.boguk's message of mié dic 14 08:09:38 -0300 2011:

  But once I add second value into IN list plan become completely screwed:

 See here:

 http://archives.postgresql.org/message-id/1309918036-sup-4...@alvh.no-ip.org
 Perhaps it's a similar problem.

 Maybe you'd get enthused enough to try to fix the problem?

 --
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support




-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
все.

Re: [BUGS] BUG #6335: Weird planner decision with exists (a join b) condition

2011-12-14 Thread Maxim Boguk

On Thu, Dec 15, 2011 at 12:00 PM, bricklen brick...@gmail.com wrote:

 On Wed, Dec 14, 2011 at 4:53 PM, Maxim Boguk maxim.bo...@gmail.com
 wrote:
  Here goes self-contained test case.
 
  I tested it on the 9.1.2, 9.1.1, 9.0.5, 9.0.4, 8.4.7

 I just tested on 9.1.2 and see the same issue.

  --bad
  EXPLAIN ANALYZE  select *
  from test1
  where
  test1.user_id in (100, 101)
  and exists (
  SELECT * from test2
  join test3 using (resume_id)
  where
  test2.user_id = test1.user_id
  );

 Setting enable_hashjoin to false pushes it back to a good plan again.


Could you show explain analyze of the good plan please?

If you getting plan like:

QUERY PLAN
-
 Nested Loop Semi Join  (cost=3.78..91844.51 rows=2 width=4) (actual
time=0.634..0.750 rows=2 loops=1)
   Join Filter: (test1.user_id = test2.user_id)
   -  Bitmap Heap Scan on test1  (cost=2.54..4.57 rows=2 width=4) (actual
time=0.013..0.015 rows=2 loops=1)
 Recheck Cond: (user_id = ANY ('{100,101}'::integer[]))
 -  Bitmap Index Scan on test1_user_id_key  (cost=0.00..2.54
rows=2 width=0) (actual time=0.009..0.009 rows=2 loops=1)
   Index Cond: (user_id = ANY ('{100,101}'::integer[]))
   -  Materialize  (cost=1.23..64339.94 rows=100 width=4) (actual
time=0.018..0.305 rows=100 loops=2)
 -  Merge Join  (cost=1.23..59339.94 rows=100 width=4) (actual
time=0.032..0.446 rows=101 loops=1)
   Merge Cond: (test2.resume_id = test3.resume_id)
   -  Index Scan using test2_resume_id_key on test2
(cost=0.00..22170.28 rows=100 width=8) (actual time=0.009..0.076
rows=101 loops=1)
   -  Index Scan using test3_resume_id_key on test3
(cost=0.00..22170.28 rows=100 width=4) (actual time=0.007..0.075
rows=101 loops=1)
 Total runtime: 0.785 ms

Try use high values for the user_id :

EXPLAIN ANALYZE  select *
from test1
where
test1.user_id in (9, 91)
and exists (
SELECT * from test2
join test3 using (resume_id)
where
test2.user_id = test1.user_id
);


QUERY PLAN
--
 Nested Loop Semi Join  (cost=3.78..91844.51 rows=2 width=4) (actual
time=432.266..4457.799 rows=2 loops=1)
   Join Filter: (test1.user_id = test2.user_id)
   -  Bitmap Heap Scan on test1  (cost=2.54..4.57 rows=2 width=4) (actual
time=0.057..0.063 rows=2 loops=1)
 Recheck Cond: (user_id = ANY ('{9,91}'::integer[]))
 -  Bitmap Index Scan on test1_user_id_key  (cost=0.00..2.54
rows=2 width=0) (actual time=0.050..0.050 rows=2 loops=1)
   Index Cond: (user_id = ANY ('{9,91}'::integer[]))
   -  Materialize  (cost=1.23..64339.94 rows=100 width=4) (actual
time=0.011..1942.046 rows=495000 loops=2)
 -  Merge Join  (cost=1.23..59339.94 rows=100 width=4) (actual
time=0.018..2805.842 rows=91 loops=1)
   Merge Cond: (test2.resume_id = test3.resume_id)
   -  Index Scan using test2_resume_id_key on test2
(cost=0.00..22170.28 rows=100 width=8) (actual time=0.007..571.851
rows=91 loops=1)
   -  Index Scan using test3_resume_id_key on test3
(cost=0.00..22170.28 rows=100 width=4) (actual time=0.006..594.484
rows=91 loops=1)
 Total runtime: 4467.887 ms


-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
все.

Re: [BUGS] BUG #6326: SELECT ARRAY(SELECT ...) syntax and array[] results doesn't work

2011-12-04 Thread Maxim Boguk

On Mon, Dec 5, 2011 at 2:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 maxim.bo...@gmail.com writes:
  SELECT ARRAY(SELECT ...)
  doesn't work when subselect return any array.
  Is that syntax supposed to work with anyarray types?

 No.

regards, tom lane


Hi.

Thank you very much for answer.

Ok the second problem (possible related to first becuse error is the same):

array_agg doesn't work with anyarray as well:

sports=# select array_agg(val) FROM (SELECT array[1,2]::integer[] as val
UNION ALL SELECT array[3,4]::integer[]) as t;
ERROR:  could not find array type for data type integer[]


PS: I try create two-dimentional integer array from  query results:

select val1::integer, val2::integer from somequery

to get integer[][] with content like:
[[val1-1,val1-2], [val2-1, val2-2], val[3-1, val3-2]... ]

Is there any way to perform that task it with reasonable efficiency?


-- 
Maxim Boguk
Senior Postgresql DBA.

Re: [BUGS] BUG #6307: intarray extention gin index does not work with Hot standby

2011-11-28 Thread Maxim Boguk

On Mon, Nov 28, 2011 at 6:02 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On Fri, Nov 25, 2011 at 6:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Maxim Boguk maxim.bo...@gmail.com writes:
  I know GIST on intarray[] do not have that problem.
  Very likely the problem is limited to intarray[] GIN indexes only
  (but I going to test some other not-well known GIN indexes tomorrow).
 
  Broken FTS indexes on Hot Standby should be known years before.
 
  You might think that, but you'd be wrong :-(.

 Yes, that did sound ominous.

  ginRedoUpdateMetapage
  is failing to restore the contents of the pending-list correctly,
  which means this is broken for all types of GIN indexes.  Will fix.

 Great detective work Tom as ever, much appreciated.


Thank you very much.
Is that fix will be included to the next minor versions releases?
(especially into 9.1.2)?

-- 
Maxim Boguk
Senior Postgresql DBA.

Re: [BUGS] BUG #6307: intarray extention gin index does not work with Hot standby

2011-11-25 Thread Maxim Boguk

On Fri, Nov 25, 2011 at 11:17 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On Thu, Nov 24, 2011 at 11:12 PM, Maksym Boguk maxim.bo...@gmail.com
 wrote:

  postgres=# SELECT * from test where sections  '{2000}';
   id | sections
  +--
  (0 rows)
 
  Ooops.

 Can you see if this is just intarray or if there are other failing cases?

 It would be good to get more info on this before I start investigating.
 Thanks


I know GIST on intarray[] do not have that problem.
Very likely the problem is limited to intarray[] GIN indexes only
(but I going to test some other not-well known GIN indexes tomorrow).

Broken FTS indexes on Hot Standby should be known years before.
And I never heard such reports.

Re: [BUGS] BUG #6296: High level backend-write activity on db

2011-11-16 Thread Maxim Boguk

 On Wed, Nov 16, 2011 at 4:07 AM, Maksym Boguk maxim.bo...@gmail.com wrote:
 Again I not sure it is actual bug or something else go wrong with my
 database.

 One of my databases just started produce 256 wal segments in 5 minutes
 instead of 100-200 wal segments per hour (averages).

 In the same time write and read activity on DB stay same (according to the
 pg_stat_database data and historical graphs).
 No anti-wraparound vacuum and such things going on.
 Situation staying same during last few days.


 I found some other anomalies:
 1)5 minutes after
 SELECT pg_stat_reset_shared('bgwriter');

 mirtesen=# SELECT * from pg_stat_bgwriter ;
  checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean |
 maxwritten_clean | buffers_backend | buffers_alloc
 ---+-++---+-
 -+-+---
                 0 |               2 |             171675 |           804 |
              0 |         1010429 |       1131106
 (1 row)

 eg almost any new buffer allocated by backend leading to backend write out
 dirty page (and that situation do not change overtime).

 This sure sounds like a ring buffer is being used, suggesting VACUUM
 or COPY IN activity.

 What does pg_stat_activity say?

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


I very sorry, it was a false alert.

Monitoring was based on sum() over pg_stat_all_tables.
But problem was bugged cron script which were every minute looping over

create table something_tmp as query (like 20M entries);
instant after
drop table something_tmp;

As a result of that instant drop -  an insert volumes were missed by monitoring.

PS: xlogdump really great tool and it was good help in locating and
fixing problem.

-- 
Maxim Boguk
Senior Postgresql DBA.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6288: Is ALTER ROLE set client_encoding broken in 9.1?

2011-11-10 Thread Maxim Boguk

Thank you very much for information. I lost some brain cells during
fighting with that problem yesterday.

However personally I'm think this behavior is rather non-intuitive:

postgres@db13:~$ psql -U slony -d test -c 'show client_encoding'
 client_encoding
-
 UTF8
(1 row)

postgres@db13:~$ psql -U slony -d test -c 'show client_encoding' | head -10
 client_encoding
-
 KOI8R
(1 row)


On Fri, Nov 11, 2011 at 1:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Maksym Boguk maxim.bo...@gmail.com writes:
 Description:        Is ALTER ROLE set client_encoding broken in 9.1?

 No, but I think psql prefers to set its encoding according to its locale
 enviroment these days.

                        regards, tom lane




-- 
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6267: Wrong results in pg_database_size

2011-10-25 Thread Maxim Boguk

 Maksym Boguk maxim.bo...@gmail.com writes:
 Somehow pg_database_size producing results which are 2х more then reality.

 I can't reproduce any such problem here.  I suspect you overlooked some
 tablespaces in your manual du commands.

                        regards, tom lane


No other tablespaces exists as can be seen in output of:
SELECT oid,spcname,spclocation,pg_size_pretty(pg_tablespace_size(spcname))  from
pg_tablespace;
  oid  |  spcname   |  spclocation  | pg_size_pretty
---++---+
  1663 | pg_default |   | 21 MB
  1664 | pg_global  |   | 1092 kB
 16443 | main   | /db/base/main | 124 GB

However I seem to have found reason for such strange behavior.

That is a duplicating link in pg_tblspc:
postgres@db:~$ ls -la /var/lib/postgresql/9.0/main/pg_tblspc/
total 8
drwx--  2 postgres postgres 4096 Jun  3 04:08 .
drwx-- 12 postgres postgres 4096 Sep 28 17:08 ..
lrwxrwxrwx  1 postgres postgres   13 Jun  6 14:39 16384 - /db/base/main
lrwxrwxrwx  1 postgres postgres   13 Jun  3 04:08 16443 - /db/base/main

I have no idea how could this happen.
No crashes/power-offs and so on happens on that database for more then year.

Can the second (16384 - /db/base/main) link be safely deleted?

-- 
Maxim Boguk
Senior Postgresql DBA.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #6097: Server crash when enabling custom_variable_classes

2011-07-08 Thread Maxim Boguk


The following bug has been logged online:

Bug reference:  6097
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.0 9.1beta
Operating system:   Linux/Freebsd
Description:Server crash when enabling custom_variable_classes
Details: 

I found crash bug in custom_variable_classes work.

Test case simple:
start server with empty custom_variable_classes

Then add to config file:

custom_variable_classes = 'something'# list of custom
variable class names
something.location = 'ru'

And then perform
select pg_reload_conf();

Ooops... you have no more working server (but client connections staying
alive).

Same crash with  9.0 and 9.1beta.

Adding just 
custom_variable_classes = ''
and pg_reload_conf() do not crash server.
But after I added .location = 'ru' to config and perform second
pg_reload_conf() server crashed again.

The same happens if server started with just custom_variable_classes =
'',
after I added .location = 'ru' and performend pg_reload_conf() server
crashed.

However with both option enabled server start and work without any issues.

PS: it is exeptionally strange no one happen to be hit by that problem
before.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6088: copy to stdout cannot be stopped with kill(pid) or pg_terminate_backend

2011-07-02 Thread Maxim Boguk

, table_name, pk_id, old_row,
new_row) TO stdout;
with pg_terminate_backend(procpid) or kill (procpid).

Which FreeBSD are you on? Version and architecture.

Can you obtain a backtrace of the backend you've sent the kill signal to,
after you've set the signal? Given the rest of what you've said, I'll bet
it's blocked in the kernel waiting to send data down a socket. If the
kernel's socket writes aren't interrupted by signals, there's nothing much
Pg can do about it.

At the same time copy to stdout completely ignore fact other side (pg_dump
itself) was killed long time ago.

That's got to be a buffering issue, surely. Once the TCP send buffers fill
up the backend would block in write() as the OS tries to send enough data to
make room in the send buffer for the next write.

If your OS keeps big buffers, Pg might've buffered many megabytes of data
before you killed the client side. It might keep on trying to send that data
for quite a long time if it doesn't get notified that the other peer
(pg_dump in this case) has vanished.

How were you running pg_dump? ssh was involved, so I'm wondering if you were
running pg_dump over an ssh tcp tunnel. Was that it? Or were you running
pg_dump on the server side and dumping the SQL text down the ssh connection
to the client? If the latter, then unless you killed ssh as well the OS
would merrily keep on sending the already-buffered data from before pg_dump
was killed; you'd have to kill sshd or the client side of ssh to actually
stop the data flow.

To help with this sort of thing, reduce your send and receive buffer sizes.
You'll pay a price in terms of throughput, but you'll gain responsiveness.

Without more information, it's hard to know if this is anything except a
buffering issue. Possibly one that doesn't even have anything to do with Pg
- it could easily just be sshd writes being buffered.

--
Craig Ringer

--
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #6088: copy to stdout cannot be stopped with kill(pid) or pg_terminate_backend

2011-07-02 Thread Maxim Boguk

After some experiments I found not only copy to stdout but simple
select count(*) from changes can not be stopped too.
However select count(*) finishes successfully and no long IO locks or
waits was found in ktrace results.

Because situation seems unusual I tried to find what not usual with
the table itself.

Table size:
SELECT pg_size_pretty(pg_relation_size('changes'));
 pg_size_pretty

 152 GB

However, and here is unusual part: first 135GB of the table is
completely dead/empty space without single live tuple
(how that happen is another story, if short it is result of many
unsuccessful attempts of Slony populate the table on on slave
server... so it was begin; delete from changes; copy table from stdin;
get an error (cluster wide statement timeout); abort; repeat N times
over night).

So, (and here going pure theory), while code perform seq scan over
large empty space it is not perform check for interrupts while looping
over completely dead/empty area.

PS: just to compare backtrace of unstoppable count(*)

#0  0x000801961b5c in read () from /lib/libc.so.7
#1  0x00658968 in FileRead (file=7, buffer=0xc8edc6ae0
╫\023, amount=8192) at fd.c:1151
#2  0x0066b2b2 in mdread (reln=0xdd08895e0,
forknum=MAIN_FORKNUM, blocknum=762045, buffer=0xc8edc6ae0 ╫\023) at
md.c:618
#3  0x0066ce5a in smgrread (reln=0xdd08895e0,
forknum=MAIN_FORKNUM, blocknum=762045, buffer=0xc8edc6ae0 ╫\023)
at smgr.c:401
#4  0x00653495 in ReadBuffer_common (smgr=0xdd08895e0,
isLocalBuf=0 '\000', forkNum=MAIN_FORKNUM, blockNum=762045,
mode=RBM_NORMAL, strategy=0xdd08b0410, hit=0x7fffde53 ) at
bufmgr.c:426
#5  0x00652eea in ReadBufferExtended (reln=0xdd0849f38,
forkNum=MAIN_FORKNUM, blockNum=762045, mode=RBM_NORMAL,
strategy=0xdd08b0410) at bufmgr.c:240
#6  0x00468ec9 in heapgetpage (scan=0xdd08b, page=762045)
at heapam.c:221
#7  0x0046a706 in heapgettup_pagemode (scan=0xdd08b,
dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:808
#8  0x0046b006 in heap_getnext (scan=0xdd08b,
direction=ForwardScanDirection) at heapam.c:1320

--part before here is the same as I see in copy to stdout backtrace

#9  0x005a4e5a in SeqNext (node=0xdd08afa10) at nodeSeqscan.c:93
#10 0x00593a10 in ExecScan (node=0xdd08afa10,
accessMtd=0x5a4d40 SeqNext) at execScan.c:68
#11 0x005a4eba in ExecSeqScan (node=0xdd08afa10) at nodeSeqscan.c:131
#12 0x00588f85 in ExecProcNode (node=0xdd08afa10) at execProcnode.c:363
#13 0x00599cec in agg_retrieve_direct (aggstate=0xdd08af550)
at nodeAgg.c:945
#14 0x00599b10 in ExecAgg (node=0xdd08af550) at nodeAgg.c:837
#15 0x00589093 in ExecProcNode (node=0xdd08af550) at execProcnode.c:431
#16 0x00586794 in ExecutePlan (estate=0xdd08af030,
planstate=0xdd08af550, operation=CMD_SELECT, numberTuples=0,
direction=ForwardScanDirection, dest=0xdd08ab790) at execMain.c:1504
#17 0x005849a5 in standard_ExecutorRun (queryDesc=0xdd0830c30,
direction=ForwardScanDirection, count=0) at execMain.c:309
#18 0x005848a5 in ExecutorRun (queryDesc=0xdd0830c30,
direction=ForwardScanDirection, count=0) at execMain.c:258
#19 0x00674e18 in PortalRunSelect (portal=0xdd0885030,
forward=1 '\001', count=0, dest=0xdd08ab790) at pquery.c:953
#20 0x00674ab9 in PortalRun (portal=0xdd0885030,
count=9223372036854775807, isTopLevel=1 '\001', dest=0xdd08ab790,
altdest=0xdd08ab790, completionTag=0x7fffe570 ) at pquery.c:779
#21 0x0066f0e6 in exec_simple_query (query_string=0x8037e7030
SELECT count(*) from changes\n;) at postgres.c:997
#22 0x00672f72 in PostgresMain (argc=4, argv=0x803732930,
username=0x803732900 pgsql) at postgres.c:3676
#23 0x0063b5cd in BackendRun (port=0x803703c00) at postmaster.c:3467
#24 0x0063ab2a in BackendStartup (port=0x803703c00) at postmaster.c:3081
#25 0x006382dc in ServerLoop () at postmaster.c:1387
#26 0x00637abb in PostmasterMain (argc=3, argv=0x7fffeb88)
at postmaster.c:1040
#27 0x005c0b5a in main (argc=3, argv=0x7fffeb88) at main.c:188


On Sat, Jul 2, 2011 at 10:00 PM, Maxim Boguk maxim.bo...@gmail.com wrote:
 Hi and thanks for responding...

 While i performing my tests I used pg_dump via local socket
 (
 pg_dump -F c -Z 0 -t changes -a db  /dev/null
 )


 Now:

 billing=# SELECT * from db_activity ;
       age       | datname | procpid | usename | waiting |
 query_start          | client_addr | client_port |
                   current_query
 -+-+-+-+-+---+-+-+--
  00:03:07.609308 | billing |   36353 | pgsql   | f       | 2011-07-02
 15:50:21.508394+04 |             |          -1 | COPY bill.changes
 (id, cdate, mdate, status, table_name, pk_id, old_row, new_row

[BUGS] BUG #6087: Unnest with multidimensional arrays

2011-06-30 Thread Maxim Boguk


The following bug has been logged online:

Bug reference:  6087
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 9.0
Operating system:   Linux
Description:Unnest with multidimensional arrays
Details: 

I not sure is it actual bug or intended behaviour.
In documentation unnest described as:
expand an array to a set of rows.

However in case of multidimensional arrays documentation is misleading or
actual behaviour is wrong.

Test case:

postgres=# select unnest(array[[1,2], [3,4], [5,6]]);
 unnest

  1
  2
  3
  4
  5
  6
(6 rows)

I think correct result must be:
unnest

{1,2}
{3,4}
{5,6}

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #5946: Long exclusive lock taken by vacuum (not full)

2011-03-25 Thread Maxim Boguk


The following bug has been logged online:

Bug reference:  5946
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 8.4
Operating system:   Linux
Description:Long exclusive lock taken by vacuum (not full)
Details: 

From documentation I know that vacuum (without full) can truncate empty
pages from end of a relation if they are free and vacuum successfully grabed
exclusive lock for short time.

However, I wasn't ready to learn that 'short exclusive lock' can be 10-20
minutes in some cases.

In my case vacuum tried to truncate last 10-15GB from 100Gb relation, and
each time (3) it was cost 10+ minutes of service downtime (because that
table was completely locked).

Is  that correct behaviour? Are here any way to speedup that process or at
least allow read-only queries during that time?

PS: no exessive disk IO observed during that 10+ min locks.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5946: Long exclusive lock taken by vacuum (not full)

2011-03-25 Thread Maxim Boguk

On Sat, Mar 26, 2011 at 4:17 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maxim Boguk maxim.bo...@gmail.com writes:
  In my case vacuum tried to truncate last 10-15GB from 100Gb relation, and
  each time (3) it was cost 10+ minutes of service downtime (because that
  table was completely locked).

  Is  that correct behaviour? Are here any way to speedup that process or
 at
  least allow read-only queries during that time?

 Use autovacuum --- if there's something that wants to access the table,
 autovac will get kicked off the lock.  (Of course, the table may never
 get truncated then, but maybe you don't care.)

regards, tom lane


Thank you for an idea.

Are having lots empty pages at end of the table can have any negative impact
on database performance (assuming I have plenty of free disk space)?
In my case these 100Gb table going to be reduced to 20Gb size actual data
located at start of the table, so I worry about possible negative impact of
having extra 80Gb free space at end of the table.

Regards,
Maxim

Re: [BUGS] BUG #5798: Some weird error with pl/pgsql procedure

2011-02-21 Thread Maxim Boguk

Hi

Yes here is one BEFORE UPDATE trigger on relation
(
Triggers:
_billing_main_denyaccess_71 BEFORE INSERT OR DELETE OR UPDATE ON
ctrl_servers FOR EACH ROW EXECUTE PROCEDURE
_billing_main.denyaccess('_billing_main')
)

However that trigger should not fire at all because storable procedure work
with set local session_replication_role to 'replica';

But tuple become virtual in these lines:

   2027 newtuple = ExecBRUpdateTriggers(estate,
resultRelInfo,

2028
tupleid, tuple);
   2029
   2030 if (newtuple == NULL)   /* do nothing */
   2031 return;
   2032
   2033 if (newtuple != tuple)  /* modified by Trigger(s) */
   2034 {
   2035 /*
   2036  * Put the modified tuple into a slot for
convenience of routines
   2037  * below.  We assume the tuple was allocated
in per-tuple memory
   2038  * context, and therefore will go away by
itself. The tuple table
   2039  * slot should not try to clear it.
   2040  */
   2041 TupleTableSlot *newslot =
estate-es_trig_tuple_slot;
   2042
   2043 if (newslot-tts_tupleDescriptor !=
slot-tts_tupleDescriptor)
   2044 ExecSetSlotDescriptor(newslot,
slot-tts_tupleDescriptor);
   2045 ExecStoreTuple(newtuple, newslot,
InvalidBuffer, false);
   2046 slot = newslot;
   2047 tuple = newtuple;
   2048 }

If be more exact at line:
   2047 tuple = newtuple;

What I don't understand fist it is why that code path fail only once per
1000-10 calls only.

I have an idea that
   2033 if (newtuple != tuple)  /* modified by Trigger(s) */
sometime can return true even if no triggers really had been done.

After more code digging I think I found reason
Look like that happens in this part of ExecBRUpdateTriggers code:

02273 /*
02274  * In READ COMMITTED isolation level it's possible that newtuple was
02275  * changed due to concurrent update.  In that case we have a
raw subplan
02276  * output tuple and need to run it through the junk filter.
02277  */
02278 if (newSlot != NULL
http://doxygen.postgresql.org/c_8h.html#070d2ce7b6bb7e5c05602aa8c308d0c4)
02279 intuple = newtuple = ExecRemoveJunk
http://doxygen.postgresql.org/execJunk_8c.html#9779200174530a361ea70e21639c2c72(relinfo-ri_junkFilter
http://doxygen.postgresql.org/structResultRelInfo.html#0d887ebe5510dd3e663e95888d72cd67,
newSlot);


That rarity of that situation become understandable (that is very very rare
case when tuple got changed in such way) and really create virtual tuple.

Look like need put tuple materialization code here as well.

PS: again may be just handwaving... I just started learn postgresql code.
Any help from more skilled persons would be great.

On Mon, Feb 21, 2011 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maxim Boguk maxim.bo...@gmail.com writes:
  I tracked virtual tuple from heaptuple.c::slot_getattr down to
  execMain.c::ExecUpdate and I think somewhere in this way virtual tuple
  should be need to be materialized into physical tuple.

 That happens at the top of ExecUpdate:

/*
 * get the heap tuple out of the tuple table slot, making sure we have a
 * writable copy
 */
tuple = ExecMaterializeSlot(slot);

 I don't see any code path there that could re-virtualize the slot.

 Do you have any triggers or constraints on the target table?  Because
 those are about the only things that influence this code path at all...

  Again I not sure but ExecProcessReturning function seems good candidate
 to
  perform tuple materialization.

 If ExecProcessReturning were given a virtual slot, materializing it
 would do no good at all for this problem, because the system columns in
 the result would be garbage.  It has to receive an already materialized
 slot that contains the right information beforehand.  The only reason
 that retrieving ctid can work at all is that when heap_update is called,
 tuple is pointing at the slot's contained tuple and so setting
 tuple-t_self changes the contents of the slot.  That whole function is
 completely dependent on the slot contents being physical not virtual,
 and I don't see anything there that would break it.

regards, tom lane




-- 
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
все.

Re: [BUGS] BUG #5798: Some weird error with pl/pgsql procedure

2011-02-21 Thread Maxim Boguk

On Tue, Feb 22, 2011 at 1:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 I wrote:
  Ugh.  That quick little ExecRemoveJunk is a lot more dangerous than it
  looks.  I had actually looked at this before, but concluded it was OK
  because I couldn't reproduce the problem with a trigger in place.
  I guess I wasn't unlucky enough to have the chance pointer equality
  occur.

 On closer inspection, the palloc collision is actually extremely likely,
 because what will happen is we'll pfree the old tuple and immediately
 palloc the new one, and if the new one is of sufficiently similar size
 that it needs the same size of alloc chunk, it's *guaranteed* to get
 that same chunk back, because of the LIFO free-chunk chains in aset.c.

 The reason that the problem is hard to reproduce is that most triggers
 (certainly those written in plpgsql) will give back newly allocated
 tuples even when you return the unmodified NEW tuple.  The only way to
 expose the problem is for ExecBRUpdateTrigger's trigger-calling loop to
 not replace the newtuple, and the easiest way for that to happen is if
 all the triggers are disabled.  So that's why you're seeing it when
 fooling with the replication-role setting.  I was able to reproduce the
 failure by creating a trigger with a false WHEN condition, and of course
 there are other ways to prevent a trigger from being called too.

regards, tom lane



Thank you very much for fixing that issue.

I apologize for not providing critical details at start of the discussion
(until yesterday they seems unimportant for me), I will try provide all
possible details in future if I find anything else like that.

Now for me start digging next strange problem described here:
http://archives.postgresql.org/pgsql-admin/2011-01/msg00055.php

-- 
Maxim Boguk
Senior Postgresql DBA.

Re: [BUGS] BUG #5798: Some weird error with pl/pgsql procedure

2011-02-20 Thread Maxim Boguk

Today I perfomed some more digging with code and gdb.

I not sure, but may be I have an idea what happens here:

RETURNING ctid fails when code implementing returning happened to get a
Virtual tuple as input.
And that ofcourse fail because virtual tuple does not have any system
columns (such as tid).
(seems no one got such error before because no one tried to use system
columns in RETURNING list).

I tracked virtual tuple from heaptuple.c::slot_getattr down to
execMain.c::ExecUpdate and I think somewhere in this way virtual tuple
should be need to be materialized into physical tuple.

I have zero expirience with reporting such results so I will try my best:
path of that virtual tuple in problem case looks like:

*(slot_getattr::slot)
*(ExecEvalVar::econtext-ecxt_scantuple)
*(ExecTargetList::econtext-ecxt_scantuple)
*(ExecProject::econtext-ecxt_scantuple)
*(ExecProject::projInfo-pi_exprContext-ecxt_scantuple)
*(ExecProcessReturning::projectReturning-pi_exprContext-ecxt_scantuple)
*(ExecUpdate::resultRelInfo-ri_projectReturning-pi_exprContext-ecxt_scantuple)

All these variable contained the same TupleTableSlot structure:

(gdb) print
*(ExecUpdate::resultRelInfo-ri_projectReturning-pi_exprContext-ecxt_scantuple)
$29 = {
  type = T_TupleTableSlot,
  tts_isempty = 0 '\000',
  tts_shouldFree = 0 '\000',
  tts_shouldFreeMin = 0 '\000',
  tts_slow = 0 '\000',
  tts_tuple = 0x0,
  tts_tupleDescriptor = 0xdd09d1030,
  tts_mcxt = 0xdd08da2e0,
  tts_buffer = 0,
  tts_nvalid = 11,
  tts_values = 0xdd09d1840,
  tts_isnull = 0xdd09d18d0 ,
  tts_mintuple = 0x0,
  tts_minhdr = {
t_len = 0,
t_self = {
  ip_blkid = {
bi_hi = 0,
bi_lo = 0
  },
  ip_posid = 44192
},
t_tableOid = 13,
t_data = 0x10
  },
  tts_off = 59333634048
}

Again I not sure but ExecProcessReturning function seems good candidate to
perform tuple materialization.

PS: May be it seems just a lot handwaving but I just started study gdb and
postgresql source code.

PPS: I must say thanks to authors of http://doxygen.postgresql.org . It is
great tool and saved me lot time today.

On Sun, Feb 20, 2011 at 12:01 PM, Maxim Boguk maxim.bo...@gmail.com wrote:



 On Sun, Feb 20, 2011 at 5:08 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maxim Boguk maxim.bo...@gmail.com writes:
  I finally managed create working slave server with non-stripped
 Postgresql
  8.4.7 and working gdb.
  (sorry for such long delay).
  And i can reproduce situation quite easy now, so i managed get stack
  backtrace with that error.
  What i need to do now? I have some expirience with gbd now but i have no
  idea where to look next.

 Now that you know what query is causing the problem:

  #9  0x00673d43 in ProcessQuery (plan=0xdd0998890,
  sourceText=0xdd09c5830 UPDATE ONLY bill.ctrl_servers SET opt = opt
  WHERE ctid=ANY($1) RETURNING ctid, params=0xdd09c58c0,
  dest=0xdd09de5c8, completionTag=0x7fffd4e0 ) at pquery.c:196

 it shouldn't be too hard to prepare a self-contained test case so that
 others can look at this.

regards, tom lane



 Sorry, but it's very unlikely I can create self-contained test case.
 That error doesn't happen on stand alone idle server.
 It's only happen at random time on table with very high update rate in
 production environment, and happen over 1-30 minutes calling my storable
 procedure in endless loop (e.g. once per 1000+ calls).
 I was long ago sure about exact place where that error happens  (becuase my
 storable procedure contains only one real sql query), but because I can't
 create self contained test case I was asked use gdb and try to get backtrace
 at point of error.

 So now I can examine and provide variable values at that point to give a
 clue about what happens here.


 --
 Maxim Boguk
 Senior Postgresql DBA.

 Skype: maxim.boguk
 Jabber: maxim.bo...@gmail.com

 LinkedIn profile: http://nz.linkedin.com/in/maximboguk
 If they can send one man to the moon... why can't they send them all?

 МойКруг: http://mboguk.moikrug.ru/
 Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
 все.




-- 
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
все.

Re: [BUGS] BUG #5798: Some weird error with pl/pgsql procedure

2011-02-19 Thread Maxim Boguk

=9223372036854775807, isTopLevel=1 '\001', dest=0xdd08996f0,
altdest=0xdd08996f0, completionTag=0x7fffe570 ) at pquery.c:779
#35 0x0066f0e6 in exec_simple_query (
query_string=0x8037e5030 \nSELECT * from
__clear_table_tail('ctrl_servers', 'bill', 'opt', 13, 1)\n) at
postgres.c:997
#36 0x00672f72 in PostgresMain (argc=4, argv=0x803732930,
username=0x803732900 pgsql) at postgres.c:3676
#37 0x0063b5cd in BackendRun (port=0x803703c00) at postmaster.c:3467
#38 0x0063ab2a in BackendStartup (port=0x803703c00) at
postmaster.c:3081
#39 0x006382dc in ServerLoop () at postmaster.c:1387
#40 0x00637abb in PostmasterMain (argc=3, argv=0x7fffeb88) at
postmaster.c:1040
#41 0x005c0b5a in main (argc=3, argv=0x7fffeb88) at main.c:188


On Wed, Jan 5, 2011 at 4:28 PM, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Dec 22, 2010 at 4:53 PM, Maxim Boguk maxim.bo...@gmail.com
 wrote:
  About stack backtrace I not sure how to get it (and even worse the
  database is heavily loaded production DB so I not sure creating core
  dump would be good idea).
  Still I would like to receive any help with creating a stack backtrace.

 Well, if you can reproduce this problem in a given session, you can
 use pg_backend_pid() at the start of the session to get the backend
 ID, and then if your build has debugging symbols (or separate
 debuginfos you can install), you could attach gdb to it, set a
 breakpoint at the lines in heaptuple.c that can throw this error,
 reproduce the problem, and then get a backtrace.  But if it's
 happening randomly I don't have a lot of ideas.

 I do wonder if this could be related to using TIDs that don't actually
 exist in the table.  That might be worth some experimentation.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company




-- 
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
все.

Re: [BUGS] BUG #5798: Some weird error with pl/pgsql procedure

2011-02-19 Thread Maxim Boguk

Just an idea from code examination:

I think problem in
 RETURNING ctid
part of query
and next comment in slot_getattr:
/*
 * system attributes are handled by heap_getsysattr
 */

So, may be RETURNING implementation somehow incorrectely handle system
attributes in returing values list.

On Sun, Feb 20, 2011 at 1:13 AM, Maxim Boguk maxim.bo...@gmail.com wrote:

 Hi all and Hi Robert

 I finally managed create working slave server with non-stripped Postgresql
 8.4.7 and working gdb.
 (sorry for such long delay).
 And i can reproduce situation quite easy now, so i managed get stack
 backtrace with that error.
 What i need to do now? I have some expirience with gbd now but i have no
 idea where to look next.

 Full backtrace:

 Breakpoint 1, slot_getattr (slot=0xdd09ea468, attnum=-1, isnull=0xdd09b4a68
 ) at heaptuple.c:1185
 1185elog(ERROR, cannot extract system
 attribute from virtual tuple);
 (gdb) bt
 #0  slot_getattr (slot=0xdd09ea468, attnum=-1, isnull=0xdd09b4a68 ) at
 heaptuple.c:1185
 #1  0x0058a5f8 in ExecEvalScalarVar (exprstate=0xdd09b4b60,
 econtext=0xdd09b4a80, isNull=0xdd09b4a68 , isDone=0xdd09b4ce0)
 at execQual.c:787
 #2  0x005933d7 in ExecTargetList (targetlist=0xdd09b4cb0,
 econtext=0xdd09b4a80, values=0xdd09b4a50, isnull=0xdd09b4a68 ,
 itemIsDone=0xdd09b4ce0, isDone=0x0) at execQual.c:5118
 #3  0x005939a3 in ExecProject (projInfo=0xdd09b4be0, isDone=0x0) at
 execQual.c:5333
 #4  0x005879c7 in ExecProcessReturning
 (projectReturning=0xdd09b4be0, tupleSlot=0xdd09ea468, planSlot=0xdd09ea3a8,
 dest=0xdd09de5c8) at execMain.c:2273
 #5  0x005875fe in ExecUpdate (slot=0xdd09ea468,
 tupleid=0x7fffd2e0, planSlot=0xdd09ea3a8, dest=0xdd09de5c8,
 estate=0xdd09e6030) at execMain.c:2142
 #6  0x00586c68 in ExecutePlan (estate=0xdd09e6030,
 planstate=0xdd09ea550, operation=CMD_UPDATE, numberTuples=0,
 direction=ForwardScanDirection, dest=0xdd09de5c8) at execMain.c:1681
 #7  0x005849a5 in standard_ExecutorRun (queryDesc=0xdd09de658,
 direction=ForwardScanDirection, count=0) at execMain.c:309
 #8  0x005848a5 in ExecutorRun (queryDesc=0xdd09de658,
 direction=ForwardScanDirection, count=0) at execMain.c:258
 #9  0x00673d43 in ProcessQuery (plan=0xdd0998890,
 sourceText=0xdd09c5830 UPDATE ONLY bill.ctrl_servers SET opt = opt
 WHERE ctid=ANY($1) RETURNING ctid, params=0xdd09c58c0,
 dest=0xdd09de5c8, completionTag=0x7fffd4e0 ) at pquery.c:196
 #10 0x006754e5 in PortalRunMulti (portal=0xdd08c8140, isTopLevel=0
 '\000', dest=0xdd09de5c8, altdest=0x9dfdc0,
 completionTag=0x7fffd4e0 ) at pquery.c:1269
 #11 0x00675127 in FillPortalStore (portal=0xdd08c8140, isTopLevel=0
 '\000') at pquery.c:1061
 #12 0x006757f8 in PortalRunFetch (portal=0xdd08c8140,
 fdirection=FETCH_FORWARD, count=10, dest=0x9dfe40) at pquery.c:1397
 #13 0x005b24e1 in _SPI_cursor_operation (portal=0xdd08c8140,
 direction=FETCH_FORWARD, count=10, dest=0x9dfe40) at spi.c:2088
 #14 0x005b11a8 in SPI_cursor_fetch (portal=0xdd08c8140, forward=1
 '\001', count=10) at spi.c:1258
 #15 0x000dd0b1e199 in exec_for_query (estate=0x7fffda50,
 stmt=0xdd098bfc8, portal=0xdd08c8140, prefetch_ok=1 '\001')
 at pl_exec.c:4262
 #16 0x000dd0b1bc43 in exec_stmt_dynfors (estate=0x7fffda50,
 stmt=0xdd098bfc8) at pl_exec.c:3107
 #17 0x000dd0b1881a in exec_stmt (estate=0x7fffda50,
 stmt=0xdd098bfc8) at pl_exec.c:1308
 #18 0x000dd0b1859a in exec_stmts (estate=0x7fffda50,
 stmts=0xdd098bc28) at pl_exec.c:1203
 #19 0x000dd0b18f38 in exec_stmt_while (estate=0x7fffda50,
 stmt=0xdd098d6f8) at pl_exec.c:1608
 #20 0x000dd0b18733 in exec_stmt (estate=0x7fffda50,
 stmt=0xdd098d6f8) at pl_exec.c:1264
 #21 0x000dd0b1859a in exec_stmts (estate=0x7fffda50,
 stmts=0xdd098a650) at pl_exec.c:1203
 #22 0x000dd0b183df in exec_stmt_block (estate=0x7fffda50,
 block=0xdd098d778) at pl_exec.c:1140
 #23 0x000dd0b16a29 in plpgsql_exec_function (func=0xdd087abd8,
 fcinfo=0x7fffdcf0) at pl_exec.c:315
 #24 0x000dd0b1260f in plpgsql_call_handler (fcinfo=0x7fffdcf0) at
 pl_handler.c:95
 ---Type return to continue, or q return to quit---
 #25 0x0058c902 in ExecMakeTableFunctionResult
 (funcexpr=0xdd08ca730, econtext=0xdd08ca460, expectedDesc=0xdd08ca5f0,
 randomAccess=0 '\000') at execQual.c:2016
 #26 0x005a3e7d in FunctionNext (node=0xdd08ca350) at
 nodeFunctionscan.c:64
 #27 0x00593a10 in ExecScan (node=0xdd08ca350, accessMtd=0x5a3e10
 FunctionNext) at execScan.c:68
 #28 0x005a3eea in ExecFunctionScan (node=0xdd08ca350) at
 nodeFunctionscan.c:98
 #29 0x00588fdf in ExecProcNode (node=0xdd08ca350) at
 execProcnode.c:385
 #30 0x00586794 in ExecutePlan (estate=0xdd08ca030,
 planstate=0xdd08ca350, operation=CMD_SELECT, numberTuples=0,
 direction=ForwardScanDirection

Re: [BUGS] BUG #5798: Some weird error with pl/pgsql procedure

2011-02-19 Thread Maxim Boguk

On Sun, Feb 20, 2011 at 5:08 AM, Tom Lane t...@sss.pgh.pa.us wrote:

Maxim Boguk maxim.bo...@gmail.com writes:
I finally managed create working slave server with non-stripped
Postgresql
8.4.7 and working gdb.
(sorry for such long delay).
And i can reproduce situation quite easy now, so i managed get stack
backtrace with that error.
What i need to do now? I have some expirience with gbd now but i have no
idea where to look next.

Now that you know what query is causing the problem:

#9 0x00673d43 in ProcessQuery (plan=0xdd0998890,
sourceText=0xdd09c5830 UPDATE ONLY bill.ctrl_servers SET opt = opt
WHERE ctid=ANY($1) RETURNING ctid, params=0xdd09c58c0,
dest=0xdd09de5c8, completionTag=0x7fffd4e0 ) at pquery.c:196

it shouldn't be too hard to prepare a self-contained test case so that
others can look at this.

regards, tom lane

Sorry, but it's very unlikely I can create self-contained test case.
That error doesn't happen on stand alone idle server.
It's only happen at random time on table with very high update rate in
production environment, and happen over 1-30 minutes calling my storable
procedure in endless loop (e.g. once per 1000+ calls).
I was long ago sure about exact place where that error happens (becuase my
storable procedure contains only one real sql query), but because I can't
create self contained test case I was asked use gdb and try to get backtrace
at point of error.

So now I can examine and provide variable values at that point to give a
clue about what happens here.

--
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
все.

Re: [BUGS] BUG #5885: Strange rows estimation for left join

2011-02-16 Thread Maxim Boguk

On Wed, Feb 16, 2011 at 6:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maxim Boguk maxim.bo...@gmail.com writes:
  Test case look like:

  create table references ( attr_id integer, reference integer,
  object_id integer );
  insert into references select *100**(random()),
  *10**(random()^*10*), *100**(random()) from
  generate_series(*1*,*1000*);
  create index xif01references on references ( reference, attr_id );
  create index xif02references on references ( object_id, attr_id,
 reference );

  analyze references;

  explain select * from references rs left join references vm on
  vm.reference = rs.reference and vm.attr_id = *10* where rs.object_id =
  *1000*;

 I don't believe there's actually anything very wrong here.  The
 large-looking estimate for the join size is not out of line: if you try
 different values for object_id you will find that some produce more rows
 than that and some produce less.  If we had cross-column stats we could
 maybe derive a better estimate, but as-is you're getting an estimate
 that is probably about right on the average, depending on whether the
 particular object_id matches to more common or less common reference
 values.

 The thing that looks funny is that the inner indexscan rowcount estimate
 is so small, which is because that's being done on the assumption that
 the passed-in rs.reference value is random.  It's not really --- it's
 more likely to be one of the more common reference values --- which is
 something that's correctly accounted for in the join size estimate but
 not in the inner indexscan estimate.

regards, tom lane


Thank you very much for answer.

Are I correct in my assumption:
estimated row counts in both sides of the join isn't related to estimated
resulting row count of the join because they are calculated independently?

If that assumption correct than which values used to select between nested
loop and merge/hash joins (estimated resulting join rows or estimated row
counts on each sides of the join)?
I asking because in some cases these two values can lead to different plans.

PS: I just calculated how many questions I had in mail lists about
postgresql planner. Look like it's time to me dust off my memory about C and
start read planner code by myself. :)

Kind Regards, Maxim

Re: [BUGS] BUG #5885: Strange rows estimation for left join

2011-02-15 Thread Maxim Boguk

Hi.

Test case look like:

create table references ( attr_id integer, reference integer,
object_id integer );
insert into references select *100**(random()),
*10**(random()^*10*), *100**(random()) from
generate_series(*1*,*1000*);
create index xif01references on references ( reference, attr_id );
create index xif02references on references ( object_id, attr_id, reference );

analyze references;

explain select * from references rs left join references vm on
vm.reference = rs.reference and vm.attr_id = *10* where rs.object_id =
*1000*;

explain analyze select * from references rs left join references
vm on vm.reference = rs.reference and vm.attr_id = *10* where
rs.object_id = *1000*;

On my system (8.4.4) it producing next results:

postgres=# explain select * from references rs left join
references vm on vm.reference = rs.reference and vm.attr_id = 10
where rs.object_id = 1000;

  QUERY PLAN
---
 Nested Loop Left Join  (cost=0.00..7.53 rows=107283 width=24)

   -  Index Scan using xif02references on references rs
(cost=0.00..0.58 rows=11 width=12)
 Index Cond: (object_id = 1000)
   -  Index Scan using xif01references on references vm
(cost=0.00..0.53 rows=8 width=12)

 Index Cond: ((vm.reference = rs.reference) AND (vm.attr_id = 10))

(again 11 rows * 8 rows107283 rows)


postgres=# explain analyze select * from references rs left join
references vm on vm.reference = rs.reference and vm.attr_id = 10
where rs.object_id = 1000;


QUERY PLAN
--

 Nested Loop Left Join  (cost=0.00..7.53 rows=107283 width=24) (actual
time=0.077..733.810 rows=117011 loops=1)
   -  Index Scan using xif02references on references rs
(cost=0.00..0.58 rows=11 width=12) (actual time=0.036..0.079 rows=10
loops=1)

 Index Cond: (object_id = 1000)
   -  Index Scan using xif01references on references vm
(cost=0.00..0.53 rows=8 width=12) (actual time=0.028..37.242
rows=11701 loops=10)
 Index Cond: ((vm.reference = rs.reference) AND (vm.attr_id = 10))

On Tue, Feb 15, 2011 at 4:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maxim Boguk maxim.bo...@gmail.com writes:
  I found that strange effect while helping with slow query on russian
  postgresql online forum.

 Please try to put together a self-contained test case for this.
 I could not reproduce such a weird result here, but that probably
 just means there's something strange about your data distribution.

regards, tom lane




-- 
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

МойКруг: http://mboguk.moikrug.ru/
Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не
все.

[BUGS] BUG #5885: Strange rows estimation for left join

2011-02-14 Thread Maxim Boguk


The following bug has been logged online:

Bug reference:  5885
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 8.4.4
Operating system:   Linux
Description:Strange rows estimation for left join
Details: 

I found that strange effect while helping with slow query on russian
postgresql online forum.

To be short results looks like:

explain analyze
select *
from references rs
left join references vm on vm.reference = rs.reference and vm.attr_id =
20084
where rs.object_id = 9129863193713091717;

Nested Loop Left Join  (cost=0.00..1567.71 rows=2129718 width=112) (actual
time=14.654..14.672 rows=2 loops=1)
  -  Index Scan using xif02references on references rs  (cost=0.00..19.85
rows=11 width=56) (actual time=0.019..0.020 rows=2 loops=1)
Index Cond: (object_id = 9129863193713091717::numeric)
  -  Index Scan using xif01references on references vm  (cost=0.00..140.03
rows=55 width=56) (actual time=7.321..7.321 rows=0 loops=2)
Index Cond: ((vm.reference = rs.reference) AND (vm.attr_id =
20084::numeric))
Total runtime: 14.758 ms

E.g. 6 orders of magnitude error in result rows selectivity (2129718 vs 2).

Table description:

CREATE TABLE references
(
  attr_id numeric(20,0) NOT NULL,
  reference numeric(20,0) NOT NULL,
  object_id numeric(20,0) NOT NULL
);

CREATE INDEX xif01references
  ON references
  USING btree
  (reference, attr_id);

CREATE INDEX xif02references
  ON references
  USING btree
  (object_id, attr_id, reference);

Yes again unfortunate EAV model.

Related data from pg_stats

schemanamepublic
tablename references

attname   attr_id
null_frac 0
avg_width 10
n_distinct23
most_common_vals

{20084;20085;9127479646713033746;11;14;15;9127479093313033036;457955;91273
06545213993525}
most_common_freqs {0.234067;0.233433;0.227433;0.0998333;0.0476667;0.0449;
0.0361;0.027;0.0247}

dattname  reference
null_frac 0
avg_width 14
n_distinct56252
most_common_vals

{9129830887313872119;9129830887313872121;9129787365613945251;9129676282313
943149;24332313945759;...}
most_common_freqs {0.2497;0.0138333;0.0002;0.00017;0.00017;...}

attname   object_id
null_frac 0
avg_width 15
n_distinct1.23744e+06
most_common_vals

{9129846527513534962;9129846051413838763;9129846154413978095;9129846403613
302858;9129846025513792413;...}
most_common_freqs {0.0003;0.00027;0.00027;0.00023;0.0002;0.00
02;...}

What seems very strange for me is final esctimation 2M when estimated 11
rows on right side of the join and 55 rows on left side of the join (so
final rows must be 55*11 ~ 600 rows even with worst case of left join
selectivity=1).

May be that effect already fixed in 8.4.5+ (but I had readed changes list
since 8.4.4 and I don't find any related changes).

PS: Sorry if I completely missunderstood of planner mechanics in left join
rows estimation.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5797: Strange bug with hstore

2010-12-25 Thread Maxim Boguk

On Tue, Dec 21, 2010 at 7:48 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Maxim Boguk maxim.bo...@gmail.com writes:
 Bad explain:
 billing=# EXPLAIN SELECT * from domains where
 name='name=somedomain'::text::hstore-'name';
                                              QUERY PLAN
 
 
  Seq Scan on domains  (cost=0.00..7775.91 rows=1 width=230)
    Filter: ((name)::text = (('name=somedomain'::text)::hstore -
 'name'::text))
 (index not used)

 The reason for this is that hstore fails to label its I/O functions as
 immutable (or even stable) in 8.4.  This is fixed in 9.0.

                        regards, tom lane


Sorry for late responce and thank you for clarification.

My research of that problem leads me to conclusion that can be fixed
on 8.4.4 with:
 ALTER FUNCTION hstore_in(cstring) IMMUTABLE;
but is it really safe?
Or text-hstore conversion in 8.4 is really VOLATILE as labeled in
hstrore module?

Regards, Maxim

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5798: Some weird error with pl/pgsql procedure

2010-12-22 Thread Maxim Boguk

It happened again in much simpler case (no slony or deadlock message
involved... no slony at all on the database).

The same error
DBD::Pg::db selectall_arrayref failed: ERROR:  cannot extract system
attribute from virtual tuple
CONTEXT:  PL/pgSQL function __clear_table_tail line 50 at FOR over
EXECUTE statement

It happened when function worked with small table with only 33 pages
size and 600 live tuples.
What made situations even more easy to track it is I know this table
don't have any inserts or deletes for ages (only select and update)
and I have full log of database queries during problem.

Log of call of the procedure:
2010-12-23 00:21:04 MSK pg...@billing 55717 ERROR:  cannot extract
system attribute from virtual tuple
2010-12-23 00:21:04 MSK pg...@billing 55717 CONTEXT:  PL/pgSQL
function __clear_table_tail line 50 at FOR over EXECUTE statement
2010-12-23 00:21:04 MSK pg...@billing 55717 STATEMENT:
SELECT * from __clear_table_tail('ctrl_servers', 'bill', 'opt', 21, 1)

In the same time log all actions involving changes on table
ctrl_servers during +/- 3 second from ERROR:
2010-12-23 00:21:03 MSK b...@billing 42078 LOG:  duration: 0.736 ms
statement: UPDATE ctrl_servers SET last_connect = 'now' WHERE ( id =
'1514' )
Just one query.

Line where ERROR happens:
FOR _new_tid IN EXECUTE 'UPDATE ONLY ' ||
quote_ident(arg_relnamespace) || '.' || quote_ident(arg_relname) || '
SET ' || arg_field || ' = ' || arg_field || ' WHERE ctid=ANY($1)
RETURNING ctid' USING _current_tid_list LOOP

Need be said _current_tid_list at start of procedure call contained
all possible tids for 21 page of relation (not only for live tuples
but all possible (in range like {21,1} to {21,300+})).

I have an idea the error happens because some rare race conditions
with tid scan and concurrent updates in the same page (probably HOT
somehow involved... because both updateable fields in function and
concurrent update query updated non-indexed fields).

Currently I work on creating self contained test case (unfortunately
without much success until now).

About stack backtrace I not sure how to get it (and even worse the
database is heavily loaded production DB so I not sure creating core
dump would be good idea).
Still I would like to receive any help with creating a stack backtrace.

On Wed, Dec 22, 2010 at 5:22 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Dec 20, 2010 at 9:48 PM, Maxim Boguk maxim.bo...@gmail.com wrote:
 Anyone can enlighten me what happens here?

 That does look weird, but without a simple test case I think it's
 going to be hard to investigate this.  Installing Slony and your code
 and building a 130GB is more effort than I'm willing to put in...

 Any way you can get a stack backtrace at the point the error occurs?

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company




-- 
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
МойКруг: http://mboguk.moikrug.ru/

Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #5797: Strange bug with hstore

2010-12-20 Thread Maxim Boguk


The following bug has been logged online:

Bug reference:  5797
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 8.4.4
Operating system:   Freebsd
Description:Strange bug with hstore
Details: 

One day ago I analyzed slow query for one of my clients and found strange
query plan. After some digging I localized something which I think is bug.

The bug can be seen in these two explains:

Good explain:
billing=# EXPLAIN SELECT * from domains where
name='name=somedomain'::hstore-'name';
  QUERY PLAN

--
 Index Scan using domains_name on domains  (cost=0.00..0.29 rows=1
width=230)
   Index Cond: ((name)::text = 'somedomain'::text)
(index used)

Bad explain:
billing=# EXPLAIN SELECT * from domains where
name='name=somedomain'::text::hstore-'name';
 QUERY PLAN


 Seq Scan on domains  (cost=0.00..7775.91 rows=1 width=230)
   Filter: ((name)::text = (('name=somedomain'::text)::hstore -
'name'::text))
(index not used)

As can be seen no index was used in second case. I tested some variants and
found conditions like field1=other_field::text::hstore-'key' never using
index on field1.

Ofcourse client case was much more complicated and contained 9 joins... but
troublesome part was looked like:

billing=# EXPLAIN SELECT
es.params
FROM services es
JOIN domains dm ON dm.name = (es.params::hstore)-'name'
WHERE
es.shortname = 'exchange_accepted_domain'
;
   QUERY PLAN

-
 Nested Loop  (cost=0.00..27990293.56 rows=3289 width=32)
   Join Filter: ((dm.name)::text = ((es.params)::hstore - 'name'::text))
   -  Index Scan using services_shortname on services es  (cost=0.00..68.50
rows=2406 width=32)
 Index Cond: ((shortname)::text = 'exchange_accepted_domain'::text)
   -  Seq Scan on domains dm  (cost=0.00..3918.31 rows=385760 width=15)
(5 rows)

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #5798: Some weird error with pl/pgsql procedure

2010-12-20 Thread Maxim Boguk


The following bug has been logged online:

Bug reference:  5798
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 8.4.4
Operating system:   FreeBSD 7.2
Description:Some weird error with pl/pgsql procedure
Details: 

While I testing my table compactor utility 
(http://code.google.com/p/compacttable/), I sometime getting exeptionally
weird errors from pl/pgsql procedure.

2010-12-21 05:17:19 MSK 17263 postg...@hh from 192.168.1.129
[vxid:22/1898656 txid:20671567] [SELECT] ERROR:  cannot extract system
attribute from virtual tuple
2010-12-21 05:17:19 MSK 17263 postg...@hh from 192.168.1.129
[vxid:22/1898656 txid:20671567] [SELECT] CONTEXT:  PL/pgSQL function
__clear_table_tail line 50 at FOR over EXECUTE statement
2010-12-21 05:17:19 MSK 17263 postg...@hh from 192.168.1.129
[vxid:22/1898656 txid:20671567] [SELECT] STATEMENT:
SELECT * from __clear_table_tail('vacancy_view', 'public', 'count_',
25053, 25)

Now 50th line of the procedure is:
--trying update all tuples in current page range
FOR _new_tid IN EXECUTE 'UPDATE ONLY ' ||
quote_ident(arg_relnamespace) || '.' || quote_ident(arg_relname) || ' SET '
|| arg_field || ' = ' || arg_field || ' WHERE ctid=ANY($1) RETURNING ctid'
USING _current_tid_list LOOP

This error usually happen with another error from Slony side (this error
easy to understand and not an problem at all):
2010-12-21 05:15:08 MSK 24679 sl...@hh from [local] [vxid:37/4336609
txid:20671079] [UPDATE] ERROR:  could not serialize access due to concurrent
update
2010-12-21 05:15:08 MSK 24679 sl...@hh from [local] [vxid:37/4336609
txid:20671079] [UPDATE] STATEMENT:  update only public.vacancy_view set
count_='1428' where vacancy_id='3568635';
...

I think there some strange concurrency issues with pl/pgsql and slony, and I
don't understand reason of that error.

This error not easy to reproduce but happens for me 3 times during tests on
130GB slave database run with my utility.

Anyone can enlighten me what happens here?

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #5649: strange (probably bugged) explain analyze output

2010-09-08 Thread Maxim Boguk


The following bug has been logged online:

Bug reference:  5649
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 8.4.4
Operating system:   FreeBSD 7.2
Description:strange (probably bugged) explain analyze output
Details: 

I not sure it is actual bug. But it is look like bug for me in expain
analyze output (or even in plan but unlikely).

All tables was analyzed just before test. All statistic data look reasonable
good for me. Query plan is the same for the master and the slave servers so
it isn't some single server random error.

Here is details (most simplified version of very long analytical query where
effect can be observed):

EXPLAIN ANALYZE
SELECT
*
FROM resstat_2010_08
JOIN services 
ON resstat_2010_08.service_id=services.id
JOIN service_types 
ON service_types.shortname = services.shortname
LEFT JOIN plan_s 
ON  service_types.abonent_service=true 
AND plan_s.shortname=service_types.shortname 
AND resstat_2010_08.tarif_id=plan_s.tarif_id


  QUERY PLAN



 Merge Left Join  (cost=85621.96..88523.79 rows=277396 width=572) (actual
time=20128.316..93314.317 rows=272430 loops=1)
   Merge Cond: ((resstat_2010_08.tarif_id = plan_s.tarif_id) AND
((service_types.shortname)::text = (plan_s.shortname)::text))
   Join Filter: service_types.abonent_service
   -  Sort  (cost=85302.62..85996.11 rows=277396 width=411) (actual
time=20110.866..20517.657 rows=272159 loops=1)
 Sort Key: resstat_2010_08.tarif_id, service_types.shortname
 Sort Method:  quicksort  Memory: 147305kB
 -  Hash Join  (cost=51.47..60223.82 rows=277396 width=411) (actual
time=3.413..16929.164 rows=272159 loops=1)
   Hash Cond: ((services.shortname)::text =
(service_types.shortname)::text)
   -  Merge Join  (cost=42.97..56401.12 rows=277396 width=246)
(actual time=1.967..15198.751 rows=272159 loops=1)
 Merge Cond: (resstat_2010_08.service_id = services.id)
 -  Index Scan using resstat_2010_08_service_id_key on
resstat_2010_08  (cost=0.00..4260.35 rows=277396 width=83) (actual
time=0.025..874.850 rows=277396 loops=1)
 -  Index Scan using services_pkey on services 
(cost=0.00..42226.11 rows=2770621 width=163) (actual time=0.017..7966.282
rows=2812840 loops=1)
   -  Hash  (cost=3.83..3.83 rows=374 width=165) (actual
time=1.421..1.421 rows=374 loops=1)
 -  Seq Scan on service_types  (cost=0.00..3.83
rows=374 width=165) (actual time=0.016..0.605 rows=374 loops=1)
   -  Sort  (cost=233.86..242.35 rows=3399 width=161) (actual
time=15.138..26960.441 rows=19164919 loops=1)
 Sort Key: plan_s.tarif_id, plan_s.shortname
 Sort Method:  quicksort  Memory: 604kB
 -  Seq Scan on plan_s  (cost=0.00..34.49 rows=3399 width=161)
(actual time=0.021..5.129 rows=3399 loops=1)
 Total runtime: 93715.497 ms

All plan look reasonable for me except that part:
   -  Sort  (cost=233.86..242.35 rows=3399 width=161) (actual
time=15.138..26960.441 rows=19164919 loops=1)
 Sort Key: plan_s.tarif_id, plan_s.shortname
 Sort Method:  quicksort  Memory: 604kB
 -  Seq Scan on plan_s  (cost=0.00..34.49 rows=3399 width=161)
(actual time=0.021..5.129 rows=3399 loops=1)

It looks very strange, in that part of the query 3399 rows of result was
translated to 19164920 rows after sort. And that sort took almost 27
seconds.

And one more strange things is merge join with that result took almost 50
seconds (more then half time of total work time).

I feel here something definitely wrong here, but I can't say what.

PS: all field types in the query except shortname have integer type, and
shortname have type character varying(32).

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5503: error in trigger function with dropped columns

2010-06-15 Thread Maxim Boguk

HI all.

Look like no one think this behavior is bug.
Then need change documentation probably, because in
http://www.postgresql.org/docs/8.4/interactive/plpgsql-trigger.html
specified:
To alter the row to be stored, it is possible to replace single
values directly in NEW and return the modified NEW, or to build a
complete new record/row to return.

But in reality returning record or row doesn't work in insert trigger
at all in case of target table contained dropped columns.

Another interesting test:

CREATE TABLE test1 as select * from test;

now test1 table have the same structure as test
and try construct row instead of record:

CREATE OR REPLACE FUNCTION test_function() RETURNS trigger AS $$
 DECLARE
   _row   test1%ROWTYPE;
 BEGIN
   RAISE NOTICE 'NEW record = %', NEW;
   SELECT * INTO _row FROM test1 limit 1;
   RAISE NOTICE '_row record = %', _row;
   RETURN _row;
 END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER test_trigger before insert on test for each row EXECUTE
PROCEDURE test_function();

INSERT INTO test values (1);
NOTICE:  NEW record = (1)
NOTICE:  _row record = (1)
ERROR:  returned row structure does not match the structure of the
triggering table
DETAIL:  Number of returned columns (1) does not match expected column
count (3).
CONTEXT:  PL/pgSQL function test_function during function exit

So i can't return record, and i can return row from table of the same
structure. And that all because trigger function somehow think need
return all columns of table including dropped column.

If this behavior is not a bug, than documentation should be changed
(because or to build a complete new record/row to return will never
work if table contained dropped columns).

On Mon, Jun 14, 2010 at 11:20 AM, Maxim Boguk maxim.bo...@gmail.com wrote:
 I see... but anyway this bug does not allow use return record value
 from a trigger in table contained at least one dropped column, and
 even worse trigger will work on fresh loaded copy of production
 database and would pass all possible tests, but on production database
 it is stop working. Moreover, full functional system can become broken
 if single column dropped from table contained such trigger.
 E.g. functionality of such trigger depends of dropped column history
 of the table, which is wrong (IMHO).

 I was tried another test trigger on table with dropped column, and get
 even more funny results (trigger awaiting return record contained all
 rows from table include dropped so I tried construct such record):

 CREATE OR REPLACE FUNCTION test_function() RETURNS trigger AS $$
  DECLARE
   _row   record;
  BEGIN
   RAISE NOTICE 'NEW record = %', NEW;
   SELECT *,2,3 INTO _row FROM test limit 1;
   RAISE NOTICE '_row record = %', _row;
   RETURN _row;
  END;
 $$ LANGUAGE plpgsql;

 postgres=# insert into test values (1);
 NOTICE:  NEW record = (1)
 NOTICE:  _row record = (1,2,3)
 ERROR:  returned row structure does not match the structure of the
 triggering table
 DETAIL:  Returned type integer does not match expected type N/A
 (dropped column) in column pg.dropped.2.
 CONTEXT:  PL/pgSQL function test_function during function exit

 I think changes in 9.0 now mask actual bug instead of fix it. If I was
 wrong, still would be useful to know how to use return record from
 trigger function in that case, because I can't make a working version
 at all.

 On Mon, Jun 14, 2010 at 4:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Maksym Boguk maxim.bo...@gmail.com writes:
 This bug hard to describe. But in general if a table contained dropped
 columns you cannot use return record variable in trigger function.

 This is fixed for 9.0 ... or at least the specific test case you provide
 doesn't fail.  We have not risked back-porting the change though,
 because there are other aspects of what the new code does that might
 cause people problems, eg
 http://archives.postgresql.org/pgsql-hackers/2010-03/msg00444.php
 http://archives.postgresql.org/message-id/6645.1267926...@sss.pgh.pa.us

                        regards, tom lane




 --
 Maxim Boguk
 Senior Postgresql DBA.

 Skype: maxim.boguk
 Jabber: maxim.bo...@gmail.com

 LinkedIn profile: http://nz.linkedin.com/in/maximboguk
 МойКруг: http://mboguk.moikrug.ru/

 Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.




-- 
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
МойКруг: http://mboguk.moikrug.ru/

Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5503: error in trigger function with dropped columns

2010-06-13 Thread Maxim Boguk

I see... but anyway this bug does not allow use return record value
from a trigger in table contained at least one dropped column, and
even worse trigger will work on fresh loaded copy of production
database and would pass all possible tests, but on production database
it is stop working. Moreover, full functional system can become broken
if single column dropped from table contained such trigger.
E.g. functionality of such trigger depends of dropped column history
of the table, which is wrong (IMHO).

I was tried another test trigger on table with dropped column, and get
even more funny results (trigger awaiting return record contained all
rows from table include dropped so I tried construct such record):

CREATE OR REPLACE FUNCTION test_function() RETURNS trigger AS $$
 DECLARE
   _row   record;
 BEGIN
   RAISE NOTICE 'NEW record = %', NEW;
   SELECT *,2,3 INTO _row FROM test limit 1;
   RAISE NOTICE '_row record = %', _row;
   RETURN _row;
 END;
$$ LANGUAGE plpgsql;

postgres=# insert into test values (1);
NOTICE:  NEW record = (1)
NOTICE:  _row record = (1,2,3)
ERROR:  returned row structure does not match the structure of the
triggering table
DETAIL:  Returned type integer does not match expected type N/A
(dropped column) in column pg.dropped.2.
CONTEXT:  PL/pgSQL function test_function during function exit

I think changes in 9.0 now mask actual bug instead of fix it. If I was
wrong, still would be useful to know how to use return record from
trigger function in that case, because I can't make a working version
at all.

On Mon, Jun 14, 2010 at 4:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Maksym Boguk maxim.bo...@gmail.com writes:
 This bug hard to describe. But in general if a table contained dropped
 columns you cannot use return record variable in trigger function.

 This is fixed for 9.0 ... or at least the specific test case you provide
 doesn't fail.  We have not risked back-porting the change though,
 because there are other aspects of what the new code does that might
 cause people problems, eg
 http://archives.postgresql.org/pgsql-hackers/2010-03/msg00444.php
 http://archives.postgresql.org/message-id/6645.1267926...@sss.pgh.pa.us

                        regards, tom lane




-- 
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
МойКруг: http://mboguk.moikrug.ru/

Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

[BUGS] BUG #5328: GIN index with fastupdates=on provide wrong result on bitmap scan

2010-02-16 Thread Maxim Boguk


The following bug has been logged online:

Bug reference:  5328
Logged by:  Maxim Boguk
Email address:  maxim.bo...@gmail.com
PostgreSQL version: 8.4.1
Operating system:   linux kernel v 2.6.18
Description:GIN index with fastupdates=on provide wrong result on
bitmap scan
Details: 

I cannot create self-containing test case because fresh loaded table from
dump don't have such effect.
Bug itself is returning some extra unrelated tuples from bitmap scan over
complex GIN index.
Bug not random server issue (same problem exists on the master database and
on both londiste slaves).

Here is query which show bug:
SELECT file_type_did FROM
file
WHERE
file.file_type_did='6903057'::bigint
AND file.obj_status_did = 1
AND (file.obj_tsvector @@ (make_tsquery('(море)',
'utf8_russian')))
offset 0
) AS t
where t.file_type_did'6903057'::bigint;
---
   6906314
   6902260
   6906314
   6906314
   6902260
   6902260
...
   6902260
   6906314
(83 rows)

E.g. inner query returned 83 rows with complete unrelated file_type_did

Now let me show explain of an inner query:
db=#  EXPLAIN SELECT file_type_did FROM
db-# file
db-# WHERE
db-# file.file_type_did='6903057'::bigint
db-# AND file.obj_status_did = 1
db-# AND (file.obj_tsvector @@ (make_tsquery('(море)',
'utf8_russian')))
db-# offset 0;
QUERY PLAN

--
 Limit  (cost=58.47..722.08 rows=5253 width=8)
   -  Bitmap Heap Scan on file  (cost=58.47..722.08 rows=5253 width=8)
 Recheck Cond: ((file_type_did = 6903057::bigint) AND
(obj_tsvector @@ '''море'''::tsquery) AND (obj_status_did = 1))
 -  Bitmap Index Scan on i_file__photo_search  (cost=0.00..57.16
rows=5253 width=0)
   Index Cond: ((file_type_did = 6903057::bigint) AND
(obj_tsvector @@ '''море'''::tsquery))


Description of used GIN index:
i_file__photo_search gin (file_type_did, photo_vislvl, obj_tsvector) WHERE
obj_status_did = 1

where is related table fields is:
 file_type_did  | bigint   | not null
 photo_vislvl   | smallint | not null default 9
 obj_tsvector   | tsvector | not null default
''::tsvector

Additional information:
1)doing reindex index i_file__photo_search; remove an problem for a short
time (1-60 minutes depend user=update activity on the table).
2)VACUUM files; have the same effect for the same time.
3)8.3 server don't have such issues

My idea is bug related to GIN fastupdate engine which was implemented in 8.4
version.

I will do ALTER INDEX i_file__photo_search set (fastupdate=off); and VACUUM
file; on one of slaves and check results in next 24 hours to confirm or
contradict my idea.

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

62 matches

Mail list logo