[BUGS] BUG #8408: Not exactly correct error messages for failed check in presence of dropped columns.
The following bug has been logged on the website: Bug reference: 8408 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.2.4 Operating system: Any Description: There are simple test case: create table test_columns (id serial primary key, value int check(value10)); NOTICE: CREATE TABLE will create implicit sequence test_columns_id_seq for serial column test_columns.id NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index test_columns_pkey for table test_columns CREATE TABLE insert into test_columns(value) values (100); ERROR: new row for relation test_columns violates check constraint test_columns_value_check DETAIL: Failing row contains (1, 100). --it's ok... --now let drop value column and add new one alter table test_columns drop column value; ALTER TABLE alter table test_columns add column value int check (value 10); ALTER TABLE insert into test_columns(value) values (100); ERROR: new row for relation test_columns violates check constraint test_columns_value_check DETAIL: Failing row contains (2, null, 100). -- now (2, null, 100) sound weird... And yes I completely understand where this NULL came from and why this happened. However, I think error message should filter dropped column and strange null values (because if a table has a lot dropped column this error message quickly become completely unreadable). Regards, Maksym -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #8176: problem with the ALTER TYPE name RENAME TO new_name [ CASCADE | RESTRICT ] syntax
The following bug has been logged on the website: Bug reference: 8176 Logged by: Maksym Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.2.4 Operating system: Any Description: Hi, It seems that documentation wrong or [ CASCADE | RESTRICT ] feature of ALTER TYPE ... RENAME TO isn't implemented. Test case: create type test as (qqq integer); CREATE TYPE alter type test rename to test1 RESTRICT; ERROR: syntax error at or near RESTRICT LINE 1: alter type test rename to test1 RESTRICT; ^ alter type test rename to test1 CASCADE; ERROR: syntax error at or near CASCADE LINE 1: alter type test rename to test1 CASCADE; ^ However: [local]:5432 postgres@postgres=# alter type test rename to test1; ALTER TYPE -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #7920: Sequence rename leave stale value for sequence_name
The following bug has been logged on the website: Bug reference: 7920 Logged by: Maksym Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.2.3 Operating system: Linux Description: sequence_name left stale after sequence rename: Test case shows same problem on versions 9.0 9.1 9.2: [postgres]=# create sequence qqq; [postgres]=# SELECT sequence_name FROM qqq; sequence_name --- qqq [postgres]=# alter sequence qqq rename to lalala; --surprise [postgres]=# SELECT sequence_name FROM lalala; sequence_name --- qqq pg_dump -F p -s postgres | grep qqq --empty -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7920: Sequence rename leave stale value for sequence_name
I don't find this particularly suprising. Nothing looks at that field in sequences, there imo is no point on having the name inside at all. Do you need that for some usecase or did you just happen to notice it? I personally don't see any way to nicely fix that. We can add code to also change the contents, but currently thats generic code. Or we could just remove the column in the next release? Well, this story began with one very bad named sequence. After database structure audit the developers were asked to rename this sequence to something more appropriate. And when they performed alter ... rename they found that the name still same. After that they came to me with questions. Task is remove that very bad name from the production database altogether. It seems that the easiest way is drop sequence and create new sequece. +1 for just remove the column in the next release -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
Re: [BUGS] BUG #7703: Error with partitioning with distinct over aggregate
On Tue, Nov 27, 2012 at 5:01 AM, Tom Lane t...@sss.pgh.pa.us wrote: maxim.bo...@gmail.com writes: test=# create table test (id serial); NOTICE: CREATE TABLE will create implicit sequence test_id_seq for serial column test.id CREATE TABLE test=# create index test_id_key on test(id); CREATE INDEX test=# create table test_part_1 (like test including all) INHERITS (test); NOTICE: merging column id with inherited definition CREATE TABLE test=# explain analyze select distinct min(id) from test; ERROR: failed to re-find MinMaxAggInfo record Fixed, thanks for the report. regards, tom lane Thank you very much. PS: I know that the query sounds stupid, but most CMS/ORM unfortunately could produce unlimited amount of strange/stupid queries. -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #7703: Error with partitioning with distinct over aggregate
The following bug has been logged on the website: Bug reference: 7703 Logged by: Maksym Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.2.1 Operating system: Any Description: Simple test case: test=# create table test (id serial); NOTICE: CREATE TABLE will create implicit sequence test_id_seq for serial column test.id CREATE TABLE test=# create index test_id_key on test(id); CREATE INDEX test=# create table test_part_1 (like test including all) INHERITS (test); NOTICE: merging column id with inherited definition CREATE TABLE test=# explain analyze select distinct min(id) from test; ERROR: failed to re-find MinMaxAggInfo record PS: without partioning or without index over id - no error. Tested 9.0.* versions is not affected by bug, but 9.2.1 and 9.1.6 is. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #7612: Wrong result with join between two values () set
The following bug has been logged on the website: Bug reference: 7612 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.2.1 Operating system: Linux Description: Join between two values() set could produce wrong results: Test case: Correct answer: SELECT a.val, b.val FROM (VALUES( (2), (1) )) AS a (val) JOIN (VALUES( (2), (42) )) AS b (val) ON a.val = b.val; val | val -+- 2 | 2 (1 row) now just change position of (2) and (1) in a(val): Wrong answer: SELECT a.val, b.val FROM (VALUES( (1), (2) )) AS a (val) JOIN (VALUES( (2), (42) )) AS b (val) ON a.val = b.val; val | val -+- (0 rows) explain (analyze, verbose) results of the both queries: mboguk=# explain (analyze, verbose) SELECT a.val, b.val FROM (VALUES( (2), (1) )) AS a (val) JOIN (VALUES( (2), (42) )) AS b (val) ON a.val = b.val; QUERY PLAN - Nested Loop (cost=0.00..0.04 rows=1 width=8) (actual time=0.070..0.118 rows=1 loops=1) Output: *VALUES*.column1, *VALUES*.column1 Join Filter: (*VALUES*.column1 = *VALUES*.column1) - Values Scan on *VALUES* (cost=0.00..0.01 rows=1 width=4) (actual time=0.016..0.027 rows=1 loops=1) Output: *VALUES*.column1, *VALUES*.column2 - Values Scan on *VALUES* (cost=0.00..0.01 rows=1 width=4) (actual time=0.013..0.024 rows=1 loops=1) Output: *VALUES*.column1, *VALUES*.column2 Total runtime: 0.209 ms (8 rows) mboguk=# explain (analyze, verbose) SELECT a.val, b.val FROM (VALUES( (1), (2) )) AS a (val) JOIN (VALUES( (2), (42) )) AS b (val) ON a.val = b.val; QUERY PLAN - Nested Loop (cost=0.00..0.04 rows=1 width=8) (actual time=0.056..0.056 rows=0 loops=1) Output: *VALUES*.column1, *VALUES*.column1 Join Filter: (*VALUES*.column1 = *VALUES*.column1) Rows Removed by Join Filter: 1 - Values Scan on *VALUES* (cost=0.00..0.01 rows=1 width=4) (actual time=0.008..0.013 rows=1 loops=1) Output: *VALUES*.column1, *VALUES*.column2 - Values Scan on *VALUES* (cost=0.00..0.01 rows=1 width=4) (actual time=0.007..0.013 rows=1 loops=1) Output: *VALUES*.column1, *VALUES*.column2 Total runtime: 0.100 ms (9 rows) -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7612: Wrong result with join between two values () set
On 10/19/12, Vik Reykja vikrey...@gmail.com wrote: On Thu, Oct 18, 2012 at 5:40 PM, maxim.bo...@gmail.com wrote: The following bug has been logged on the website: Bug reference: 7612 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.2.1 Operating system: Linux Description: Join between two values() set could produce wrong results: This is not a bug; your test case produces correct results. Your VALUES clauses are producing one row with two columns each called val and column2. You are joining on val and so when you switch the values to put 2 in different columns, no results are found. Is it possible you didn't intend to put the parentheses immediately following the VALUES keyword? I think that is the case because your parentheses around the numbers are superfluous. Oops sorry for noise. I should be more careful. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #7580: repeatable test case for the BUG #7579
The following bug has been logged on the website: Bug reference: 7580 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.2.1 Operating system: Linux Description: create table t1 as select id from generate_series(1,10) as g(id); create table t2 as select id from generate_series(1,10) as g(id); alter table t1 add primary key (id); alter table t2 add primary key (id); analyze t1; analyze t2; explain select * from t1 left outer join ( select distinct t2.id from t2 where t2.id100 ) as _t on _t.id=t1.id where (t1.id100 or (t1.id100 and _t.id IS NOT NULL)); The connection to the server was lost. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7573: data loss in corner case using delete_old_cluster.sh (pg_upgrade)
Test database have a bit unusual tablespace layout: main tablespace partition was mounted inside data directory of the old cluster... E.g.: data directory - /var/lib/postgresql/9.2/main main tablespace (another partition mount point) - /var/lib/postgresql/9.2/main/largedb Can you show us the data directory path of the old and new clusters? --old-datadir=/var/lib/postgresql/9.0/main --new-datadir=/var/lib/postgresql/9.2/main second partition used as tablespace were mounted as: /var/lib/postgresql/9.0/main/largedb pg_upgrade really doesn't know what is inside that old cluster, so it just deletes everything under the data directory. Hmm... may be good idea to try opposite way: default directories and files layout in PostgreSQL data directory well documented and almost never changes. May be instead of rm -rf whole data directory try rm -rf only files and directories which sure belong to the PostgreSQL? Something along with: 1)rm -rf base global pg_clog pg_multixact ... and so on 2)produce warning if any unusual files left in data directory after that (but not delete them). 3)delete data directory itself only if that directory completely empty after step 1 and 2 PS: I know that solution will be not completely error-prone but it will prevent most probably data-loss scenarios. So it's better then nothing. PS: I also think deleting postgresql.conf and pg_hba.conf from old data directory is wrong move too... if admin forget copy pg_hba.conf to the new cluster - these settings could be lost forever after delete_old_cluster.sh . -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Maxim Boguk Senior Postgresql DBA http://www.postgresql-consulting.ru/ http://www.postgresql-consulting.com/ Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
[BUGS] BUG #7573: data loss in corner case using delete_old_cluster.sh (pg_upgrade)
The following bug has been logged on the website: Bug reference: 7573 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.2.0 Operating system: Linux Description: Hi, today while performing migration of test database (with no critical data... and that was good thing). I found very nasty corner case with using delete_old_cluster.sh after pg_upgrade. Test database have a bit unusual tablespace layout: main tablespace partition was mounted inside data directory of the old cluster... E.g.: data directory - /var/lib/postgresql/9.2/main main tablespace (another partition mount point) - /var/lib/postgresql/9.2/main/largedb Now funny part: migration was successful but after few days I decided to clear old cluster data... I echecked content of delete_old_cluster.sh but found nothing suspicious... just one string... rm -rf /var/lib/postgresql/9.2/main Well I know I should be more careful, but in result that command deleted whole tablespace data on another partition including 9.2 version tablespace. It was surprise... May be it is good idea to add: --one-file-system when removing a hierarchy recursively, skip any directory that is on a file system different from that of the corresponding command line argument to rm call into that script. However, it is Linux only feature. PS: Yes I know that keeping any foreign data inside PostgreSQL data directory is bad idea. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7500: hot-standby replica crash after an initial rsync
Does that mean that file was damaged during rsync? Not necessarily. When did you initially set up that cluster? Normally the file should get zeroed out before its being used. If the cluster was copied improperly (i.e. no pg_start/stop backup or such) it could easily happen. But I would guess that you did it properly? That server was master before planned switchover to replica server was performed. Procedure: 1)stop former master 2)trigger replica failover (thus server become new master) There are unusual part because human error from system administrator: 2.1)start former master 2.2)stop former master 3)empty pg_xlog directory on former master 4)pg_start_backup on new master 5)re-resync datafiles to former master using: rsync -aq --delete --exclude pg_xlog /db/data [former master]:/db/ 6)pg_stop_backup on new master 7)start former master as replica... that procedure used more than 10 times without any problem in the past. May be I missing something in it. Could you show pg_controldata output for primary/replica? To see how new that multixact offset is. On failing replica: [pgsql@db1 ~/tmp/postgresql-9.0.8]$ pg_controldata /db/data_back pg_control version number:903 Catalog version number: 201008051 Database system identifier: 5687760461964967531 Database cluster state: in archive recovery pg_control last modified: Wed Aug 22 13:47:35 2012 Latest checkpoint location: 925/2D41CAA0 Prior checkpoint location:925/2D41CAA0 Latest checkpoint's REDO location:925/2C0004C8 Latest checkpoint's TimeLineID: 5 Latest checkpoint's NextXID: 0/1768511856 Latest checkpoint's NextOID: 59880938 Latest checkpoint's NextMultiXactId: 361788644 Latest checkpoint's NextMultiOffset: 822580965 Latest checkpoint's oldestXID:1569182525 Latest checkpoint's oldestXID's DB: 16424 Latest checkpoint's oldestActiveXID: 1768498919 Time of latest checkpoint:Tue Aug 21 08:21:42 2012 Minimum recovery ending location: 925/2C0004C8 Backup start location:925/2C0004C8 Current wal_level setting:hot_standby Current max_connections setting: 300 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 2048 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment:16777216 Maximum length of identifiers:64 Maximum columns in an index: 32 Maximum size of a TOAST chunk:1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value On master no sense to look into it because master 1 day ahead of that replica. It's a production system. What next I should examine? If you could give me a DEBUG3 log from startup till it crashes it would be helpful. Please #define MULTIXACT_DEBUG ontop of multixact.c before doing so. Thats probably going to be to big to go through the mailinglist. I see nothing new after I rebuit postgres with that define ( /usr/local/pgsql/tmp/postgresql/bin/postgres -d 3 -D /db/data_back/ ). Just: LOCATION: readRecoveryCommandFile, xlog.c:5269 DEBUG: 0: primary_conninfo = 'host=10.12.22.19 port=5433 user=replica password=acilper' LOCATION: readRecoveryCommandFile, xlog.c:5276 DEBUG: 0: trigger_file = '/db/data/failover' LOCATION: readRecoveryCommandFile, xlog.c:5283 LOG: 0: entering standby mode LOCATION: StartupXLOG, xlog.c:5827 DEBUG: 0: checkpoint record is at 925/2D41CAA0 LOCATION: StartupXLOG, xlog.c:5894 DEBUG: 0: redo record is at 925/2C0004C8; shutdown FALSE LOCATION: StartupXLOG, xlog.c:5929 DEBUG: 0: next transaction ID: 0/1768511856; next OID: 59880938 LOCATION: StartupXLOG, xlog.c:5933 DEBUG: 0: next MultiXactId: 361788644; next MultiXactOffset: 822580965 LOCATION: StartupXLOG, xlog.c:5936 DEBUG: 0: oldest unfrozen transaction ID: 1569182525, in database 16424 LOCATION: StartupXLOG, xlog.c:5939 DEBUG: 0: MultiXact: setting next multi to 361788644 offset 822580965 LOCATION: MultiXactSetNextMXact, multixact.c:1668 DEBUG: 0: transaction ID wrap limit is 371172, limited by database with OID 16424 LOCATION: SetTransactionIdLimit, varsup.c:334 WARNING: 01000: pid of startup is: 16919, sleeping for 60s LOCATION: StartupXLOG, xlog.c:5983 DEBUG: 0: initializing for hot standby LOCATION: StartupXLOG, xlog.c:6073 LOG: 0: redo starts at 925/2C0004C8 LOCATION: StartupXLOG, xlog.c:6202 FATAL: XX000: could not access status of transaction 361788644 DETAIL: Could not read from file pg_multixact/offsets/1590 at offset 114688: Unknown error: 0. CONTEXT: xlog redo create multixact 361788644 offset 822580965: 1768508061 1768508078 LOCATION: SlruReportIOError, slru.c:840 If
[BUGS] BUG #7500: hot-standby replica crash after an initial rsync
The following bug has been logged on the website: Bug reference: 7500 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.0.8 Operating system: FreeBSD Description: Hi, Reinitialization of the replica after failover is procedure performed by me a lot times in that installation. And I never had any issues. But today after following the same pg_start_backup/rsync procedure replica doesn't start with following error: 2012-08-21 11:17:07.777 MSK 34329 @ from [vxid: txid:0] []LOG: entering standby mode 2012-08-21 11:17:07.780 MSK 34329 @ from [vxid:1/0 txid:0] []LOG: redo starts at 925/2C0004C8 2012-08-21 11:17:07.783 MSK 34329 @ from [vxid:1/0 txid:0] []FATAL: could not access status of transaction 361788644 2012-08-21 11:17:07.783 MSK 34329 @ from [vxid:1/0 txid:0] []DETAIL: Could not read from file pg_multixact/offsets/1590 at offset 114688: Unknown error: 0. 2012-08-21 11:17:07.783 MSK 34329 @ from [vxid:1/0 txid:0] []CONTEXT: xlog redo create multixact 361788644 offset 822580965: 1768508061 1768508078 2012-08-21 11:17:07.791 MSK 34327 @ from [vxid: txid:0] []LOG: startup process (PID 34329) exited with exit code 1 2012-08-21 11:17:07.791 MSK 34327 @ from [vxid: txid:0] []LOG: terminating any other active server processes I have kept all that database files for the future investigation. What I should look into first? -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7500: hot-standby replica crash after an initial rsync
I have kept all that database files for the future investigation. What I should look into first? Could you reproduce the error with log_error_verbosity=verbose? Or even better provide a backtrace with gdb? There log with log_error_verbosity=verbose: 2012-08-21 14:04:18.973 MSK 33950 @ from [vxid: txid:0] []LOG: 0: database system was interrupted while in recovery at log time 2012-08-21 08:21:42 MSK 2012-08-21 14:04:18.973 MSK 33950 @ from [vxid: txid:0] []HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2012-08-21 14:04:18.973 MSK 33950 @ from [vxid: txid:0] []LOCATION: StartupXLOG, xlog.c:5757 2012-08-21 14:04:18.973 MSK 33950 @ from [vxid: txid:0] []LOG: 0: creating missing WAL directory pg_xlog/archive_status 2012-08-21 14:04:18.973 MSK 33950 @ from [vxid: txid:0] []LOCATION: ValidateXLOGDirectoryStructure, xlog.c:3452 2012-08-21 14:04:19.009 MSK 33950 @ from [vxid: txid:0] []LOG: 0: entering standby mode 2012-08-21 14:04:19.009 MSK 33950 @ from [vxid: txid:0] []LOCATION: StartupXLOG, xlog.c:5827 2012-08-21 14:04:19.041 MSK 33951 @ from [vxid: txid:0] []LOG: 0: streaming replication successfully connected to primary 2012-08-21 14:04:19.041 MSK 33951 @ from [vxid: txid:0] []LOCATION: libpqrcv_connect, libpqwalreceiver.c:169 2012-08-21 14:04:20.140 MSK 33950 @ from [vxid:1/0 txid:0] []LOG: 0: redo starts at 925/2C0004C8 2012-08-21 14:04:20.140 MSK 33950 @ from [vxid:1/0 txid:0] []LOCATION: StartupXLOG, xlog.c:6199 2012-08-21 14:04:20.160 MSK 33950 @ from [vxid:1/0 txid:0] []FATAL: XX000: could not access status of transaction 361788644 2012-08-21 14:04:20.161 MSK 33950 @ from [vxid:1/0 txid:0] []DETAIL: Could not read from file pg_multixact/offsets/1590 at offset 114688: Unknown error: 0. 2012-08-21 14:04:20.161 MSK 33950 @ from [vxid:1/0 txid:0] []CONTEXT: xlog redo create multixact 361788644 offset 822580965: 1768508061 1768508078 2012-08-21 14:04:20.161 MSK 33950 @ from [vxid:1/0 txid:0] []LOCATION: SlruReportIOError, slru.c:840 2012-08-21 14:04:20.169 MSK 33948 @ from [vxid: txid:0] []LOG: 0: startup process (PID 33950) exited with exit code 1 2012-08-21 14:04:20.169 MSK 33948 @ from [vxid: txid:0] []LOCATION: LogChildExit, postmaster.c:2840 2012-08-21 14:04:20.169 MSK 33948 @ from [vxid: txid:0] []LOG: 0: terminating any other active server processes 2012-08-21 14:04:20.169 MSK 33948 @ from [vxid: txid:0] []LOCATION: HandleChildCrash, postmaster.c:2674 Kind Regards, Maksym
Re: [BUGS] BUG #7500: hot-standby replica crash after an initial rsync
Hi Andres, I have some problems with debug startup process with gdb... I following next sequence of commands (and got no useful results): Youre debugging the postmaster that way. The easiest way would be to just attach to the startup process with gdb -p. Not sure if you can manage that timingwise without manually adding sleeps. It's imposible without added sleep seems. Too fast for me. Where I should add sleep call in PostgreSQL source? (process managment part of PostgreSQL isn't known for me at all). Otherwise you probably will have to work with set follow-fork-mode ask in gdb. Getting to the right child isn't easy though. Unfortunatelly gdb under FreeBSD doesn't suport set follow-fork-mode. Kind Regards, Maksy
Re: [BUGS] BUG #7500: hot-standby replica crash after an initial rsync
On Wed, Aug 22, 2012 at 6:08 AM, Andres Freund and...@2ndquadrant.comwrote: On Tuesday, August 21, 2012 03:30:44 PM Maxim Boguk wrote: Hi Andres, I would add something akin to elog(WARNING, pid of startup is: %d, sleeping for 10s, getpid()); sleep(10); Hi Andres, There is it: gdb backtrace: #0 0x000800bd1ab0 in exit () from /lib/libc.so.7 #1 0x005d77f6 in proc_exit (code=1) at ipc.c:137 #2 0x00697948 in errfinish (dummy=optimized out) at elog.c:501 #3 0x00488956 in SlruReportIOError (ctl=optimized out, pageno=optimized out, xid=361788644) at slru.c:857 #4 0x0048956a in SimpleLruReadPage (ctl=0x95bf80, pageno=176654, write_ok=1 '\001', xid=361788644) at slru.c:427 #5 0x00489f7f in RecordNewMultiXact (multi=361788644, offset=822580965, nxids=2, xids=0x800e5302c) at multixact.c:764 #6 0x0048a126 in multixact_redo (lsn=..., record=0x800e53000) at multixact.c:2017 #7 0x00484bea in StartupXLOG () at xlog.c:6277 #8 0x004870a8 in StartupProcessMain () at xlog.c:9335 #9 0x0049cab2 in AuxiliaryProcessMain (argc=2, argv=0x7fffe450) at bootstrap.c:412 #10 0x005ba926 in StartChildProcess (type=StartupProcess) at postmaster.c:4442 #11 0x005bccb2 in PostmasterMain (argc=5, argv=0x7fffea38) at postmaster.c:1093 #12 0x00566f3b in main (argc=5, argv=0x7fffea38) at main.c:188 relevant database log: 2012-08-22 07:47:21.330 MSK 18155 @ from [vxid:1/0 txid:0] []LOG: 0: redo starts at 925/2C0004C8 2012-08-22 07:47:21.330 MSK 18155 @ from [vxid:1/0 txid:0] []LOCATION: StartupXLOG, xlog.c:6202 2012-08-22 07:47:21.384 MSK 18155 @ from [vxid:1/0 txid:0] []FATAL: XX000: could not access status of transaction 361788644 2012-08-22 07:47:21.384 MSK 18155 @ from [vxid:1/0 txid:0] []DETAIL: Could not read from file pg_multixact/offsets/1590 at offset 114688: Unknown error: 0. 2012-08-22 07:47:21.384 MSK 18155 @ from [vxid:1/0 txid:0] []CONTEXT: xlog redo create multixact 361788644 offset 822580965: 1768508061 1768508078 2012-08-22 07:47:21.384 MSK 18155 @ from [vxid:1/0 txid:0] []LOCATION: SlruReportIOError, slru.c:840 Kind Regards, Maksym
Re: [BUGS] BUG #7500: hot-standby replica crash after an initial rsync
On Wed, Aug 22, 2012 at 1:50 PM, Maxim Boguk maxim.bo...@gmail.com wrote: On Wed, Aug 22, 2012 at 6:08 AM, Andres Freund and...@2ndquadrant.comwrote: On Tuesday, August 21, 2012 03:30:44 PM Maxim Boguk wrote: Hi Andres, I would add something akin to elog(WARNING, pid of startup is: %d, sleeping for 10s, getpid()); sleep(10); Hi Andres, There is it: gdb backtrace: #0 0x000800bd1ab0 in exit () from /lib/libc.so.7 #1 0x005d77f6 in proc_exit (code=1) at ipc.c:137 #2 0x00697948 in errfinish (dummy=optimized out) at elog.c:501 #3 0x00488956 in SlruReportIOError (ctl=optimized out, pageno=optimized out, xid=361788644) at slru.c:857 #4 0x0048956a in SimpleLruReadPage (ctl=0x95bf80, pageno=176654, write_ok=1 '\001', xid=361788644) at slru.c:427 #5 0x00489f7f in RecordNewMultiXact (multi=361788644, offset=822580965, nxids=2, xids=0x800e5302c) at multixact.c:764 #6 0x0048a126 in multixact_redo (lsn=..., record=0x800e53000) at multixact.c:2017 #7 0x00484bea in StartupXLOG () at xlog.c:6277 #8 0x004870a8 in StartupProcessMain () at xlog.c:9335 #9 0x0049cab2 in AuxiliaryProcessMain (argc=2, argv=0x7fffe450) at bootstrap.c:412 #10 0x005ba926 in StartChildProcess (type=StartupProcess) at postmaster.c:4442 #11 0x005bccb2 in PostmasterMain (argc=5, argv=0x7fffea38) at postmaster.c:1093 #12 0x00566f3b in main (argc=5, argv=0x7fffea38) at main.c:188 relevant database log: 2012-08-22 07:47:21.330 MSK 18155 @ from [vxid:1/0 txid:0] []LOG: 0: redo starts at 925/2C0004C8 2012-08-22 07:47:21.330 MSK 18155 @ from [vxid:1/0 txid:0] []LOCATION: StartupXLOG, xlog.c:6202 2012-08-22 07:47:21.384 MSK 18155 @ from [vxid:1/0 txid:0] []FATAL: XX000: could not access status of transaction 361788644 2012-08-22 07:47:21.384 MSK 18155 @ from [vxid:1/0 txid:0] []DETAIL: Could not read from file pg_multixact/offsets/1590 at offset 114688: Unknown error: 0. 2012-08-22 07:47:21.384 MSK 18155 @ from [vxid:1/0 txid:0] []CONTEXT: xlog redo create multixact 361788644 offset 822580965: 1768508061 1768508078 2012-08-22 07:47:21.384 MSK 18155 @ from [vxid:1/0 txid:0] []LOCATION: SlruReportIOError, slru.c:840 Kind Regards, Maksym And finally gdb backtrace on build without optimization ( ./configure --prefix=/usr/local/pgsql/tmp/postgresql/ --enable-debug --enable-cassert CFLAGS='-O0' ): #0 0x000800d42ab0 in exit () from /lib/libc.so.7 #1 0x006aafc6 in proc_exit (code=1) at ipc.c:137 #2 0x007bcbb1 in errfinish (dummy=0) at elog.c:501 #3 0x004b7065 in SlruReportIOError (ctl=0xad9740, pageno=176654, xid=361788644) at slru.c:836 #4 0x004b6353 in SimpleLruReadPage (ctl=0xad9740, pageno=176654, write_ok=1 '\001', xid=361788644) at slru.c:427 #5 0x004b85dd in RecordNewMultiXact (multi=361788644, offset=822580965, nxids=2, xids=0x800f5302c) at multixact.c:764 #6 0x004b9dd1 in multixact_redo (lsn=..., record=0x800f53000) at multixact.c:2017 #7 0x004aef0c in StartupXLOG () at xlog.c:6277 #8 0x004b446d in StartupProcessMain () at xlog.c:9335 #9 0x004d9e36 in AuxiliaryProcessMain (argc=2, argv=0x7fffe8b0) at bootstrap.c:412 #10 0x00683398 in StartChildProcess (type=StartupProcess) at postmaster.c:4442 #11 0x0067ef60 in PostmasterMain (argc=5, argv=0x7fffea38) at postmaster.c:1093 #12 0x005fbf5a in main (argc=5, argv=0x7fffea38) at main.c:188 It seems failing around function SlruPhysicalReadPage at block errno = 0; if (read(fd, shared-page_buffer[slotno], BLCKSZ) != BLCKSZ) { slru_errcause = SLRU_READ_FAILED; slru_errno = errno; close(fd); return false; } It seems that it read less than BLCKSZ bytes. And it is not an error condition for read call (from man 2 read: Note that reading less than size bytes is not an error. ). Actual size of pg_multixact/offsets/1590 file is: [pgsql@db1 ~/tmp/postgresql-9.0.8]$ ls -la /db/data_back/pg_multixact/offsets/1590 -rw--- 1 pgsql pgsql 81920 Aug 21 07:46 /db/data_back/pg_multixact/offsets/1590 Now it seems that the (lseek(fd, (off_t) offset, SEEK_SET) 0) call moving seek position to outside of an actual file size. And after that, read call actually could read only 0 bytes from it. Does that mean that file was damaged during rsync? Kind Regards, Maksym What next I should examine? Kind Regards, Maksym
[BUGS] BUG #6736: Hot-standby replica crashed after failover:
The following bug has been logged on the website: Bug reference: 6736 Logged by: Maksym Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.0.4 Operating system: FreeBSD Description: Hi, May be it was fixed in more recent releases, but I can not find related things in release notices. The next sequence leads to the standby crash after failover: 1)stop master (wait untill full stop) 2)initiate failover hot-standby crashed during failover, however successfully started after manual start (and seems work without any future errors). In hot-standby logs: 2012-07-13 17:06:08.199 MSK 58208 @ from [vxid: txid:0] []FATAL: replication terminated by primary server 2012-07-13 17:06:08.788 MSK 67891 @ from [vxid: txid:0] []FATAL: could not connect to the primary server: server closed the connection unexpectedly 2012-07-13 17:06:19.521 MSK 67912 @ from [vxid: txid:0] []FATAL: terminating walreceiver process due to administrator command 2012-07-13 17:06:19.732 MSK 79724 @ from [vxid:1/0 txid:0] []WARNING: page 2026 of relation base/14482918/32905335 was uninitialized 2012-07-13 17:06:19.732 MSK 79724 @ from [vxid:1/0 txid:0] []WARNING: page 7327 of relation base/14482918/32905343 was uninitialized 2012-07-13 17:06:19.732 MSK 79724 @ from [vxid:1/0 txid:0] []WARNING: page 7299 of relation base/14482918/32905343 was uninitialized 2012-07-13 17:06:19.732 MSK 79724 @ from [vxid:1/0 txid:0] []WARNING: page 2040 of relation base/14482918/32905335 was uninitialized 2012-07-13 17:06:19.732 MSK 79724 @ from [vxid:1/0 txid:0] []WARNING: page 1761 of relation base/16424/33266809 was uninitialized 2012-07-13 17:06:19.732 MSK 79724 @ from [vxid:1/0 txid:0] []WARNING: page 1750 of relation base/16424/33266809 was uninitialized 2012-07-13 17:06:19.732 MSK 79724 @ from [vxid:1/0 txid:0] []PANIC: WAL contains references to invalid pages 2012-07-13 17:06:19.895 MSK 67435 pgsql@pgsql from [local] [vxid:2/0 txid:0] []WARNING: terminating connection because of crash of another server process 2012-07-13 17:06:19.895 MSK 67435 pgsql@pgsql from [local] [vxid:2/0 txid:0] []DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2012-07-13 17:06:19.895 MSK 67435 pgsql@pgsql from [local] [vxid:2/0 txid:0] []HINT: In a moment you should be able to reconnect to the database and repeat your command. 2012-07-13 17:07:27.072 MSK 68607 @ from [vxid: txid:0] []HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. Is that something which was fixed already? PS: update to 9.0.8 planned on next week. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #6725: hot-standby slave repeatable crashed after restart (GIN error)
The following bug has been logged on the website: Bug reference: 6725 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.0.7 Operating system: Debian Linux Description: Hi, On one of my production databases hot-standby server start repeatable crash after restart. There is log: 2012-07-10 03:30:20 MSK 12034 postgres@postgres from [local] [vxid:2/0 txid:0] [CHECKPOINT] LOG: duration: 100.262 ms statement: checkpoint; 2012-07-10 03:30:24 MSK 12034 postgres@postgres from [local] [vxid:2/0 txid:0] [CHECKPOINT] LOG: duration: 100.207 ms statement: checkpoint; 2012-07-10 03:30:41 MSK 24618 @ from [vxid: txid:0] [] LOG: received fast shutdown request 2012-07-10 03:30:41 MSK 24618 @ from [vxid: txid:0] [] LOG: aborting any active transactions 2012-07-10 03:30:41 MSK 25312 @ from [vxid: txid:0] [] FATAL: terminating walreceiver process due to administrator command 2012-07-10 03:30:41 MSK 24646 @ from [vxid: txid:0] [] LOG: shutting down 2012-07-10 03:30:41 MSK 24646 @ from [vxid: txid:0] [] LOG: database system is shut down 2012-07-10 03:30:45 MSK 12202 @ from [vxid: txid:0] [] LOG: database system was shut down in recovery at 2012-07-10 03:30:41 MSK 2012-07-10 03:30:45 MSK 12202 @ from [vxid: txid:0] [] LOG: entering standby mode 2012-07-10 03:30:45 MSK 12202 @ from [vxid:1/0 txid:0] [] LOG: redo starts at 127C/51EF6798 2012-07-10 03:31:35 MSK 12202 @ from [vxid:1/0 txid:0] [] PANIC: GIN metapage disappeared 2012-07-10 03:31:35 MSK 12202 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog redo Update metapage, node: 1663/62450/27663577 blkno: 2164 2012-07-10 03:31:35 MSK 12189 @ from [vxid: txid:0] [] LOG: startup process (PID 12202) was terminated by signal 6: Aborted 2012-07-10 03:31:35 MSK 12189 @ from [vxid: txid:0] [] LOG: terminating any other active server processes 2012-07-10 03:33:37 MSK 13071 @ from [vxid: txid:0] [] LOG: database system was interrupted while in recovery at log time 2012-07-10 02:17:21 MSK 2012-07-10 03:33:37 MSK 13071 @ from [vxid: txid:0] [] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2012-07-10 03:33:37 MSK 13071 @ from [vxid: txid:0] [] LOG: entering standby mode 2012-07-10 03:33:37 MSK 13071 @ from [vxid:1/0 txid:0] [] LOG: redo starts at 127C/51EF6798 2012-07-10 03:34:15 MSK 13071 @ from [vxid:1/0 txid:0] [] PANIC: GIN metapage disappeared 2012-07-10 03:34:15 MSK 13071 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog redo Update metapage, node: 1663/62450/27663577 blkno: 2164 2012-07-10 03:34:15 MSK 13065 @ from [vxid: txid:0] [] LOG: startup process (PID 13071) was terminated by signal 6: Aborted 2012-07-10 03:34:15 MSK 13065 @ from [vxid: txid:0] [] LOG: terminating any other active server processes and so on. I almost sure there are no disk corruption or such errors. Now some additional information from the master server: 1)1663/62450/27663577 - corresponding to the pg_default tablespace and wsfilm database. However, no entries with relfilenode=27663577 were found in pg_class. In the same time: ls -la ./9.0/main/base/62450/27663577 -rw--- 1 postgres postgres 0 Июл 10 02:29 ./9.0/main/base/62450/27663577 In wsfilm db only one GIN index exists: wsfilm=# SELECT * from pg_indexes where indexdef like '%gin%'; -[ RECORD 1 ]--- schemaname | public tablename | movie indexname | movie_fti_idx tablespace | indexdef | CREATE INDEX movie_fti_idx ON movie USING gin (fti) But it has different relfilenode: wsfilm=# SELECT relfilenode from pg_class where relname='movie_fti_idx'; -[ RECORD 1 ]- relfilenode | 27677714 2)Just few minutes before that index was rebuild on the master database: 2012-07-10 02:29:50 MSK 1480 postgres@wsfilm from [local] [vxid:33/3324338 txid:0] [idle] LOG: statement: CREATE INDEX CONCURRENTLY pgcompactor_tmp1436 ON movie USING gin (fti); 2012-07-10 02:29:52 MSK 1480 postgres@wsfilm from [local] [vxid:33/3324343 txid:0] [idle] LOG: statement: BEGIN; DROP INDEX public.movie_fti_idx; ALTER INDEX public.pgcompactor_tmp1436 RENAME TO movie_fti_idx; END; My theory that the wal replay trying replay GIN index information on already dropped index. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6725: hot-standby slave repeatable crashed after restart (GIN error)
2012-07-10 03:31:35 MSK 12202 @ from [vxid:1/0 txid:0] [] PANIC: GIN metapage disappeared This is fixed in current minor releases. http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=268ca4f57ea2dc3bf0dfb0f5c17fbda269b5d462 regards, tom lane Ooops sorry, I should check a release notes from the atest version before posting new bugs. Thank you. Kind Regards, Maksym
[BUGS] BUG #6662: Database do not push condition to subquery, test case for bug 6658
The following bug has been logged on the website: Bug reference: 6662 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.1.3 Operating system: Linux Description: I managed create simple self-contained test case for 6658. create table test as select * from generate_series(1,10) as g(val); create index test_val_special on test((val || '')); analyze test; select count(*) from test; --ok --index scan explain analyze SELECT val FROM test WHERE (val || '')='something'; QUERY PLAN --- Index Scan using test_val_special on test (cost=0.01..8.29 rows=1 width=4) (actual time=0.011..0.011 rows=0 loops=1) Index Cond: (((val)::text || ''::text) = 'something'::text) Total runtime: 0.038 ms --not ok --seq scan on 9.1.3 explain analyze SELECT val FROM ( SELECT val, (val || '') AS search_string FROM test ) AS t1 WHERE search_string='something'; QUERY PLAN -- Subquery Scan on t1 (cost=0.00..3443.00 rows=500 width=4) (actual time=47.076..47.076 rows=0 loops=1) Filter: (t1.search_string = 'something'::text) - Seq Scan on test (cost=0.00..2193.00 rows=10 width=4) (actual time=0.012..34.949 rows=10 loops=1) Total runtime: 47.091 ms --the same second query on 8.3.13 --plan correct QUERY PLAN --- Index Scan using test_val_special on test (cost=0.01..8.29 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=1) Index Cond: (((val)::text || ''::text) = 'something'::text) Total runtime: 0.018 ms Kind Regards, Maksym -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6662: Database do not push condition to subquery, test case for bug 6658
Hi, Thank you very much for answer. Explicit casting resolved an issue. Just single question: if anytextcat() and textanycat() are marked volatile, why the database allows create index on supposedly to be volatile expression: create index test_val_special on test((val || '')); ? P On Wed, May 23, 2012 at 11:46 AM, Tom Lane t...@sss.pgh.pa.us wrote: maxim.bo...@gmail.com writes: I managed create simple self-contained test case for 6658. This works fine in HEAD. The reason it doesn't work fine in 9.1 (or 9.0) is that in those branches, anytextcat() and textanycat() are marked volatile, for reasons that were good at the time but were superseded by later policy changes. So you have a subselect containing a volatile output expression, which prevents flattening of the subselect, so the restriction clause doesn't get pushed down to where it could be used with the index. For some history see http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=c82d931dd180965a9a0c06acc764404f91de8170 http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=aab353a60b95aadc00f81da0c6d99bde696c4b75 http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=3db6524fe63f0598dcb2b307bb422bc126f2b15d If this is a big problem for you, you could safely adjust the provolatile marking of those functions by hand in 9.1. It might be better just to include an explicit cast to text, though, instead of relying on the assumption that the system will let you concat an integer directly to a text string. regards, tom lane -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
Re: [BUGS] BUG #6662: Database do not push condition to subquery, test case for bug 6658
On Wed, May 23, 2012 at 12:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: if anytextcat() and textanycat() are marked volatile, why the database allows create index on supposedly to be volatile expression: create index test_val_special on test((val || '')); CREATE INDEX inlines anytextcat (which is just a SQL function) before making the volatility test, cf commit 5a86e5e19. The reason the subselect flattening code has difficulty with this case is that that happens before function inlining does, so the expression still looks volatile even though it really isn't. regards, tom lane Thank you very much for all detailed explanation and all help. Kind Regards, Maksym
[BUGS] BUG #6658: Database do not push down static condition to subquery.
The following bug has been logged on the website: Bug reference: 6658 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.1.3 Operating system: Linux Description: Hi, I have 2 queries which have the same plan/same performance on 8.3 but on 9.1.3 fist one doesn't perform well because condition doesn't pushed down to subquery: --slow on 9.1.3 but fast on 8.3.13 --helper just shell view db=# explain (analyze,verbose) select id from (SELECT id, regexp_replace(btrim(((COALESCE(NULLIF(start_number, 0) || ' '::text, ''::text) || address) || ' '::text) || suburb) || ' '::text) || state) || ' '::text) || postcode), ' +'::text, ' '::text, 'g'::text) AS search_string FROM helper WHERE start_number IS NOT NULL AND start_alpha IS NOT NULL AND end_number IS NULL ) AS _t WHERE search_string='something'; QUERY PLAN Subquery Scan on _t (cost=7715.15..2141451.98 rows=55630 width=4) (actual time=4132.959..4132.959 rows=0 loops=1) Output: _t.id Filter: (_t.search_string = 'something'::text) - Bitmap Heap Scan on data_table (cost=7715.15..1557337.03 rows=11125999 width=45) (actual time=92.290..4066.518 rows=403344 loops=1) Output: id, regexp_replace(btrimCOALESCE(((NULLIF(start_number, 0))::text || ' '::text), ''::text) || upper(regexp_replace(btrim(((COALESCE((name || ' '::text), ''::text) || COALESCE((extension || ' '::text), ''::text)) || COALESCE(direction, ''::text))), ' +'::text, ' '::text, 'g'::text))) || ' '::text) || upper(regexp_replace(btrim(COALESCE(locality_name, ''::text)), ' +'::text, ' '::text, 'g'::text))) || ' '::text) || upper(regexp_replace(btrim(COALESCE(state_code, ''::text)), ' +'::text, ' '::text, 'g'::text))) || ' '::text) || upper(regexp_replace(btrim(COALESCE(post_code, ''::text)), ' +'::text, ' '::text, 'g'::text, ' +'::text, ' '::text, 'g'::text) Recheck Cond: ((start_number IS NOT NULL) AND (upper((start_alpha)::text) IS NOT NULL) AND (end_number IS NULL)) - Bitmap Index Scan on ix_search_full5 (cost=0.00..4933.65 rows=11125999 width=0) (actual time=59.118..59.118 rows=403344 loops=1) Total runtime: 4133.001 ms (8 rows) --fast on both 9.1.3 and 8.3.13 --flat version (search_string condition manually inlined) db=# explain (analyze, verbose) SELECT id FROM helper WHERE start_number IS NOT NULL AND start_alpha IS NOT NULL AND end_number IS NULL AND regexp_replace(btrim(((COALESCE(NULLIF(start_number, 0) || ' '::text, ''::text) || address) || ' '::text) || suburb) || ' '::text) || state) || ' '::text) || postcode), ' +'::text, ' '::text, 'g'::text) = 'something'; QUERY PLAN
[BUGS] BUG #6513: explain (analyze, buffers) and toasted fields
The following bug has been logged on the website: Bug reference: 6513 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.0.7 Operating system: Linux Description: I got hit by that bug when explored reasons of one very slow production query again. And I lost a lot time trying reproduce the problem query on production server with explain analyze. Finally I found I need some workaround to get explain perform actual work with toasted data. So there is going the bug report: EXPLAIN do not take into account that some result fields will require detoasting and produce wrong buffers result. Test case: pgsql=# drop table if exists test; DROP TABLE Time: 277.926 ms pgsql=# CREATE TABLE test (id integer primary key, value text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index test_pkey for table test CREATE TABLE Time: 3.812 ms --populate table with large toasted field pgsql=# INSERT INTO test select i,(select array_agg(random()) from generate_series (1,1000))::text from generate_series (1,1000) as g(i); INSERT 0 1000 Time: 3176.286 ms --wrong explain value at least for the buffers value... and whats even worse wrong (100x less that would be actual) runtime --(Buffers: shared hit=7) - wrong value pgsql=# EXPLAIN (analyze, costs, buffers) select * from test; QUERY PLAN -- Seq Scan on test (cost=0.00..10.70 rows=1000 width=36) (actual time=0.020..1.656 rows=1000 loops=1) Buffers: shared hit=7 Total runtime: 3.252 ms (3 rows) Time: 3.704 ms --again wrong pgsql=# EXPLAIN (analyze, costs, buffers) select value from test; QUERY PLAN -- Seq Scan on test (cost=0.00..10.70 rows=1000 width=32) (actual time=0.013..1.625 rows=1000 loops=1) Buffers: shared hit=7 Total runtime: 3.141 ms (3 rows) Time: 3.428 ms --force take detoasting into account via || with toasted field --result close to reality (Buffers: shared hit=4700 - real value) pgsql=# EXPLAIN (analyze, costs, buffers) select value||'a' from test; QUERY PLAN Seq Scan on test (cost=0.00..13.20 rows=1000 width=32) (actual time=0.264..187.855 rows=1000 loops=1) Buffers: shared hit=4700 Total runtime: 189.696 ms (3 rows) Time: 190.001 ms --actual timings of the both queries the same. pgsql=# \o /dev/null pgsql=# select * from test; Time: 219.845 ms pgsql=# \o /dev/null pgsql=# select value||'a' from test; Time: 221.599 ms Having correct buffer hit/read values could be critical when toasted fields are likely to be read from the HDD. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #6422: User without any priviledges on a table can lock the table from other users in some cases
The following bug has been logged on the website: Bug reference: 6422 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.1.2 Operating system: Linux Description: Hi. Unfortunately I was hit by that problem in the real project. During active pg_dump (or any long transaction keeping lock on table), any users with access to database can lock table from access, even if that user have absolutely no rights on that table. Test scenario: 1)CREATE ROLE dummy with login password 'something'; 2)start pg_dump on database or any long transactions involving table news 3)now: psql -U dummy -d testdb = SELECT * from news; ERROR: permission denied for relation news That perfectly ok... sports= begin; BEGIN sports= lock table news; oops... table locked from everyone access untill canceled or pg_dump finished (and in realily - large web site down for 3 minute until my intervention). The same can be done with any alter table news ... ; statement. Again I not sure is it an actual bug or feature. Seems Postgres trying to put a lock on the table before check grants. May be it's possible to check grants before set any heavy lock on the table? That behavior could be very troublesome in multi-user database environment. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6393: cluster sometime fail under heavy concurrent write load
Hi. The test case: The first stage - preparation: CREATE TABLE test (id serial primary key, flag integer, value text); cluster test_pkey on test; insert into test select nextval('test_id_seq') as id, 0 as flag,(SELECT array(select random() from generate_series(1,1000)))::text as value from generate_series(1,1000) as gs(id); cluster test; The second stage - test: Console one: postgres@db10:~$ while true; do psql -d test_db -c begin;update test set flag=(random()*1000)::integer where id IN (select (random()*1000)::integer from generate_series(1,30)); select pg_sleep(1); commit;; done COMMIT COMMIT COMMIT COMMIT COMMIT COMMIT COMMIT COMMIT COMMIT COMMIT Console two: postgres@db10:~$ while true; do psql -d test_db -c cluster test;; done CLUSTER ERROR: duplicate key value violates unique constraint pg_toast_119812558_index DETAIL: Key (chunk_id, chunk_seq)=(119811577, 0) already exists. ERROR: duplicate key value violates unique constraint pg_toast_119812564_index DETAIL: Key (chunk_id, chunk_seq)=(119811601, 0) already exists. ERROR: duplicate key value violates unique constraint pg_toast_119812570_index DETAIL: Key (chunk_id, chunk_seq)=(119811494, 0) already exists. ERROR: duplicate key value violates unique constraint pg_toast_119812576_index DETAIL: Key (chunk_id, chunk_seq)=(119811552, 0) already exists. ERROR: duplicate key value violates unique constraint pg_toast_119812582_index DETAIL: Key (chunk_id, chunk_seq)=(119811525, 0) already exists. PS: no such effects happens on 9.0.4 I think it can be related to TOAST related changes in 9.0.6 On Thu, Jan 12, 2012 at 2:24 AM, Alvaro Herrera alvhe...@commandprompt.comwrote: Excerpts from maxim.boguk's message of mar ene 10 23:00:59 -0300 2012: The following bug has been logged on the website: Bug reference: 6393 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.0.6 Operating system: Linux Ubuntu Description: I have heavy write-load table under PostgreSQL 9.0.6 and sometime (not always but more then 50% chance) i'm getting the next error during cluster: db=# cluster public.enqueued_mail; ERROR: duplicate key value violates unique constraint pg_toast_119685646_index DETAIL: Key (chunk_id, chunk_seq)=(119685590, 0) already exists. chunk_id different each time. No uncommon datatypes exists in the table. Currently I work on create reproducible test case (but it seems require 2-3 open write transaction on the table). I don't see how can this be done at all, given that cluster grabs an exclusive lock on the table in question. An better example illustrating what you're really doing would be useful. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage.
[BUGS] BUG #6393: cluster sometime fail under heavy concurrent write load
The following bug has been logged on the website: Bug reference: 6393 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.0.6 Operating system: Linux Ubuntu Description: I have heavy write-load table under PostgreSQL 9.0.6 and sometime (not always but more then 50% chance) i'm getting the next error during cluster: db=# cluster public.enqueued_mail; ERROR: duplicate key value violates unique constraint pg_toast_119685646_index DETAIL: Key (chunk_id, chunk_seq)=(119685590, 0) already exists. chunk_id different each time. No uncommon datatypes exists in the table. Currently I work on create reproducible test case (but it seems require 2-3 open write transaction on the table). -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #6359: excessively inlining subquery leads to slow queries
The following bug has been logged on the website: Bug reference: 6359 Logged by: Maksym Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.1.2 Operating system: Ubuntu linux Description: Sometime Postgres inline subrequest even if it produce slower plan (and that slow plan have higher actual cost than non-inlined plan): test case: drop table if exists t1; drop table if exists t2; create table t1 as select id from generate_series(1,1) as g(id); create table t2 as select id from generate_series(1,1000) as g(id); alter table t1 add primary key (id); alter table t2 add primary key (id); analyze t1; analyze t2; --fast non-inlined plan explain (verbose, analyze) select id, t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id FROM ( select t1.id, (select t2.id from t2 where t2.id=t1.id) as t2_id from t1 offset 0 ) as t; QUERY PLAN - Subquery Scan on t (cost=0.00..0.65 rows=1 width=8) (actual time=0.066..0.069 rows=1 loops=1) Output: t.id, t.t2_id + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) + t.t2_id) - Limit (cost=0.00..0.60 rows=1 width=4) (actual time=0.053..0.056 rows=1 loops=1) Output: t1.id, ((SubPlan 1)) - Seq Scan on public.t1 (cost=0.00..0.60 rows=1 width=4) (actual time=0.052..0.053 rows=1 loops=1) Output: t1.id, (SubPlan 1) SubPlan 1 - Index Scan using t2_pkey on public.t2 (cost=0.00..0.49 rows=1 width=4) (actual time=0.025..0.028 rows=1 loops=1) Output: t2.id Index Cond: (t2.id = t1.id) Total runtime: 0.161 ms (11 rows) --slow inlined plan explain (verbose, analyze) select id, t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id+t2_id FROM ( select t1.id, (select t2.id from t2 where t2.id=t1.id) as t2_id from t1 -- offset 0 ) as t; QUERY PLAN -- Seq Scan on public.t1 (cost=0.00..8.44 rows=1 width=4) (actual time=0.180..0.181 rows=1 loops=1) Output: t1.id, (SubPlan 1) + (SubPlan 2)) + (SubPlan 3)) + (SubPlan 4)) + (SubPlan 5)) + (SubPlan 6)) + (SubPlan 7)) + (SubPlan 8)) + (SubPlan 9)) + (SubPlan 10)) + (SubPlan 11)) + (SubPlan 12)) + (SubPlan 13)) + (SubPlan 14)) + (SubPlan 15)) + (SubPlan 16)) + (SubPlan 17)) SubPlan 1 - Index Scan using t2_pkey on public.t2 (cost=0.00..0.49 rows=1 width=4) (actual time=0.025..0.028 rows=1 loops=1) Output: public.t2.id Index Cond: (public.t2.id = t1.id) SubPlan 2 - Index Scan using t2_pkey on public.t2 (cost=0.00..0.49 rows=1 width=4) (actual time=0.006..0.007 rows=1 loops=1) Output: public.t2.id Index Cond: (public.t2.id = t1.id) SubPlan 3 - Index Scan using t2_pkey on public.t2 (cost=0.00..0.49 rows=1 width=4) (actual time=0.005..0.006 rows=1 loops=1) Output: public.t2.id Index Cond: (public.t2.id = t1.id) SubPlan 4 - Index Scan using t2_pkey on public.t2 (cost=0.00..0.49 rows=1 width=4) (actual time=0.004..0.005 rows=1 loops=1) Output: public.t2.id Index Cond: (public.t2.id = t1.id) SubPlan 5 - Index Scan using t2_pkey on public.t2 (cost=0.00..0.49 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=1) Output: public.t2.id Index Cond: (public.t2.id = t1.id) SubPlan 6 - Index Scan using t2_pkey on public.t2 (cost=0.00..0.49 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=1) Output: public.t2.id Index Cond: (public.t2.id = t1.id) SubPlan 7 - Index Scan using t2_pkey on public.t2 (cost=0.00..0.49 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=1) Output: public.t2.id Index Cond: (public.t2.id = t1.id) SubPlan 8 - Index Scan using t2_pkey on public.t2 (cost=0.00..0.49 rows=1 width=4)
Re: [BUGS] BUG #6335: Weird planner decision with exists (a join b) condition
On Thu, Dec 15, 2011 at 7:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: maxim.bo...@gmail.com writes: EXPLAIN analyze select * from applicant_adv_subscription aas where aas.user_id in (5112699,7995496) and exists ( SELECT * from resume join resume_view_history using (resume_id) where resume.user_id = aas.user_id ); I'm hoping to fix this type of case with the generalized inner indexscan work that I've been nattering about for a year or two now. What you need to make this fast, given that resume and resume_view_history are both large, is to push the current value of aas.user_id down into the table scan of resume --- and because the join and semijoin can't be reordered, that's not possible with the planner's current simpleminded idea of what an inner indexscan can be. The other example you show manages to luck out and get a good plan due to transitive propagation of equality conditions, but that's a narrow special case. Any other form of constraint whatsoever on aas is going to end up with the crummy plan where the whole lower join gets computed. regards, tom lane Thank you very much for information. Rewriting the query did the trick and resolved performance issues. Do you plan create generalized inner indexscan mechanics for 9.2 version? -- Maxim Boguk
[BUGS] BUG #6335: Weird planner decision with exists (a join b) condition
The following bug has been logged on the website: Bug reference: 6335 Logged by: Maksym Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.0.4 Operating system: Linux Ubuntu Description: I was explored reasons of high DB load and I localized the next problem query: That is correct version: EXPLAIN ANALYZE select * from applicant_adv_subscription aas where aas.user_id in (5112699) and exists ( SELECT * from resume join resume_view_history using (resume_id) where resume.user_id = aas.user_id ); QUERY PLAN -- Nested Loop Semi Join (cost=0.00..20.51 rows=1 width=65) (actual time=0.031..0.032 rows=1 loops=1) - Index Scan using applicant_adv_subscription_user_id_key on applicant_adv_subscription aas (cost=0.00..0.02 rows=1 width=65) (actual time=0.011..0.012 rows=1 loops=1) Index Cond: (user_id = 5112699) - Nested Loop (cost=0.00..20.49 rows=3118 width=4) (actual time=0.018..0.018 rows=1 loops=1) - Index Scan using resume_user_id_key on resume (cost=0.00..0.13 rows=18 width=8) (actual time=0.008..0.008 rows=1 loops=1) Index Cond: (user_id = 5112699) - Index Scan using resume_view_history_fk73b63ccd36b06a5 on resume_view_history (cost=0.00..0.95 rows=173 width=4) (actual time=0.009..0.009 rows=1 loops=1) Index Cond: (resume_view_history.resume_id = resume.resume_id) Total runtime: 0.080 ms But once I add second value into IN list plan become completely screwed: EXPLAIN analyze select * from applicant_adv_subscription aas where aas.user_id in (5112699,7995496) and exists ( SELECT * from resume join resume_view_history using (resume_id) where resume.user_id = aas.user_id ); QUERY PLAN - Nested Loop Semi Join (cost=38967.39..735076.65 rows=2 width=65) (actual time=14656.388..389866.211 rows=1 loops=1) Join Filter: (aas.user_id = resume.user_id) - Bitmap Heap Scan on applicant_adv_subscription aas (cost=0.02..0.04 rows=2 width=65) (actual time=0.018..0.021 rows=2 loops=1) Recheck Cond: (user_id = ANY ('{5112699,7995496}'::integer[])) - Bitmap Index Scan on applicant_adv_subscription_user_id_key (cost=0.00..0.02 rows=2 width=0) (actual time=0.014..0.014 rows=2 loops=1) Index Cond: (user_id = ANY ('{5112699,7995496}'::integer[])) - Hash Join (cost=38967.36..726839.23 rows=272203680 width=4) (actual time=13267.456..182842.841 rows=136136926 loops=2) Hash Cond: (resume_view_history.resume_id = resume.resume_id) - Seq Scan on resume_view_history (cost=0.00..282228.92 rows=272203680 width=4) (actual time=0.004..25574.666 rows=136161776 loops=2) - Hash (cost=21737.05..21737.05 rows=16110150 width=8) (actual time=13260.145..13260.145 rows=16114222 loops=2) Buckets: 2097152 Batches: 2 Memory Usage: 314776kB - Seq Scan on resume (cost=0.00..21737.05 rows=16110150 width=8) (actual time=0.005..8839.480 rows=16114222 loops=2) Total runtime: 389866.374 ms Ooops. Changes in *_cost settings have no effect. Disabling seq_scan/merge_join/hash_join doest not help as well. Nothing special about tables, all required indexes on the place. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6335: Weird planner decision with exists (a join b) condition
Here goes self-contained test case. I tested it on the 9.1.2, 9.1.1, 9.0.5, 9.0.4, 8.4.7 all of them affected by the problem: select version(); drop table if exists test1; drop table if exists test2; drop table if exists test3; CREATE TABLE test1 AS SELECT user_id FROM generate_series(1,100) AS gs(user_id); CREATE TABLE test2 AS SELECT user_id,user_id AS resume_id FROM generate_series(1,100) AS gs(user_id); CREATE TABLE test3 AS SELECT user_id AS resume_id FROM generate_series(1,100) AS gs(user_id); create index test1_user_id_key on test1(user_id); create index test2_user_id_key on test2(user_id); create index test2_resume_id_key on test2(resume_id); create index test3_resume_id_key on test3(resume_id); analyze test1; analyze test2; analyze test3; --good EXPLAIN ANALYZE select * from test1 where test1.user_id in (100) and exists ( SELECT * from test2 join test3 using (resume_id) where test2.user_id = test1.user_id ); --bad EXPLAIN ANALYZE select * from test1 where test1.user_id in (100, 101) and exists ( SELECT * from test2 join test3 using (resume_id) where test2.user_id = test1.user_id ); On Thu, Dec 15, 2011 at 1:00 AM, Alvaro Herrera alvhe...@commandprompt.comwrote: Excerpts from maxim.boguk's message of mié dic 14 08:09:38 -0300 2011: But once I add second value into IN list plan become completely screwed: See here: http://archives.postgresql.org/message-id/1309918036-sup-4...@alvh.no-ip.org Perhaps it's a similar problem. Maybe you'd get enthused enough to try to fix the problem? -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.
Re: [BUGS] BUG #6335: Weird planner decision with exists (a join b) condition
On Thu, Dec 15, 2011 at 12:00 PM, bricklen brick...@gmail.com wrote: On Wed, Dec 14, 2011 at 4:53 PM, Maxim Boguk maxim.bo...@gmail.com wrote: Here goes self-contained test case. I tested it on the 9.1.2, 9.1.1, 9.0.5, 9.0.4, 8.4.7 I just tested on 9.1.2 and see the same issue. --bad EXPLAIN ANALYZE select * from test1 where test1.user_id in (100, 101) and exists ( SELECT * from test2 join test3 using (resume_id) where test2.user_id = test1.user_id ); Setting enable_hashjoin to false pushes it back to a good plan again. Could you show explain analyze of the good plan please? If you getting plan like: QUERY PLAN - Nested Loop Semi Join (cost=3.78..91844.51 rows=2 width=4) (actual time=0.634..0.750 rows=2 loops=1) Join Filter: (test1.user_id = test2.user_id) - Bitmap Heap Scan on test1 (cost=2.54..4.57 rows=2 width=4) (actual time=0.013..0.015 rows=2 loops=1) Recheck Cond: (user_id = ANY ('{100,101}'::integer[])) - Bitmap Index Scan on test1_user_id_key (cost=0.00..2.54 rows=2 width=0) (actual time=0.009..0.009 rows=2 loops=1) Index Cond: (user_id = ANY ('{100,101}'::integer[])) - Materialize (cost=1.23..64339.94 rows=100 width=4) (actual time=0.018..0.305 rows=100 loops=2) - Merge Join (cost=1.23..59339.94 rows=100 width=4) (actual time=0.032..0.446 rows=101 loops=1) Merge Cond: (test2.resume_id = test3.resume_id) - Index Scan using test2_resume_id_key on test2 (cost=0.00..22170.28 rows=100 width=8) (actual time=0.009..0.076 rows=101 loops=1) - Index Scan using test3_resume_id_key on test3 (cost=0.00..22170.28 rows=100 width=4) (actual time=0.007..0.075 rows=101 loops=1) Total runtime: 0.785 ms Try use high values for the user_id : EXPLAIN ANALYZE select * from test1 where test1.user_id in (9, 91) and exists ( SELECT * from test2 join test3 using (resume_id) where test2.user_id = test1.user_id ); QUERY PLAN -- Nested Loop Semi Join (cost=3.78..91844.51 rows=2 width=4) (actual time=432.266..4457.799 rows=2 loops=1) Join Filter: (test1.user_id = test2.user_id) - Bitmap Heap Scan on test1 (cost=2.54..4.57 rows=2 width=4) (actual time=0.057..0.063 rows=2 loops=1) Recheck Cond: (user_id = ANY ('{9,91}'::integer[])) - Bitmap Index Scan on test1_user_id_key (cost=0.00..2.54 rows=2 width=0) (actual time=0.050..0.050 rows=2 loops=1) Index Cond: (user_id = ANY ('{9,91}'::integer[])) - Materialize (cost=1.23..64339.94 rows=100 width=4) (actual time=0.011..1942.046 rows=495000 loops=2) - Merge Join (cost=1.23..59339.94 rows=100 width=4) (actual time=0.018..2805.842 rows=91 loops=1) Merge Cond: (test2.resume_id = test3.resume_id) - Index Scan using test2_resume_id_key on test2 (cost=0.00..22170.28 rows=100 width=8) (actual time=0.007..571.851 rows=91 loops=1) - Index Scan using test3_resume_id_key on test3 (cost=0.00..22170.28 rows=100 width=4) (actual time=0.006..594.484 rows=91 loops=1) Total runtime: 4467.887 ms -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.
Re: [BUGS] BUG #6326: SELECT ARRAY(SELECT ...) syntax and array[] results doesn't work
On Mon, Dec 5, 2011 at 2:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: maxim.bo...@gmail.com writes: SELECT ARRAY(SELECT ...) doesn't work when subselect return any array. Is that syntax supposed to work with anyarray types? No. regards, tom lane Hi. Thank you very much for answer. Ok the second problem (possible related to first becuse error is the same): array_agg doesn't work with anyarray as well: sports=# select array_agg(val) FROM (SELECT array[1,2]::integer[] as val UNION ALL SELECT array[3,4]::integer[]) as t; ERROR: could not find array type for data type integer[] PS: I try create two-dimentional integer array from query results: select val1::integer, val2::integer from somequery to get integer[][] with content like: [[val1-1,val1-2], [val2-1, val2-2], val[3-1, val3-2]... ] Is there any way to perform that task it with reasonable efficiency? -- Maxim Boguk Senior Postgresql DBA.
Re: [BUGS] BUG #6307: intarray extention gin index does not work with Hot standby
On Mon, Nov 28, 2011 at 6:02 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Nov 25, 2011 at 6:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: I know GIST on intarray[] do not have that problem. Very likely the problem is limited to intarray[] GIN indexes only (but I going to test some other not-well known GIN indexes tomorrow). Broken FTS indexes on Hot Standby should be known years before. You might think that, but you'd be wrong :-(. Yes, that did sound ominous. ginRedoUpdateMetapage is failing to restore the contents of the pending-list correctly, which means this is broken for all types of GIN indexes. Will fix. Great detective work Tom as ever, much appreciated. Thank you very much. Is that fix will be included to the next minor versions releases? (especially into 9.1.2)? -- Maxim Boguk Senior Postgresql DBA.
Re: [BUGS] BUG #6307: intarray extention gin index does not work with Hot standby
On Fri, Nov 25, 2011 at 11:17 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Nov 24, 2011 at 11:12 PM, Maksym Boguk maxim.bo...@gmail.com wrote: postgres=# SELECT * from test where sections '{2000}'; id | sections +-- (0 rows) Ooops. Can you see if this is just intarray or if there are other failing cases? It would be good to get more info on this before I start investigating. Thanks I know GIST on intarray[] do not have that problem. Very likely the problem is limited to intarray[] GIN indexes only (but I going to test some other not-well known GIN indexes tomorrow). Broken FTS indexes on Hot Standby should be known years before. And I never heard such reports.
Re: [BUGS] BUG #6296: High level backend-write activity on db
On Wed, Nov 16, 2011 at 4:07 AM, Maksym Boguk maxim.bo...@gmail.com wrote: Again I not sure it is actual bug or something else go wrong with my database. One of my databases just started produce 256 wal segments in 5 minutes instead of 100-200 wal segments per hour (averages). In the same time write and read activity on DB stay same (according to the pg_stat_database data and historical graphs). No anti-wraparound vacuum and such things going on. Situation staying same during last few days. I found some other anomalies: 1)5 minutes after SELECT pg_stat_reset_shared('bgwriter'); mirtesen=# SELECT * from pg_stat_bgwriter ; checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc ---+-++---+- -+-+--- 0 | 2 | 171675 | 804 | 0 | 1010429 | 1131106 (1 row) eg almost any new buffer allocated by backend leading to backend write out dirty page (and that situation do not change overtime). This sure sounds like a ring buffer is being used, suggesting VACUUM or COPY IN activity. What does pg_stat_activity say? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company I very sorry, it was a false alert. Monitoring was based on sum() over pg_stat_all_tables. But problem was bugged cron script which were every minute looping over create table something_tmp as query (like 20M entries); instant after drop table something_tmp; As a result of that instant drop - an insert volumes were missed by monitoring. PS: xlogdump really great tool and it was good help in locating and fixing problem. -- Maxim Boguk Senior Postgresql DBA. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6288: Is ALTER ROLE set client_encoding broken in 9.1?
Thank you very much for information. I lost some brain cells during fighting with that problem yesterday. However personally I'm think this behavior is rather non-intuitive: postgres@db13:~$ psql -U slony -d test -c 'show client_encoding' client_encoding - UTF8 (1 row) postgres@db13:~$ psql -U slony -d test -c 'show client_encoding' | head -10 client_encoding - KOI8R (1 row) On Fri, Nov 11, 2011 at 1:55 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maksym Boguk maxim.bo...@gmail.com writes: Description: Is ALTER ROLE set client_encoding broken in 9.1? No, but I think psql prefers to set its encoding according to its locale enviroment these days. regards, tom lane -- Maxim Boguk Senior Postgresql DBA. Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6267: Wrong results in pg_database_size
Maksym Boguk maxim.bo...@gmail.com writes: Somehow pg_database_size producing results which are 2х more then reality. I can't reproduce any such problem here. I suspect you overlooked some tablespaces in your manual du commands. regards, tom lane No other tablespaces exists as can be seen in output of: SELECT oid,spcname,spclocation,pg_size_pretty(pg_tablespace_size(spcname)) from pg_tablespace; oid | spcname | spclocation | pg_size_pretty ---++---+ 1663 | pg_default | | 21 MB 1664 | pg_global | | 1092 kB 16443 | main | /db/base/main | 124 GB However I seem to have found reason for such strange behavior. That is a duplicating link in pg_tblspc: postgres@db:~$ ls -la /var/lib/postgresql/9.0/main/pg_tblspc/ total 8 drwx-- 2 postgres postgres 4096 Jun 3 04:08 . drwx-- 12 postgres postgres 4096 Sep 28 17:08 .. lrwxrwxrwx 1 postgres postgres 13 Jun 6 14:39 16384 - /db/base/main lrwxrwxrwx 1 postgres postgres 13 Jun 3 04:08 16443 - /db/base/main I have no idea how could this happen. No crashes/power-offs and so on happens on that database for more then year. Can the second (16384 - /db/base/main) link be safely deleted? -- Maxim Boguk Senior Postgresql DBA. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #6097: Server crash when enabling custom_variable_classes
The following bug has been logged online: Bug reference: 6097 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.0 9.1beta Operating system: Linux/Freebsd Description:Server crash when enabling custom_variable_classes Details: I found crash bug in custom_variable_classes work. Test case simple: start server with empty custom_variable_classes Then add to config file: custom_variable_classes = 'something'# list of custom variable class names something.location = 'ru' And then perform select pg_reload_conf(); Ooops... you have no more working server (but client connections staying alive). Same crash with 9.0 and 9.1beta. Adding just custom_variable_classes = '' and pg_reload_conf() do not crash server. But after I added .location = 'ru' to config and perform second pg_reload_conf() server crashed again. The same happens if server started with just custom_variable_classes = '', after I added .location = 'ru' and performend pg_reload_conf() server crashed. However with both option enabled server start and work without any issues. PS: it is exeptionally strange no one happen to be hit by that problem before. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6088: copy to stdout cannot be stopped with kill(pid) or pg_terminate_backend
, table_name, pk_id, old_row, new_row) TO stdout; with pg_terminate_backend(procpid) or kill (procpid). Which FreeBSD are you on? Version and architecture. Can you obtain a backtrace of the backend you've sent the kill signal to, after you've set the signal? Given the rest of what you've said, I'll bet it's blocked in the kernel waiting to send data down a socket. If the kernel's socket writes aren't interrupted by signals, there's nothing much Pg can do about it. At the same time copy to stdout completely ignore fact other side (pg_dump itself) was killed long time ago. That's got to be a buffering issue, surely. Once the TCP send buffers fill up the backend would block in write() as the OS tries to send enough data to make room in the send buffer for the next write. If your OS keeps big buffers, Pg might've buffered many megabytes of data before you killed the client side. It might keep on trying to send that data for quite a long time if it doesn't get notified that the other peer (pg_dump in this case) has vanished. How were you running pg_dump? ssh was involved, so I'm wondering if you were running pg_dump over an ssh tcp tunnel. Was that it? Or were you running pg_dump on the server side and dumping the SQL text down the ssh connection to the client? If the latter, then unless you killed ssh as well the OS would merrily keep on sending the already-buffered data from before pg_dump was killed; you'd have to kill sshd or the client side of ssh to actually stop the data flow. To help with this sort of thing, reduce your send and receive buffer sizes. You'll pay a price in terms of throughput, but you'll gain responsiveness. Without more information, it's hard to know if this is anything except a buffering issue. Possibly one that doesn't even have anything to do with Pg - it could easily just be sshd writes being buffered. -- Craig Ringer -- Maxim Boguk Senior Postgresql DBA. Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6088: copy to stdout cannot be stopped with kill(pid) or pg_terminate_backend
After some experiments I found not only copy to stdout but simple select count(*) from changes can not be stopped too. However select count(*) finishes successfully and no long IO locks or waits was found in ktrace results. Because situation seems unusual I tried to find what not usual with the table itself. Table size: SELECT pg_size_pretty(pg_relation_size('changes')); pg_size_pretty 152 GB However, and here is unusual part: first 135GB of the table is completely dead/empty space without single live tuple (how that happen is another story, if short it is result of many unsuccessful attempts of Slony populate the table on on slave server... so it was begin; delete from changes; copy table from stdin; get an error (cluster wide statement timeout); abort; repeat N times over night). So, (and here going pure theory), while code perform seq scan over large empty space it is not perform check for interrupts while looping over completely dead/empty area. PS: just to compare backtrace of unstoppable count(*) #0 0x000801961b5c in read () from /lib/libc.so.7 #1 0x00658968 in FileRead (file=7, buffer=0xc8edc6ae0 ╫\023, amount=8192) at fd.c:1151 #2 0x0066b2b2 in mdread (reln=0xdd08895e0, forknum=MAIN_FORKNUM, blocknum=762045, buffer=0xc8edc6ae0 ╫\023) at md.c:618 #3 0x0066ce5a in smgrread (reln=0xdd08895e0, forknum=MAIN_FORKNUM, blocknum=762045, buffer=0xc8edc6ae0 ╫\023) at smgr.c:401 #4 0x00653495 in ReadBuffer_common (smgr=0xdd08895e0, isLocalBuf=0 '\000', forkNum=MAIN_FORKNUM, blockNum=762045, mode=RBM_NORMAL, strategy=0xdd08b0410, hit=0x7fffde53 ) at bufmgr.c:426 #5 0x00652eea in ReadBufferExtended (reln=0xdd0849f38, forkNum=MAIN_FORKNUM, blockNum=762045, mode=RBM_NORMAL, strategy=0xdd08b0410) at bufmgr.c:240 #6 0x00468ec9 in heapgetpage (scan=0xdd08b, page=762045) at heapam.c:221 #7 0x0046a706 in heapgettup_pagemode (scan=0xdd08b, dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:808 #8 0x0046b006 in heap_getnext (scan=0xdd08b, direction=ForwardScanDirection) at heapam.c:1320 --part before here is the same as I see in copy to stdout backtrace #9 0x005a4e5a in SeqNext (node=0xdd08afa10) at nodeSeqscan.c:93 #10 0x00593a10 in ExecScan (node=0xdd08afa10, accessMtd=0x5a4d40 SeqNext) at execScan.c:68 #11 0x005a4eba in ExecSeqScan (node=0xdd08afa10) at nodeSeqscan.c:131 #12 0x00588f85 in ExecProcNode (node=0xdd08afa10) at execProcnode.c:363 #13 0x00599cec in agg_retrieve_direct (aggstate=0xdd08af550) at nodeAgg.c:945 #14 0x00599b10 in ExecAgg (node=0xdd08af550) at nodeAgg.c:837 #15 0x00589093 in ExecProcNode (node=0xdd08af550) at execProcnode.c:431 #16 0x00586794 in ExecutePlan (estate=0xdd08af030, planstate=0xdd08af550, operation=CMD_SELECT, numberTuples=0, direction=ForwardScanDirection, dest=0xdd08ab790) at execMain.c:1504 #17 0x005849a5 in standard_ExecutorRun (queryDesc=0xdd0830c30, direction=ForwardScanDirection, count=0) at execMain.c:309 #18 0x005848a5 in ExecutorRun (queryDesc=0xdd0830c30, direction=ForwardScanDirection, count=0) at execMain.c:258 #19 0x00674e18 in PortalRunSelect (portal=0xdd0885030, forward=1 '\001', count=0, dest=0xdd08ab790) at pquery.c:953 #20 0x00674ab9 in PortalRun (portal=0xdd0885030, count=9223372036854775807, isTopLevel=1 '\001', dest=0xdd08ab790, altdest=0xdd08ab790, completionTag=0x7fffe570 ) at pquery.c:779 #21 0x0066f0e6 in exec_simple_query (query_string=0x8037e7030 SELECT count(*) from changes\n;) at postgres.c:997 #22 0x00672f72 in PostgresMain (argc=4, argv=0x803732930, username=0x803732900 pgsql) at postgres.c:3676 #23 0x0063b5cd in BackendRun (port=0x803703c00) at postmaster.c:3467 #24 0x0063ab2a in BackendStartup (port=0x803703c00) at postmaster.c:3081 #25 0x006382dc in ServerLoop () at postmaster.c:1387 #26 0x00637abb in PostmasterMain (argc=3, argv=0x7fffeb88) at postmaster.c:1040 #27 0x005c0b5a in main (argc=3, argv=0x7fffeb88) at main.c:188 On Sat, Jul 2, 2011 at 10:00 PM, Maxim Boguk maxim.bo...@gmail.com wrote: Hi and thanks for responding... While i performing my tests I used pg_dump via local socket ( pg_dump -F c -Z 0 -t changes -a db /dev/null ) Now: billing=# SELECT * from db_activity ; age | datname | procpid | usename | waiting | query_start | client_addr | client_port | current_query -+-+-+-+-+---+-+-+-- 00:03:07.609308 | billing | 36353 | pgsql | f | 2011-07-02 15:50:21.508394+04 | | -1 | COPY bill.changes (id, cdate, mdate, status, table_name, pk_id, old_row, new_row
[BUGS] BUG #6087: Unnest with multidimensional arrays
The following bug has been logged online: Bug reference: 6087 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 9.0 Operating system: Linux Description:Unnest with multidimensional arrays Details: I not sure is it actual bug or intended behaviour. In documentation unnest described as: expand an array to a set of rows. However in case of multidimensional arrays documentation is misleading or actual behaviour is wrong. Test case: postgres=# select unnest(array[[1,2], [3,4], [5,6]]); unnest 1 2 3 4 5 6 (6 rows) I think correct result must be: unnest {1,2} {3,4} {5,6} -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #5946: Long exclusive lock taken by vacuum (not full)
The following bug has been logged online: Bug reference: 5946 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 8.4 Operating system: Linux Description:Long exclusive lock taken by vacuum (not full) Details: From documentation I know that vacuum (without full) can truncate empty pages from end of a relation if they are free and vacuum successfully grabed exclusive lock for short time. However, I wasn't ready to learn that 'short exclusive lock' can be 10-20 minutes in some cases. In my case vacuum tried to truncate last 10-15GB from 100Gb relation, and each time (3) it was cost 10+ minutes of service downtime (because that table was completely locked). Is that correct behaviour? Are here any way to speedup that process or at least allow read-only queries during that time? PS: no exessive disk IO observed during that 10+ min locks. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #5946: Long exclusive lock taken by vacuum (not full)
On Sat, Mar 26, 2011 at 4:17 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: In my case vacuum tried to truncate last 10-15GB from 100Gb relation, and each time (3) it was cost 10+ minutes of service downtime (because that table was completely locked). Is that correct behaviour? Are here any way to speedup that process or at least allow read-only queries during that time? Use autovacuum --- if there's something that wants to access the table, autovac will get kicked off the lock. (Of course, the table may never get truncated then, but maybe you don't care.) regards, tom lane Thank you for an idea. Are having lots empty pages at end of the table can have any negative impact on database performance (assuming I have plenty of free disk space)? In my case these 100Gb table going to be reduced to 20Gb size actual data located at start of the table, so I worry about possible negative impact of having extra 80Gb free space at end of the table. Regards, Maxim
Re: [BUGS] BUG #5798: Some weird error with pl/pgsql procedure
Hi Yes here is one BEFORE UPDATE trigger on relation ( Triggers: _billing_main_denyaccess_71 BEFORE INSERT OR DELETE OR UPDATE ON ctrl_servers FOR EACH ROW EXECUTE PROCEDURE _billing_main.denyaccess('_billing_main') ) However that trigger should not fire at all because storable procedure work with set local session_replication_role to 'replica'; But tuple become virtual in these lines: 2027 newtuple = ExecBRUpdateTriggers(estate, resultRelInfo, 2028 tupleid, tuple); 2029 2030 if (newtuple == NULL) /* do nothing */ 2031 return; 2032 2033 if (newtuple != tuple) /* modified by Trigger(s) */ 2034 { 2035 /* 2036 * Put the modified tuple into a slot for convenience of routines 2037 * below. We assume the tuple was allocated in per-tuple memory 2038 * context, and therefore will go away by itself. The tuple table 2039 * slot should not try to clear it. 2040 */ 2041 TupleTableSlot *newslot = estate-es_trig_tuple_slot; 2042 2043 if (newslot-tts_tupleDescriptor != slot-tts_tupleDescriptor) 2044 ExecSetSlotDescriptor(newslot, slot-tts_tupleDescriptor); 2045 ExecStoreTuple(newtuple, newslot, InvalidBuffer, false); 2046 slot = newslot; 2047 tuple = newtuple; 2048 } If be more exact at line: 2047 tuple = newtuple; What I don't understand fist it is why that code path fail only once per 1000-10 calls only. I have an idea that 2033 if (newtuple != tuple) /* modified by Trigger(s) */ sometime can return true even if no triggers really had been done. After more code digging I think I found reason Look like that happens in this part of ExecBRUpdateTriggers code: 02273 /* 02274 * In READ COMMITTED isolation level it's possible that newtuple was 02275 * changed due to concurrent update. In that case we have a raw subplan 02276 * output tuple and need to run it through the junk filter. 02277 */ 02278 if (newSlot != NULL http://doxygen.postgresql.org/c_8h.html#070d2ce7b6bb7e5c05602aa8c308d0c4) 02279 intuple = newtuple = ExecRemoveJunk http://doxygen.postgresql.org/execJunk_8c.html#9779200174530a361ea70e21639c2c72(relinfo-ri_junkFilter http://doxygen.postgresql.org/structResultRelInfo.html#0d887ebe5510dd3e663e95888d72cd67, newSlot); That rarity of that situation become understandable (that is very very rare case when tuple got changed in such way) and really create virtual tuple. Look like need put tuple materialization code here as well. PS: again may be just handwaving... I just started learn postgresql code. Any help from more skilled persons would be great. On Mon, Feb 21, 2011 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: I tracked virtual tuple from heaptuple.c::slot_getattr down to execMain.c::ExecUpdate and I think somewhere in this way virtual tuple should be need to be materialized into physical tuple. That happens at the top of ExecUpdate: /* * get the heap tuple out of the tuple table slot, making sure we have a * writable copy */ tuple = ExecMaterializeSlot(slot); I don't see any code path there that could re-virtualize the slot. Do you have any triggers or constraints on the target table? Because those are about the only things that influence this code path at all... Again I not sure but ExecProcessReturning function seems good candidate to perform tuple materialization. If ExecProcessReturning were given a virtual slot, materializing it would do no good at all for this problem, because the system columns in the result would be garbage. It has to receive an already materialized slot that contains the right information beforehand. The only reason that retrieving ctid can work at all is that when heap_update is called, tuple is pointing at the slot's contained tuple and so setting tuple-t_self changes the contents of the slot. That whole function is completely dependent on the slot contents being physical not virtual, and I don't see anything there that would break it. regards, tom lane -- Maxim Boguk Senior Postgresql DBA. Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.
Re: [BUGS] BUG #5798: Some weird error with pl/pgsql procedure
On Tue, Feb 22, 2011 at 1:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: I wrote: Ugh. That quick little ExecRemoveJunk is a lot more dangerous than it looks. I had actually looked at this before, but concluded it was OK because I couldn't reproduce the problem with a trigger in place. I guess I wasn't unlucky enough to have the chance pointer equality occur. On closer inspection, the palloc collision is actually extremely likely, because what will happen is we'll pfree the old tuple and immediately palloc the new one, and if the new one is of sufficiently similar size that it needs the same size of alloc chunk, it's *guaranteed* to get that same chunk back, because of the LIFO free-chunk chains in aset.c. The reason that the problem is hard to reproduce is that most triggers (certainly those written in plpgsql) will give back newly allocated tuples even when you return the unmodified NEW tuple. The only way to expose the problem is for ExecBRUpdateTrigger's trigger-calling loop to not replace the newtuple, and the easiest way for that to happen is if all the triggers are disabled. So that's why you're seeing it when fooling with the replication-role setting. I was able to reproduce the failure by creating a trigger with a false WHEN condition, and of course there are other ways to prevent a trigger from being called too. regards, tom lane Thank you very much for fixing that issue. I apologize for not providing critical details at start of the discussion (until yesterday they seems unimportant for me), I will try provide all possible details in future if I find anything else like that. Now for me start digging next strange problem described here: http://archives.postgresql.org/pgsql-admin/2011-01/msg00055.php -- Maxim Boguk Senior Postgresql DBA.
Re: [BUGS] BUG #5798: Some weird error with pl/pgsql procedure
Today I perfomed some more digging with code and gdb. I not sure, but may be I have an idea what happens here: RETURNING ctid fails when code implementing returning happened to get a Virtual tuple as input. And that ofcourse fail because virtual tuple does not have any system columns (such as tid). (seems no one got such error before because no one tried to use system columns in RETURNING list). I tracked virtual tuple from heaptuple.c::slot_getattr down to execMain.c::ExecUpdate and I think somewhere in this way virtual tuple should be need to be materialized into physical tuple. I have zero expirience with reporting such results so I will try my best: path of that virtual tuple in problem case looks like: *(slot_getattr::slot) *(ExecEvalVar::econtext-ecxt_scantuple) *(ExecTargetList::econtext-ecxt_scantuple) *(ExecProject::econtext-ecxt_scantuple) *(ExecProject::projInfo-pi_exprContext-ecxt_scantuple) *(ExecProcessReturning::projectReturning-pi_exprContext-ecxt_scantuple) *(ExecUpdate::resultRelInfo-ri_projectReturning-pi_exprContext-ecxt_scantuple) All these variable contained the same TupleTableSlot structure: (gdb) print *(ExecUpdate::resultRelInfo-ri_projectReturning-pi_exprContext-ecxt_scantuple) $29 = { type = T_TupleTableSlot, tts_isempty = 0 '\000', tts_shouldFree = 0 '\000', tts_shouldFreeMin = 0 '\000', tts_slow = 0 '\000', tts_tuple = 0x0, tts_tupleDescriptor = 0xdd09d1030, tts_mcxt = 0xdd08da2e0, tts_buffer = 0, tts_nvalid = 11, tts_values = 0xdd09d1840, tts_isnull = 0xdd09d18d0 , tts_mintuple = 0x0, tts_minhdr = { t_len = 0, t_self = { ip_blkid = { bi_hi = 0, bi_lo = 0 }, ip_posid = 44192 }, t_tableOid = 13, t_data = 0x10 }, tts_off = 59333634048 } Again I not sure but ExecProcessReturning function seems good candidate to perform tuple materialization. PS: May be it seems just a lot handwaving but I just started study gdb and postgresql source code. PPS: I must say thanks to authors of http://doxygen.postgresql.org . It is great tool and saved me lot time today. On Sun, Feb 20, 2011 at 12:01 PM, Maxim Boguk maxim.bo...@gmail.com wrote: On Sun, Feb 20, 2011 at 5:08 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: I finally managed create working slave server with non-stripped Postgresql 8.4.7 and working gdb. (sorry for such long delay). And i can reproduce situation quite easy now, so i managed get stack backtrace with that error. What i need to do now? I have some expirience with gbd now but i have no idea where to look next. Now that you know what query is causing the problem: #9 0x00673d43 in ProcessQuery (plan=0xdd0998890, sourceText=0xdd09c5830 UPDATE ONLY bill.ctrl_servers SET opt = opt WHERE ctid=ANY($1) RETURNING ctid, params=0xdd09c58c0, dest=0xdd09de5c8, completionTag=0x7fffd4e0 ) at pquery.c:196 it shouldn't be too hard to prepare a self-contained test case so that others can look at this. regards, tom lane Sorry, but it's very unlikely I can create self-contained test case. That error doesn't happen on stand alone idle server. It's only happen at random time on table with very high update rate in production environment, and happen over 1-30 minutes calling my storable procedure in endless loop (e.g. once per 1000+ calls). I was long ago sure about exact place where that error happens (becuase my storable procedure contains only one real sql query), but because I can't create self contained test case I was asked use gdb and try to get backtrace at point of error. So now I can examine and provide variable values at that point to give a clue about what happens here. -- Maxim Boguk Senior Postgresql DBA. Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все. -- Maxim Boguk Senior Postgresql DBA. Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.
Re: [BUGS] BUG #5798: Some weird error with pl/pgsql procedure
=9223372036854775807, isTopLevel=1 '\001', dest=0xdd08996f0, altdest=0xdd08996f0, completionTag=0x7fffe570 ) at pquery.c:779 #35 0x0066f0e6 in exec_simple_query ( query_string=0x8037e5030 \nSELECT * from __clear_table_tail('ctrl_servers', 'bill', 'opt', 13, 1)\n) at postgres.c:997 #36 0x00672f72 in PostgresMain (argc=4, argv=0x803732930, username=0x803732900 pgsql) at postgres.c:3676 #37 0x0063b5cd in BackendRun (port=0x803703c00) at postmaster.c:3467 #38 0x0063ab2a in BackendStartup (port=0x803703c00) at postmaster.c:3081 #39 0x006382dc in ServerLoop () at postmaster.c:1387 #40 0x00637abb in PostmasterMain (argc=3, argv=0x7fffeb88) at postmaster.c:1040 #41 0x005c0b5a in main (argc=3, argv=0x7fffeb88) at main.c:188 On Wed, Jan 5, 2011 at 4:28 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Dec 22, 2010 at 4:53 PM, Maxim Boguk maxim.bo...@gmail.com wrote: About stack backtrace I not sure how to get it (and even worse the database is heavily loaded production DB so I not sure creating core dump would be good idea). Still I would like to receive any help with creating a stack backtrace. Well, if you can reproduce this problem in a given session, you can use pg_backend_pid() at the start of the session to get the backend ID, and then if your build has debugging symbols (or separate debuginfos you can install), you could attach gdb to it, set a breakpoint at the lines in heaptuple.c that can throw this error, reproduce the problem, and then get a backtrace. But if it's happening randomly I don't have a lot of ideas. I do wonder if this could be related to using TIDs that don't actually exist in the table. That might be worth some experimentation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Maxim Boguk Senior Postgresql DBA. Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.
Re: [BUGS] BUG #5798: Some weird error with pl/pgsql procedure
Just an idea from code examination: I think problem in RETURNING ctid part of query and next comment in slot_getattr: /* * system attributes are handled by heap_getsysattr */ So, may be RETURNING implementation somehow incorrectely handle system attributes in returing values list. On Sun, Feb 20, 2011 at 1:13 AM, Maxim Boguk maxim.bo...@gmail.com wrote: Hi all and Hi Robert I finally managed create working slave server with non-stripped Postgresql 8.4.7 and working gdb. (sorry for such long delay). And i can reproduce situation quite easy now, so i managed get stack backtrace with that error. What i need to do now? I have some expirience with gbd now but i have no idea where to look next. Full backtrace: Breakpoint 1, slot_getattr (slot=0xdd09ea468, attnum=-1, isnull=0xdd09b4a68 ) at heaptuple.c:1185 1185elog(ERROR, cannot extract system attribute from virtual tuple); (gdb) bt #0 slot_getattr (slot=0xdd09ea468, attnum=-1, isnull=0xdd09b4a68 ) at heaptuple.c:1185 #1 0x0058a5f8 in ExecEvalScalarVar (exprstate=0xdd09b4b60, econtext=0xdd09b4a80, isNull=0xdd09b4a68 , isDone=0xdd09b4ce0) at execQual.c:787 #2 0x005933d7 in ExecTargetList (targetlist=0xdd09b4cb0, econtext=0xdd09b4a80, values=0xdd09b4a50, isnull=0xdd09b4a68 , itemIsDone=0xdd09b4ce0, isDone=0x0) at execQual.c:5118 #3 0x005939a3 in ExecProject (projInfo=0xdd09b4be0, isDone=0x0) at execQual.c:5333 #4 0x005879c7 in ExecProcessReturning (projectReturning=0xdd09b4be0, tupleSlot=0xdd09ea468, planSlot=0xdd09ea3a8, dest=0xdd09de5c8) at execMain.c:2273 #5 0x005875fe in ExecUpdate (slot=0xdd09ea468, tupleid=0x7fffd2e0, planSlot=0xdd09ea3a8, dest=0xdd09de5c8, estate=0xdd09e6030) at execMain.c:2142 #6 0x00586c68 in ExecutePlan (estate=0xdd09e6030, planstate=0xdd09ea550, operation=CMD_UPDATE, numberTuples=0, direction=ForwardScanDirection, dest=0xdd09de5c8) at execMain.c:1681 #7 0x005849a5 in standard_ExecutorRun (queryDesc=0xdd09de658, direction=ForwardScanDirection, count=0) at execMain.c:309 #8 0x005848a5 in ExecutorRun (queryDesc=0xdd09de658, direction=ForwardScanDirection, count=0) at execMain.c:258 #9 0x00673d43 in ProcessQuery (plan=0xdd0998890, sourceText=0xdd09c5830 UPDATE ONLY bill.ctrl_servers SET opt = opt WHERE ctid=ANY($1) RETURNING ctid, params=0xdd09c58c0, dest=0xdd09de5c8, completionTag=0x7fffd4e0 ) at pquery.c:196 #10 0x006754e5 in PortalRunMulti (portal=0xdd08c8140, isTopLevel=0 '\000', dest=0xdd09de5c8, altdest=0x9dfdc0, completionTag=0x7fffd4e0 ) at pquery.c:1269 #11 0x00675127 in FillPortalStore (portal=0xdd08c8140, isTopLevel=0 '\000') at pquery.c:1061 #12 0x006757f8 in PortalRunFetch (portal=0xdd08c8140, fdirection=FETCH_FORWARD, count=10, dest=0x9dfe40) at pquery.c:1397 #13 0x005b24e1 in _SPI_cursor_operation (portal=0xdd08c8140, direction=FETCH_FORWARD, count=10, dest=0x9dfe40) at spi.c:2088 #14 0x005b11a8 in SPI_cursor_fetch (portal=0xdd08c8140, forward=1 '\001', count=10) at spi.c:1258 #15 0x000dd0b1e199 in exec_for_query (estate=0x7fffda50, stmt=0xdd098bfc8, portal=0xdd08c8140, prefetch_ok=1 '\001') at pl_exec.c:4262 #16 0x000dd0b1bc43 in exec_stmt_dynfors (estate=0x7fffda50, stmt=0xdd098bfc8) at pl_exec.c:3107 #17 0x000dd0b1881a in exec_stmt (estate=0x7fffda50, stmt=0xdd098bfc8) at pl_exec.c:1308 #18 0x000dd0b1859a in exec_stmts (estate=0x7fffda50, stmts=0xdd098bc28) at pl_exec.c:1203 #19 0x000dd0b18f38 in exec_stmt_while (estate=0x7fffda50, stmt=0xdd098d6f8) at pl_exec.c:1608 #20 0x000dd0b18733 in exec_stmt (estate=0x7fffda50, stmt=0xdd098d6f8) at pl_exec.c:1264 #21 0x000dd0b1859a in exec_stmts (estate=0x7fffda50, stmts=0xdd098a650) at pl_exec.c:1203 #22 0x000dd0b183df in exec_stmt_block (estate=0x7fffda50, block=0xdd098d778) at pl_exec.c:1140 #23 0x000dd0b16a29 in plpgsql_exec_function (func=0xdd087abd8, fcinfo=0x7fffdcf0) at pl_exec.c:315 #24 0x000dd0b1260f in plpgsql_call_handler (fcinfo=0x7fffdcf0) at pl_handler.c:95 ---Type return to continue, or q return to quit--- #25 0x0058c902 in ExecMakeTableFunctionResult (funcexpr=0xdd08ca730, econtext=0xdd08ca460, expectedDesc=0xdd08ca5f0, randomAccess=0 '\000') at execQual.c:2016 #26 0x005a3e7d in FunctionNext (node=0xdd08ca350) at nodeFunctionscan.c:64 #27 0x00593a10 in ExecScan (node=0xdd08ca350, accessMtd=0x5a3e10 FunctionNext) at execScan.c:68 #28 0x005a3eea in ExecFunctionScan (node=0xdd08ca350) at nodeFunctionscan.c:98 #29 0x00588fdf in ExecProcNode (node=0xdd08ca350) at execProcnode.c:385 #30 0x00586794 in ExecutePlan (estate=0xdd08ca030, planstate=0xdd08ca350, operation=CMD_SELECT, numberTuples=0, direction=ForwardScanDirection
Re: [BUGS] BUG #5798: Some weird error with pl/pgsql procedure
On Sun, Feb 20, 2011 at 5:08 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: I finally managed create working slave server with non-stripped Postgresql 8.4.7 and working gdb. (sorry for such long delay). And i can reproduce situation quite easy now, so i managed get stack backtrace with that error. What i need to do now? I have some expirience with gbd now but i have no idea where to look next. Now that you know what query is causing the problem: #9 0x00673d43 in ProcessQuery (plan=0xdd0998890, sourceText=0xdd09c5830 UPDATE ONLY bill.ctrl_servers SET opt = opt WHERE ctid=ANY($1) RETURNING ctid, params=0xdd09c58c0, dest=0xdd09de5c8, completionTag=0x7fffd4e0 ) at pquery.c:196 it shouldn't be too hard to prepare a self-contained test case so that others can look at this. regards, tom lane Sorry, but it's very unlikely I can create self-contained test case. That error doesn't happen on stand alone idle server. It's only happen at random time on table with very high update rate in production environment, and happen over 1-30 minutes calling my storable procedure in endless loop (e.g. once per 1000+ calls). I was long ago sure about exact place where that error happens (becuase my storable procedure contains only one real sql query), but because I can't create self contained test case I was asked use gdb and try to get backtrace at point of error. So now I can examine and provide variable values at that point to give a clue about what happens here. -- Maxim Boguk Senior Postgresql DBA. Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.
Re: [BUGS] BUG #5885: Strange rows estimation for left join
On Wed, Feb 16, 2011 at 6:18 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: Test case look like: create table references ( attr_id integer, reference integer, object_id integer ); insert into references select *100**(random()), *10**(random()^*10*), *100**(random()) from generate_series(*1*,*1000*); create index xif01references on references ( reference, attr_id ); create index xif02references on references ( object_id, attr_id, reference ); analyze references; explain select * from references rs left join references vm on vm.reference = rs.reference and vm.attr_id = *10* where rs.object_id = *1000*; I don't believe there's actually anything very wrong here. The large-looking estimate for the join size is not out of line: if you try different values for object_id you will find that some produce more rows than that and some produce less. If we had cross-column stats we could maybe derive a better estimate, but as-is you're getting an estimate that is probably about right on the average, depending on whether the particular object_id matches to more common or less common reference values. The thing that looks funny is that the inner indexscan rowcount estimate is so small, which is because that's being done on the assumption that the passed-in rs.reference value is random. It's not really --- it's more likely to be one of the more common reference values --- which is something that's correctly accounted for in the join size estimate but not in the inner indexscan estimate. regards, tom lane Thank you very much for answer. Are I correct in my assumption: estimated row counts in both sides of the join isn't related to estimated resulting row count of the join because they are calculated independently? If that assumption correct than which values used to select between nested loop and merge/hash joins (estimated resulting join rows or estimated row counts on each sides of the join)? I asking because in some cases these two values can lead to different plans. PS: I just calculated how many questions I had in mail lists about postgresql planner. Look like it's time to me dust off my memory about C and start read planner code by myself. :) Kind Regards, Maxim
Re: [BUGS] BUG #5885: Strange rows estimation for left join
Hi. Test case look like: create table references ( attr_id integer, reference integer, object_id integer ); insert into references select *100**(random()), *10**(random()^*10*), *100**(random()) from generate_series(*1*,*1000*); create index xif01references on references ( reference, attr_id ); create index xif02references on references ( object_id, attr_id, reference ); analyze references; explain select * from references rs left join references vm on vm.reference = rs.reference and vm.attr_id = *10* where rs.object_id = *1000*; explain analyze select * from references rs left join references vm on vm.reference = rs.reference and vm.attr_id = *10* where rs.object_id = *1000*; On my system (8.4.4) it producing next results: postgres=# explain select * from references rs left join references vm on vm.reference = rs.reference and vm.attr_id = 10 where rs.object_id = 1000; QUERY PLAN --- Nested Loop Left Join (cost=0.00..7.53 rows=107283 width=24) - Index Scan using xif02references on references rs (cost=0.00..0.58 rows=11 width=12) Index Cond: (object_id = 1000) - Index Scan using xif01references on references vm (cost=0.00..0.53 rows=8 width=12) Index Cond: ((vm.reference = rs.reference) AND (vm.attr_id = 10)) (again 11 rows * 8 rows107283 rows) postgres=# explain analyze select * from references rs left join references vm on vm.reference = rs.reference and vm.attr_id = 10 where rs.object_id = 1000; QUERY PLAN -- Nested Loop Left Join (cost=0.00..7.53 rows=107283 width=24) (actual time=0.077..733.810 rows=117011 loops=1) - Index Scan using xif02references on references rs (cost=0.00..0.58 rows=11 width=12) (actual time=0.036..0.079 rows=10 loops=1) Index Cond: (object_id = 1000) - Index Scan using xif01references on references vm (cost=0.00..0.53 rows=8 width=12) (actual time=0.028..37.242 rows=11701 loops=10) Index Cond: ((vm.reference = rs.reference) AND (vm.attr_id = 10)) On Tue, Feb 15, 2011 at 4:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: I found that strange effect while helping with slow query on russian postgresql online forum. Please try to put together a self-contained test case for this. I could not reproduce such a weird result here, but that probably just means there's something strange about your data distribution. regards, tom lane -- Maxim Boguk Senior Postgresql DBA. Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk If they can send one man to the moon... why can't they send them all? МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все.
[BUGS] BUG #5885: Strange rows estimation for left join
The following bug has been logged online: Bug reference: 5885 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 8.4.4 Operating system: Linux Description:Strange rows estimation for left join Details: I found that strange effect while helping with slow query on russian postgresql online forum. To be short results looks like: explain analyze select * from references rs left join references vm on vm.reference = rs.reference and vm.attr_id = 20084 where rs.object_id = 9129863193713091717; Nested Loop Left Join (cost=0.00..1567.71 rows=2129718 width=112) (actual time=14.654..14.672 rows=2 loops=1) - Index Scan using xif02references on references rs (cost=0.00..19.85 rows=11 width=56) (actual time=0.019..0.020 rows=2 loops=1) Index Cond: (object_id = 9129863193713091717::numeric) - Index Scan using xif01references on references vm (cost=0.00..140.03 rows=55 width=56) (actual time=7.321..7.321 rows=0 loops=2) Index Cond: ((vm.reference = rs.reference) AND (vm.attr_id = 20084::numeric)) Total runtime: 14.758 ms E.g. 6 orders of magnitude error in result rows selectivity (2129718 vs 2). Table description: CREATE TABLE references ( attr_id numeric(20,0) NOT NULL, reference numeric(20,0) NOT NULL, object_id numeric(20,0) NOT NULL ); CREATE INDEX xif01references ON references USING btree (reference, attr_id); CREATE INDEX xif02references ON references USING btree (object_id, attr_id, reference); Yes again unfortunate EAV model. Related data from pg_stats schemanamepublic tablename references attname attr_id null_frac 0 avg_width 10 n_distinct23 most_common_vals {20084;20085;9127479646713033746;11;14;15;9127479093313033036;457955;91273 06545213993525} most_common_freqs {0.234067;0.233433;0.227433;0.0998333;0.0476667;0.0449; 0.0361;0.027;0.0247} dattname reference null_frac 0 avg_width 14 n_distinct56252 most_common_vals {9129830887313872119;9129830887313872121;9129787365613945251;9129676282313 943149;24332313945759;...} most_common_freqs {0.2497;0.0138333;0.0002;0.00017;0.00017;...} attname object_id null_frac 0 avg_width 15 n_distinct1.23744e+06 most_common_vals {9129846527513534962;9129846051413838763;9129846154413978095;9129846403613 302858;9129846025513792413;...} most_common_freqs {0.0003;0.00027;0.00027;0.00023;0.0002;0.00 02;...} What seems very strange for me is final esctimation 2M when estimated 11 rows on right side of the join and 55 rows on left side of the join (so final rows must be 55*11 ~ 600 rows even with worst case of left join selectivity=1). May be that effect already fixed in 8.4.5+ (but I had readed changes list since 8.4.4 and I don't find any related changes). PS: Sorry if I completely missunderstood of planner mechanics in left join rows estimation. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #5797: Strange bug with hstore
On Tue, Dec 21, 2010 at 7:48 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maxim Boguk maxim.bo...@gmail.com writes: Bad explain: billing=# EXPLAIN SELECT * from domains where name='name=somedomain'::text::hstore-'name'; QUERY PLAN Seq Scan on domains (cost=0.00..7775.91 rows=1 width=230) Filter: ((name)::text = (('name=somedomain'::text)::hstore - 'name'::text)) (index not used) The reason for this is that hstore fails to label its I/O functions as immutable (or even stable) in 8.4. This is fixed in 9.0. regards, tom lane Sorry for late responce and thank you for clarification. My research of that problem leads me to conclusion that can be fixed on 8.4.4 with: ALTER FUNCTION hstore_in(cstring) IMMUTABLE; but is it really safe? Or text-hstore conversion in 8.4 is really VOLATILE as labeled in hstrore module? Regards, Maxim -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #5798: Some weird error with pl/pgsql procedure
It happened again in much simpler case (no slony or deadlock message involved... no slony at all on the database). The same error DBD::Pg::db selectall_arrayref failed: ERROR: cannot extract system attribute from virtual tuple CONTEXT: PL/pgSQL function __clear_table_tail line 50 at FOR over EXECUTE statement It happened when function worked with small table with only 33 pages size and 600 live tuples. What made situations even more easy to track it is I know this table don't have any inserts or deletes for ages (only select and update) and I have full log of database queries during problem. Log of call of the procedure: 2010-12-23 00:21:04 MSK pg...@billing 55717 ERROR: cannot extract system attribute from virtual tuple 2010-12-23 00:21:04 MSK pg...@billing 55717 CONTEXT: PL/pgSQL function __clear_table_tail line 50 at FOR over EXECUTE statement 2010-12-23 00:21:04 MSK pg...@billing 55717 STATEMENT: SELECT * from __clear_table_tail('ctrl_servers', 'bill', 'opt', 21, 1) In the same time log all actions involving changes on table ctrl_servers during +/- 3 second from ERROR: 2010-12-23 00:21:03 MSK b...@billing 42078 LOG: duration: 0.736 ms statement: UPDATE ctrl_servers SET last_connect = 'now' WHERE ( id = '1514' ) Just one query. Line where ERROR happens: FOR _new_tid IN EXECUTE 'UPDATE ONLY ' || quote_ident(arg_relnamespace) || '.' || quote_ident(arg_relname) || ' SET ' || arg_field || ' = ' || arg_field || ' WHERE ctid=ANY($1) RETURNING ctid' USING _current_tid_list LOOP Need be said _current_tid_list at start of procedure call contained all possible tids for 21 page of relation (not only for live tuples but all possible (in range like {21,1} to {21,300+})). I have an idea the error happens because some rare race conditions with tid scan and concurrent updates in the same page (probably HOT somehow involved... because both updateable fields in function and concurrent update query updated non-indexed fields). Currently I work on creating self contained test case (unfortunately without much success until now). About stack backtrace I not sure how to get it (and even worse the database is heavily loaded production DB so I not sure creating core dump would be good idea). Still I would like to receive any help with creating a stack backtrace. On Wed, Dec 22, 2010 at 5:22 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 20, 2010 at 9:48 PM, Maxim Boguk maxim.bo...@gmail.com wrote: Anyone can enlighten me what happens here? That does look weird, but without a simple test case I think it's going to be hard to investigate this. Installing Slony and your code and building a 130GB is more effort than I'm willing to put in... Any way you can get a stack backtrace at the point the error occurs? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Maxim Boguk Senior Postgresql DBA. Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #5797: Strange bug with hstore
The following bug has been logged online: Bug reference: 5797 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 8.4.4 Operating system: Freebsd Description:Strange bug with hstore Details: One day ago I analyzed slow query for one of my clients and found strange query plan. After some digging I localized something which I think is bug. The bug can be seen in these two explains: Good explain: billing=# EXPLAIN SELECT * from domains where name='name=somedomain'::hstore-'name'; QUERY PLAN -- Index Scan using domains_name on domains (cost=0.00..0.29 rows=1 width=230) Index Cond: ((name)::text = 'somedomain'::text) (index used) Bad explain: billing=# EXPLAIN SELECT * from domains where name='name=somedomain'::text::hstore-'name'; QUERY PLAN Seq Scan on domains (cost=0.00..7775.91 rows=1 width=230) Filter: ((name)::text = (('name=somedomain'::text)::hstore - 'name'::text)) (index not used) As can be seen no index was used in second case. I tested some variants and found conditions like field1=other_field::text::hstore-'key' never using index on field1. Ofcourse client case was much more complicated and contained 9 joins... but troublesome part was looked like: billing=# EXPLAIN SELECT es.params FROM services es JOIN domains dm ON dm.name = (es.params::hstore)-'name' WHERE es.shortname = 'exchange_accepted_domain' ; QUERY PLAN - Nested Loop (cost=0.00..27990293.56 rows=3289 width=32) Join Filter: ((dm.name)::text = ((es.params)::hstore - 'name'::text)) - Index Scan using services_shortname on services es (cost=0.00..68.50 rows=2406 width=32) Index Cond: ((shortname)::text = 'exchange_accepted_domain'::text) - Seq Scan on domains dm (cost=0.00..3918.31 rows=385760 width=15) (5 rows) -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #5798: Some weird error with pl/pgsql procedure
The following bug has been logged online: Bug reference: 5798 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 8.4.4 Operating system: FreeBSD 7.2 Description:Some weird error with pl/pgsql procedure Details: While I testing my table compactor utility (http://code.google.com/p/compacttable/), I sometime getting exeptionally weird errors from pl/pgsql procedure. 2010-12-21 05:17:19 MSK 17263 postg...@hh from 192.168.1.129 [vxid:22/1898656 txid:20671567] [SELECT] ERROR: cannot extract system attribute from virtual tuple 2010-12-21 05:17:19 MSK 17263 postg...@hh from 192.168.1.129 [vxid:22/1898656 txid:20671567] [SELECT] CONTEXT: PL/pgSQL function __clear_table_tail line 50 at FOR over EXECUTE statement 2010-12-21 05:17:19 MSK 17263 postg...@hh from 192.168.1.129 [vxid:22/1898656 txid:20671567] [SELECT] STATEMENT: SELECT * from __clear_table_tail('vacancy_view', 'public', 'count_', 25053, 25) Now 50th line of the procedure is: --trying update all tuples in current page range FOR _new_tid IN EXECUTE 'UPDATE ONLY ' || quote_ident(arg_relnamespace) || '.' || quote_ident(arg_relname) || ' SET ' || arg_field || ' = ' || arg_field || ' WHERE ctid=ANY($1) RETURNING ctid' USING _current_tid_list LOOP This error usually happen with another error from Slony side (this error easy to understand and not an problem at all): 2010-12-21 05:15:08 MSK 24679 sl...@hh from [local] [vxid:37/4336609 txid:20671079] [UPDATE] ERROR: could not serialize access due to concurrent update 2010-12-21 05:15:08 MSK 24679 sl...@hh from [local] [vxid:37/4336609 txid:20671079] [UPDATE] STATEMENT: update only public.vacancy_view set count_='1428' where vacancy_id='3568635'; ... I think there some strange concurrency issues with pl/pgsql and slony, and I don't understand reason of that error. This error not easy to reproduce but happens for me 3 times during tests on 130GB slave database run with my utility. Anyone can enlighten me what happens here? -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #5649: strange (probably bugged) explain analyze output
The following bug has been logged online: Bug reference: 5649 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 8.4.4 Operating system: FreeBSD 7.2 Description:strange (probably bugged) explain analyze output Details: I not sure it is actual bug. But it is look like bug for me in expain analyze output (or even in plan but unlikely). All tables was analyzed just before test. All statistic data look reasonable good for me. Query plan is the same for the master and the slave servers so it isn't some single server random error. Here is details (most simplified version of very long analytical query where effect can be observed): EXPLAIN ANALYZE SELECT * FROM resstat_2010_08 JOIN services ON resstat_2010_08.service_id=services.id JOIN service_types ON service_types.shortname = services.shortname LEFT JOIN plan_s ON service_types.abonent_service=true AND plan_s.shortname=service_types.shortname AND resstat_2010_08.tarif_id=plan_s.tarif_id QUERY PLAN Merge Left Join (cost=85621.96..88523.79 rows=277396 width=572) (actual time=20128.316..93314.317 rows=272430 loops=1) Merge Cond: ((resstat_2010_08.tarif_id = plan_s.tarif_id) AND ((service_types.shortname)::text = (plan_s.shortname)::text)) Join Filter: service_types.abonent_service - Sort (cost=85302.62..85996.11 rows=277396 width=411) (actual time=20110.866..20517.657 rows=272159 loops=1) Sort Key: resstat_2010_08.tarif_id, service_types.shortname Sort Method: quicksort Memory: 147305kB - Hash Join (cost=51.47..60223.82 rows=277396 width=411) (actual time=3.413..16929.164 rows=272159 loops=1) Hash Cond: ((services.shortname)::text = (service_types.shortname)::text) - Merge Join (cost=42.97..56401.12 rows=277396 width=246) (actual time=1.967..15198.751 rows=272159 loops=1) Merge Cond: (resstat_2010_08.service_id = services.id) - Index Scan using resstat_2010_08_service_id_key on resstat_2010_08 (cost=0.00..4260.35 rows=277396 width=83) (actual time=0.025..874.850 rows=277396 loops=1) - Index Scan using services_pkey on services (cost=0.00..42226.11 rows=2770621 width=163) (actual time=0.017..7966.282 rows=2812840 loops=1) - Hash (cost=3.83..3.83 rows=374 width=165) (actual time=1.421..1.421 rows=374 loops=1) - Seq Scan on service_types (cost=0.00..3.83 rows=374 width=165) (actual time=0.016..0.605 rows=374 loops=1) - Sort (cost=233.86..242.35 rows=3399 width=161) (actual time=15.138..26960.441 rows=19164919 loops=1) Sort Key: plan_s.tarif_id, plan_s.shortname Sort Method: quicksort Memory: 604kB - Seq Scan on plan_s (cost=0.00..34.49 rows=3399 width=161) (actual time=0.021..5.129 rows=3399 loops=1) Total runtime: 93715.497 ms All plan look reasonable for me except that part: - Sort (cost=233.86..242.35 rows=3399 width=161) (actual time=15.138..26960.441 rows=19164919 loops=1) Sort Key: plan_s.tarif_id, plan_s.shortname Sort Method: quicksort Memory: 604kB - Seq Scan on plan_s (cost=0.00..34.49 rows=3399 width=161) (actual time=0.021..5.129 rows=3399 loops=1) It looks very strange, in that part of the query 3399 rows of result was translated to 19164920 rows after sort. And that sort took almost 27 seconds. And one more strange things is merge join with that result took almost 50 seconds (more then half time of total work time). I feel here something definitely wrong here, but I can't say what. PS: all field types in the query except shortname have integer type, and shortname have type character varying(32). -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #5503: error in trigger function with dropped columns
HI all. Look like no one think this behavior is bug. Then need change documentation probably, because in http://www.postgresql.org/docs/8.4/interactive/plpgsql-trigger.html specified: To alter the row to be stored, it is possible to replace single values directly in NEW and return the modified NEW, or to build a complete new record/row to return. But in reality returning record or row doesn't work in insert trigger at all in case of target table contained dropped columns. Another interesting test: CREATE TABLE test1 as select * from test; now test1 table have the same structure as test and try construct row instead of record: CREATE OR REPLACE FUNCTION test_function() RETURNS trigger AS $$ DECLARE _row test1%ROWTYPE; BEGIN RAISE NOTICE 'NEW record = %', NEW; SELECT * INTO _row FROM test1 limit 1; RAISE NOTICE '_row record = %', _row; RETURN _row; END; $$ LANGUAGE plpgsql; CREATE TRIGGER test_trigger before insert on test for each row EXECUTE PROCEDURE test_function(); INSERT INTO test values (1); NOTICE: NEW record = (1) NOTICE: _row record = (1) ERROR: returned row structure does not match the structure of the triggering table DETAIL: Number of returned columns (1) does not match expected column count (3). CONTEXT: PL/pgSQL function test_function during function exit So i can't return record, and i can return row from table of the same structure. And that all because trigger function somehow think need return all columns of table including dropped column. If this behavior is not a bug, than documentation should be changed (because or to build a complete new record/row to return will never work if table contained dropped columns). On Mon, Jun 14, 2010 at 11:20 AM, Maxim Boguk maxim.bo...@gmail.com wrote: I see... but anyway this bug does not allow use return record value from a trigger in table contained at least one dropped column, and even worse trigger will work on fresh loaded copy of production database and would pass all possible tests, but on production database it is stop working. Moreover, full functional system can become broken if single column dropped from table contained such trigger. E.g. functionality of such trigger depends of dropped column history of the table, which is wrong (IMHO). I was tried another test trigger on table with dropped column, and get even more funny results (trigger awaiting return record contained all rows from table include dropped so I tried construct such record): CREATE OR REPLACE FUNCTION test_function() RETURNS trigger AS $$ DECLARE _row record; BEGIN RAISE NOTICE 'NEW record = %', NEW; SELECT *,2,3 INTO _row FROM test limit 1; RAISE NOTICE '_row record = %', _row; RETURN _row; END; $$ LANGUAGE plpgsql; postgres=# insert into test values (1); NOTICE: NEW record = (1) NOTICE: _row record = (1,2,3) ERROR: returned row structure does not match the structure of the triggering table DETAIL: Returned type integer does not match expected type N/A (dropped column) in column pg.dropped.2. CONTEXT: PL/pgSQL function test_function during function exit I think changes in 9.0 now mask actual bug instead of fix it. If I was wrong, still would be useful to know how to use return record from trigger function in that case, because I can't make a working version at all. On Mon, Jun 14, 2010 at 4:09 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maksym Boguk maxim.bo...@gmail.com writes: This bug hard to describe. But in general if a table contained dropped columns you cannot use return record variable in trigger function. This is fixed for 9.0 ... or at least the specific test case you provide doesn't fail. We have not risked back-porting the change though, because there are other aspects of what the new code does that might cause people problems, eg http://archives.postgresql.org/pgsql-hackers/2010-03/msg00444.php http://archives.postgresql.org/message-id/6645.1267926...@sss.pgh.pa.us regards, tom lane -- Maxim Boguk Senior Postgresql DBA. Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все. -- Maxim Boguk Senior Postgresql DBA. Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #5503: error in trigger function with dropped columns
I see... but anyway this bug does not allow use return record value from a trigger in table contained at least one dropped column, and even worse trigger will work on fresh loaded copy of production database and would pass all possible tests, but on production database it is stop working. Moreover, full functional system can become broken if single column dropped from table contained such trigger. E.g. functionality of such trigger depends of dropped column history of the table, which is wrong (IMHO). I was tried another test trigger on table with dropped column, and get even more funny results (trigger awaiting return record contained all rows from table include dropped so I tried construct such record): CREATE OR REPLACE FUNCTION test_function() RETURNS trigger AS $$ DECLARE _row record; BEGIN RAISE NOTICE 'NEW record = %', NEW; SELECT *,2,3 INTO _row FROM test limit 1; RAISE NOTICE '_row record = %', _row; RETURN _row; END; $$ LANGUAGE plpgsql; postgres=# insert into test values (1); NOTICE: NEW record = (1) NOTICE: _row record = (1,2,3) ERROR: returned row structure does not match the structure of the triggering table DETAIL: Returned type integer does not match expected type N/A (dropped column) in column pg.dropped.2. CONTEXT: PL/pgSQL function test_function during function exit I think changes in 9.0 now mask actual bug instead of fix it. If I was wrong, still would be useful to know how to use return record from trigger function in that case, because I can't make a working version at all. On Mon, Jun 14, 2010 at 4:09 AM, Tom Lane t...@sss.pgh.pa.us wrote: Maksym Boguk maxim.bo...@gmail.com writes: This bug hard to describe. But in general if a table contained dropped columns you cannot use return record variable in trigger function. This is fixed for 9.0 ... or at least the specific test case you provide doesn't fail. We have not risked back-porting the change though, because there are other aspects of what the new code does that might cause people problems, eg http://archives.postgresql.org/pgsql-hackers/2010-03/msg00444.php http://archives.postgresql.org/message-id/6645.1267926...@sss.pgh.pa.us regards, tom lane -- Maxim Boguk Senior Postgresql DBA. Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk МойКруг: http://mboguk.moikrug.ru/ Сила солому ломит, но не все в нашей жизни - солома, да и сила далеко не все. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #5328: GIN index with fastupdates=on provide wrong result on bitmap scan
The following bug has been logged online: Bug reference: 5328 Logged by: Maxim Boguk Email address: maxim.bo...@gmail.com PostgreSQL version: 8.4.1 Operating system: linux kernel v 2.6.18 Description:GIN index with fastupdates=on provide wrong result on bitmap scan Details: I cannot create self-containing test case because fresh loaded table from dump don't have such effect. Bug itself is returning some extra unrelated tuples from bitmap scan over complex GIN index. Bug not random server issue (same problem exists on the master database and on both londiste slaves). Here is query which show bug: SELECT file_type_did FROM file WHERE file.file_type_did='6903057'::bigint AND file.obj_status_did = 1 AND (file.obj_tsvector @@ (make_tsquery('(море)', 'utf8_russian'))) offset 0 ) AS t where t.file_type_did'6903057'::bigint; --- 6906314 6902260 6906314 6906314 6902260 6902260 ... 6902260 6906314 (83 rows) E.g. inner query returned 83 rows with complete unrelated file_type_did Now let me show explain of an inner query: db=# EXPLAIN SELECT file_type_did FROM db-# file db-# WHERE db-# file.file_type_did='6903057'::bigint db-# AND file.obj_status_did = 1 db-# AND (file.obj_tsvector @@ (make_tsquery('(море)', 'utf8_russian'))) db-# offset 0; QUERY PLAN -- Limit (cost=58.47..722.08 rows=5253 width=8) - Bitmap Heap Scan on file (cost=58.47..722.08 rows=5253 width=8) Recheck Cond: ((file_type_did = 6903057::bigint) AND (obj_tsvector @@ '''море'''::tsquery) AND (obj_status_did = 1)) - Bitmap Index Scan on i_file__photo_search (cost=0.00..57.16 rows=5253 width=0) Index Cond: ((file_type_did = 6903057::bigint) AND (obj_tsvector @@ '''море'''::tsquery)) Description of used GIN index: i_file__photo_search gin (file_type_did, photo_vislvl, obj_tsvector) WHERE obj_status_did = 1 where is related table fields is: file_type_did | bigint | not null photo_vislvl | smallint | not null default 9 obj_tsvector | tsvector | not null default ''::tsvector Additional information: 1)doing reindex index i_file__photo_search; remove an problem for a short time (1-60 minutes depend user=update activity on the table). 2)VACUUM files; have the same effect for the same time. 3)8.3 server don't have such issues My idea is bug related to GIN fastupdate engine which was implemented in 8.4 version. I will do ALTER INDEX i_file__photo_search set (fastupdate=off); and VACUUM file; on one of slaves and check results in next 24 hours to confirm or contradict my idea. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs