Re: [BUGS] BUG #8496: psqlODBC driver does not work well via Excel
On 02.10.2013 14:57, manindra.sar...@brightnorth.co.uk wrote: Excel does not seem to respond with any data - but does give an idea that it has made some sort of a connection with the database. SQL commands fail from being executed. I'm afraid you'll have to provide a lot more details for anyone to be able to help you. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8468: Create index on type tstzrange fail
On 24.09.2013 14:42, marian.kruc...@gmail.com wrote: CREATE INDEX ON tstzrange fail on 9.3.0 and 9.2.4 - default postgres configuration. It ate whole memory and was killed by oom. Example: postgres=# CREATE TABLE range_test AS SELECT tstzrange(t, t+'1min') tr FROM generate_series('2000-1-1'::TIMESTAMPTZ, '2010-1-1'::TIMESTAMPTZ, '1min') AS t1(t); SELECT 5260321 postgres=# CREATE INDEX ON range_test(tr); WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. The connection to the server was lost. Attempting reset: Failed. A-ha, the comparison function of range datatypes, range_cmp(), detoasts its arguments, but fails to free the detoasted copies. Functions are normally not required to free such temporary copies - the memory is usually leaked into a short-lived memory context that will be quickly free'd anyway - but B-tree comparison operators are expected to not leak. Committed a fix, it will appear in the next minor releases. Thanks for the report! - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] Bogus pg_class.relminmxid value for pg_database
Right after a fresh initdb on a 9.3 or master database: postgres=# select oid,relname,relminmxid from pg_class where relname='pg_database'; oid | relname | relminmxid --+-+ 1262 | pg_database | 4244967297 (1 row) That bogus value seems to appear already some time during initdb. The relminmxid for all other tables is 1, which is correct. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8450: pg_basebackup blocks until WAL archiving successful
On 13.09.2013 15:13, stu...@stuartbishop.net wrote: pg_basebackup blocks until all necessary WAL files have been archived by archive_command. This can take a few minutes under normal circumstances, and indefinitely if archive_command is failing. I would like to be able to disable this check, as I am streaming backups to a system that can time out my connection if it does not receive new data for a short while. This makes unattended backups problematic. I can see why you'd want that, but it seems equally problematic to let pg_basebackup return, when the WAL files haven't been archived yet and you therefore don't in fact have valid, restorable backup yet. Have you considered using the --xlog-method=stream option, to include the WAL files in the backup? That will make your backups somewhat larger, as the WAL files are included, but in that mode pg_basebackup won't wait for the archival and the backup will be restorable even if archive_command is failing. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8465: major dump/reload problem
On 21.09.2013 20:16, jan.m...@inf-it.com wrote: today I tried to upgrade from 9.2 to 9.3 (pg_upgradecluster 9.2 main) and the upgrade of one of my databases failed with the following error: "ERROR: new row for relation ... violates check constraint ...". I created an example to reproduce this bug: http://www.inf-it.com/fixes/postgres-bugreport2/schema.sql The problem is that when the database is dumped with pg_dump and reloaded, the activity table is loaded first, and codebook table second. The check constraint checks that when a row is inserted into activity table, the corresponding row exists in codebook table, which clearly isn't true if the activity table is loaded first and the codebook table is still empty. The system doesn't know about that dependency since it's all implemented in the PL/pgSQL code. With a constraint like that, you would also get an unrestorable dump if you e.g deleted a row from codebook table after loading the activities. Usually you would implement a schema like that using foreign keys. That would be less code, and the system would automatically get the dump order correct. I would recommend that over a check constraint, if possible. As a work-around, you can drop the constraints from the database before upgrading, and restore them afterwards. The problem isn't really related to upgrade per se, BTW. Running pg_dump + restore even on the same version will give you the same error. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8453: uninitialized memory access in pg_receivexlog and other bugs
On 16.09.2013 22:59, Andrew Gierth wrote: "Heikki" == Heikki Linnakangas writes: Heikki> Attached is a patch to fix both of these issues. I'm too Heikki> tired right now to thoroughly test it and commit, so I'll get Heikki> back to this tomorrow. Meanwhile, please take a look and let Heikki> me know if you can see something wrong. A quick eyeball check looks ok; I'll see about reproducing the original scenario with this patch applied. Committed, thanks for the report! If you still have a chance to try it with the original scenario, please do. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8453: uninitialized memory access in pg_receivexlog and other bugs
On 15.09.2013 15:02, and...@tao11.riddles.org.uk wrote: The following bug has been logged on the website: Bug reference: 8453 Logged by: Andrew Gierth Email address: and...@tao11.riddles.org.uk PostgreSQL version: 9.3.0 Operating system: any Description: The first snprintf in writeTimeLineHistoryFile in receivelog.c accesses uninitialized data in the "path" variable, thus creating the .tmp file in a random place (usually the current dir, leading to unexpected EXDEV errors on the rename). Ouch, that was a silly bug! Also, receivexlog is ignoring .partial and .history files when determining which timeline to start streaming from, which means that if there are two timeline changes that are not separated by a WAL segment switch, it will fail to operate due to attempting to start from a too-old timeline (for which xlogs are not available on the server). There's nothing we can do with .history files here. The point is to find out how far we have already received WAL, and the presence of a .history file doesn't tell you anything about that. There is a comment about .partial files though: /* * Check if the filename looks like an xlog file, or a .partial file. * Xlog files are always 24 characters, and .partial files are 32 * characters. */ if (strlen(dirent->d_name) != 24 || strspn(dirent->d_name, "0123456789ABCDEF") != 24) continue; The comment says that .partial files are taken into account, but the code doesn't match the comment. Attached is a patch to fix both of these issues. I'm too tired right now to thoroughly test it and commit, so I'll get back to this tomorrow. Meanwhile, please take a look and let me know if you can see something wrong. - Heikki diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c index 787a395..ca89438 100644 --- a/src/bin/pg_basebackup/pg_receivexlog.c +++ b/src/bin/pg_basebackup/pg_receivexlog.c @@ -121,6 +121,7 @@ FindStreamingStart(uint32 *tli) struct dirent *dirent; XLogSegNo high_segno = 0; uint32 high_tli = 0; + bool high_ispartial = false; dir = opendir(basedir); if (dir == NULL) @@ -132,20 +133,33 @@ FindStreamingStart(uint32 *tli) while ((dirent = readdir(dir)) != NULL) { - char fullpath[MAXPGPATH]; - struct stat statbuf; uint32 tli; unsigned int log, seg; XLogSegNo segno; + bool ispartial; /* * Check if the filename looks like an xlog file, or a .partial file. * Xlog files are always 24 characters, and .partial files are 32 * characters. */ - if (strlen(dirent->d_name) != 24 || - strspn(dirent->d_name, "0123456789ABCDEF") != 24) + if (strlen(dirent->d_name) == 24) + { + if (strspn(dirent->d_name, "0123456789ABCDEF") != 24) +continue; + ispartial = false; + } + else if (strlen(dirent->d_name) == 32) + { + if (strspn(dirent->d_name, "0123456789ABCDEF") != 24) +continue; + if (strcmp(&dirent->d_name[24], ".partial") != 0) +continue; + + ispartial = true; + } + else continue; /* @@ -160,31 +174,40 @@ FindStreamingStart(uint32 *tli) } segno = ((uint64) log) << 32 | seg; - /* Check if this is a completed segment or not */ - snprintf(fullpath, sizeof(fullpath), "%s/%s", basedir, dirent->d_name); - if (stat(fullpath, &statbuf) != 0) + /* + * Check that the segment has the right size, if it's supposed to be + * completed. + */ + if (!ispartial) { - fprintf(stderr, _("%s: could not stat file \"%s\": %s\n"), - progname, fullpath, strerror(errno)); - disconnect_and_exit(1); - } + struct stat statbuf; + char fullpath[MAXPGPATH]; - if (statbuf.st_size == XLOG_SEG_SIZE) - { - /* Completed segment */ - if (segno > high_segno || (segno == high_segno && tli > high_tli)) + snprintf(fullpath, sizeof(fullpath), "%s/%s", basedir, dirent->d_name); + if (stat(fullpath, &statbuf) != 0) + { +fprintf(stderr, _("%s: could not stat file \"%s\": %s\n"), + progname, fullpath, strerror(errno)); +disconnect_and_exit(1); + } + + if (statbuf.st_size != XLOG_SEG_SIZE) { -high_segno = segno; -high_tli = tli; +fprintf(stderr, + _("%s: segment file \"%s\" has incorrect size %d, skipping\n"), + progname, dirent->d_name, (int) statbuf.st_size); continue; } } - else + + /* Looks like a valid segment. Remember that we saw it */ + if ((segno > high_segno) || + (segno == high_segno && tli > high_tli) || + (segno == high_segno && tli == high_tli && high_ispartial && !ispartial)) { - fprintf(stderr, - _("%s: segment file \"%s\" has incorrect size %d, skipping\n"), - progname, dirent->d_name, (int) statbuf.st_size); - continue; + high_segno = segno; + high_tli = tli; + high_ispartial = ispartial; } } @@ -195,10 +218,12 @@ FindStreamingStart(uint32 *t
Re: [BUGS] BUG #8405: user can create a system table(eg, pg_class)
On 28.08.2013 05:01, syxj...@gmail.com wrote: postgres=# create table pg_class(a int); CREATE TABLE postgres=# insert into pg_class values (1); ERROR: null value in column "relnamespace" violates not-null constraint DETAIL: Failing row contains (1, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null). That's not a bug. The table created is in the public schema, while all the system tables are in pg_catalog schema. Yes, it's possible to create a table in another schema with the same name. To avoid hijacking applications that access the system tables, pg_catalog is implicitly in front of search_path, if it's not listed there explicitly. So when you do the insert, it refers to pg_catalog.pg_class. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7494: WAL replay speed depends heavily on the shared_buffers size
On 28.08.2013 02:28, Valentine Gogichashvili wrote: Running this sproc on the master: CREATE OR REPLACE FUNCTION public.f() RETURNS integer LANGUAGE plpgsql AS $function$ begin CREATE TEMP TABLE temp_table_to_test_replication AS SELECT s.i as id from generate_series(1, 100) as s(i); DROP TABLE temp_table_to_test_replication; RETURN 1; end; $function$ leads to writing of WAL files. Is it an expected behavior? Is it expected that WAL files are filled when the only thing, that sproc is supposed to do is to create and drop a temporary table. Are these catalog changes? Yep, creating/dropping temp tables are catalog changes, which are WAL-logged. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8404: JDBC block hot standby replication?
On 28.08.2013 12:46, Valentine Gogichashvili wrote: Hello This is a well documented feature of Hot-Standby Replication. see: http://www.postgresql.org/docs/9.2/static/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-STANDBY That explains how streaming delay and query cancellations work, but the OP's question was why running the same statements over JDBC behaves differently than running them over psql. I'm afraid I don't know the answer. One guess is that when you make the JDBC connection, you have asked for repeatable read or serializable isolation level, while psql runs in read committed mode. Running a "show transaction_isolation" in both would show if that's the case. I'd also suggest doing "select * from pg_stat_activity" to see if the session looks the same in both cases. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG 5199:ERROR: cannot override frame clause of window xx
On 17.08.2013 12:37, Jov wrote: http://postgresql.1045698.n5.nabble.com/BUG-5199-Window-frame-clause-wrong-behaviour-td2131492.html tody I hit by this bug and can't find any doc about this, after google I find the bug 5199 reported 4 years ago. it is really nice to improve the error message or doc. Agreed. Now that you've just run into, could you suggest some docs changes to explain that? A patch would be nice, or just type the suggested paragraphs in a reply email and I can incorporate them in the docs. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8387: Error while make of the source code
On 19.08.2013 13:33, nitishsaur...@gmail.com wrote: The following bug has been logged on the website: Bug reference: 8387 Logged by: Nitish Email address: nitishsaur...@gmail.com PostgreSQL version: 9.2.4 Operating system: AIX7.1 Description: Gives Error while compiling (configure goes well but while running make it gives error) the source code as shown below. Is there any compatibility issue between PostgreSQl 9.2.4 and AIX7.1 ? Not that I'm aware of. Then again, there is no AIX 7.1 box in the buildfarm, the closes thing is "grebe" running AIX 5.3 (see http://buildfarm.postgresql.org/cgi-bin/show_status.pl). - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [JDBC] [BUGS] Incorrect response code after XA recovery
On 05.08.2013 17:58, Jeremy Whiting wrote: Hello Tom, A quick update on progress. A second PR was created to provide a patch. https://github.com/pgjdbc/pgjdbc/pull/76 Thanks. Looks good to me. I wish the backend would throw a more specific error code for this, 42704 is used for many other errors as well. But at COMMIT/ROLLBACK PREPARED, it's probably safe to assume that it means that the transaction does not exist. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8294: new timeline 6 forked off current dat abase system timeline 5 before current recovery point 0/100000
(Quoted pg_controldata output edited to highlight the important parts) On 11.07.2013 06:26, dig...@126.com wrote: PostgreSQL 9.3 beta2 stream replication primary and standby cann't switchover. ... Primary : psql checkpont; pg_controldata ... Database cluster state: in production Latest checkpoint location: 0/C60 Prior checkpoint location:0/B60 Latest checkpoint's REDO location:0/C28 Latest checkpoint's TimeLineID: 4 Latest checkpoint's PrevTimeLineID: 4 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 Standby : Database cluster state: in archive recovery Latest checkpoint location: 0/B60 Prior checkpoint location:0/B60 Latest checkpoint's REDO location:0/B28 Latest checkpoint's TimeLineID: 4 Latest checkpoint's PrevTimeLineID: 4 Minimum recovery ending location: 0/BF0 Min recovery ending loc's timeline: 4 So at this point you have a primary, and a standby server following the primary through streaming replication. Primary : pg_ctl stop waiting for server to shut down done server stopped pg_controldata Database cluster state: shut down Latest checkpoint location: 0/D28 Prior checkpoint location:0/C60 Latest checkpoint's REDO location:0/D28 Latest checkpoint's TimeLineID: 4 Latest checkpoint's PrevTimeLineID: 4 Minimum recovery ending location: 0/0 Min recovery ending loc's timeline: 0 standby : pg_controldata Database cluster state: in archive recovery Latest checkpoint location: 0/B60 Prior checkpoint location:0/B60 Latest checkpoint's REDO location:0/B28 Latest checkpoint's TimeLineID: 4 Latest checkpoint's PrevTimeLineID: 4 Minimum recovery ending location: 0/BF0 Min recovery ending loc's timeline: 4 pg_ctl promote server promoting pg_controldata Database cluster state: in production Latest checkpoint location: 0/D90 Prior checkpoint location:0/B60 Latest checkpoint's REDO location:0/D58 Latest checkpoint's TimeLineID: 5 Latest checkpoint's PrevTimeLineID: 5 The primary is shut down, and the standby is promoted to become new master. primary : cd $PGDATA mv recovery.done recovery.conf pg_ctl start log : 2013-07-11 11:10:11.386 CST,,,14911,,51de2213.3a3f,1,,2013-07-11 11:10:11 CST,,0,LOG,0,"database system was shut down in recovery at 2013-07-11 11:09:51 CST""StartupXLOG, xlog.c:4895","" 2013-07-11 11:10:11.387 CST,,,14911,,51de2213.3a3f,2,,2013-07-11 11:10:11 CST,,0,LOG,0,"entering standby mode""StartupXLOG, xlog.c:4968","" 2013-07-11 11:10:11.391 CST,,,14911,,51de2213.3a3f,3,,2013-07-11 11:10:11 CST,1/0,0,LOG,0,"consistent recovery state reached at 0/D90""CheckRecoveryConsistency, xlog.c:6187","" 2013-07-11 11:10:11.391 CST,,,14911,,51de2213.3a3f,4,,2013-07-11 11:10:11 CST,1/0,0,LOG,0,"record with zero length at 0/D90""ReadRecord, xlog.c:3285","" 2013-07-11 11:10:11.392 CST,,,14909,,51de2213.3a3d,1,,2013-07-11 11:10:11 CST,,0,LOG,0,"database system is ready to accept read only connections""sigusr1_handler, postmaster.c:4658","" 2013-07-11 11:10:11.407 CST,,,14915,,51de2213.3a43,1,,2013-07-11 11:10:11 CST,,0,LOG,0,"fetching timeline history file for timeline 5 from primary server""WalRcvFetchTimeLineHistoryFiles, walreceiver.c:666","" 2013-07-11 11:10:11.411 CST,,,14915,,51de2213.3a43,2,,2013-07-11 11:10:11 CST,,0,LOG,0,"primary server contains no more WAL on requested timeline 4""WalReceiverMain, walreceiver.c:529","" 2013-07-11 11:10:11.411 CST,,,14911,,51de2213.3a3f,5,,2013-07-11 11:10:11 CST,1/0,0,LOG,0,"new timeline 5 forked off current database system timeline 4 before current recovery point 0/D90""rescanLatestTimeLine, xlog.c:3441","" The old primary is restarted in standby-mode. It tries to connect to the new primary, but it refuses to follow it because there is some WAL applied on the old primary that was not replicated to the new primary before the switchover. I believe this is the same issue that Fujii reported in June: http://www.postgresql.org/message-id/CAHGQGwHLjEROTMtSWJd=xg_vfwre3ojwntysybdubrya6rr...@mail.gmail.com. This was fixed in commit 0b958f3efcfcc3d9b0e39d550b705a28763bc9e2 on June 25th: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0b958f3efcfcc3d9b0e39d550b705a28763bc9e2 Unfortunately that commit didn't make it into 9.3beta2, which was wrapped just one day before that commit. Could you try again with a fresh checkout from REL9_3_STABLE branch? Or if you can't easily build from sources, you can wait for the 9.3rc1 release, which should be available later this week (http://www.postgresql.org/message-id/24973.1376419...@sss.pgh.pa.us). Thanks for the r
Re: [BUGS] Postgres crash? could not write to log file: No space left on device
On 26.06.2013 17:15, Tom Lane wrote: Heikki Linnakangas writes: We've discussed retrying short writes before, and IIRC Tom has argued that it shouldn't be necessary when writing to disk. Nevertheless, I think we should retry in XLogWrite(). It can write much bigger chunks than most write() calls, so there's more room for a short write to happen there if it can happen at all. Secondly, it PANICs on failure, so it would be nice to try a bit harder to avoid that. Seems reasonable. My concern about the idea in general was the impossibility of being sure we'd protected every single write() call. But if we can identify specific call sites that seem at more risk than most, I'm okay with adding extra logic there. Committed a patch to add retry loop to XLogWrite(). I noticed that FileWrite() has some additional Windows-specific code to also retry on an ERROR_NO_SYSTEM_RESOURCES error. That's a bit scary, because we don't check for that in any other write() calls in the backend. If we really need to be prepared for that on Windows, I think that would need to be in a wrapper function in src/port or src/backend/port. Would a Windows-person like to comment on that? - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8272: Unable to connect to diffrent schema from jdbc
On 01.07.2013 00:41, emes...@redhat.com wrote: Hi Postgres does not support currently defining the schema in teh connection parameters which makes it imposible to seprate the database to several schemas and connect to the right one from the application. There is already a thread discussing that and a suggested patch that is fixing that. See http://www.postgresql.org/message-id/4873f034.8010...@scharp.org This is not a bug, but a feature request. I agree that would be a nice feature. Patches are welcome, on the pgsql-jdbc mailing list. As a work-around, you can set up a different db user for each schema, and set search_path as a per-user setting in the server. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Postgres crash? could not write to log file: No space left on device
On 26.06.2013 15:21, Andres Freund wrote: On 2013-06-26 13:14:37 +0100, Greg Stark wrote: On Wed, Jun 26, 2013 at 12:57 AM, Tom Lane wrote: (Though if it is, it's not apparent why such failures would only be manifesting on the pg_xlog files and not for anything else.) Well data files are only ever written to in 8k chunks. Maybe these errors are only occuring on>8k xlog records such as records with multiple full page images. I'm not sure how much we write for other types of files but they won't be written to as frequently as xlog or data files and might not cause errors that are as noticeable. We only write xlog in XLOG_BLCKSZ units - which is 8kb by default as well... Actually, XLogWrite() writes multiple pages at once. If all wal_buffers are dirty, it can try to write them all in one write() call. We've discussed retrying short writes before, and IIRC Tom has argued that it shouldn't be necessary when writing to disk. Nevertheless, I think we should retry in XLogWrite(). It can write much bigger chunks than most write() calls, so there's more room for a short write to happen there if it can happen at all. Secondly, it PANICs on failure, so it would be nice to try a bit harder to avoid that. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] [ODBC] Segmentation Fault in Postgres server when using psqlODBC
On 11.06.2013 19:04, Joshua Berry wrote: Hiroshi Inoue has developed the attached patch to correct the issue that was reported. More of the dialogue can be found in the pgsql-odbc list. I tried to follow that thread over at pgsql-odbc, but couldn't quite understand what the problem is. Did you have a test program to reproduce it? Or failing that, what is the sequence of protocol messages that causes the problem? - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Memory-leak in BackgroundWriter(and Checkpointer)
On 04.06.2013 15:27, Stephen Frost wrote: * Naoya Anzai (anzai-na...@mxu.nes.nec.co.jp) wrote: I've found a memory-leak bug in PostgreSQL 9.1.9's background writer process. This looks legit, but probably not the right approach to fixing it. Looks like it'd be better to work out a way to use a static variable to reuse the same memory, ala what GetRunningTransactionData() does, and avoid having to do allocation while holding all the locks (or at least, not very often). I can't get too excited about the overhead of a single palloc here. It's a fairly heavy operation anyway, and only runs once per checkpoint. And we haven't heard any actual complaints of latency hiccups with wal_level=hot_standby. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] COPY .... (FORMAT binary) syntax doesn't work
On 26.05.2013 04:31, Simon Riggs wrote: This works fine... COPY pgbench_accounts TO '/tmp/acc' BINARY; This new format does not COPY pgbench_accounts FROM '/tmp/acc' (FORMAT BINARY); ERROR: syntax error at or near "BINARY" at character 47 which looks like I've mistyped something. Until you realise that this statement gives a completely different error message. COPY pgbench_accounts FROM '/tmp/acc' (FORMAT anyname); ERROR: COPY format "anyname" not recognized and we also note that there are no examples in the docs, nor regression tests to cover this situation. So I conclude that this hasn't ever worked since it was introduced in 9.0. The cause is that there is an overlap between the old and the new COPY syntax, relating to the word BINARY. It's the grammar generating the error, not post parse analysis. Hmm, the problem is that BINARY is a type_func_keyword, so it doesn't match the ColId rule used to capture the format argument. My attempts to fix that look pretty ugly, so I'm not even going to post them. I can stop the error on binary by causing errors on csv and text, obviously not a fix. Any grammar based fix looks like it would restrict the list of formats, which breaks the orginal intention of the syntax change. This seems to work: --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -2528,3 +2528,7 @@ copy_generic_opt_elem: { $$ = makeDefElem($1, $2); } + | ColLabel BINARY + { + $$ = makeDefElem($1, (Node *) makeString("binary")); + } Am I missing something? - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 out of 1000 times.
On 21.05.2013 18:20, lal...@fhcrc.org wrote: The following bug has been logged on the website: Bug reference: 8173 Logged by: Lloyd Albin Email address: lal...@fhcrc.org PostgreSQL version: 9.2.4 Operating system: SUSE Linux (64-bit) Description: During testing for our 9.2 upgrade, we found that the error messages we were expecting did not match what was given by the program. In looking over the revision notes from our current version of 9.0.12 through the 9.2.4, that we are testing, I believe that I have tracked down the issue to "Improve COPY performance by adding tuples to the heap in batches". When I looked at the patch code in http://www.postgresql.org/message-id/4e708759.40...@enterprisedb.com I found that you are inserting 1000 rows at a time. The problem is that on failure, you return either row 1000 or the last line, whichever comes first. This can be confusing as you will see in the demo code below. CREATE TABLE public.table1 ( key INTEGER, PRIMARY KEY(key) ); Create a csv file with only one column of data, numbered from 1 to 1008. Make two copies of the file and name them csv_test.csv and csv_test2.csv. Edit csv_test.csv and change the entry 1000 to 500. Edit csv_test2.csv and change the entry 900 to 500. On 9.0.12 Server COPY public.table1 FROM 'csv_test.csv'; ERROR: duplicate key value violates unique constraint "table1_pkey" DETAIL: Key (key)=(500) already exists. CONTEXT: COPY table1, line 1000: "500" COPY public.table1 FROM 'csv_test2.csv'; ERROR: duplicate key value violates unique constraint "table1_pkey" DETAIL: Key (key)=(500) already exists. CONTEXT: COPY table1, line 900: "500" Both times the context gave us the correct information. Now try the same thing on 9.2.4 Server COPY public.table1 FROM 'csv_test.csv'; ERROR: duplicate key value violates unique constraint "table1_pkey" DETAIL: Key (key)=(500) already exists. CONTEXT: COPY table1, line 1000: "500" COPY public.table1 FROM 'csv_test2.csv'; ERROR: duplicate key value violates unique constraint "table1_pkey" DETAIL: Key (key)=(500) already exists. CONTEXT: COPY table1, line 1000: "1000" As you can see, the second test returned the last line of the set of tuples being recorded not the line that actually failed. Make a copy of csv_test2.csv and name it csv_test3.csv. Edit csv_test3.csv and remove all entries after 994. COPY public.table1 FROM 'csv_test3.csv'; ERROR: duplicate key value violates unique constraint "table1_pkey" DETAIL: Key (key)=(500) already exists. CONTEXT: COPY table1, line 995: "" If you are writing less than 1000 lines then it will return the line after the last line with a value of "". Hmm, yeah, it's quite self-evident what's happening; the server reports the last line that was *read*, no the line where the error happened. Committed a fix for this. Unfortunately we only keep the last line read buffered in text format, so after this you'll only get the line number, not the content of that line: postgres=# copy foo from '/tmp/foo'; ERROR: duplicate key value violates unique constraint "foo_pkey" DETAIL: Key (id)=(4500) already exists. CONTEXT: COPY foo, line 4500 Thanks for the report! - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8168: duplicated function signature
On 17.05.2013 18:53, Vladimir Jovanovic wrote: Hi Heikki, Here you can find attached .csv semicolon separated made by :"SELECT * FROM pg_proc WHERE proname LIKE 'sp_get_league_prediction". "\df sp_get_league_prediction" is also attached. Both functions are returning the same setof record: --- CREATE OR REPLACE FUNCTION sp_get_league_prediction(IN _id bigint, IN _rank integer, IN _log_in_expectence double precision, IN _feathers_gained integer, IN _tokens_all integer, IN _tokens_active integer, IN _score integer) RETURNS SETOF record AS $BODY$ ... --- CREATE OR REPLACE FUNCTION sp_get_league_prediction(_id bigint, _rank integer, _log_in_expectence double precision, _feathers_gained integer, _tokens_all integer, _tokens_active integer, _score integer) RETURNS SETOF record AS $BODY$ ... No. One of the functions was created with something like above. But the other one takes no arguments, and *returns* a table with those columns. Try "\ef sp_get_league_prediction()" to get a CREATE OR REPLACE FUNCTION statement to recreate the latter; you will see that it looks something like this: CREATE OR REPLACE FUNCTION public.sp_get_league_prediction() RETURNS TABLE(id bigint, _rank integer, _log_in_expectence double precision, _feathers_gained integer, _tokens_all integer, _tokens_active integer, _score integer) AS ... - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: Bugfix patches in the next CF, was: Re: [BUGS] pg_ctl -D "/absolute/path" -m fast restart - doesn't work in some cases
On 17.05.2013 11:57, Boszormenyi Zoltan wrote: Sorry for diverting the thread but if bugfixes are urgent, you can also look at http://www.postgresql.org/message-id/cabuevewx2dqt45x1phhdzd_khy8hz5a9xo95xvghou2ztyn...@mail.gmail.com and the fix at https://commitfest.postgresql.org/action/patch_view?id=1085 The CF in progress at the time was already closed for new patches. [pg_basebackup with -R option and start standby have problems with escaped password] I replied to that now. Let's get that fixed before next beta. There are other patches marked as a bugfix, like: https://commitfest.postgresql.org/action/patch_view?id=1087 [backend hangs at immediate shutdown] The question here is do we want to take the risk and apply this, at this late stage? On a cursory look, I understand it well enough to have an opinion. https://commitfest.postgresql.org/action/patch_view?id=1076 [fixing pg_ctl with relative paths] Seems reasonable, and I think we should apply this now, to 9.3. The issue with embedded quotes is pretty scary though, maybe we should dig a bit deeper into that to get that fixed too. https://commitfest.postgresql.org/action/patch_view?id=1081 I don't know ldap well enough to comment, but I think we should fix this now, in 9.3, if we want to do this at all. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8168: duplicated function signature
On 17.05.2013 13:31, vladimir.jovano...@aparteko.com wrote: I noticed that I have two functions with the same signature. sp_get_league_prediction(IN _id bigint, IN _rank integer, IN _log_in_expectence double precision, IN _feathers_gained integer, IN _tokens_all integer, IN _tokens_active integer, IN _score integer) sp_get_league_prediction(_id bigint, _rank integer, _log_in_expectence double precision, _feathers_gained integer, _tokens_all integer, _tokens_active integer, _score integer) In addition to higher-level checks, there is a unique index in the pg_proc catalog that should not let that happen, so to be honest, I find that hard to believe. Are you 100% they have the same signature? Are they in different schemas, perhaps? In psql, what does "\df sp_get_league_prediction" return? - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8160: 9.3 Beta 1 Initdb doesn't work
On 14.05.2013 16:48, bnichol...@hp.com wrote: I've installed the 9.3 beta 1 packages (via pgdg repo and yum) and when I try to do an initdb it fails. /var/lib/pgsql/9.3/data has nothing in it when I attempt the initdb : #service postgresql-9.3 initdb Initializing database:[FAILED] # cat /var/lib/pgsql/9.3/pgstartup.log The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "en_US.UTF-8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. fixing permissions on existing directory /var/lib/pgsql/9.3/data ... ok creating directory /var/lib/pgsql/9.3/data/pg_xlog ... ok initdb: could not create symbolic link "/var/lib/pgsql/9.3/data/pg_xlog": File exists There seems to be a bug in the init script. If you don't give a location for pg_xlog on the command line, it uses $PGDATA/pg_xlog, which confuses initdb: # If the xlog directory is specified just after the locale parameter, use it: if [ -z $INITDBXLOGDIR ] then INITDBXLOGSTRING=`echo $PGDATA/pg_xlog` else INITDBXLOGSTRING=`echo $INITDBXLOGDIR` fi > ... # Initialize the database $SU -l postgres -c "$PGENGINE/initdb --pgdata='$PGDATA' --xlogdir=$INITDBXLOGSTRING --auth='ident' $LOCALESTRING" >> "$PGLOG" 2>&1 < /dev/null Seems that if INITDBXLOGDIR is not given, it should just leave out --xlogdir. Devrim? - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()
On 13.05.2013 17:09, Tom Lane wrote: Heikki Linnakangas writes: Would it be possible to be lenient, and also accept . as the decimal separator, when there is no ambiguity? Ie. when . is not the thousands separator. I originally coded it that way, but concluded that it was probably a waste of code space. How many locales can you point to where neither the decimal point nor thousands_sep is "."? On my laptop, there are eight locales that use "," as the decimal separator and " " as the thousands separator. $ grep -l "^thousands_sep.*U00A0" /usr/share/i18n/locales/* | xargs grep -l "^decimal_point.*U002C" /usr/share/i18n/locales/cs_CZ /usr/share/i18n/locales/et_EE /usr/share/i18n/locales/fi_FI /usr/share/i18n/locales/lv_LV /usr/share/i18n/locales/nb_NO /usr/share/i18n/locales/ru_RU /usr/share/i18n/locales/sk_SK /usr/share/i18n/locales/uk_UA Out of these, ru_RU actually uses "." as the LC_MONETARY decimal point, even though it uses "," as the LC_NUMERIC decimal point. I think that strengthens the argument for accepting both. I don't speak Russian, but if you pass a monetary value to TO_NUMBER in ru_RU locale, using "." as the decimal separator, you probably would expect it to work. According to http://en.wikipedia.org/wiki/Decimal_separator#Examples_of_use, many countries accept either "1 234 567,89" or "1.234.567,89" style, but looking at the locale files installed on my system, the latter style is the one actually used (e.g Germany). - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Inconsistency between TO_CHAR() and TO_NUMBER()
On 11.05.2013 01:17, Euler Taveira wrote: On 10-05-2013 13:09, Thomas Kellerer wrote: Tom Lane wrote on 10.05.2013 17:49: I looked into this, and find that the reason it misbehaves is that NUM_numpart_from_char() will treat a '.' as being a decimal point *without any regard to locale considerations*. So even if we have a locale-dependent format string and a locale that says '.' is a thousands separator, it does the wrong thing. It's a bit surprising nobody's complained of this before. I propose the attached patch. I'm slightly worried though about whether this might break any existing applications that are (incorrectly) depending on a D format specifier being able to match '.' regardless of locale. Perhaps we should only apply this to HEAD and not back-patch? +1 only in HEAD. That's because (a) it doesn't crash, (b) it doesn't always produce the "wrong" answer (only in some specific situation) and (c) it has been like that for years without a complain. For those reasons, it is better to continue with this "wrong" behavior in back branches than prevent important security updates to be applied (without applying a patch to preserve the "wrong" answer). This argument is only valid for legacy closed-source apps but seems to have more weight than the bug scenario. +1 for HEAD-only. The Finnish language and locale uses comma (,) as the decimal separator, and it's a real pain in the ass. And if something goes wrong there, it can be *really* subtle. I once had to debug an application where all prices were suddenly rounded down to the nearest euro. And it only happened on some servers (those with locale set to Finnish). It was not a PostgreSQL application, but it turned out to be a bug in the JDBC driver of another DBMS. Would it be possible to be lenient, and also accept . as the decimal separator, when there is no ambiguity? Ie. when . is not the thousands separator. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog
On 08.04.2013 18:58, Jeff Bohmer wrote: On Apr 6, 2013, at 1:24 PM, Jeff Janes wrote: On Sat, Apr 6, 2013 at 1:24 AM, Heikki Linnakangas wrote: Perhaps we should improve the documentation to make it more explicit that backup_label must be included in the backup. The docs already say that, though, so I suspect that people making this mistake have not read the docs very carefully anyway. I don't think the docs are very clear on that. They say "This file will of course be archived as a part of your backup dump file", but "will be" does not imply "must be". Elsewhere it emphasizes that the label you gave to pg_start_backup is written into the file, but doesn't really say what the file itself is there for. To me it seems to imply that the file is there for your convenience, to hold that label, and not as a critical part of the system. Patch attached, which I hope can be back-patched. I'll also add it to commitfest-Next. I think this documentation update would be helpful. Committed that. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8135: current_setting('DateStyle'); does not reflect User setting
On 05.05.2013 21:18, fr...@heuveltop.nl wrote: PostgreSQL version: 9.1.8 Not that it makes any difference for this issue, but you should upgrade to 9.1.9. After ALTER ROLE frank SET TimeZone = 'Europe/Amsterdam'; Where the TimeZone differs from the default timezone. SELECT current_setting('TimeZone'); Gives the correct answer. But this same doesn't work for DateStyle ALTER ROLE frank SET DateStyle = 'SQL, DMY'; Where the DateStyle differs from the default DateStyle SELECT current_setting('DateStyle'); Gives the system/database setting but not the user setting; while the setting does have its effect on the output of date's and timestamps. This might also effect other user settings, but I haven't found any yet. Works for me. Are you sure the value isn't being overridden by a per-database-and-role setting? You can use "select source from pg_settings where name='DateStyle'" to check where the currently effective value came from. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] ISSUE after upgrading to POSTGRES 8.4.8
On 30.04.2013 09:40, Bansal, Pradeep wrote: I have observed the below issue after upgrading to the above mentioned version of postgress. Any help is very much appreciated:- ISSUE:- === SEQUENCE STATEMENTS = SELECT nextval('"serial1"') PQstatus(m_connection) :- CONNECTION OK PQdb name : RAIDSAPROVDB : server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. = The above error is seen after postgress restart/fresh installation couple of times. While executing the same query thrice and so on it runs as expected. I am not able to understand that why it is getting failed initially. kindly let me know if this is an existing bug in the postgress or any fix available for the same. You didn't mention what version you upgraded from, but in any case, you should at least upgrade to the latest 8.4.x minor release, which is 8.4.17. There have been a lot of bugs fixed between 8.4.8 and 8.4.17. I don't know if any of them explain the issue you're seeing, but there's not much point debugging it further before you upgrade. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8091: No permissions on the table file causing recovery failure
On 24.04.2013 08:01, Hari Babu wrote: As the following raised bug is not received by the bugs mailing list. Forwarding the same to mailing list. http://www.postgresql.org/message-id/E1USmqv-0006X0-5X@wrigleys.postgresql.o rg Please check the above defect needs any handling? 1. create table. 2. change the file permissions. 3. Drop table. 4. Restart the server leads to recovery failure. I think the answer to that is "don't do that". There is an arbitrary number of ways you can make the system fail, if you mess with the files in the data directory. This is just one of them. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8106: Redundant function definition in contrib/cube/cube.c
On 23.04.2013 10:54, ams...@cam.ac.uk wrote: The following bug has been logged on the website: Bug reference: 8106 Logged by: Adrian Schreyer Email address: ams...@cam.ac.uk PostgreSQL version: 9.2.4 Operating system: Ubuntu 12.04 LTS Description: The cube.c file in the cube contrib module contains a prototype for the function Datum cube(PG_FUNCTION_ARGS). The prototype seems to be an artifact from an older version because the function is never defined and also never used in the code. Thanks, removed. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] postgres 8.4 PQexec hang on HP-UX
On 12.04.2013 09:41, Singh, Devendra wrote: Hi All, I hit a hang issue in postgres 8.4 query. Hit this issue multiple time on HP-UX. Below is the snapshot of the hang thread. lwpid : 600943 --- 0: c054ced0 : _poll_sys() + 0x30 (/usr/lib/hpux64/libc.so.1) 1: c0561560 : poll() + 0xe0 (/usr/lib/hpux64/libc.so.1) 2: c1f798e0 : pqSocketPoll() at fe-misc.c:1079 3: c1f790e0 : pqWait() at fe-misc.c:1024 4: c1f6ebb0 : PQgetResult() at fe-exec.c:1554 5: c1f6f0a0 : PQexec() at fe-exec.c:1735 Is it a known issue? Any possible workaround ? The first thing that comes to mind is that the query is simply taking a long time to execute, or is being blocked trying to acquire a lock that another process is holding in the server. You'll have to provide a lot more details if you want anyone to make a better guess than that. A simplified test program that reproduces the problem would be best. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog
On 06.04.2013 01:02, Jeff Janes wrote: On Fri, Apr 5, 2013 at 12:27 PM, wrote: I use a custom base backup script to call pg_start/stop_backup() and make the backup with rsync. The restore_command in recovery.conf is never called by PG 9.2.4 during startup. I confirmed this by adding a "touch /tmp/restore_command.`date +%H:%M:%S`" line at the beginning of the shell script I use for my restore_command. No such files are created when starting PG 9.2.4. After downgrading back to 9.2.3, archive recovery works using the very same base backup, recovery.conf file, and restore_command. The log indicates that PG 9.2.3 begins recovery by pulling WAL files from the archive instead of pg_xlog: I can reproduce the behavior you report only if I remove the "backup_label" file from the restored data directory before I begin recovery. Of course, doing that renders the backup invalid, as without it recovery is very likely to begin from the wrong WAL recovery location. Yeah, if you use pg_start/stop_backup(), there definitely should be a backup_label present. But there is a point here, if you use an atomic filesystem snapshot instead of pg_start/stop_backup(), or just a plain copy of the data directory while the system is shut down. The problem in that case is that if pg_xlog is empty, we have no idea how far we need to recover until the system is consistent. Actually, if the system was shut down, then the system is consistent immediately and we could allow that, but the problem still remains for an online backup using an atomic filesystem snapshot. I don't think there's much we can do about that case. We could start up and recover all the WAL from the archive before we declare consistency, but that gets pretty complicated, and it would still not work if you tried to do that in a standby that uses streaming replication without a restore_command. So, I think what we need to do is to update the documentation to make it clear that you must not zap pg_xlog if you take a backup without pg_start/stop_backup(). The documentation that talks about filesystem snapshots and offline backups doesn't actually say that you can zap pg_xlog - that is only mentioned in the section on pg_start/stop_backup(). But perhaps that could be made more explicit. Or, must I now include pg_xlog files when taking base backups with 9.2.4, contrary to the documentation? You do not need to include pg_xlog, but you do need to include backup_label. And you always did need to include it--if you were not including it in the past, then you were playing with fire and is only due to luck that your database survived. Incidentally, I bumped into another custom backup script just a few weeks back that also excluded backup_label. I don't know what the author was thinking when he wrote that, but it seems to be a surprisingly common mistake. Maybe it's the "label" in the filename that makes people think it's not important. Perhaps we should improve the documentation to make it more explicit that backup_label must be included in the backup. The docs already say that, though, so I suspect that people making this mistake have not read the docs very carefully anyway. Perhaps a comment in the beginning of backup_label would help: # NOTE: This file MUST be included in the backup. Otherwise, the backup # is inconsistent, and restoring it may result in a corrupt database. Jeff B., assuming that you excluded backup_label from the backup for some reason, do you have any thoughts on what would've helped you to avoid that mistake? Would a comment like above have helped - did you look inside backup_label at any point? - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] pg_stat_statements doesn't work with --disable-spinlocks
pg_stat_statements (re-)initializes spinlocks as part of normal operation. With --disable-spinlock, spinlocks are implemented using semaphores, and semaphores are expected to be created at postmaster startup. Hence, you get an assertion failure: postgres=# select * from pg_stat_statements ; The connection to the server was lost. Attempting reset: Failed. !> TRAP: FailedAssertion("!(!IsUnderPostmaster)", File: "pg_sema.c", Line: 326) Even if that worked, re-initializing a spinlock with SpinLockInit, like pg_stat_statement does, would always allocate a new semaphore, so you would run out very quickly. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Re: BUG #7969: Postgres Recovery Fatal With: "incorrect local pin count:2"
On 27.03.2013 21:04, Heikki Linnakangas wrote: On 27.03.2013 20:27, Josh Berkus wrote: Folks, So I'm a bit surprised that this bug report hasn't gotten a follow-up. Does this sound like the known 9.2.2 corruption issue, or is it potentially something else? It seems like a new issue. At a quick glance, I think there's a bug in heap_xlog_update, ie. the redo routine of a heap update. If the new tuple is put on a different page, and at redo, the new page doesn't exist (that's normal if it was later vacuumed away), heap_xlog_update leaks a pin on the old page. Here: { nbuffer = XLogReadBuffer(xlrec->target.node, ItemPointerGetBlockNumber(&(xlrec->newtid)), false); if (!BufferIsValid(nbuffer)) return; page = (Page) BufferGetPage(nbuffer); if (XLByteLE(lsn, PageGetLSN(page))) /* changes are applied */ { UnlockReleaseBuffer(nbuffer); if (BufferIsValid(obuffer)) UnlockReleaseBuffer(obuffer); return; } } Notice how in the first 'return' above, obuffer is not released. I'll try to create a reproducible test case for this, and fix.. Ok, here's how to reproduce it: create table foo (i int4 primary key); insert into foo select generate_series(1,1000); checkpoint; -- update a tuple from the first page, new tuple goes to last page update foo set i = 1 where i = 1; -- delete everything on pages > 1 delete from foo where i > 10; -- truncate the table, including the page the updated tuple went to vacuum verbose foo; pg_ctl stop -m immediate This bug was introduced by commit 8805ff6580621d0daee350826de5211d6bb36ec3, in 9.2.2 (and 9.1.7 and 9.0.11), which fixed multiple WAL replay issues with Hot Standby. Before that commit, replaying a heap update didn't try to keep both buffers locked at the same time, which is necessary for the correctness of hot standby. The patch fixed that, but missed releasing the old buffer in this corner case. I was not able to come up with a scenario with full_page_writes=on where this would fail, but I'm also not 100% sure it can't happen. I scanned through the commit, and couldn't see any other instances of this kind of a bug. heap_xlog_update is more complicated than other redo functions, with all the return statements inside it. It could use some refactoring, but for now, I'll commit the attached small fix. - Heikki diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 595dead..860fd20 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -5367,7 +5367,11 @@ newt:; ItemPointerGetBlockNumber(&(xlrec->newtid)), false); if (!BufferIsValid(nbuffer)) + { + if (BufferIsValid(obuffer)) +UnlockReleaseBuffer(obuffer); return; + } page = (Page) BufferGetPage(nbuffer); if (XLByteLE(lsn, PageGetLSN(page))) /* changes are applied */ -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Re: BUG #7969: Postgres Recovery Fatal With: "incorrect local pin count:2"
On 27.03.2013 20:27, Josh Berkus wrote: Folks, So I'm a bit surprised that this bug report hasn't gotten a follow-up. Does this sound like the known 9.2.2 corruption issue, or is it potentially something else? It seems like a new issue. At a quick glance, I think there's a bug in heap_xlog_update, ie. the redo routine of a heap update. If the new tuple is put on a different page, and at redo, the new page doesn't exist (that's normal if it was later vacuumed away), heap_xlog_update leaks a pin on the old page. Here: { nbuffer = XLogReadBuffer(xlrec->target.node, ItemPointerGetBlockNumber(&(xlrec->newtid)), false); if (!BufferIsValid(nbuffer)) return; page = (Page) BufferGetPage(nbuffer); if (XLByteLE(lsn, PageGetLSN(page)))/* changes are applied */ { UnlockReleaseBuffer(nbuffer); if (BufferIsValid(obuffer)) UnlockReleaseBuffer(obuffer); return; } } Notice how in the first 'return' above, obuffer is not released. I'll try to create a reproducible test case for this, and fix.. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #8000: ExclusiveLock on a simple SELECT ?
On 27.03.2013 15:07, roberto.menon...@netspa.it wrote: after upgrading to version 9.2.3 we having a performance degradation. We are investigating the matter on several fronts. We've seen that Postgres (9.2.3) creates ExclusiveLock even with simple SELECT * From myschema.mytable. You mean like this: postgres=# create table mytable(i int4); CREATE TABLE postgres=# begin; BEGIN postgres=# select * from mytable; i --- (0 rows) postgres=# select * from pg_locks; locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | gran ted | fastpath +--+--+--+---++---+- +---+--++---+-+- +-- relation |12010 |11069 | | || | | | | 1/3| 19811 | AccessShareLock | t | t relation |12010 |16482 | | || | | | | 1/3| 19811 | AccessShareLock | t | t virtualxid | | | | | 1/3| | | | | 1/3| 19811 | ExclusiveLock | t | t (3 rows) That last ExclusiveLock is on the transactions virtual transactaction ID. Not on the table. There is no change from previous versions here. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7753: Cannot promote out of hot standby
(cleaning up my inbox..) Did you ever figure out this one? On 12.12.2012 23:36, dan...@heroku.com wrote: The following bug has been logged on the website: Bug reference: 7753 Logged by: Daniel Farina Email address: dan...@heroku.com PostgreSQL version: 9.1.6 Operating system: Ubuntu 10.04 Description: Touching a trigger file will not cause promotion out of hot standby. Basically, an apparently normally-working hot-standby database will not leave hot standby. The database emitting WAL is version 9.1.4. Everything appears normal in the log (downloads and restoring of archived segments), and the server seems to take no notice of the trigger file. To force the issue, I introduced an error into the configuration of the restoration program to cause it to exit. Normally that's no problem; postgres would just keep on trying to restore a segment over and over until the error is fixed. Instead, the server crashes: [413-1] [COPPER] LOG: restored log file "0001034D0050" from archive wal_e.worker.s3_worker INFO MSG: completed download and decompression#012DETAIL: Downloaded and decompressed "s3://archive-root/wal_005/0001034D0051.lzo" to "pg_xlog/RECOVERYXLOG" [414-1] [COPPER] LOG: restored log file "0001034D0051" from archive wal_e.worker.s3_worker INFO MSG: completed download and decompression#012DETAIL: Downloaded and decompressed "s3://archive-root/wal_005/0001034D0052.lzo" to "pg_xlog/RECOVERYXLOG" [415-1] [COPPER] LOG: restored log file "0001034D0052" from archive # I introduce the failure here wal_e.main ERRORMSG: no AWS_SECRET_ACCESS_KEY defined#012HINT: Define the environment variable AWS_SECRET_ACCESS_KEY. LOG: trigger file found: /etc/postgresql/wal-e.d/pull-env/STANDBY_OFF LOG: redo done at 34D/52248590 LOG: last completed transaction was at log time 2012-12-10 wal_e.main ERRORMSG: no AWS_SECRET_ACCESS_KEY defined#012HINT: Define the environment variable AWS_SECRET_ACCESS_KEY. PANIC: could not open file "pg_xlog/0001034D0052" (log file 845, segment 82): No such file or directory LOG: startup process (PID 7) was terminated by signal 6: Aborted LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. I can fix the configuration and restart the server, and everything is as fine as before. Next, I try removing recovery.conf and restarting the server as an alternative way of promoting...but, no avail; however, a slightly different error message: # Server begins starting LOG: loaded library "auto_explain" LOG: loaded library "pg_stat_statements" LOG: database system was interrupted while in recovery at log time 2012-12-10 15:20:03 UTC HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. LOG: could not open file "pg_xlog/0001034E001A" (log file 846, segment 26): No such file or directory LOG: invalid primary checkpoint record LOG: could not open file "pg_xlog/0001034D00F2" (log file 845, segment 242): No such file or directory LOG: invalid secondary checkpoint record PANIC: could not locate a valid checkpoint record LOG: startup process (PID 7) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure main process (24284) terminated with status 1 pg_control looks like this around the same time, for reference: pg_control version number:903 Catalog version number: 201105231 Database cluster state: in archive recovery pg_control last modified: Wed 12 Dec 2012 09:22:30 PM UTC Latest checkpoint location: 351/1FE194C0 Prior checkpoint location:351/FD64A78 Latest checkpoint's REDO location:351/131848C8 Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 0/652342033 Latest checkpoint's NextOID: 103224 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID:455900714 Latest checkpoint's oldestXID's DB: 16385 Latest checkpoint's oldestActiveXID: 652311442 Time of latest checkpoint:Mon 10 Dec 2012 07:19:23 PM UTC Minimum recovery ending location: 351/4BFFFE20 Backup start location:0/0 Current wal_level setting:hot_standby Current max_connections setting: 500 Current m
Re: [BUGS] BUG #7986: base backup copy all files from tablespace, not only needed data
On 23.03.2013 16:39, eshkin...@gmail.com wrote: The following bug has been logged on the website: Bug reference: 7986 Logged by: Sergey Burladyan Email address: eshkin...@gmail.com PostgreSQL version: 9.2.3 Operating system: Debian GNU/Linux 7.0 (wheezy) Description: I have one tablespace dir with multiple versions: $ ls -la ~/inst/var/l/ итого 16 drwx-- 4 seb seb 4096 Мар 23 18:26 . drwxr-xr-x 7 seb seb 4096 Мар 23 18:28 .. drwx-- 3 seb seb 4096 Мар 23 18:24 PG_9.2_201204301 drwx-- 3 seb seb 4096 Мар 23 18:26 PG_9.3_201303201 ../pg-dev-master/bin/pg_basebackup -Ft -D backup copy all subdirectorys from tablespace dir: $ tar -tf backup/16384.tar PG_9.2_201204301/ PG_9.2_201204301/12042/ PG_9.2_201204301/12042/16385 PG_9.3_201303201/ PG_9.3_201303201/12070/ PG_9.3_201303201/12070/16385 PG_9.3_201303201/12070/16390 PG_9.3_201303201/12070/16388 IMHO it must copy only self version dir PG_9.3_201303201 Agreed. Fixed, thanks for the report. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7970: How 's list all members in a special tablespace ( not as pg_global & pg_default )
On 19.03.2013 10:19, sunny1...@yahoo.com.vn wrote: I tried this, which will get the list of tables belong to 'XYZ' tablespace. CREATE TABLESPACE TBS1 OWNER access LOCATION '/u03/tbs'; --> OK Create table public.SinhVien ( MaSV text, TenSV text, Sodt int ) tablespace TBS1; --> OK Insert into public.SinhVien (MaSV,TenSV,Sodt) Values ('001','Nguyen Van',123456789),('002','Nguyen Ha',987654321); --> OK select relname from pg_class where reltablespace=(select oid from pg_tablespace where spcname='TBS1'); relname -0 (0 rows) why's return 0 rows ? i don't know You didn't quote the tablespace name in the CREATE TABLESPACE statement, so it's actually called tbs1, in lower case. Try: select relname from pg_class where reltablespace=(select oid from pg_tablespace where spcname='tbs1'); - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] 8.4: COPY continues after client disconnects
On 01.03.2013 17:15, Jon Nelson wrote: Using PostgreSQL 8.4.13 on ScientificLinux 6.3 (x86_64), I noticed that a pg_dump ran out of (local) disk space. However, the server was still using CPU and disk resources. An strace clearly showed this pattern: read() = 8192 sendto(...) = -1 EPIPE -- SIGPIPE (Broken pipe) @ The server does detect the broken pipe. It logs the following messages: 637 LOG: 08006: could not send data to client: Connection reset by peer 638 LOCATION: internal_flush, pqcomm.c:1108 639 STATEMENT: COPY ... to stdout; 640 LOG: 08006: could not send data to client: Broken pipe 641 LOCATION: internal_flush, pqcomm.c:1108 642 STATEMENT: COPY ... to stdout; This was fixed in version 9.2. Per release notes: Cancel the running query if the client gets disconnected (Florian Pflug) If the backend detects loss of client connection during a query, it will now cancel the query rather than attempting to finish it. COPY counts as a "running query". -- - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7883: "PANIC: WAL contains references to invalid pages" on replica recovery
On 19.02.2013 00:19, Maciek Sakrejda wrote: On Mon, Feb 18, 2013 at 12:57 AM, Heikki Linnakangas< hlinnakan...@vmware.com> wrote: On 16.02.2013 01:49, Daniel Farina wrote: I guess that means Ubuntu (and probably Debian?) libpq-dev breaks PG_VERSION_NUM for PGXS=1. That obviously needs to be fixed in debian. Meanwhile, Maciek, I'd suggest that you build PostgreSQL from sources, install it to some temporary location, and then build xlogdump against that. That worked, thanks. I have a working xlogdump. Any pointers as to what I should look for? This is the contents of the pg_xlog directory: total 49160 -rw--- 1 udrehggpif7kft postgres 16777216 Feb 15 00:00 0001003C0093 -rw--- 1 udrehggpif7kft postgres 16777216 Feb 15 00:47 0001003C0094 -rw--- 1 udrehggpif7kft postgres 16777216 Feb 15 00:49 0002003C0093 -rw--- 1 udrehggpif7kft postgres 56 Feb 15 00:49 0002.history drwx-- 2 udrehggpif7kft postgres 4096 Feb 15 00:49 archive_status I'd like to see the contents of the WAL, starting from the last checkpoint, up to the point where failover happened. In particular, any actions on the relation base/16385/16430, which caused the error. pg_controldata output on the base backup would also interesting, as well as the contents of backup_label file. How long did the standby run between the base backup and the failover? How many WAL segments? One more thing you could try to narrow down the error: restore from the base backup, and let it run up to the point of failover, but shut it down just before the failover with "pg_ctl stop -m fast". That should create a restartpoint, at the latest checkpoint record. Then restart, and perform failover. If it still throws the same error, we know that the WAL record that touched the page that doesn't exist was after the last checkpoint. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] pg_dumpall fails if a database name contains =
On 20.02.2013 16:19, Heikki Linnakangas wrote: ~/pgsql.92stable$ bin/createdb "foo=bar" ~/pgsql.92stable$ bin/pg_dumpall > /dev/null pg_dump: [archiver (db)] connection to database "(null)" failed: invalid connection option "foo" pg_dumpall: pg_dump failed on database "foo=bar", exiting There are two bugs above: 1. When pg_dumpall passes the database name to pg_dump as a command line argument, pg_dump interprets it as a connection string if it contains =. Fixed this by passing the database name to pg_dump as a connection string. That way pg_dump doesn't interpret the database name. So now pg_dumpall calls pg_dump like this: pg_dump ... "dbname='foo'" instead of just pg_dump ... "foo" 2. When you pass an invalid connection string to pg_dump, it passes a NULL pointer to printf when constructing the error message. It shows as "(null)" above, but would segfault on other platforms. Fixed by printing an empty string instead of passing NULL to fprintf. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] pg_dumpall fails if a database name contains =
~/pgsql.92stable$ bin/createdb "foo=bar" ~/pgsql.92stable$ bin/pg_dumpall > /dev/null pg_dump: [archiver (db)] connection to database "(null)" failed: invalid connection option "foo" pg_dumpall: pg_dump failed on database "foo=bar", exiting There are two bugs above: 1. When pg_dumpall passes the database name to pg_dump as a command line argument, pg_dump interprets it as a connection string if it contains =. 2. When you pass an invalid connection string to pg_dump, it passes a NULL pointer to printf when constructing the error message. It shows as "(null)" above, but would segfault on other platforms. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7890: wrong behaviour using pg_rotate_logfile() with parameter log_truncate_on_rotation = on
On 19.02.2013 14:31, Rafael Martinez wrote: In the way pg_rotate_logfile() and log_truncate_on_rotation = on work today, we have to stop postgres to truncate the log file if an unexpected situation happens and this is not always possible in a production system. If we need to run pg_rotate_logfile() manually in the middle of the month and we don't want to lose the data in the file that is going to be truncated, we will have to take a manual copy of it before running pg_rotate_logfile(). You can rm the log file, then do pg_rotate_logfile(). No need to stop the system. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7883: "PANIC: WAL contains references to invalid pages" on replica recovery
On 16.02.2013 01:49, Daniel Farina wrote: I guess that means Ubuntu (and probably Debian?) libpq-dev breaks PG_VERSION_NUM for PGXS=1. That obviously needs to be fixed in debian. Meanwhile, Maciek, I'd suggest that you build PostgreSQL from sources, install it to some temporary location, and then build xlogdump against that. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7883: "PANIC: WAL contains references to invalid pages" on replica recovery
On 15.02.2013 03:49, mac...@heroku.com wrote: The following bug has been logged on the website: Bug reference: 7883 Logged by: Maciek Sakrejda Email address: mac...@heroku.com PostgreSQL version: 9.1.8 Operating system: Ubuntu 12.04 64-bit Description: We ran into a customer database giving us the error above when replicating from 9.1.7 to 9.1.8 and attempting to fail over to the 9.1.8. I noticed several fixes to WAL replay in 9.1.8--could this be a factor in this case? ... Feb 15 00:49:16 [1305-1] [COPPER] LOG: archive recovery complete Feb 15 00:49:16 [1306-1] [COPPER] WARNING: page 37956 of relation base/16385/16430 was uninitialized Feb 15 00:49:16 [1307-1] [COPPER] PANIC: WAL contains references to invalid pages Hmm, that sure looks like the same issue Kyotaro HORIGUCHI reported (http://www.postgresql.org/message-id/20121206.130458.170549097.horiguchi.kyot...@lab.ntt.co.jp), but that was fixed in 9.1.8. Maybe there's some corner case where it's still not working. Did you keep a copy of the WAL involved? Any chance of running xlogdump on it, and posting the results, or just sending over the WAL files so I could take a look? - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7865: Unexpected error code on insert of duplicate to composite primary key
On 11.02.2013 17:34, Matti Aarnio wrote: } catch (SQLException e) { int code = e.getErrorCode(); if (code == 2 // Derby || code == 23505) {// PostgreSQL, Oracle, ... System.out.println("Expected SQL duplicate insert indication status code: "+code) } else { System.out.println("Insert into example at "+this.jdbcUrl+ " resulted unexpected SQL Exception code: "+ code + " " + e.getMessage()); } Hmm, looking at the PSQLException source code, I don't think the driver has ever set the vendor-specific error code that getErrorCode() returns. I tested the snippet you posted with server 8,4 and 9.2, and with jdbc driver 8.4 and 8.2, and saw no difference; getErrorCode() always returned 0. You should be using getSQLState() instead. The "23505" sqlstate is defined by the SQL standard, so if the other DBMS' you're supporting follow the spec on that, you won't even need any vendor-specific code there. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7865: Unexpected error code on insert of duplicate to composite primary key
On 09.02.2013 22:25, matti.aar...@methics.fi wrote: CREATE TABLE example ( a TIMESTAMPNOT NULL, b VARCHAR(256) NOT NULL, c VARCHAR(256) NOT NULL, PRIMARY KEY(a,b,c) ); Inserting a duplicate record on this is returning an SQL Error, but the status code is 0 instead of expected 23505. This used to work fine in 8.x series, but is now causing trouble in 9.1.7, and 9.2.3. Works for me: postgres=# do $$ begin insert into example values ('2001-01-01', 'foo', 'bar'); insert into example values ('2001-01-01', 'foo', 'bar'); exception when others then raise notice 'caught %', sqlstate; end; $$; NOTICE: caught 23505 DO How exactly are you seeing the wrong status code? What client are you using? - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7820: Extension uuid-ossp cannot be installed on Windows - getting syntax error
On 22.01.2013 15:31, jan-peter.seif...@gmx.de wrote: The statement: 'CREATE EXTENSION uuid-ossp' just gives me a syntax error: ERROR: syntax error at or near "-" LINE 1: CREATE EXTENSION uuid-ossp ^ ** Fehler ** ERROR: syntax error at or near "-" SQL Status:42601 Zeichen:22 Obviously "CREATE EXTENSION" expects underscores instead of hyphens. Try: CREATE EXTENSION "uuid-ossp" - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7811: strlen(NULL) cause psql crash
On 15.01.2013 21:13, Tom Lane wrote: Heikki Linnakangas writes: On 15.01.2013 20:29, Tom Lane wrote: But you already introduced "none" as a stand-alone (and probably almost untranslatable without context) string. That's better? I figured it would be. One little untranslated string in parens, versus the whole is untranslated. I'm happy to change it if you feel otherwise, though, I don't feel strongly about it myself. Well, I shouldn't be opining too strongly on translatability issues. Other people would have much-better-qualified opinions as to how well it'll read if "none" has to be translated by itself. But as to the behavior when the new message hasn't been translated yet: the only case where anyone would see the untranslated message is if they were in fact not connected, which we know is a seldom-exercised corner case (else this bug would've been noticed long ago). So it might not be worth arguing about. But ISTM that somebody whose English was weak might not grasp that "none" (untranslated) wasn't meant to be a name. Hmm, I wonder if an empty string would be better? It'd look a bit odd, but at least it would not mislead you to think you're connected to a database called "none". - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7811: strlen(NULL) cause psql crash
On 15.01.2013 20:29, Tom Lane wrote: Heikki Linnakangas writes: On 15.01.2013 20:10, Tom Lane wrote: I think that patch could use more thought. As is, it will print connect to new database (currently "none") which to me is claiming that we are connected to a database whose name is "none". The quotes should not be used in this situation, and in fact it would be better for translatability if the whole message text were specialized to this case. I'd like to see it reading more like connect to new database (currently no connection) Hmm, that'd introduce a new quite visible string into back-branches that needs to be translated, which I think we try to avoid. But you already introduced "none" as a stand-alone (and probably almost untranslatable without context) string. That's better? I figured it would be. One little untranslated string in parens, versus the whole is untranslated. I'm happy to change it if you feel otherwise, though, I don't feel strongly about it myself. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7811: strlen(NULL) cause psql crash
On 15.01.2013 20:10, Tom Lane wrote: Heikki Linnakangas writes: I committed a local fix to help.c to print "none" as the database name when not connected. I think that patch could use more thought. As is, it will print connect to new database (currently "none") which to me is claiming that we are connected to a database whose name is "none". The quotes should not be used in this situation, and in fact it would be better for translatability if the whole message text were specialized to this case. I'd like to see it reading more like connect to new database (currently no connection) Hmm, that'd introduce a new quite visible string into back-branches that needs to be translated, which I think we try to avoid. I'll change that in master, but back-branches? - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7811: strlen(NULL) cause psql crash
On 15.01.2013 16:18, 1584171...@qq.com wrote: I give you a description about how to trigger this bug first: (1) start the server with the command "postgres -D pgdata" (2) start the client with the command "psql" (3) close the server (4) execute a query from the client "slect *from t; ". At this time, the client detected that it lost the connection with the server. (5) execute the following command from the client "\?", then the client will crash. I have found the reason which caused that. (1) When the client execute "slect *from t; ", it execute the function "ResetCancelConn()" at line 364 in src\bin\psql\common.c ,and the function set pset.db to NULL. (2) When the client execute "\?", it execute the function fprintf at line 254 in help.c. The value returned by PQdb(pset.db) is an argument of fprintf, and at this time PQdb returned NULL. (3) This NULL was finally passed to strlen at line 779 in snprintf.c through several simple fuction calls, so psql crashed. Thanks for the report and debugging! I hava fixed the bug in the following way which may be not the best: (1) add a string named strofnull, and in the function "dopr" in file src\port\snprintf.c char *strofnull="(null)"; (2) add an if statment before calling fmtstr at about line 720 in file src\port\snprintf.c if (strvalue==NULL) { strvalue=strofnull; } That'd change the behavior of all sprintf calls, not sure we want to go there. Might not be a bad idea to avoid crashes if there are more bugs like this, but one really should not pass NULL to sprintf to begin with. I committed a local fix to help.c to print "none" as the database name when not connected. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7803: Replication Problem(no master is there)
On 15.01.2013 10:14, Simon Riggs wrote: On 15 January 2013 05:12, Tomonari Katsumata wrote: We added a REPLICATION privelge onto user accounts to control access. Perhaps we should add a CASCADE privilege as well, so that we can control whether we can connect to a master and/or a standby. Syntax would be ALTER USER foo [MASTER | CASCADE] REPLICATION REPLICATION allows both master and cascaded replication (same as now) MASTER REPLICATION allows master only CASCADE REPLICATION allows cascaded replication only NOREPLICATION allows neither option Someone is working for it already ? If not yet, may I try to implement it ? Please do. It looks fairly short. To me, permissions doesn't feel like the right vehicle for controlling this. Not sure what to suggest instead, a new GUC perhaps. BTW, is there any reason to not allow streaming replication when hot_standby=off? A streaming replication connection doesn't execute any queries, so it doesn't need the system to be consistent. Another thing to consider is that "pg_basebackup -X stream" also uses streaming replication, so if you forbid cascade replication, you also forbid using "pg_basebackup -X stream" on the standby. At the protocol level, pg_basebackup streams the WAL just like a standby server does, so we cannot distinguish those two cases in the server. The client could tell the server which one it is, but using permissions to allow/forbid based on that would make no sense as the client could lie which one it is. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7807: "sign" function returns double
On 14.01.2013 16:21, a_ra...@yahoo.com wrote: In the documentation (http://www.postgresql.org/docs/9.1/static/functions-math.html) it is written that the return type is the same as the input, but a query like 'select sign(1::int)' returns a double It says that the input type is "dp or numeric", which means "double or numeric". So there are two overloaded functions called sign, sign(double), and sign(numeric). When the doc says that the return type is the same as input, it means that the return type is double, if you call sign(double), and numeric if you call sign(numeric). There is no sign(int) function, so the input "1::int" is cast to double, and the return type is also a double. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7805: Inconsistency, multiple versions
On 14.01.2013 11:37, geryd...@gmail.com wrote: The following bug has been logged on the website: Bug reference: 7805 Logged by: Gery Deft Email address: geryd...@gmail.com PostgreSQL version: 9.2.2 Operating system: Windows Server, Windows 7,Ubuntu linux Description: Hi, Versions: 9.2.*,8.4.* Our project uses PostgreSQL as main database system, but recently we found that PG have serious inconsistency problems. We have about 60 connections to the database, reading and writing it most of the time. It does not matter if we use inline SQL, stored procedures or pg_advisory_lock, after a few hundreds of thousand records the database will contain duplication in a key field. Example: CREATE TABLE ( id serial NOT NULL, name character varying(100), ... CONSTRAINT domain_pkey PRIMARY KEY (id), CONSTRAINT domain_name_key UNIQUE (name) - SELECT id, name,length(name) FROM where name like 'samedata%'; 3161132;"samedata";8 7821530;"samedata";8 1962653;"samedata";8 I made ASCII comparison and they are identical. It's hard to believe that there would be any fundamental bug like that in constraint checking, or we would've gotten plenty of reports of it before. There must be something peculiar about your environment, but without more details, it's impossible to guess what it might be. If you can put together a smaller test case to reproduce the issue and post that, that would make it possible for someone to diagnose the problem. If that's not feasible, please explain in more detail what does the table definition look like exactly, and how you're loading the data. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7803: Replication Problem(no master is there)
On 11.01.2013 11:19, Simon Riggs wrote: On 11 January 2013 08:40, Heikki Linnakangas wrote: This makes me wonder if there should be a GUC to forbid cascading replication, though. If you don't want to do cascading replication (which is quite rare, I'd say), you could just disable it to avoid a situation like this. Connection from a standby is disabled by default. Don't enable it... It's controlled by hot_standby=on/off. You might well want to enable hot standby, to run queries on a standby, but disable cascading replication. You can also forbid replication connections using pg_hba.conf, but then you need to modify it to allow connections again after failover, when the standby becomes master. That's doable, but inconvenient. (just thinking out loud here..) If we were to have such a switch, it'd be nice to still allow base backups from the standby. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7803: Replication Problem(no master is there)
On 11.01.2013 06:09, katsumata.tomon...@po.ntts.co.jp wrote: The following bug has been logged on the website: Bug reference: 7803 Logged by: Tomonari Katsumata Email address: katsumata.tomon...@po.ntts.co.jp PostgreSQL version: 9.2.2 Operating system: RHEL 5.3 x86_64 Description: hi, I'm playing with Synchronous Replication on PostgreSQL 9.2.2. And I saw a strange behavior. Unless you left out something, the configuration you described actually sets up asynchronous replication. = [issues] two standbys are connected on each other, but no master is there. ... = I did not see the situation like above on PostgreSQL 9.1.7. Is this intended change? In 9.1, this scenario was impossible because you could not connect a standby to another standby. In 9.2, that's allowed. It's a new feature called "cascading replication", see http://www.postgresql.org/docs/9.2/static/warm-standby.html#CASCADING-REPLICATION. With that feature, it's indeed possible to form a cycle of standby servers connected to each other. There was just a long discussion on pgsql-hackers on whether we should try to detect that scenario [1], but the consensus seems to be that we should not. It would be difficult to implement such detection, and sometimes it's useful to have such a cycle, as a transient state at a failover, for example. So the bottom line is that this is an intended change, and the admin will just have to avoid doing that. This makes me wonder if there should be a GUC to forbid cascading replication, though. If you don't want to do cascading replication (which is quite rare, I'd say), you could just disable it to avoid a situation like this. [1] http://archives.postgresql.org/pgsql-hackers/2012-12/msg01134.php - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] WAL Receiver Segmentation Fault
On 28.12.2012 20:55, Phil Sorber wrote: Postgres 9.0.11 running as a hot standby. The master was restarted and the standby went into a segmentation fault loop. A hard stop/start fixed it. Here are pertinent logs with excess and identifying information removed: ... If there is any more info I can provide, let me know. This is a production DB so I won't be able to do any disruptive testing. Based on what I have seen so far, I think this would be difficult to replicate anyway. A stack trace would be nice. If you didn't get a core dump this time, it would be good to configure the system so that you get one next time it happens. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] pg_basebackup fails if a data file is removed
On 21.12.2012 15:30, Magnus Hagander wrote: On Fri, Dec 21, 2012 at 2:28 PM, Heikki Linnakangas wrote: When pg_basebackup copies data files, it does basically this: if (lstat(pathbuf,&statbuf) != 0) { if (errno != ENOENT) ereport(ERROR, (errcode_for_file_access(), errmsg("could not stat file or directory \"%s\": %m", pathbuf))); /* If the file went away while scanning, it's no error. */ continue; } ... sendFile(pathbuf, pathbuf + basepathlen + 1,&statbuf); There's a race condition there. If the file is removed after the lstat call, and before sendFile opens the file, the backup fails with an error. It's a fairly tight window, so it's difficult to run into by accident, but by putting a breakpoint with a debugger there it's quite easy to reproduce, by e.g doing a VACUUM FULL on the table about to be copied. A straightforward fix is to allow sendFile() to ignore ENOENT. Patch attached. Looks good to me. Ok, committed. Nice spot - don't tell me you actually ran into it during testing? :) Heh, no, eyeballing the code. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] pg_basebackup fails if a data file is removed
When pg_basebackup copies data files, it does basically this: if (lstat(pathbuf, &statbuf) != 0) { if (errno != ENOENT) ereport(ERROR, (errcode_for_file_access(), errmsg("could not stat file or directory \"%s\": %m", pathbuf))); /* If the file went away while scanning, it's no error. */ continue; } > ... > sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf); There's a race condition there. If the file is removed after the lstat call, and before sendFile opens the file, the backup fails with an error. It's a fairly tight window, so it's difficult to run into by accident, but by putting a breakpoint with a debugger there it's quite easy to reproduce, by e.g doing a VACUUM FULL on the table about to be copied. A straightforward fix is to allow sendFile() to ignore ENOENT. Patch attached. - Heikki diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 1b234c6..bc95215 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -44,8 +44,8 @@ typedef struct static int64 sendDir(char *path, int basepathlen, bool sizeonly); -static void sendFile(char *readfilename, char *tarfilename, - struct stat * statbuf); +static bool sendFile(char *readfilename, char *tarfilename, + struct stat * statbuf, bool missing_ok); static void sendFileWithContent(const char *filename, const char *content); static void _tarWriteHeader(const char *filename, const char *linktarget, struct stat * statbuf); @@ -194,7 +194,7 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir) XLOG_CONTROL_FILE))); } -sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf); +sendFile(XLOG_CONTROL_FILE, XLOG_CONTROL_FILE, &statbuf, false); } /* @@ -715,11 +715,18 @@ sendDir(char *path, int basepathlen, bool sizeonly) } else if (S_ISREG(statbuf.st_mode)) { - /* Add size, rounded up to 512byte block */ - size += ((statbuf.st_size + 511) & ~511); + bool sent = false; + if (!sizeonly) -sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf); - size += 512; /* Size of the header of the file */ +sent = sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf, +true); + + if (sent || sizeonly) + { +/* Add size, rounded up to 512byte block */ +size += ((statbuf.st_size + 511) & ~511); +size += 512; /* Size of the header of the file */ + } } else ereport(WARNING, @@ -779,9 +786,17 @@ _tarChecksum(char *header) return sum; } -/* Given the member, write the TAR header & send the file */ -static void -sendFile(char *readfilename, char *tarfilename, struct stat * statbuf) +/* + * Given the member, write the TAR header & send the file. + * + * If 'missing_ok' is true, will not throw an error if the file is not found. + * + * Returns true if the file was successfully sent, false if 'missing_ok', + * and the file did not exist. + */ +static bool +sendFile(char *readfilename, char *tarfilename, struct stat *statbuf, + bool missing_ok) { FILE *fp; char buf[TAR_SEND_SIZE]; @@ -791,9 +806,13 @@ sendFile(char *readfilename, char *tarfilename, struct stat * statbuf) fp = AllocateFile(readfilename, "rb"); if (fp == NULL) + { + if (errno == ENOENT && missing_ok) + return false; ereport(ERROR, (errcode_for_file_access(), errmsg("could not open file \"%s\": %m", readfilename))); + } /* * Some compilers will throw a warning knowing this test can never be true @@ -847,6 +866,8 @@ sendFile(char *readfilename, char *tarfilename, struct stat * statbuf) } FreeFile(fp); + + return true; } -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] Bug with temporary child of a permanent table after recovery
Spotted by accident while working on a patch: Open psql and do: CREATE TABLE uctest(f1 int, f2 text); -- Create a temporary child of the permanent table CREATE TEMP TABLE ucchild () inherits (uctest); In another terminal: pg_ctl stop -m immediate pg_ctl start psql (9.3devel) Type "help" for help. postgres=# SELECT * FROM uctest; ERROR: could not open file "base/12030/t2_16392": No such file or directory This goes back to 9.1. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PITR potentially broken in 9.2
On 28.11.2012 15:47, Andres Freund wrote: I mean the label read by read_backup_label(). Jeff's mail indicated it had CHECKPOINT_LOCATION at 1/188D8120 but redo started at 1/CD89E48. That's correct. The checkpoint was at 1/188D8120, but it's redo pointer was earlier, at 1/CD89E48, so that's where redo had to start. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PITR potentially broken in 9.2
On 28.11.2012 15:26, Andres Freund wrote: Hm. Are you sure its actually reading your backup file? Its hard to say without DEBUG1 output but I would tentatively say its not reading it at all because the the "redo starts at ..." message indicates its not using the checkpoint location from the backup file. By backup file, you mean the backup history file? Since 9.0, recovery does not read the backup history file, it's for informational/debugging purposes only. All the information recovery needs is in the backup_label, and an end-of-backup WAL record marks the location where pg_stop_backup() was called, ie. how far the WAL must be replayed for the backup to be consistent. Can you reproduce the issue? If so, can you give an exact guide? If not, do you still have the datadir et al. from above? I just committed a fix for this, but if you can, it would still be nice if you could double-check that it now really works. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] PITR potentially broken in 9.2
On 28.11.2012 06:27, Noah Misch wrote: On Tue, Nov 27, 2012 at 10:08:12AM -0800, Jeff Janes wrote: Doing PITR in 9.2.1, the system claims that it reached a consistent recovery state immediately after redo starts. This leads to it various mysterious failures, when it should instead throw a "requested recovery stop point is before consistent recovery point" error. (If you are unlucky, I think it might even silently start up in a corrupt state.) I observed a similar problem with 9.2. Despite a restore_command that failed every time, startup from a hot backup completed. At the time, I suspected a mistake on my part. I believe this was caused by this little typo/thinko: --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6763,7 +6763,7 @@ StartupXLOG(void) /* Pop the error context stack */ error_context_stack = errcontext.previous; - if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint) && + if (!XLogRecPtrIsInvalid(ControlFile->backupEndPoint) && XLByteLE(ControlFile->backupEndPoint, EndRecPtr)) { /* Normally, backupEndPoint is invalid, and we rely on seeing an end-of-backup WAL record to mark the location. backupEndPoint is only set when restoring from a backup that was taken from a standby, but thanks to the above, recovery incorrectly treats that as end-of-backup. Fixed, thanks for the report! - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [GENERAL] [BUGS] Prepared Statement Name Truncation
On 23.11.2012 17:53, Tom Lane wrote: Euler Taveira writes: On 22-11-2012 04:27, Pavel Stehule wrote: significantly larger catalog Less than 5% of catalog columns? I don't buy your argument. It's not about count, it's about size. For instance, pg_attribute currently requires 140 bytes per row (counting the tuple header and line pointer), so adding 64 bytes would represent 45% bloat. In a database with lots of tables that would be painful. We could avoid this problem if we were prepared to make type "name" be varlena, ... It would actually be nice to do that because it would *reduce* the amount of space and memory used for the catalogs in the typical case, where the attribute names are much smaller than 64 bytes. I received a complaint just the other day that our backend processes consume a lot of memory, even when idle; the catalog caches are a large part of that. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Fwd: race in pg_ctl start -w
On 18.10.2012 22:15, Alvaro Herrera wrote: Dave Vitek wrote: Heikki, It's happy about the overruns. It did flag an issue where the file descriptor can leak when the various early returns get taken. This is a common problem with static analysers; they don't realise we don't care about the leaked resource because the program is shortly going to terminate anyway. We (used to?) have plenty of false positives in initdb as reported in the Coverity scanner, for example. Actually, this was a real leak. I should've put close() calls to the cases where the file is empty, or fstat() fails. It doesn't matter when the function is called only once, but in the "pg_ctl start -w" case it's called repeatedly to poll for postmaster startup. Fixed, and I also changed it to not return the last line if it doesn't end in a newline, per Tom's suggestion. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On 15.10.2012 19:31, Fujii Masao wrote: On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas wrote: On 15.10.2012 13:13, Heikki Linnakangas wrote: Oh, I didn't remember that we've documented the specific structs that we pass around. It's quite bogus anyway to explain the messages the way we do currently, as they are actually dependent on the underlying architecture's endianess and padding. I think we should refactor the protocol to not transmit raw structs, but use pq_sentint and friends to construct the messages. This was discussed earlier (see http://archives.postgresql.org/message-id/4fe2279c.2070...@enterprisedb.com), I think there's consensus that 9.3 would be a good time to do that as we changed the XLogRecPtr format anyway. This is what I came up with. The replication protocol is now architecture-independent. The WAL format itself is still architecture-independent, of course, but this is useful if you want to e.g use pg_receivexlog to back up a server that runs on a different platform. I chose the int64 format to transmit timestamps, even when compiled with --disable-integer-datetimes. Please review if you have the time.. Thanks for the patch! When I ran pg_receivexlog, I encountered the following error. Yeah, clearly I didn't test this near enough... I fixed the bugs you bumped into, new version attached. + hdrlen = sizeof(int64) + sizeof(int64) + sizeof(int64); + hdrlen = sizeof(int64) + sizeof(int64) + sizeof(char); These should be macro, to avoid calculation overhead? The compiler will calculate this at compilation time, it's going to be a constant at runtime. - Heikki diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 3d72a16..5a32517 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1366,7 +1366,8 @@ The commands accepted in walsender mode are: WAL data is sent as a series of CopyData messages. (This allows other information to be intermixed; in particular the server can send an ErrorResponse message if it encounters a failure after beginning - to stream.) The payload in each CopyData message follows this format: + to stream.) The payload of each CopyData message from server to the + client contains a message of one of the following formats: @@ -1390,34 +1391,32 @@ The commands accepted in walsender mode are: - Byte8 + Int64 - The starting point of the WAL data in this message, given in - XLogRecPtr format. + The starting point of the WAL data in this message. - Byte8 + Int64 - The current end of WAL on the server, given in - XLogRecPtr format. + The current end of WAL on the server. - Byte8 + Int64 - The server's system clock at the time of transmission, - given in TimestampTz format. + The server's system clock at the time of transmission, as + microseconds since midnight on 2000-01-01. @@ -1445,25 +1444,12 @@ The commands accepted in walsender mode are: continuation records can be sent in different CopyData messages. - Note that all fields within the WAL data and the above-described header - will be in the sending server's native format. Endianness, and the - format for the timestamp, are unpredictable unless the receiver has - verified that the sender's system identifier matches its own - pg_control contents. - - If the WAL sender process is terminated normally (during postmaster shutdown), it will send a CommandComplete message before exiting. This might not happen during an abnormal shutdown, of course. - The receiving process can send replies back to the sender at any time, - using one of the following message formats (also in the payload of a - CopyData message): - - - @@ -1495,12 +1481,23 @@ The commands accepted in walsender mode are: - Byte8 + Int64 - The server's system clock at the time of transmission, - given in TimestampTz format. + The server's system clock at the time of transmission, as + microseconds since midnight on 2000-01-01. + + + + + + Byte1 + + + + 1 means that the client should reply to this message as soon as + possible, to avoid a timeout disconnect. 0 otherwise. @@ -1512,6 +1509,12 @@ The commands accepted in walsender
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On 15.10.2012 13:13, Heikki Linnakangas wrote: On 13.10.2012 19:35, Fujii Masao wrote: ISTM you need to update the protocol.sgml because you added the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage. Oh, I didn't remember that we've documented the specific structs that we pass around. It's quite bogus anyway to explain the messages the way we do currently, as they are actually dependent on the underlying architecture's endianess and padding. I think we should refactor the protocol to not transmit raw structs, but use pq_sentint and friends to construct the messages. This was discussed earlier (see http://archives.postgresql.org/message-id/4fe2279c.2070...@enterprisedb.com), I think there's consensus that 9.3 would be a good time to do that as we changed the XLogRecPtr format anyway. This is what I came up with. The replication protocol is now architecture-independent. The WAL format itself is still architecture-independent, of course, but this is useful if you want to e.g use pg_receivexlog to back up a server that runs on a different platform. I chose the int64 format to transmit timestamps, even when compiled with --disable-integer-datetimes. Please review if you have the time.. - Heikki diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 3d72a16..5a32517 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1366,7 +1366,8 @@ The commands accepted in walsender mode are: WAL data is sent as a series of CopyData messages. (This allows other information to be intermixed; in particular the server can send an ErrorResponse message if it encounters a failure after beginning - to stream.) The payload in each CopyData message follows this format: + to stream.) The payload of each CopyData message from server to the + client contains a message of one of the following formats: @@ -1390,34 +1391,32 @@ The commands accepted in walsender mode are: - Byte8 + Int64 - The starting point of the WAL data in this message, given in - XLogRecPtr format. + The starting point of the WAL data in this message. - Byte8 + Int64 - The current end of WAL on the server, given in - XLogRecPtr format. + The current end of WAL on the server. - Byte8 + Int64 - The server's system clock at the time of transmission, - given in TimestampTz format. + The server's system clock at the time of transmission, as + microseconds since midnight on 2000-01-01. @@ -1445,25 +1444,12 @@ The commands accepted in walsender mode are: continuation records can be sent in different CopyData messages. - Note that all fields within the WAL data and the above-described header - will be in the sending server's native format. Endianness, and the - format for the timestamp, are unpredictable unless the receiver has - verified that the sender's system identifier matches its own - pg_control contents. - - If the WAL sender process is terminated normally (during postmaster shutdown), it will send a CommandComplete message before exiting. This might not happen during an abnormal shutdown, of course. - The receiving process can send replies back to the sender at any time, - using one of the following message formats (also in the payload of a - CopyData message): - - - @@ -1495,12 +1481,23 @@ The commands accepted in walsender mode are: - Byte8 + Int64 - The server's system clock at the time of transmission, - given in TimestampTz format. + The server's system clock at the time of transmission, as + microseconds since midnight on 2000-01-01. + + + + + + Byte1 + + + + 1 means that the client should reply to this message as soon as + possible, to avoid a timeout disconnect. 0 otherwise. @@ -1512,6 +1509,12 @@ The commands accepted in walsender mode are: + The receiving process can send replies back to the sender at any time, + using one of the following message formats (also in the payload of a + CopyData message): + + + @@ -1532,45 +1535,56 @@ The commands accepted in walsender mode are: - Byte8 + Int64 The location of the last WAL byte + 1 rece
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On 13.10.2012 19:35, Fujii Masao wrote: On Thu, Oct 11, 2012 at 11:52 PM, Heikki Linnakangas wrote: Ok, thanks. Committed. I found one typo. The attached patch fixes that typo. Thanks, fixed. ISTM you need to update the protocol.sgml because you added the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage. Oh, I didn't remember that we've documented the specific structs that we pass around. It's quite bogus anyway to explain the messages the way we do currently, as they are actually dependent on the underlying architecture's endianess and padding. I think we should refactor the protocol to not transmit raw structs, but use pq_sentint and friends to construct the messages. This was discussed earlier (see http://archives.postgresql.org/message-id/4fe2279c.2070...@enterprisedb.com), I think there's consensus that 9.3 would be a good time to do that as we changed the XLogRecPtr format anyway. I'll look into doing that.. Is it worth adding the same mechanism (send back the reply immediately if walsender request a reply) into pg_basebackup and pg_receivexlog? Good catch. Yes, they should be taught about this too. I'll look into doing that too. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Fwd: race in pg_ctl start -w
On 11.10.2012 22:36, Tom Lane wrote: Heikki Linnakangas writes: Hmm, starting with 9.3, postmaster can not only create and append to the end of file, it can also inject a line in the middle, shifting the following lines forwards. In theory, if a new line is injected into the middle of the file between fgets() calls, readfile() could read part of the same line twice. Not sure what consequences that could have; pg_ctl might try to connect to wrong address or socket directory. Hm. IIRC, the postmaster is careful to write the whole thing in a single write() call, which in principle is atomic. Perhaps you're right that we'd better have pg_ctl read it in a single read() to ensure that it sees a consistent file state. Otherwise we're making assumptions about what sort of buffering underlies the stdio functions. Ok, changed it to slurp the whole file to memory with one read() call. Dave, did this silence the static analysis tool you used? - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Fwd: race in pg_ctl start -w
On 11.10.2012 20:29, Tom Lane wrote: Heikki Linnakangas writes: A straightforward fix would be to just allocate one large-enough buffer to begin with, e.g 8k, and read the whole file in one go. I'll write up a patch for that. This makes the readfile function very usage-specific though. The fix I was thinking about was to modify the second loop to force it to fall out once the predetermined number of lines had been read. Or maybe we should use just one loop with realloc, instead of reading the file twice. Hmm, starting with 9.3, postmaster can not only create and append to the end of file, it can also inject a line in the middle, shifting the following lines forwards. In theory, if a new line is injected into the middle of the file between fgets() calls, readfile() could read part of the same line twice. Not sure what consequences that could have; pg_ctl might try to connect to wrong address or socket directory. Although in practice, fgets() is buffered, and the buffer is probably large enough to hold the whole file, so it probably gets slurped into memory as one unit anyway. Then again, I don't think read/write on a file is guaranteed to be atomic either, so I guess there's always the theoretical possibility of a partial read. This makes me a bit uncomfortable with the 9.3 change that postmaster.pid file is no longer strictly append-only (commit c9b0cbe9). Could we delay appending the socket directory and listen address information to the file until we know both, and then append both in one call after that? Gah, how can a trivial thing like this be so complicated.. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] Fwd: race in pg_ctl start -w
Forwarding this to pgsql-bugs, since this isn't a security issue, as pg_ctl can only be called an admin. My replies inline. Original Message Subject: [pgsql-security] race in pg_ctl start -w Date: Thu, 11 Oct 2012 12:39:02 -0400 From: Dave Vitek To: Hi, I don't really think this is a security issue since pg_ctl isn't really an appealing target, but in theory you could hijack the process if you got very very lucky. Feel free to forward this to the bugs list if you decide it isn't sensitive. We recently ran into a crash when using pg_ctl to start the postgres daemon. The command was: pg_ctl start -w -t 600 -D It was a segv inside the body of malloc and didn't want to repro. The good news is, we develop a static analysis tool called CodeSonar for detecting bugs like this. It has been flagging the problem since 2009, but we've generally been ignoring reports in third-party code (there are a lot of them). We analyze postgres on a regular basis because it is used by CS's web UI. So, the problem is that there's a race condition in the readfile() function that pg_ctl uses to read postmaster.pid. It does essentially this: 1. open postmaster.pid 2. Read through the file one byte at a time, and count newlines. 3. Allocate buffers to hold that many lines. 4. Rewind and read the file again, this time copying the lines to the allocated buffers The race condition is that if another process (postmaster) changes the file between steps 2 and 4, so that there are now more lines, reading the file again can overrun the buffers allocated. In particular that can happen at postmaster startup, when postmaster initially creates the file empty, and then does a write() to fill it in. It seems very unlucky if you actually hit that race condition, but I guess it's not impossible. A straightforward fix would be to just allocate one large-enough buffer to begin with, e.g 8k, and read the whole file in one go. I'll write up a patch for that. It also flagged a very similar issue (looks like the code was copied & pasted) in initdb.c. I haven't looked into whether that one is as likely to be subject to a race as the pg_ctl one, but it could be problematic as well. I don't think it's a problem for initdb, because the files that it reads should not change on the fly. Nevertheless, I guess we might as well add a check there. Here are the bug reports: http://www.grammatech.com/products/codesonar/examples/pg_ctl_race.html http://www.grammatech.com/products/codesonar/examples/initdb_race.html I've just saved static HTML above, so none of the links, navigation features, or images will work. The files are posted on the website because they seemed a bit big for sending to a list. I haven't shared them with anyone else outside the company. - Dave -. Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On 11.10.2012 13:17, Amit Kapila wrote: How does this look now? The Patch is fine and test results are also fine. Ok, thanks. Committed. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7597: exception 0xC0000005
On 11.10.2012 14:42, Craig Ringer wrote: On 10/11/2012 06:07 PM, Bo Thorbjørn Jensen wrote: Self-contained case attached. No crash here on version() = PostgreSQL 9.1.5 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.7.0 20120507 (Red Hat 4.7.0-5), 64-bit I can reproduce this on my Windows box with the script Bo provided, with a fresh checkout from git master branch. Stack trace looks like this: . 0 Id: 92c.71c Suspend: 1 Teb: 07ff`fffde000 Unfrozen Child-SP RetAddr Call Site `0053eb80 0001`3f929a31 postgres!int8eq+0x12 [c:\postgresql\src\backend\utils\adt\int8.c @ 205] `0053ebb0 0001`3f5321a0 postgres!FunctionCall2Coll+0xa1 [c:\postgresql\src\backend\utils\fmgr\fmgr.c @ 1326] `0053efb0 0001`3f573bf3 postgres!execTuplesUnequal+0xe0 [c:\postgresql\src\backend\executor\execgrouping.c @ 168] `0053f020 0001`3f572ad2 postgres!findPartialMatch+0xa3 [c:\postgresql\src\backend\executor\nodesubplan.c @ 592] `0053f090 0001`3f57293b postgres!ExecHashSubPlan+0x172 [c:\postgresql\src\backend\executor\nodesubplan.c @ 156] `0053f0e0 0001`3f541ccb postgres!ExecSubPlan+0xcb [c:\postgresql\src\backend\executor\nodesubplan.c @ 79] `0053f120 0001`3f545c89 postgres!ExecEvalNot+0x5b [c:\postgresql\src\backend\executor\execqual.c @ 2670] `0053f170 0001`3f54995b postgres!ExecQual+0x79 [c:\postgresql\src\backend\executor\execqual.c @ 5124] `0053f1d0 0001`3f56fe71 postgres!ExecScan+0x1ab [c:\postgresql\src\backend\executor\execscan.c @ 195] `0053f240 0001`3f539034 postgres!ExecSeqScan+0x21 [c:\postgresql\src\backend\executor\nodeseqscan.c @ 116] `0053f270 0001`3f535ca0 postgres!ExecProcNode+0x114 [c:\postgresql\src\backend\executor\execprocnode.c @ 399] `0053f2c0 0001`3f533dda postgres!ExecutePlan+0x60 [c:\postgresql\src\backend\executor\execmain.c @ 1394] `0053f300 0001`3f533bc5 postgres!standard_ExecutorRun+0x20a [c:\postgresql\src\backend\executor\execmain.c @ 313] `0053f380 0001`3f78fe80 postgres!ExecutorRun+0x45 [c:\postgresql\src\backend\executor\execmain.c @ 251] `0053f3b0 0001`3f78facb postgres!PortalRunSelect+0x130 [c:\postgresql\src\backend\tcop\pquery.c @ 946] `0053f420 0001`3f78a368 postgres!PortalRun+0x28b [c:\postgresql\src\backend\tcop\pquery.c @ 789] `0053f5c0 0001`3f789277 postgres!exec_simple_query+0x4b8 [c:\postgresql\src\backend\tcop\postgres.c @ 1061] `0053f700 0001`3f707b70 postgres!PostgresMain+0x7c7 [c:\postgresql\src\backend\tcop\postgres.c @ 3978] `0053f900 0001`3f7065ae postgres!BackendRun+0x270 [c:\postgresql\src\backend\postmaster\postmaster.c @ 3672] `0053f960 0001`3f59cba9 postgres!SubPostmasterMain+0x2be [c:\postgresql\src\backend\postmaster\postmaster.c @ 4174] - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
- --- 2468,2473 *** *** 2507,2512 include 'filename' --- 2496,2520 + + wal_receiver_timeout (integer) + +wal_receiver_timeout configuration parameter + + + + Terminate replication connections that are inactive longer + than the specified number of milliseconds. This is useful for + the receiving standby server to detect a primary node crash or network + outage. + A value of zero disables the timeout mechanism. This parameter + can only be set in + the postgresql.conf file or on the server command line. + The default value is 60 seconds. + + + + *** a/doc/src/sgml/release-9.1.sgml --- b/doc/src/sgml/release-9.1.sgml *** *** 3322,3328 Add !replication_timeout setting (Fujii Masao, Heikki Linnakangas) --- 3322,3328 Add !replication_timeout setting (Fujii Masao, Heikki Linnakangas) *** a/src/backend/replication/walreceiver.c --- b/src/backend/replication/walreceiver.c *** *** 55,60 --- 55,61 /* GUC variables */ int wal_receiver_status_interval; + int wal_receiver_timeout; bool hot_standby_feedback; /* libpqreceiver hooks to these when loaded */ *** *** 121,127 static void WalRcvDie(int code, Datum arg); static void XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len); static void XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr); static void XLogWalRcvFlush(bool dying); ! static void XLogWalRcvSendReply(void); static void XLogWalRcvSendHSFeedback(void); static void ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime); --- 122,128 static void XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len); static void XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr); static void XLogWalRcvFlush(bool dying); ! static void XLogWalRcvSendReply(bool force, bool requestReply); static void XLogWalRcvSendHSFeedback(void); static void ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime); *** *** 170,178 WalReceiverMain(void) { char conninfo[MAXCONNINFO]; XLogRecPtr startpoint; - /* use volatile pointer to prevent code rearrangement */ volatile WalRcvData *walrcv = WalRcv; /* * WalRcv should be set up already (if we are a backend, we inherit this --- 171,180 { char conninfo[MAXCONNINFO]; XLogRecPtr startpoint; /* use volatile pointer to prevent code rearrangement */ volatile WalRcvData *walrcv = WalRcv; + TimestampTz last_recv_timestamp; + bool ping_sent; /* * WalRcv should be set up already (if we are a backend, we inherit this *** *** 282,287 WalReceiverMain(void) --- 284,293 MemSet(&reply_message, 0, sizeof(reply_message)); MemSet(&feedback_message, 0, sizeof(feedback_message)); + /* Initialize the last recv timestamp */ + last_recv_timestamp = GetCurrentTimestamp(); + ping_sent = false; + /* Loop until end-of-streaming or error */ for (;;) { *** *** 316,330 WalReceiverMain(void) /* Wait a while for data to arrive */ if (walrcv_receive(NAPTIME_PER_CYCLE, &type, &buf, &len)) { /* Accept the received data, and process it */ XLogWalRcvProcessMsg(type, buf, len); /* Receive any more data we can without sleeping */ while (walrcv_receive(0, &type, &buf, &len)) XLogWalRcvProcessMsg(type, buf, len); /* Let the master know that we received some data. */ ! XLogWalRcvSendReply(); /* * If we've written some records, flush them to disk and let the --- 322,344 /* Wait a while for data to arrive */ if (walrcv_receive(NAPTIME_PER_CYCLE, &type, &buf, &len)) { + /* Something was received from master, so reset timeout */ + last_recv_timestamp = GetCurrentTimestamp(); + ping_sent = false; + /* Accept the received data, and process it */ XLogWalRcvProcessMsg(type, buf, len); /* Receive any more data we can without sleeping */ while (walrcv_receive(0, &type, &buf, &len)) + { + last_recv_timestamp = GetCurrentTimestamp(); + ping_sent = false; XLogWalRcvProcessMsg(type, buf, len); + } /* Let the master know that we received some data. */ ! XLogWalRcvSendReply(false, false); /* * If we've written some records, flush them to disk and let the *** *** 335,344 WalReceiverMain(void) else { /* ! * We didn't receive anything new, but send a status update to the ! * master anyway, to report any progress in applying WAL. */ ! XLogWalRcvSendRep
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On 02.10.2012 10:36, Amit kapila wrote: On Monday, October 01, 2012 4:08 PM Heikki Linnakangas wrote: So let's think how this should ideally work from a user's point of view. I think there should be just two settings: walsender_timeout and walreceiver_timeout. walsender_timeout specifies how long a walsender will keep a connection open if it doesn't hear from the walreceiver, and walreceiver_timeout is the same for walreceiver. The system should figure out itself how often to send keepalive messages so that those timeouts are not reached. By this it implies that we should remove wal_receiver_status_interval. Currently it is also used incase of reply message of data sent by sender which contains till what point receiver has flushed. So if we remove this variable receiver might start sending that message sonner than required. Is that okay behavior? I guess we should keep that setting, then, so that you can get status updates more often than would be required for heartbeat purposes. In walsender, after half of walsender_timeout has elapsed and we haven't received anything from the client, the walsender process should send a "ping" message to the client. Whenever the client receives a Ping, it replies. The walreceiver does the same; when half of walreceiver_timeout has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip resets the timer in both ends, regardless of which side initiated it, so if e.g walsender_timeout< walreceiver_timeout, the client will never have to initiate a Ping message, because walsender will always reach the walsender_timeout/2 point first and initiate the heartbeat message. Just to clarify, walsender should reset timer after it gets reply from receiver of the message it sent. Right. walreceiver should reset timer after sending reply for heartbeat message. > Similar to above timers will be reset when receiver sent the heartbeat message. walreceiver should reset the timer when it *receives* any message from walsender. If it sends the reply right away, I guess that's the same thing, but I'd phrase it so that it's the reception of a message from the other end that resets the timer. The Ping/Pong messages don't necessarily need to be new message types, we can use the message types we currently have, perhaps with an additional flag attached to them, to request the other side to reply immediately. Can't we make the decision to send reply immediately based on message type, because these message types will be unique. To clarify my understanding, 1. the heartbeat message from walsender side will be keepalive message ('k') and from walreceiver side it will be Hot Standby feedback message ('h'). 2. the reply message from walreceiver side will be current reply message ('r'). Yep. I wonder why need separate message types for Hot Standby Feedback 'h' and Reply 'r', though. Seems it would be simpler to have just one messasge type that includes all the fields from both messages. 3. currently there is no reply kind of message from walsender, so do we need to introduce one new message for it or can use some existing message only? if new, do we need to send any additional information along with it, for existing messages can we use keepalive message it self as reply message but with an additional byte to indicate it is reply? Hmm, I think I'd prefer to use the existing Keepalive message 'k', with an additional flag. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown
On 21.09.2012 14:18, Amit kapila wrote: On Tuesday, September 18, 2012 6:02 PM Fujii Masao wrote: On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapila wrote: Approach-2 : Provide a variable wal_send_status_interval, such that if this is 0, then the current behavior would prevail and if its non-zero then KeepAlive message would be send maximum after that time. The modified code of WALSendLoop will be as follows: Which way you think is better or you have any other idea to handle. I think #2 is better because it's more intuitive to a user. Please find a patch attached for implementation of Approach-2. Hmm, I think we need to step back a bit. I've never liked the way replication_timeout works, where it's the user's responsibility to set wal_receiver_status_interval < replication_timeout. It's not very user-friendly. I'd rather not copy that same design to this walreceiver timeout. If there's two different timeouts like that, it's even worse, because it's easy to confuse the two. So let's think how this should ideally work from a user's point of view. I think there should be just two settings: walsender_timeout and walreceiver_timeout. walsender_timeout specifies how long a walsender will keep a connection open if it doesn't hear from the walreceiver, and walreceiver_timeout is the same for walreceiver. The system should figure out itself how often to send keepalive messages so that those timeouts are not reached. In walsender, after half of walsender_timeout has elapsed and we haven't received anything from the client, the walsender process should send a "ping" message to the client. Whenever the client receives a Ping, it replies. The walreceiver does the same; when half of walreceiver_timeout has elapsed, send a Ping message to the server. Each Ping-Pong roundtrip resets the timer in both ends, regardless of which side initiated it, so if e.g walsender_timeout < walreceiver_timeout, the client will never have to initiate a Ping message, because walsender will always reach the walsender_timeout/2 point first and initiate the heartbeat message. The Ping/Pong messages don't necessarily need to be new message types, we can use the message types we currently have, perhaps with an additional flag attached to them, to request the other side to reply immediately. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7559: syslogger doesn't close stdout and stderr
On 20.09.2012 00:05, Reinhard Max wrote: On Wed, 19 Sep 2012 at 12:39, Tom Lane wrote: reinh...@m4x.de writes: And what makes /dev/null necessarily the right substitute? Because it is what virtually all deamons have been doing for the last 3..4 decades? I don't think we should change this within Postgres. We removed logic associated with daemonization altogether in 9.2 Huh - why that? I believe Tom is referring to the removal of silent_mode in 9.2, see http://archives.postgresql.org/pgsql-general/2011-06/msg00796.php and http://archives.postgresql.org/pgsql-hackers/2011-06/msg02156.php. "We removed logic associated with daemonization" meant that the logic was removed from postmaster, because the preferred way to do it is by calling "pg_ctl start". pg_ctl redirects to /dev/null as you'd expect. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6412: psql & fe-connect truncate passwords
On 15.02.2012 07:09, Andy Grimm wrote: Sorry that it's been a couple of weeks, but I have gotten around to working on a patch that address more of these concerns. The attached patch should 1) allow arbitrary length passwords to be read from a file via initdb --pwfile 2) allow the client to accept a password of arbitrary length at the password prompt 3) allow a password of arbitrary length in a pgpass file In #2 I say "allow the client to accept", because there's a pq_getmessage call in src/backend/libpq/auth.c which limits the password message length to 1000 characters. Changing that part of the code should allow longer passwords, but there may be other lurking backend issues after that, and I'm not concerned about going beyond 1000 at this point. Thanks for the patch. A few comments: * Most of the simple_prompt() calls are for passwords, which now have no limit, but there's a few others. How about we remove the maxlen argument altogether, and just have it always return a malloc'd string that can be arbitrarily long. (maybe with a sanity-check limit within simple_prompt(), like 100k) * .pg_service.conf handling still has a fixed limit on line length of 256 bytes. See parseServiceInfo() in fe-connect. I think we should lift that limit too, for the sake of consistency. You can pass a password in the service file, too. * Missed a few simple_prompt() calls in contrib (oid2name, vacuumlo, pgbench) - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7549: max_connections check should query master when starting standby
On 18.09.2012 09:46, Craig Ringer wrote: On 09/18/2012 07:57 AM, Fujii Masao wrote: If you change the max_connections on the master, you need to take a fresh backup from the master and start the standby from it. WTF, really? No. It's enough to bump up max_connections on the standby, and restart. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby
On 12.09.2012 22:03, Fujii Masao wrote: On Wed, Sep 12, 2012 at 8:47 PM, wrote: The following bug has been logged on the website: Bug reference: 7533 Logged by: Amit Kapila Email address: amit.kap...@huawei.com PostgreSQL version: 9.2.0 Operating system: Suse Description: M host is primary, S host is standby and CS host is cascaded standby. 1.Set up postgresql-9.2beta2/RC1 on all hosts. 2.Execute command initdb on host M to create fresh database. 3.Modify the configure file postgresql.conf on host M like this: listen_addresses = 'M' port = 15210 wal_level = hot_standby max_wal_senders = 4 hot_standby = on 4.modify the configure file pg_hba.conf on host M like this: host replication repl M/24md5 5.Start the server on host M as primary. 6.Connect one client to primary server and create a user ‘repl’ Create user repl superuser password '123'; 7.Use the command pg_basebackup on the host S to retrieve database of primary host pg_basebackup -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup -P -v -h M -p 15210 -U repl –W 8. Copy one recovery.conf.sample from share folder of package to database folder of the host S. Then rename this file to recovery.conf 9.Modify the file recovery.conf on host S as below: standby_mode = on primary_conninfo = 'host=M port=15210 user=repl password=123' 10. Modify the file postgresql.conf on host S as follow: listen_addresses = 'S' 11.Start the server on host S as standby server. 12.Use the command pg_basebackup on the host CS to retrieve database of standby host pg_basebackup -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup -P -v -h M -p 15210 -U repl –W 13.Modify the file recovery.conf on host CS as below: standby_mode = on primary_conninfo = 'host=S port=15210 user=repl password=123' 14. Modify the file postgresql.conf on host S as follow: listen_addresses = 'CS' 15.Start the server on host CS as Cascaded standby server node. 16. Try to connect a client to host CS but it gives error as: FATAL: the database system is starting up This procedures didn't reproduce the problem in HEAD. But when I restarted the master server between the step 11 and 12, I was able to reproduce the problem. Observations related to bug -- In the above scenario it is observed that Start-up process has read all data (in our defect scenario minRecoveryPoint is 5016220) till the position 5016220 and then it goes and check for recovery consistency by following condition in function CheckRecoveryConsistency: if (!reachedConsistency&& XLByteLE(minRecoveryPoint, EndRecPtr)&& XLogRecPtrIsInvalid(ControlFile->backupStartPoint)) At this point first two conditions are true but last condition is not true because still redo has not been applied and hence backupStartPoint has not been reset. So it does not signal postmaster regarding consistent stage. After this it goes and applies the redo and then reset backupStartPoint and then it goes to read next set of record. Since all records have been already read, so it starts waiting for the new record from the Standby node. But since there is no new record from Standby node coming so it keeps waiting for that and it does not get chance to recheck the recovery consistent level. And hence client connection does not get allowed. If cascaded standby starts a recovery at a normal checkpoint record, this problem will not happen. Because if wal_level is set to hot_standby, XLOG_RUNNING_XACTS WAL record always follows after the normal checkpont record. So while XLOG_RUNNING_XACTS record is being replayed, ControlFile->backupStartPoint can be reset, and then cascaded standby can pass through the consistency test. The problem happens when cascaded standby starts a recovery at a shutdown checkpoint record. In this case, no WAL record might follow the checkpoint one yet. So, after replaying the shutdown checkpoint record, cascaded standby needs to wait for new WAL record to appear before reaching the code block for resetting ControlFile->backupStartPoint. The cascaded standby cannot reach a consistent state and a client cannot connect to the cascaded standby until new WAL has arrived. Attached patch will fix the problem. In this patch, if recovery is beginning at a shutdown checkpoint record, any ControlFile fields (like backupStartPoint) required for checking that an end-of-backup is reached are not set at first. IOW, cascaded standby thinks that the database is consistent from the beginning. This is safe because a shutdown checkpoint record means that there is no running database activity at that point and the database is in consistent state. Hmm, I think the CheckRecoveryConsistency() call in the redo loop is misplaced. It's called after we got a record from ReadRecord, but *before* replaying it (rm_redo). Even if replaying record X makes the sy
Re: [BUGS] BUG #7521: Cannot disable WAL log while using pg_dump
On 06.09.2012 13:07, Robert Haas wrote: On Thu, Sep 6, 2012 at 3:55 PM, Tom Lane wrote: Robert Haas writes: On Wed, Sep 5, 2012 at 9:57 AM, wrote: So it would be nice if there is an option to disable WAL logging while running pg_dump. pg_dump doesn't modify any data, so I don't see how it could be causing WAL logs to get generated. Doesn't hint-bit setting cause WAL traffic these days? I sure as heck don't think so. It does not. HOT page pruning does, however. It could be that.. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] ERROR - CREATE GIST INDEX on 9.2 beta3
On 15.08.2012 09:50, Heikki Linnakangas wrote: On 15.08.2012 01:02, Zdeněk Jílovec wrote: Hello, I use PostgreSQL 9.2beta3 with PostGIS 2.0.1 and if I try create GIST index on column geometry(Point,2065) I get error: test=> CREATE INDEX places_point ON places USING GIST(def_point); ERROR: failed to re-find parent for block 18097 It works on 9.1 Hmm, I bet this is a bug in the new GiST buffering build code. There was an earlier bug that led to "failed to re-find parent" errors that I fixed back in May, but maybe I missed some corner case. Zdeněk sent me the dump and instructions off-list, and I was able to reproduce and diagnose the bug. Many thanks for that! It was indeed a corner-case in the parent tracking logic. During the build, we maintain a hash table of the parent of each page. The hash table is used to find the parent of a page, when a page is split and we have to insert the downlinks of the new pages to the parent. In a regular GiST insertion, we always descend the tree from the root to leaf, and we get the parent pointers from the stack. During a buffered build, we don't have such a stack available, because we can start the descend from a buffer in the middle of the tree. So we use the parent map instead. However, the parent hash table does not track the immediate parents of leaf pages. That's not required, because even though we can begin the descend somewhere in the middle of the tree, when we descend to a leaf page we know the immediate parent where we came from. Not tracking the leaf level saves a considerable amount of memory. But just before we descend to the leaf page, we check if the downlink needs to be adjusted to accommodate the new tuple, and replace it with an updated tuple if so. The bug arises when updating the downlink of the leaf splits the parent page, and the downlink is moved to a right sibling. When we then descend to the leaf page, the parent of the leaf page is incorrect, the real parent is somewhere to the right of where we think it is. In a normal index insert that case is covered by the logic to move right if the downlink is not found on the expected page. In the buffering build, we don't do that because we think we know exactly what the parent of each page is. I committed the attached patch to fix that. With the patch, when the downlink is updated in the parent page, the gistbufferinginserttuples() function returns the block where the updated tuple was placed, so that when we descend to the leaf, we know the parent of the leaf correctly. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com *** a/src/backend/access/gist/gist.c --- b/src/backend/access/gist/gist.c *** *** 148,163 gistinsert(PG_FUNCTION_ARGS) --- 148,169 * pages are released; note that new tuple(s) are *not* on the root page * but in one of the new child pages. * + * If 'newblkno' is not NULL, returns the block number of page the first + * new/updated tuple was inserted to. Usually it's the given page, but could + * be its right sibling if the page was split. + * * Returns 'true' if the page was split, 'false' otherwise. */ bool gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate, Buffer buffer, IndexTuple *itup, int ntup, OffsetNumber oldoffnum, + BlockNumber *newblkno, Buffer leftchildbuf, List **splitinfo, bool markfollowright) { + BlockNumber blkno = BufferGetBlockNumber(buffer); Page page = BufferGetPage(buffer); bool is_leaf = (GistPageIsLeaf(page)) ? true : false; XLogRecPtr recptr; *** *** 199,205 gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate, BlockNumber oldrlink = InvalidBlockNumber; GistNSN oldnsn = 0; SplitedPageLayout rootpg; - BlockNumber blkno = BufferGetBlockNumber(buffer); bool is_rootsplit; is_rootsplit = (blkno == GIST_ROOT_BLKNO); --- 205,210 *** *** 319,327 gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate, for (i = 0; i < ptr->block.num; i++) { ! if (PageAddItem(ptr->page, (Item) data, IndexTupleSize((IndexTuple) data), i + FirstOffsetNumber, false, false) == InvalidOffsetNumber) elog(ERROR, "failed to add item to index page in \"%s\"", RelationGetRelationName(rel)); ! data += IndexTupleSize((IndexTuple) data); } /* Set up rightlinks */ --- 324,342 for (i = 0; i < ptr->block.num; i++) { ! IndexTuple thistup = (IndexTuple) data; ! ! if (PageAddItem(ptr->page, (Item) data, IndexTupleSize(thistup), i + FirstOffsetNumber, false, false) == InvalidOffsetNumber) elog(ERROR, "failed to add item to index page in \"%s\"", RelationGetRelationName(rel)); ! ! /* ! * If this is the first inserted/updated tuple, let the
Re: [BUGS] ERROR - CREATE GIST INDEX on 9.2 beta3
On 15.08.2012 01:02, Zdeněk Jílovec wrote: Hello, I use PostgreSQL 9.2beta3 with PostGIS 2.0.1 and if I try create GIST index on column geometry(Point,2065) I get error: test=> CREATE INDEX places_point ON places USING GIST(def_point); ERROR: failed to re-find parent for block 18097 It works on 9.1 Hmm, I bet this is a bug in the new GiST buffering build code. There was an earlier bug that led to "failed to re-find parent" errors that I fixed back in May, but maybe I missed some corner case. I can send a table dump (43 MB - bzip2). Yes, please send the dump to me off-list, and I'll take a look. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6722: Debugger broken?
On 13.08.2012 15:22, Dave Page wrote: Heikki; please try forcing the delete, and let us know if it works OK. I've CC'd Sachin and Ashesh who can drop the old branch from the installer build machines for 8.x and 9.0/9.1. Ah, I tried forcing the push, didn't help, I got "[remote rejected] PRE_9_2 -> PRE_9_2 (non-fast-forward)" error. However, I succeeded by pushing it in two steps. First, I deleted the branch with "git push origin :PRE_9_2", and then pushed it again with "git push origin PRE_9_2". So, it's done now. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6722: Debugger broken?
On 13.08.2012 11:39, Dave Page wrote: On Mon, Aug 13, 2012 at 9:20 AM, Heikki Linnakangas wrote: Ok. So, why is pldbgapi.control file not included? It's there in the repository, and "make install" copies it to share/extension. If it's there in the repo, then it shouldn't be. The PRE_9_2 branch of code is supposed to be how things looked *before* you started your work - it's the stable branch that we're building the< 9.2 installers from, and certainly shouldn't have been "extensionised", or had any other changes made to it except for bug fixes. Then that was a misunderstanding when we discussed that back in May then (http://archives.postgresql.org/pgadmin-hackers/2012-05/msg00022.php). How do you prefer to resolve that now? The extensionised code in PRE_9_2 branch should work on 9.1, if you include all the right files in the installer, including the control file. Or, we can create a new "real" pre-9.2 branch at the point before I started any of this work. Or at the point just before the commit that turned it into an extension. All the commits prior to that seem fairly harmless, but would need to test that it works on all supported platforms and server versions. It cannot be extensionised - it's used on 8.x too. The PRE_9_2 code works against older server versions if you run the .sql file manually. It just isn't an extension then, obviously. I think the best option is to drop the branch, and recreate it from the point immediately prior to your first change, and then cherry pick any important bug fixes you may have made into the new branch. Agreed. I'm not even going cherry-pick any of the fixes from master - there's a non-zero risk that something is broken, and given the lack of complaints from the field about the bugs that were fixed, it's not worth it. The important thing is that the new code that will be included in 9.2 installers has all the goodies, and that the old versions work as far as they used to. I tried to drop and recreate the branch, but the server would not let me push that change, as it's a non-fast-forward update. Someone needs to temporarily remove the check for that from the git server - I don't think I have access to do that. Do you? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6722: Debugger broken?
On 13.08.2012 10:42, Dave Page wrote: On Mon, Aug 13, 2012 at 7:22 AM, Heikki Linnakangas wrote: On 09.08.2012 17:55, Dave Page wrote: On Thu, Aug 9, 2012 at 3:51 PM, Heikki Linnakangas wrote: Ok, I see. I just downloaded the 9.1.4 installer, and you're right, there is no pldbgapi.control file anywhere. Looking at the README.pldebugger file, the version of pldebugger that the installer includes certainly should include that file. Dave, is this a packaging issue? Which version of pldebugger was included in the 9.1.4 installers? It uses the head of the PRE_9_2 branch. Ok. So, why is pldbgapi.control file not included? It's there in the repository, and "make install" copies it to share/extension. If it's there in the repo, then it shouldn't be. The PRE_9_2 branch of code is supposed to be how things looked *before* you started your work - it's the stable branch that we're building the< 9.2 installers from, and certainly shouldn't have been "extensionised", or had any other changes made to it except for bug fixes. Then that was a misunderstanding when we discussed that back in May then (http://archives.postgresql.org/pgadmin-hackers/2012-05/msg00022.php). How do you prefer to resolve that now? The extensionised code in PRE_9_2 branch should work on 9.1, if you include all the right files in the installer, including the control file. Or, we can create a new "real" pre-9.2 branch at the point before I started any of this work. Or at the point just before the commit that turned it into an extension. All the commits prior to that seem fairly harmless, but would need to test that it works on all supported platforms and server versions. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6722: Debugger broken?
On 09.08.2012 17:55, Dave Page wrote: On Thu, Aug 9, 2012 at 3:51 PM, Heikki Linnakangas wrote: On 09.08.2012 17:07, kargor wrote: I use the binaries provided on the download page. I have not compiled anything - I'm a 'stupid' windows user... Ok, I see. I just downloaded the 9.1.4 installer, and you're right, there is no pldbgapi.control file anywhere. Looking at the README.pldebugger file, the version of pldebugger that the installer includes certainly should include that file. Dave, is this a packaging issue? Which version of pldebugger was included in the 9.1.4 installers? It uses the head of the PRE_9_2 branch. Ok. So, why is pldbgapi.control file not included? It's there in the repository, and "make install" copies it to share/extension. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6722: Debugger broken?
On 09.08.2012 17:07, kargor wrote: I use the binaries provided on the download page. I have not compiled anything - I'm a 'stupid' windows user... Ok, I see. I just downloaded the 9.1.4 installer, and you're right, there is no pldbgapi.control file anywhere. Looking at the README.pldebugger file, the version of pldebugger that the installer includes certainly should include that file. Dave, is this a packaging issue? Which version of pldebugger was included in the 9.1.4 installers? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6722: Debugger broken?
On 09.08.2012 15:07, kargor wrote: * CREATE EXTENSION pldbgapi; FEHLER: konnte Erweiterungskontrolldatei »F:/progs/9.1.4/share/extension/pldbgapi.control« nicht öffnen: No such file or directory Did you use MinGW or MSVC to build it? Did you perform the install step, ie. "make install" ? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6722: Debugger broken?
On 08.08.2012 19:05, kargor wrote: Its possible to start the debugger, but its not really easy. The control file from git has not worked, so I copied the sql file to install the functions. How did it not work? What error message did you get? Did you follow the installation instructions in the README? I copied the debugging.dll from plugins into the libdir, too, but I think this is not necessary (in the conf file the dll from plugins will be loaded). There is no reference to "debugging.dll" in the git repository, or in any old version of the debugger either IIRC, so I wonder where that came from. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6710: txid_current() provides incorrect txid after server startup
On 27.06.2012 10:08, tar...@gmail.com wrote: The following bug has been logged on the website: Bug reference: 6710 Logged by: Tarvi Pillessaar Email address: tar...@gmail.com PostgreSQL version: 9.1.4 Operating system: linux Description: This happens when epoch is greater than 0. After a checkpoint it starts providing correct txid. It seems that this regression is caused by following commit: 20d98ab6e4110087d1816cd105a40fcc8ce0a307 Correct epoch of txid_current() when executed on a Hot Standby server. When reverting this commit, txid_current() works as expected. Fixed, thanks for the report! -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6698: sub-query with join producing out of memory in where clause
On 19.06.2012 04:01, armando.mirag...@stud-inf.unibz.it wrote: The following bug has been logged on the website: Bug reference: 6698 Logged by: Armando Miraglia Email address: armando.mirag...@stud-inf.unibz.it PostgreSQL version: 9.1.2 Operating system: Arch Linux/Ubuntu Description: Hi everybody! Fact: while I was trying to produce a c-function I got an OOM which RhodiumToad helped me to debug. The OOM is reproducible with also standard query. Environment: I tested the POC using 9.1.2 but also 9.2devel compiled "by-hand" Reproducibility: - limit the memory usage ulimit -S -v 50 - start postgresql postgres -D ../data.ascii/ - run the following query from psql SELECT * FROM generate_series(1,100) i WHERE 100<= (SELECT COUNT(*) FROM unnest(array(select j from generate_series(i-100,i+100) j)) u1 JOIN unnest(array(select j from generate_series(i-100,i+100) j)) u2 ON (u1.u1=u2.u2)); Error: - psql side: ERROR: out of memory DETAIL: Failed on request of size 828. - server side: ... PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used PortalHeapMemory: 1024 total in 1 blocks; 824 free (0 chunks); 200 used ExecutorState: 458358928 total in 67 blocks; 794136 free (15965 chunks); 457564792 used accumArrayResult: 8192 total in 1 blocks; 5744 free (4 chunks); 2448 used HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used HashBatchContext: 32768 total in 2 blocks; 8416 free (1 chunks); 24352 used ... This test case can be further reduced into: explain analyze SELECT * FROM generate_series(1,10) i WHERE (SELECT array(select repeat('a', 1) || i) u1) is not null; We're leaking the array constructed on each row, in ExecSetParamPlan(). At line 1000 in nodeSubplan.c, we create a new array and store it as the value of the PARAM_EXEC parameter. But it's never free'd. The old value of the parameter is simply overwritten. I'm not sure what the correct fix is. I suppose we could pfree() the old value before overwriting it, but I'm not sure if that's safe, or if there might still be references to the old value somewhere in the executor. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6643: [PostgreSQL9.2beta1] COPY after changing fillfactor gets a PANIC.
On 16.05.2012 13:47, Heikki Linnakangas wrote: This sounds like a bug in the new page-at-a-time behavior in COPY. Can you send me a self-contained test, including the test data? Never mind. After staring at the code for a while, I spotted the bug, and was able to reproduce with a simpler case. It's quite easy to reproduce when you set fillfactor even lower, like 10. The problem is with this line in heap_multi_insert function: if (PageGetHeapFreeSpace(page) - saveFreeSpace < MAXALIGN(heaptup->t_len)) That doesn't work as intended, because the return value of PageGetHeapFreeSpace and saveFreeSpace are unsigned. When saveFreeSpace is larger than the amount of free space on the page, the left hand side of that comparison is supposed to go negative, but it wraps around to a highly positive number because it's unsigned. Fixed, thanks for the report! -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6643: [PostgreSQL9.2beta1] COPY after changing fillfactor gets a PANIC.
On 16.05.2012 13:39, katsumata.tomon...@po.ntts.co.jp wrote: Now, I'm testing PostgreSQL 9.2 Beta 1. And I have a problem. Steps to procedure are bellow. 1. CREATE DATABASE export LANG=C initdb -D $PGDATA -E SQL_ASCII pg_ctl start createdb testdb 2. CREATE TABLE psql -d testdb -f ./create_table_customer.sql 3. ALTER TABLE(fillfactor) psql -d testdb -c "ALTER TABLE customer SET (fillfactor=90);" 4. LOAD DATA (please set correct path to customer.data) psql -d testdb -f ./customer.sql Then, I have a PANIC error. == BEGIN TRUNCATE TABLE PANIC: failed to add tuple to page CONTEXT: COPY customer, line 296: "296>ALNGAT>alngat>EgBEEAyXVIAWBE>KCiDDFsqA8Kv>2586068>4067234479>ALNGAT@kuvkaEEyi>20100905>20101023>..." STATEMENT: COPY customer FROM '/home/katsumata/work/2012/20120516_PG92beta1_bug1/copy_panic_dbt1/copy_panic/customer.data' WITH DELIMITER '>'; psql:./customer.sql:3: PANIC: failed to add tuple to page CONTEXT: COPY customer, line 296: "296>ALNGAT>alngat>EgBEEAyXVIAWBE>KCiDDFsqA8Kv>2586068>4067234479>ALNGAT@kuvkaEEyi>20100905>20101023>..." PANIC: failed to add tuple to page CONTEXT: COPY customer, line 296: "296>ALNGAT>alngat>EgBEEAyXVIAWBE>KCiDDFsqA8Kv>2586068>4067234479>ALNGAT@kuvkaEEyi>20100905>20101023>..." psql:./customer.sql:3: connection to server was lost == If I skip the 3rd step(ALTER TABLE(fillfactor)), I don't have any ERROR. And It's also OK on PostgreSQL 9.1.3. Are there any changes about this behavior ? This sounds like a bug in the new page-at-a-time behavior in COPY. Can you send me a self-contained test, including the test data? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6638: Casablanca timezone is wrong
On 14.05.2012 18:21, dch...@odotech.com wrote: We get for Africa/Casablanca, timezone equal to WET with no DST. This is not true from 2010. See Wikipedia, for Morocco: Time zone WET (UTC+0) Summer (DST)WEST (UTC+1)(May 2nd to August 7th) PostgreSQL uses the timezone data from the so-called Olson library. See http://www.twinsun.com/tz/tz-link.htm. Looking at the upstream library, this is fixed in the most recent version (tzdata2012c), We will pick up that change in the next PostgreSQL minor release, ie. 9.0.8 for the 9.0 series. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #6629: Creating a gist index fails with "too many LWLocks taken"
On 11.05.2012 18:18, Simon Riggs wrote: On 11 May 2012 15:14, Heikki Linnakangas wrote: On 11.05.2012 16:56, Simon Riggs wrote: On 11 May 2012 11:07, Heikki Linnakangas wrote: I wonder if we should reserve a few of the lwlock "slots" for critical sections, to make this less likely to happen. Not only in this case, but in general. We haven't seen this problem often, but it would be quite trivial to reserve a few slots. Why reserve them solely for critical sections? Because if you run out of lwlocks in a critical section, you get a PANIC. Yes, but why reserve them solely for critical sections? If you have an escape hatch you use it, always Well, no, because a PANIC is much worse than an ERROR. Think of this as a spare oxygen tank while diving, rather than an escape hatch. A spare tank can save your life if you run out of oxygen while ascending, but if you run out of oxygen on the way down, you don't continue going down with just the spare tank. Imagine that you have a process that does something like this: for (i=0; i < 99; i++) LWLockAcquire(foolock[i]) START_CRIT_SECTION(); XLogInsert(...) END_CRIT_SECTION(); What happens at the moment is that XLogInsert hits the limit when it tries to acquire WALInsertLock, so you get a PANIC. If we reserved, say, 5 locks for critical sections, so that you could hold 95 locks while outside a critical section, and 5 more within on, you would get an error earlier, outside the critical section, while acquiring the "foolocks". Or if the number of foolocks acquired was less than 95, you would not get error at all. That avoids the PANIC. You can argue for just raising the limit, but that just moves the problem around. It's still possible to hit the limit within a critical section, and PANIC. Likewise, if we lower the limit, that helps us find the problems earlier in the development cycle, but doesn't change the fact that if you run too close to the edge, you run out of locks within a critical section and PANIC. Of course, nothing stops you from writing (bad) code that acquires a lot of locks within a critical section, in which case you're screwed anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs