Re: [HACKERS] Function to know last log write timestamp
On 2014-08-11 12:42:06 +0900, Fujii Masao wrote: On Mon, Aug 11, 2014 at 10:48 AM, Tatsuo Ishii is...@postgresql.org wrote: On Mon, Aug 11, 2014 at 9:23 AM, Tatsuo Ishii is...@postgresql.org wrote: We can know the LSN of last committed WAL record on primary by using pg_current_xlog_location(). It seems there's no API to know the time when the WAL record was created. I would like to know standby delay by using pg_last_xact_replay_timestamp() and such that API. If there's no such a API, it would be useful to invent usch an API IMO. +1 I proposed that function before, but unfortunately it failed to be applied. But I still think that function is useful to calculate the replication delay. The past discussion is http://www.postgresql.org/message-id/cahgqgwf3zjfunej5ka683ku5rqubtswtqfq7g1x0g34o+jx...@mail.gmail.com I looked into the thread briefly and found Simon and Robert gave -1 for this because of performance concern. I'm not sure if it's a actual performance penalty or not. Maybe we need to major the penalty? I think that the performance penalty is negligible small because the patch I posted before added only three stores to shared memory per commit/abort. Uh. It adds another atomic operation (the spinlock) to the commit path. That's surely *not* insignificant. At the very least the concurrency approach needs to be rethought. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nulls in GIN index
On 08/11/2014 01:19 AM, worthy7 wrote: http://www.postgresql.org/docs/9.1/static/gin-implementation.html As of PostgreSQL 9.1, NULL key values can be included in the index. Also, placeholder NULLs are included in the index for indexed items that are NULL or contain no keys according to extractValue. This allows searches that should find empty items to do so. How do I define an index that includes nulls then? You don't need to do anything special, any NULL values will be indexed automatically. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Function to know last log write timestamp
On Mon, Aug 11, 2014 at 3:54 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-11 12:42:06 +0900, Fujii Masao wrote: On Mon, Aug 11, 2014 at 10:48 AM, Tatsuo Ishii is...@postgresql.org wrote: On Mon, Aug 11, 2014 at 9:23 AM, Tatsuo Ishii is...@postgresql.org wrote: We can know the LSN of last committed WAL record on primary by using pg_current_xlog_location(). It seems there's no API to know the time when the WAL record was created. I would like to know standby delay by using pg_last_xact_replay_timestamp() and such that API. If there's no such a API, it would be useful to invent usch an API IMO. +1 I proposed that function before, but unfortunately it failed to be applied. But I still think that function is useful to calculate the replication delay. The past discussion is http://www.postgresql.org/message-id/cahgqgwf3zjfunej5ka683ku5rqubtswtqfq7g1x0g34o+jx...@mail.gmail.com I looked into the thread briefly and found Simon and Robert gave -1 for this because of performance concern. I'm not sure if it's a actual performance penalty or not. Maybe we need to major the penalty? I think that the performance penalty is negligible small because the patch I posted before added only three stores to shared memory per commit/abort. Uh. It adds another atomic operation (the spinlock) to the commit path. That's surely *not* insignificant. At the very least the concurrency approach needs to be rethought. No, the patch doesn't add the spinlock at all. What the commit path additionally does are 1. increment the counter in shared memory 2. set the timestamp of last commit record to shared memory 3. increment the counter in shared memory There is no extra spinlock. OTOH, when pg_last_xact_insert_timestamp reads the timestamp from the shared memory, it checks whether the counter values are the same or not before and after reading the timestamp. If they are not the same, it tries to read the timesetamp again. This logic is necessary for reading the consistent timestamp value there. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers
On Mon, Aug 11, 2014 at 2:10 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Aug 11, 2014 at 1:26 PM, Fujii Masao masao.fu...@gmail.com wrote: Thanks for updating the patch! Again I tested the feature and found something wrong. I set synchronous_standby_num to 2 and started three standbys. Two of them are included in synchronous_standby_names, i.e., they are synchronous standbys. That is, the other one standby is always asynchronous. When I shutdown one of synchronous standbys and executed the write transaction, the transaction was successfully completed. So the transaction doesn't wait for two sync standbys in that case. Probably this is a bug. Well, that's working in my case :) Oh, that worked in my machine, too, this time... I did something wrong. Sorry for the noise. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Function to know last log write timestamp
On 2014-08-11 16:20:41 +0900, Fujii Masao wrote: On Mon, Aug 11, 2014 at 3:54 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-08-11 12:42:06 +0900, Fujii Masao wrote: On Mon, Aug 11, 2014 at 10:48 AM, Tatsuo Ishii is...@postgresql.org wrote: On Mon, Aug 11, 2014 at 9:23 AM, Tatsuo Ishii is...@postgresql.org wrote: We can know the LSN of last committed WAL record on primary by using pg_current_xlog_location(). It seems there's no API to know the time when the WAL record was created. I would like to know standby delay by using pg_last_xact_replay_timestamp() and such that API. If there's no such a API, it would be useful to invent usch an API IMO. +1 I proposed that function before, but unfortunately it failed to be applied. But I still think that function is useful to calculate the replication delay. The past discussion is http://www.postgresql.org/message-id/cahgqgwf3zjfunej5ka683ku5rqubtswtqfq7g1x0g34o+jx...@mail.gmail.com I looked into the thread briefly and found Simon and Robert gave -1 for this because of performance concern. I'm not sure if it's a actual performance penalty or not. Maybe we need to major the penalty? I think that the performance penalty is negligible small because the patch I posted before added only three stores to shared memory per commit/abort. Uh. It adds another atomic operation (the spinlock) to the commit path. That's surely *not* insignificant. At the very least the concurrency approach needs to be rethought. No, the patch doesn't add the spinlock at all. What the commit path additionally does are 1. increment the counter in shared memory 2. set the timestamp of last commit record to shared memory 3. increment the counter in shared memory There is no extra spinlock. Ah, I see. There's another patch somewhere down that thread (cahgqgwg4xfzjfyzabn5v__d3qpynnsgbph3nar6p40elivk...@mail.gmail.com). The patch in the message you linked to *does* use a spinlock though. OTOH, when pg_last_xact_insert_timestamp reads the timestamp from the shared memory, it checks whether the counter values are the same or not before and after reading the timestamp. If they are not the same, it tries to read the timesetamp again. This logic is necessary for reading the consistent timestamp value there. Yea, that approach then just touches a cacheline that should already be local. I doubt that the implementation is correct on some more lenient platforms (missing write memory barrier), but that's not your fault. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers
On Mon, Aug 11, 2014 at 4:26 PM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Aug 11, 2014 at 2:10 PM, Michael Paquier michael.paqu...@gmail.com wrote: Oh, that worked in my machine, too, this time... I did something wrong. Sorry for the noise. No problem, thanks for spending time testing. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
Hi, On 2011-10-04 20:52:59 +0900, Fujii Masao wrote: *** a/src/backend/access/transam/xact.c --- b/src/backend/access/transam/xact.c *** *** 1066,1071 RecordTransactionCommit(void) --- 1066,1074 (void) XLogInsert(RM_XACT_ID, XLOG_XACT_COMMIT_COMPACT, rdata); } + + /* Save timestamp of latest transaction commit record */ + pgstat_report_xact_end_timestamp(xactStopTimestamp); } Perhaps that pgstat_report() should instead be combined with the pgstat_report_xact_timestamp(0) in CommitTransaction()? Then the number of changecount increases and cacheline references would stay the same. The only thing that'd change would be a single additional assignment. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nulls in GIN index
Perhaps I'm missing something Table has 2 columns, text and ftstext text: how are you ftstest: (nothing) Because how and are and you are too common to be tsvectored. Which is fine. So if a user searches for how are you: select * from tbl_lines WHERE ftstext @@ plainto_tsquery('English', 'how are you') Returns nothing. Which I somewhat understand, but I want it to return all the rows with nothing in the ftstext. plainto_tsquery('English', 'how are you') = '' and the ftstext of some rows is also = '' So why doesn't the index return all these rows when a null string is searched. I think you can see what im trying to achieve, how do I do it? -- View this message in context: http://postgresql.1045698.n5.nabble.com/nulls-in-GIN-index-tp5814384p5814416.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] pg_last_xact_insert_timestamp
On Mon, Aug 11, 2014 at 4:46 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2011-10-04 20:52:59 +0900, Fujii Masao wrote: *** a/src/backend/access/transam/xact.c --- b/src/backend/access/transam/xact.c *** *** 1066,1071 RecordTransactionCommit(void) --- 1066,1074 (void) XLogInsert(RM_XACT_ID, XLOG_XACT_COMMIT_COMPACT, rdata); } + + /* Save timestamp of latest transaction commit record */ + pgstat_report_xact_end_timestamp(xactStopTimestamp); } Perhaps that pgstat_report() should instead be combined with the pgstat_report_xact_timestamp(0) in CommitTransaction()? Then the number of changecount increases and cacheline references would stay the same. The only thing that'd change would be a single additional assignment. Sounds good suggestion. While reading the patch again, I found it didn't handle the COMMIT/ABORT PREPARED case properly. According to the commit e74e090, now pg_last_xact_replay_timestamp() returns the timestamp of COMMIT/ABORT PREPARED. pg_last_xact_insert_timestamp() is mainly expected to be used to calculate the replication delay, so it also needs to return that timestam. But the patch didn't change 2PC code at all. We need to add pgstat_report_xact_end_timestamp() into FinishPreparedTransaction(), RecordTransactionCommitPrepared() or RecordTransactionAbortPrepared(). Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: show only failed queries
On Sat, Aug 9, 2014 at 4:16 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hi 2014-08-08 13:58 GMT+02:00 Fujii Masao masao.fu...@gmail.com: On Fri, Aug 8, 2014 at 4:31 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2014-08-07 7:10 GMT+02:00 Fujii Masao masao.fu...@gmail.com: On Thu, Aug 7, 2014 at 6:26 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello updated version patch in attachment Thanks! But ISTM you forgot to attached the patch. grr .. I am sorry No problem. Thanks for the patch! Attached is the revised version of the patch. +/* all psql known variables are included in list by default */ +for (known_varname = known_varnames; *known_varname; known_varname++) +varnames[nvars++] = pg_strdup(*known_varname); Don't we need to append both prefix and suffix to even known variables? ??? I am not sure if I understand well - probably system read only variables as DBNAME, USER, VERSION should not be there I had that question because complete_from_variables() is also called by the tab-completion of \echo : and it shows half-baked variables list. So I thought probably we need to change complete_from_variables() more to fix the problem. I understand now. I fixed it. I have a question.\echo probably should not to show empty known variable. data for autocomplete for \echo should be same as result of \set Agreed. I think that only the variables having the set values should be displayed in \echo : case. So I modified complete_from_variables() so that the unset variables are not shown in that case but all the variables are shown in the tab-completion of \set. +else if (strcmp(prev2_wd, \\set) == 0) +{ +if (strcmp(prev_wd, AUTOCOMMIT) == 0) ISTM that some psql variables like IGNOREEOF are not there. Why not? yes, there are not complete for DBNAME, ENCODING, FETCH_COUNT, HISTCONTROL, HISTFILE, HISTSIZE, HOST, IGNOREEOFF, PROMPT*,USER, VERSION There are more reasons: * paremeter is not a enum (string, number or both): FETCH_COUNT, PROMPT, HISTSIZE, .. * variable is pseudo read only - change has not any effect: DBNAME, ENCODING, VERSION So HISTCONTROL should be there because it doesn't have such reasons at all? yes ISTM that you forgot to add HISTCONTROL to your patch. So I just added that. I added the tab-completion for \unset command. Like \echo :, only the variables having the set values should be displayed in \unset case. perfect I changed complete_from_variables() so that it checks the memory size of the variable array even when appending the known variables. If the memory size is not enough, it's dynamically increased. Practically this check would not be required because the initial size of the array is enough larger than the number of the known variables. I added this as the safe-guard. Typo: IGNOREEOFF - IGNOREEOF I removed the value none from the value list of ECHO because it's not documented and a user might get confused when he or she sees the undocumented value none. Thought? isn't better to fix doc? I don't know any reason why we should not to support none I'm OK with this. The attached patch adds the support of none value both in ECHO and HISTCONTROL variables (because HISTCONTROL had the same problem as ECHO had), and also adds the description of that value into the document. I looked to code, you removed a check against duplicate varname in list. Is it ok? Oh, just revived that code. Regards, -- Fujii Masao *** a/doc/src/sgml/ref/psql-ref.sgml --- b/doc/src/sgml/ref/psql-ref.sgml *** *** 2827,2833 bar they are sent to the server. The switch for this is option-e/option. If set to literalerrors/literal then only failed queries are displayed on standard error output. The switch ! for this is option-b/option. /para /listitem /varlistentry --- 2827,2835 they are sent to the server. The switch for this is option-e/option. If set to literalerrors/literal then only failed queries are displayed on standard error output. The switch ! for this is option-b/option. If unset, or if set to ! literalnone/literal (or any other value than those above) then ! no queries are displayed. /para /listitem /varlistentry *** *** 2892,2899 bar list. If set to a value of literalignoredups/literal, lines matching the previous history line are not entered. A value of literalignoreboth/literal combines the two options. If ! unset, or if set to any other value than those above, all lines ! read in interactive mode are saved on the history list. /para note para --- 2894,2902 list. If set to a
[HACKERS] ProcessUtilityHook DropStmt RenameStmt
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi! I try to catch a DropStmt and convert it to a Rename Stmt, like petere's pg_trashcan, but i don't like to create new schema. I only like to rename the table in case of a drop table query. Is this possible with something like: ProcessUtility (Node * parsetree, const char *queryString, ParamListInfo params, bool isTopLevel, DestReceiver * dest, char *completionTag) { if (nodeTag(parsetree) == T_DropStmt) { DropStmt *stmt = (DropStmt *) parsetree; if (stmt-removeType == OBJECT_TABLE) { RenameStmt *newstmt = makeNode(RenameStmt); newstmt-objectType = stmt-removeType; newstmt-newname = new_name; parsetree = (Node *) newstmt; } (*prev_ProcessUtility) (parsetree, queryString,context, params, dest, completionTag); } regards ge0has -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAEBAgAGBQJT6MfNAAoJEJFGMlQe7wR/RCQH/1KOwtCLDT2QVrGm/PKfIFGF e6w+oOCUYz8v78s+uvI5Y5qEuUr2wqYuUhhV7UWXWBwKgLPkSvUTv04TWS9Ms6FJ +Zn+yzqWUygdwDzKbKY3/qYreYAL6ZBv62ldjtApNUh1VHpPtZsPWtIe/485KB6v W4xZt7PUAKOUlqTiQwaZok2rdYt0t7vWdVmw6qncUnlPGBpJM/XGGwDl4w5NCK23 Ls5ueLpe8gKoH1eMYG27FKo1rRARVBtB3zPkXmmfRZR+f1FUIkhiDkfm1AYhBJPy FG0yExArvZjZLQIIEaenb8GzwjR04Ulaqej5CLPdOB0NomkN0aN0CKcSRT9SrME= =Y3t4 -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
Stephen Frost sfr...@snowman.net wrote: Our grace period for active backends after unclean exit of one of their peers is low, milliseconds to seconds. Our grace period for active backends after unclean exit of the postmaster is unconstrained. At least one of those policies has to be wrong. Like Andres and Robert, I pick the second one. Ditto for me. +1 In fact, I would say that is slightly understated. The grace period for active backends after unclean exit of one of their peers is low, milliseconds to seconds, *unless the postmaster has also crashed* -- in which case it is unconstrained. Why is the crash of a backend less serious if the postmaster has also crashed? Certainly it can't be considered to be surprising that if the postmaster is crashing that other backends might be also crashing around the same time? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nulls in GIN index
På mandag 11. august 2014 kl. 11:17:56, skrev worthy7 worthy@gmail.com mailto:worthy@gmail.com: Perhaps I'm missing something Table has 2 columns, text and ftstext text: how are you ftstest: (nothing) Because how and are and you are too common to be tsvectored. Which is fine. So if a user searches for how are you: select * from tbl_lines WHERE ftstext @@ plainto_tsquery('English', 'how are you') Returns nothing. Which I somewhat understand, but I want it to return all the rows with nothing in the ftstext. plainto_tsquery('English', 'how are you') = '' and the ftstext of some rows is also = '' So why doesn't the index return all these rows when a null string is searched. I think you can see what im trying to achieve, how do I do it? Use the 'simple' dictionary: my_fts_column @@ to_tsquery('simple', 'how are you') -- Andreas Joseph Krogh CTO / Partner - Visena AS Mobile: +47 909 56 963 andr...@visena.com mailto:andr...@visena.com www.visena.com https://www.visena.com https://www.visena.com
Re: [HACKERS] Hokey wrong versions of libpq in apt.postgresql.org
JD, * Joshua D. Drake (j...@commandprompt.com) wrote: The issue that I specifically ran into is that by using apt.postgresql.org in its default configuration, I can't add certain extensions (without jumping through hoops). Simple: Assume a running 9.2.9 from apt.postgresql.org apt-get install pgxnclient sudo pgxnclient install pg_repack Doesn't work. Because it is looking for libs in /var/lib/postgresql/9.2 not /var/lib/postgresql/9.3. Have you got any postgresql-server-dev package installed? If not, then pg_config is going to grab the info for the libpq-dev that's installed, but I doubt the extension is going to compile without the server-dev package installed anyway.. In any case, pgxnclient should probably be more intelligent when it's working under a Debian-based installation where multiple major versions of PG can be installed. Yes. I can get the 9.2 libpq but that isn't really the point is it? This is quite unexpected behavior from an operational perspective. It should just work but it doesn't because we are shipping from apt.postgresql.org a 9.3 version of libpq. I don't believe the 9.3 version of libpq is the issue here at all, see above.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Proposal: Incremental Backup
On Tue, Aug 5, 2014 at 8:04 PM, Simon Riggs si...@2ndquadrant.com wrote: To decide whether we need to re-copy the file, you read the file until we find a block with a later LSN. If we read the whole file without finding a later LSN then we don't need to re-copy. That means we read each file twice, which is slower, but the file is at most 1GB in size, we we can assume will be mostly in memory for the second read. That seems reasonable, although copying only the changed blocks doesn't seem like it would be a whole lot harder. Yes, you'd need a tool to copy those blocks back into the places where they need to go, but that's probably not a lot of work and the disk savings, in many cases, would be enormous. As Marco says, that can be optimized using filesystem timestamps instead. The idea of using filesystem timestamps gives me the creeps. Those aren't always very granular, and I don't know that (for example) they are crash-safe. Does every filesystem on every platform make sure that the mtime update hits the disk before the data? What about clock changes made manually by users, or automatically by ntpd? I recognize that there are people doing this today, because it's what we have, and it must not suck too much, because people are still doing it ... but I worry that if we do it this way, we'll end up with people saying PostgreSQL corrupted my data and will have no way of tracking the problem back to the filesystem or system clock event that was the true cause of the problem, so they'll just blame the database. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
On 2014-08-10 18:36:18 -0400, Noah Misch wrote: [Due for a new subject line?] On Sat, Aug 09, 2014 at 08:16:01PM +0200, Andres Freund wrote: On 2014-08-09 14:09:36 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-08-09 14:00:49 -0400, Tom Lane wrote: I don't think it's anywhere near as black-and-white as you guys claim. What it comes down to is whether allowing existing transactions/sessions to finish is more important than allowing new sessions to start. Depending on the application, either could be more important. Nah. The current behaviour circumvents security measures we normally consider absolutely essential. If the postmaster died some bad shit went on. The likelihood of hitting corner case bugs where it's important that we react to a segfault/panic with a restart/crash replay is rather high. What's your point? Once a new postmaster starts, it *will* do a crash restart, because certainly no shutdown checkpoint ever happened. That's not saying much. For one, there can be online checkpoints in that time. So it's certainly not guaranteed (or even all that likely) that all the WAL since the incident is replayed. For another, it can be *hours* before all the backends finish. IIRC we'll continue to happily write WAL and everything after postmaster (and possibly some backends, corrupting shmem) have crashed. The bgwriter, checkpointer, backends will continue to write dirty buffers to disk. We'll IIRC continue to write checkpoints. That's simply not things we should be doing after postmaster crashed if we can avoid at all. The basic support processes, including the checkpointer, exit promptly upon detecting a postmaster exit. Checkpoints cease. Only after finishing an 'in process' checkpoint though afaics. And only if no new checkpoint has been requested since. The latter because we don't even test for postmaster death if a latch has been set... I think it's similar for the bgwriter and such. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL
On Sat, Aug 9, 2014 at 2:00 PM, Tom Lane t...@sss.pgh.pa.us wrote: +1. I think the current behaviour is a seriously bad idea. I don't think it's anywhere near as black-and-white as you guys claim. What it comes down to is whether allowing existing transactions/sessions to finish is more important than allowing new sessions to start. Depending on the application, either could be more important. It's partly about that, and I think the answer is that being able to start new sessions is almost always more important; but it's also about about the fact that the postmaster provides essential protections against data corruption, and running without those protections is a bad idea. If it's not a bad idea, then why do we need those protections ever? Why have we put so much effort into bullet-proofing them over the years? I mean, we could simply regard the unexpected end of a backend as being something that is probably OK and we'd usually be right; after all, a backend would crap out without releasing a critical spinlock very often. A lot of users would probably be very happy to be liberated from the tyranny of a server-wide restart every time a backend crashes, and 90% of the time nothing bad would happen. But clearly this is insanity, because every now and then something would go terribly wrong and there would be no automated way for the system to recover, and on even rarer occasions your data would get eaten. That is why it is right to think that the service provided by the postmaster is essential, not nice-to-have. Ideally we'd have some way to configure the behavior appropriately for a given installation; but short of that, it's unclear to me that unilaterally changing the system's bias is something our users would thank us for. I've not noticed a large groundswell of complaints about it (though this may just reflect that we've made the postmaster pretty darn robust, so that the case seldom comes up). I do think that's a large part of it. The postmaster doesn't get killed very often, and when it does, things are often messed up to a degree where the user's just going to reboot anyway. But I've encountered customers who managed to corrupt their database because backends didn't exit when the postmaster died, because it turns out that removing postmaster.pid defeats the shared memory interlocks that normally prevent starting a new postmaster, and the customer did that. And I've personally experienced at least one protracted outage that resulted from orphaned backends preventing 'pg_ctl restart' from working. If the postmaster weren't so reliable, I'm sure these kinds of problems would be a lot more common. But the fact that they're uncommon doesn't mean that the current behavior is the best one, and I'm convinced that it isn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] failures on barnacle (CLOBBER_CACHE_RECURSIVELY) because of memory leaks
Hi! So after 83 days, the regression tests on barnacle completed, and it smells like another memory leak in CacheMemoryContext, similar to those fixed in 078b2ed on May 18. Barnacle is one of those machines with -DCLOBBER_CACHE_RECURSIVELY, and the tests were running with a snapshot from May 19, so the memory leaks detected with only -DCLOBBER_CACHE_ALWAYS should be fixed there. I have a script in place looking for excessive memory usage of postgres backends - whenever a backend allocates more than 512MB of VSS, it does MemoryContextStats(TopMemoryContext) on it (from gdb). The results get logged into the postmaster log, and thus get to the buildfarm. See this: http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=barnacledt=2014-05-19%2011%3A25%3A03 Essentially, it looks like this: TopMemoryContext: 69984 total in 10 blocks; 6216 free (24 chunks); 63768 used TopTransactionContext: 8192 total in 1 blocks; 5768 free (23 chunks); 2424 used ... Relcache by OID: 8192 total in 1 blocks; 592 free (0 chunks); 7600 used CacheMemoryContext: 301981696 total in 45 blocks; 3701744 free (140 chunks); 298279952 used ... or like this: CacheMemoryContext: 1602215936 total in 200 blocks; 497336 free (138 chunks); 1601718600 used So probably another low-probability memory leak, happening apparently only in CLOBBER_CACHE_RECURSIVELY. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote: I've tracked down the real root cause. The fix is very simple. Please check the attached one-liner patch. I confirmed that the fix is already in 9.3 and 9.5devel, so I just copied the code fragment from 9.5devel to 9.2.9. The attached patch is for 9.2.9. I didn't check 9.4 and other versions. Why wasn't the fix applied to 9.2? It was considered a performance improvement - i.e. a feature - rather than a bug fix, so it was only added to master. That was after the release of 9.2 and before the release of 9.3, so it's in newer branches but not older ones. Author: Heikki Linnakangas heikki.linnakan...@iki.fi Branch: master Release: REL9_3_BR [c9d7dbacd] 2013-01-29 10:43:33 +0200 Skip truncating ON COMMIT DELETE ROWS temp tables, if the transaction hasn't touched any temporary tables. We could try harder, and keep track of whether we've inserted to any temp tables, rather than accessed them, and which temp tables have been inserted to. But this is dead simple, and already covers many interesting scenarios. I'd support back-porting that commit to 9.1 and 9.2 as a fix for this problem. As the commit message says, it's dead simple. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: Incremental Backup
On Mon, Aug 11, 2014 at 12:27 PM, Robert Haas robertmh...@gmail.com wrote: As Marco says, that can be optimized using filesystem timestamps instead. The idea of using filesystem timestamps gives me the creeps. Those aren't always very granular, and I don't know that (for example) they are crash-safe. Does every filesystem on every platform make sure that the mtime update hits the disk before the data? What about clock changes made manually by users, or automatically by ntpd? I recognize that there are people doing this today, because it's what we have, and it must not suck too much, because people are still doing it ... but I worry that if we do it this way, we'll end up with people saying PostgreSQL corrupted my data and will have no way of tracking the problem back to the filesystem or system clock event that was the true cause of the problem, so they'll just blame the database. I have the same creeps. I only do it on a live system, after a first full rsync, where mtime persistence is not an issue, and where I know ntp updates have not happened. I had a problem once where a differential rsync with timestamps didn't work as expected, and corrupted a slave. It was a test system so I didn't care much at the time, but if it were a backup, I'd be quite pissed. Basically, mtimes aren't trustworthy across reboots. Granted, this was a very old system, debian 5 when it was new, IIRC, so it may be better now. But it does illustrate just how bad things can get when one trusts timestamps. This case was an old out-of-sync slave on a test set up that got de-synchronized, and I tried to re-synchronize it with a delta rsync to avoid the hours it would take to actually compare everything (about a day). One segment that was modified after the sync loss was not transfered, causing trouble at the slave, so I was forced to re-synchronize with a full rsync (delta, but without timestamps). This was either before pg_basebackup or before I heard of it ;-), but in any case, if it happened on a test system with little activity, you can be certain it can happen on a production system. So I now only trust mtime when there has been neither a reboot nor an ntpd running since the last mtime-less rsync. On those cases, the optimization works and helps a lot. But I doubt you'll take many incremental backups matching those conditions. Say what you will of anecdotal evidence, but the issue is quite clear theoretically as well: modifications to file segments that aren't reflected within mtime granularity. There are many reasons why mtime could lose precision. Being an old filesystem with second-precision timestamps is just one, but not the only one. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] replication commands and log_statements
On Fri, Aug 8, 2014 at 1:04 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Aug 9, 2014 at 12:29 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Aug 8, 2014 at 11:00 AM, Bruce Momjian br...@momjian.us wrote: On Fri, Aug 8, 2014 at 08:51:13AM -0400, Robert Haas wrote: On Thu, Aug 7, 2014 at 12:07 PM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: At 2014-08-07 23:22:43 +0900, masao.fu...@gmail.com wrote: That is, we log replication commands only when log_statement is set to all. Neither new parameter is introduced nor log_statement is redefined as a list. That sounds good to me. It sounds fairly unprincipled to me. I liked the idea of making log_statement a list, but if we aren't gonna do that, I think this should be a separate parameter. I am unclear there is enough demand for a separate replication logging parameter --- using log_statement=all made sense to me. Most people don't want to turn on log_statement=all because it produces too much log volume. See, for example: http://bonesmoses.org/2014/08/05/on-postgresql-logging-verbosity/ But logging replication commands is quite low-volume, so it is not hard to imagine someone wanting to log all replication commands but not all SQL statements. You can do that by executing ALTER ROLE replication user SET log_statement TO 'all'. If you don't use the replication user to execute SQL statements, no SQL statements are logged in that setting. If you have a user devoted to it, I suppose that's true. I still think it shouldn't get munged together like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql.auto.conf and reload
On Sun, Aug 10, 2014 at 3:54 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Fri, Aug 8, 2014 at 11:41 AM, Fujii Masao masao.fu...@gmail.com wrote: Yep, right. ParseConfigFp() is not good place to pick up data_directory. What about the attached patch which makes ProcessConfigFile() instead of ParseConfigFp() pick up data_directory just after the configuration file is parsed? I think this is better way to handle it. Few comments about patch: Thanks for the review! Attached is the updated version of the patch. 1. can't we directly have the code by having else in below loop. if (DataDir !ParseConfigFile(PG_AUTOCONF_FILENAME, NULL, false, 0, elevel, head, tail)) Done. 2. + if (DataDir == NULL) + { + ConfigVariable *prev = NULL; + for (item = head; item;) + { .. .. + } If data_directory is not present in list, then can't we directly return and leave the other work in function ProcessConfigFile() for second pass. Done. 3. I think comments can be improved: a. + In this case, + * we should not pick up all the settings except the data_directory + * from postgresql.conf yet because they might be overwritten + * with the settings in PG_AUTOCONF_FILENAME file which will be + * read later. It would be better if we change it as below: In this case, we shouldn't pick any settings except the data_directory from postgresql.conf because they might be overwritten with the settings in PG_AUTOCONF_FILENAME file which will be read later. Done. b. +Now only the data_directory needs to be picked up to + * read PG_AUTOCONF_FILENAME file later. This part of comment seems to be repetitive as you already mentioned some part of it in first line as well as at one other location: Okay, I just remove that line. While updating the patch, I found that the ConfigVariable which is removed from list has the fields that the memory has been already allocated but we forgot to free them. So I included the code to free them in the patch. Regards, -- Fujii Masao *** a/src/backend/utils/misc/guc-file.l --- b/src/backend/utils/misc/guc-file.l *** *** 45,50 static unsigned int ConfigFileLineno; --- 45,52 static const char *GUC_flex_fatal_errmsg; static sigjmp_buf *GUC_flex_fatal_jmp; + static void FreeConfigVariable(ConfigVariable *item); + /* flex fails to supply a prototype for yylex, so provide one */ int GUC_yylex(void); *** *** 151,164 ProcessConfigFile(GucContext context) * file is in the data directory, we can't read it until the DataDir has * been set. */ ! if (DataDir ! !ParseConfigFile(PG_AUTOCONF_FILENAME, NULL, false, 0, elevel, ! head, tail)) { ! /* Syntax error(s) detected in the file, so bail out */ ! error = true; ! ErrorConfFile = PG_AUTOCONF_FILENAME; ! goto cleanup_list; } /* --- 153,218 * file is in the data directory, we can't read it until the DataDir has * been set. */ ! if (DataDir) { ! if (!ParseConfigFile(PG_AUTOCONF_FILENAME, NULL, false, 0, elevel, ! head, tail)) ! { ! /* Syntax error(s) detected in the file, so bail out */ ! error = true; ! ErrorConfFile = PG_AUTOCONF_FILENAME; ! goto cleanup_list; ! } ! } ! /* ! * Pick up only the data_directory if DataDir is not set, which ! * means that the configuration file is read for the first time and ! * PG_AUTOCONF_FILENAME file cannot be read yet. In this case, ! * we shouldn't pick any settings except the data_directory ! * from postgresql.conf because they might be overwritten ! * with the settings in PG_AUTOCONF_FILENAME file which will be ! * read later. OTOH, since it's ensured that data_directory doesn't ! * exist in PG_AUTOCONF_FILENAME file, it will never be overwritten ! * later. ! */ ! else ! { ! ConfigVariable *prev = NULL; ! ! for (item = head; item;) ! { ! ConfigVariable *ptr = item; ! ! item = item-next; ! if (strcmp(ptr-name, data_directory) != 0) ! { ! if (prev == NULL) ! head = ptr-next; ! else ! { ! prev-next = ptr-next; ! /* ! * On removing last item in list, we need to update tail ! * to ensure that list will be maintianed. ! */ ! if (prev-next == NULL) ! tail = prev; ! } ! FreeConfigVariable(ptr); ! } ! else ! prev = ptr; ! } ! ! /* ! * Quick exit if data_directory is not present in list. ! * ! * Don't remember when we last successfully loaded the config file in ! * this case because that time will be set soon by subsequent load of ! * the config file. ! */ ! if (head == NULL) ! return; } /* *** *** 677,683 ParseConfigFp(FILE *fp, const char *config_file, int depth, int elevel, *tail_p = prev_item; } ! pfree(cur_item); break; } } --- 731,737 *tail_p = prev_item; } ! FreeConfigVariable(cur_item); break;
Re: [HACKERS] psql output change in 9.4
On Fri, Aug 8, 2014 at 9:34 PM, Peter Eisentraut pete...@gmx.net wrote: This is 9.3: peter=# \a Output format is unaligned. peter=# \a Output format is aligned. peter=# \x Expanded display is on. peter=# \x Expanded display is off. This is new in 9.4: peter=# \a Output format (format) is unaligned. peter=# \a Output format (format) is aligned. peter=# \x Expanded display (expanded) is on. peter=# \x Expanded display (expanded) is off. What is the point of that change? I suppose it is so that you can use \pset without arguments to show all settings: peter=# \pset Border style (border) is 1. Target width (columns) unset. Expanded display (expanded) is off. ... But those are unrelated features, and the changed output doesn't make any sense in the contexts I show above. I think this should be reverted, and the \pset output should be implemented separately. Yes, the \pset patch (commit c64e68fd9f1132fec563fb5de53dc3bcccb5fc3b) caused this behavior change. I can't remember whether I noticed it at the time and thought it was a reasonable change, or whether I didn't notice it when committing. Either way, clarifying the name of the parameter which is being displayed does not seem like particularly bad idea to me even in the contexts you mention. I've certainly run commands like \a and \t and then said to myself, crap, which pset parameter does this correspond to?. And there was no easy way to figure it out. I think the output could justly be criticized for making it insufficiently clear that the parenthesized text is, in fact, the name of the pset parameter. We could write something like: Border style (parameter border) is 1. But I don't know whether that would be considered an improvement or just extra verbosity. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 pg_restore --help changes
On Fri, Aug 8, 2014 at 9:44 PM, Peter Eisentraut pete...@gmx.net wrote: This is a valuable feature change, but I think the help output is unhelpful. The left side has a singular placeholder, but the right side talks about objects in plural. How do I actually specify multiple indexes? Is is --index=foo,bar? It's --index=foo --index=bar, but that's unclear from the help. I think it would be clearer to write something like -I, --index=NAME restore named indexes (repeat for multiple) or perhaps make a note at the bottom The options -I, -n, -P, -t, -T, --section can be combined and specified multiple times to select multiple objects. Other ideas? I like the note at the bottom idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql output change in 9.4
Robert Haas robertmh...@gmail.com writes: On Fri, Aug 8, 2014 at 9:34 PM, Peter Eisentraut pete...@gmx.net wrote: What is the point of that change? I think the output could justly be criticized for making it insufficiently clear that the parenthesized text is, in fact, the name of the pset parameter. Quite; that wasn't apparent to me either. We could write something like: Border style (parameter border) is 1. How about Border style (\pset border) is 1. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing dependency to wsock32.lib when compiling code on WIndows
On Tue, Jul 15, 2014 at 5:14 AM, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Jul 15, 2014 at 8:46 PM, Magnus Hagander mag...@hagander.net wrote: On Thu, Jun 19, 2014 at 4:13 PM, MauMau maumau...@gmail.com wrote: From: Michael Paquier michael.paqu...@gmail.com You are right, I only though about the MS scripts when working on this patch. Updated patch for a complete cleanup is attached (note I updated manually ./configure for test purposes and did not re-run autoconf). I marked this patch as ready for committer. I confirmed: I'm pretty sure this is not ready for commiter due to: * The binaries can be built with MSVC 2008 Express. I didn't try to build on MinGW. Somebody needs to test it on mingw first :) That should be an easy test for someone who has a mingw install around, so better leave it at needs review so more people see it and can run those tests. I vaguely recall that I tested as well with MinGW when writing v2 of this patch after MauMau reviewed v1. Btw, just in case, I have just made working my MinGW box a bit more and re-tested it now. And it worked :) Of course someone else than the patch author with a MinGW environment at hand is invited to test. I guess I should have done that While trying to test more recent stuff against mingw, I kept running into a problem which bisected down to this (a16bac36eca8158cbf78987e953). I am following the instructions at https://wiki.postgresql.org/wiki/Building_With_MinGW, using the mingw64 set of instructions and of course the git instructions. Except that I am running as my own user, not creating a dummy account, and I am using local hardware, not an amazon rental. This is the warning/error I get: auth.c: In function 'ClientAuthentication': auth.c:458:6: warning: implicit declaration of function 'gai_strerror' [-Wimplicit-function-declaration] auth.c:458:6: warning: format '%s' expects argument of type 'char *', but argument 3 has type 'int' [-Wformat] auth.c:458:6: warning: format '%s' expects argument of type 'char *', but argument 2 has type 'int' [-Wformat] auth.c:476:6: warning: format '%s' expects argument of type 'char *', but argument 3 has type 'int' [-Wformat] auth.c:476:6: warning: format '%s' expects argument of type 'char *', but argument 2 has type 'int' [-Wformat] auth.c: In function 'CheckRADIUSAuth': auth.c:2282:3: warning: format '%s' expects argument of type 'char *', but argument 3 has type 'int' [-Wformat] hba.c: In function 'parse_hba_line': hba.c:1060:5: warning: implicit declaration of function 'gai_strerror' [-Wimplicit-function-declaration] hba.c:1060:5: warning: format '%s' expects argument of type 'char *', but argument 3 has type 'int' [-Wformat] hba.c:1131:6: warning: format '%s' expects argument of type 'char *', but argument 3 has type 'int' [-Wformat] hba.c: In function 'parse_hba_auth_opt': hba.c:1607:4: warning: format '%s' expects argument of type 'char *', but argument 3 has type 'int' [-Wformat] pqcomm.c: In function 'StreamServerPort': pqcomm.c:334:4: warning: implicit declaration of function 'gai_strerror' [-Wimplicit-function-declaration] pqcomm.c:334:4: warning: format '%s' expects argument of type 'char *', but argument 4 has type 'int' [-Wformat] pqcomm.c:338:4: warning: format '%s' expects argument of type 'char *', but argument 3 has type 'int' [-Wformat] mingwcompat.c:60:1: warning: 'RegisterWaitForSingleObject' redeclared without dllimport attribute: previous dllimport ignored [-Wattributes] pgstat.c: In function 'pgstat_init': pgstat.c:353:3: warning: implicit declaration of function 'gai_strerror' [-Wimplicit-function-declaration] pgstat.c:353:3: warning: format '%s' expects argument of type 'char *', but argument 2 has type 'int' [-Wformat] postmaster.c: In function 'BackendInitialize': postmaster.c:3938:3: warning: implicit declaration of function 'gai_strerror' [-Wimplicit-function-declaration] postmaster.c:3938:3: warning: format '%s' expects argument of type 'char *', but argument 2 has type 'int' [-Wformat] libpq/auth.o:auth.c:(.text+0x1949): undefined reference to `gai_strerror' libpq/auth.o:auth.c:(.text+0x19c4): undefined reference to `gai_strerror' libpq/auth.o:auth.c:(.text+0x1b1a): undefined reference to `gai_strerror' libpq/auth.o:auth.c:(.text+0x1cb4): undefined reference to `gai_strerror' libpq/auth.o:auth.c:(.text+0x1cdc): undefined reference to `gai_strerror' libpq/hba.o:hba.c:(.text+0x1fa0): more undefined references to `gai_strerror' follow collect2.exe: error: ld returned 1 exit status make[2]: *** [postgres] Error 1 make[1]: *** [all-backend-recurse] Error 2 make: *** [all-src-recurse] Error 2 Any suggestions on what to try? This patch doesn't seem to explicitly mention gai_strerror at all, so it must be some indirect effect. Cheers, Jeff
Re: [HACKERS] replication commands and log_statements
* Robert Haas (robertmh...@gmail.com) wrote: On Fri, Aug 8, 2014 at 1:04 PM, Fujii Masao masao.fu...@gmail.com wrote: You can do that by executing ALTER ROLE replication user SET log_statement TO 'all'. If you don't use the replication user to execute SQL statements, no SQL statements are logged in that setting. If you have a user devoted to it, I suppose that's true. I still think it shouldn't get munged together like that. Folks *should* have a dedicated replication user, imv. That said, I agree with Robert in that I don't particularly like this recommendation for how to enable logging of replication commands. For one thing, it means having to remember to set the per-role GUC for every replication user which is created and that's the kind of trivially-missed step that can get people into trouble. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] psql output change in 9.4
2014-08-11 19:52 GMT+02:00 Tom Lane t...@sss.pgh.pa.us: Robert Haas robertmh...@gmail.com writes: On Fri, Aug 8, 2014 at 9:34 PM, Peter Eisentraut pete...@gmx.net wrote: What is the point of that change? I think the output could justly be criticized for making it insufficiently clear that the parenthesized text is, in fact, the name of the pset parameter. Quite; that wasn't apparent to me either. We could write something like: Border style (parameter border) is 1. How about Border style (\pset border) is 1. +1 Pavel regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins
On Sat, Aug 9, 2014 at 9:13 AM, Tomas Vondra t...@fuzzy.cz wrote: Adding least-significant bit does not work, we need get back to adding the most-significant one. Not sure what's the least complex way to do that, though. I'm thinking about computing the nbuckets limit (how many buckets may fit into work_mem): tupsize = HJTUPLE_OVERHEAD + MAXALIGN(sizeof(MinimalTupleData)) + MAXALIGN(tupwidth); nbuckets_bits_max = my_log2(work_mem / tupsize) and then start with nbatches from this bit, like this: 31 23 max 1570 ||||| | | - batches| free | - buckets | That doesn't seem unreasonable provide there's enough bit space, but, as you point out, the day might not be far off when there isn't. It also strikes me that when there's only 1 batch, the set of bits that map onto the batch number is zero-width, and one zero-width bit range is as good as another. In other words, if you're only planning to do one batch, you can easily grow the number of buckets on the fly. Growing the number of buckets only becomes difficult once you have more than one batch. And I mention that because, in the scenario mentioned in your original post (a hash table with small number of buckets (8192) containing large number of tuples [~3.3M]), presumably what happens is you start out thinking you are going to have one batch with 8K buckets. Then, as more tuples continue to pour in, you see the load factor rising and responding by bumping up the size of the hash table. Now eventually you realize, gosh, this isn't even going to fit into work_mem, so you need to start batching it. But by that point you've presumably done as much as you're going to do in terms of increasing the number of buckets; after that, you're just going to add more batches as necessary. So not being able to further increase the number of buckets once the number of batches is no longer 1 wouldn't be a problem in that case. Now if you start out planning for multiple batches, and then you discover you'd like to keep the same number of batches but have more buckets per batch, that's gonna be hard. It's not impossible, because you could steal some of the unused high-order bits above the number of batches, and then concatenate them with the low-order bits that you already had, but that seems like kind of an ugly kludge, and AFAICS it only helps if the new tuples are narrower by enough to justify the cost of splitting all the buckets. I won't say that couldn't happen, but I think it would be better to keep that complexity out of the patch unless we're absolutely certain it's justified. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql output change in 9.4
On Mon, Aug 11, 2014 at 1:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Aug 8, 2014 at 9:34 PM, Peter Eisentraut pete...@gmx.net wrote: What is the point of that change? I think the output could justly be criticized for making it insufficiently clear that the parenthesized text is, in fact, the name of the pset parameter. Quite; that wasn't apparent to me either. We could write something like: Border style (parameter border) is 1. How about Border style (\pset border) is 1. That would look just fine as a response to \a or \x, but I'm not sure it would look as good as a response to \pset, which prints out that line for every parameter (why does every line say \pset when the command I just typed is \pset?). However, I can certainly live with it if others prefer that to what I suggested. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
On Sat, Aug 9, 2014 at 3:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kgri...@ymail.com writes: Stephen Frost sfr...@snowman.net writes: Trying to move the header to the end just for the sake of this doesn't strike me as a good solution as it'll make things quite a bit more complicated. Why is that? How much harder would it be to add a single offset field to the front to point to the part we're shifting to the end? It is not all that unusual to put a directory at the end, like in the .zip file format. Yeah, I was wondering that too. Arguably, directory-at-the-end would be easier to work with for on-the-fly creation, not that we do any such thing at the moment. I think the main thing that's bugging Stephen is that doing that just to make pglz_compress happy seems like a kluge (and I have to agree). Here's a possibly more concrete thing to think about: we may very well someday want to support JSONB object field or array element extraction without reading all blocks of a large toasted JSONB value, if the value is stored external without compression. We already went to the trouble of creating analogous logic for substring extraction from a long uncompressed text or bytea value, so I think this is a plausible future desire. With the current format you could imagine grabbing the first TOAST chunk, and then if you see the header is longer than that you can grab the remainder of the header without any wasted I/O, and for the array-subscripting case you'd now have enough info to fetch the element value from the body of the JSONB without any wasted I/O. With directory-at-the-end you'd have to read the first chunk just to get the directory pointer, and this would most likely not give you any of the directory proper; but at least you'd know exactly how big the directory is before you go to read it in. The former case is probably slightly better. However, if you're doing an object key lookup not an array element fetch, neither of these formats are really friendly at all, because each binary-search probe probably requires bringing in one or two toast chunks from the body of the JSONB value so you can look at the key text. I'm not sure if there's a way to redesign the format to make that less painful/expensive --- but certainly, having the key texts scattered through the JSONB value doesn't seem like a great thing from this standpoint. I think that's a good point. On the general topic, I don't think it's reasonable to imagine that we're going to come up with a single heuristic that works well for every kind of input data. What pglz is doing - assuming that if the beginning of the data is incompressible then the rest probably is too - is fundamentally reasonable, nonwithstanding the fact that it doesn't happen to work out well for JSONB. We might be able to tinker with that general strategy in some way that seems to fix this case and doesn't appear to break others, but there's some risk in that, and there's no obvious reason in my mind why PGLZ should be require to fly blind. So I think it would be a better idea to arrange some method by which JSONB (and perhaps other data types) can provide compression hints to pglz. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
Robert Haas robertmh...@gmail.com writes: ... I think it would be a better idea to arrange some method by which JSONB (and perhaps other data types) can provide compression hints to pglz. I agree with that as a long-term goal, but not sure if it's sane to push into 9.4. What we could conceivably do now is (a) add a datatype OID argument to toast_compress_datum, and (b) hard-wire the selection of a different compression-parameters struct if it's JSONBOID. The actual fix would then be to increase the first_success_by field of this alternate struct. I had been worrying about API-instability risks associated with (a), but on reflection it seems unlikely that any third-party code calls toast_compress_datum directly, and anyway it's not something we'd be back-patching to before 9.4. The main objection to (b) is that it wouldn't help for domains over jsonb (and no, I don't want to insert a getBaseType call there to fix that). A longer-term solution would be to make this some sort of type property that domains could inherit, like typstorage is already. (Somebody suggested dealing with this by adding more typstorage values, but I don't find that attractive; typstorage is known in too many places.) We'd need some thought about exactly what we want to expose, since the specific knobs that pglz_compress has today aren't necessarily good for the long term. This is all kinda ugly really, but since I'm not hearing brilliant ideas for redesigning jsonb's storage format, maybe this is the best we can do for now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
On Mon, Aug 11, 2014 at 12:07 PM, Robert Haas robertmh...@gmail.com wrote: I think that's a good point. I think that there may be something to be said for the current layout. Having adjacent keys and values could take better advantage of CPU cache characteristics. I've heard of approaches to improving B-Tree locality that forced keys and values to be adjacent on individual B-Tree pages [1], for example. I've heard of this more than once. And FWIW, I believe based on earlier research of user requirements in this area that very large jsonb datums are not considered all that compelling. Document database systems have considerable limitations here. On the general topic, I don't think it's reasonable to imagine that we're going to come up with a single heuristic that works well for every kind of input data. What pglz is doing - assuming that if the beginning of the data is incompressible then the rest probably is too - is fundamentally reasonable, nonwithstanding the fact that it doesn't happen to work out well for JSONB. We might be able to tinker with that general strategy in some way that seems to fix this case and doesn't appear to break others, but there's some risk in that, and there's no obvious reason in my mind why PGLZ should be require to fly blind. So I think it would be a better idea to arrange some method by which JSONB (and perhaps other data types) can provide compression hints to pglz. If there is to be any effort to make jsonb a more effective target for compression, I imagine that that would have to target redundancy between JSON documents. With idiomatic usage, we can expect plenty of it. [1] http://www.vldb.org/conf/1999/P7.pdf , We also forced each key and child pointer to be adjacent to each other physically -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: show only failed queries
2014-08-11 14:59 GMT+02:00 Fujii Masao masao.fu...@gmail.com: On Sat, Aug 9, 2014 at 4:16 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hi 2014-08-08 13:58 GMT+02:00 Fujii Masao masao.fu...@gmail.com: On Fri, Aug 8, 2014 at 4:31 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2014-08-07 7:10 GMT+02:00 Fujii Masao masao.fu...@gmail.com: On Thu, Aug 7, 2014 at 6:26 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello updated version patch in attachment Thanks! But ISTM you forgot to attached the patch. grr .. I am sorry No problem. Thanks for the patch! Attached is the revised version of the patch. +/* all psql known variables are included in list by default */ +for (known_varname = known_varnames; *known_varname; known_varname++) +varnames[nvars++] = pg_strdup(*known_varname); Don't we need to append both prefix and suffix to even known variables? ??? I am not sure if I understand well - probably system read only variables as DBNAME, USER, VERSION should not be there I had that question because complete_from_variables() is also called by the tab-completion of \echo : and it shows half-baked variables list. So I thought probably we need to change complete_from_variables() more to fix the problem. I understand now. I fixed it. I have a question.\echo probably should not to show empty known variable. data for autocomplete for \echo should be same as result of \set Agreed. I think that only the variables having the set values should be displayed in \echo : case. So I modified complete_from_variables() so that the unset variables are not shown in that case but all the variables are shown in the tab-completion of \set. +else if (strcmp(prev2_wd, \\set) == 0) +{ +if (strcmp(prev_wd, AUTOCOMMIT) == 0) ISTM that some psql variables like IGNOREEOF are not there. Why not? yes, there are not complete for DBNAME, ENCODING, FETCH_COUNT, HISTCONTROL, HISTFILE, HISTSIZE, HOST, IGNOREEOFF, PROMPT*,USER, VERSION There are more reasons: * paremeter is not a enum (string, number or both): FETCH_COUNT, PROMPT, HISTSIZE, .. * variable is pseudo read only - change has not any effect: DBNAME, ENCODING, VERSION So HISTCONTROL should be there because it doesn't have such reasons at all? yes ISTM that you forgot to add HISTCONTROL to your patch. So I just added that. I added the tab-completion for \unset command. Like \echo :, only the variables having the set values should be displayed in \unset case. perfect I changed complete_from_variables() so that it checks the memory size of the variable array even when appending the known variables. If the memory size is not enough, it's dynamically increased. Practically this check would not be required because the initial size of the array is enough larger than the number of the known variables. I added this as the safe-guard. Typo: IGNOREEOFF - IGNOREEOF I removed the value none from the value list of ECHO because it's not documented and a user might get confused when he or she sees the undocumented value none. Thought? isn't better to fix doc? I don't know any reason why we should not to support none I'm OK with this. The attached patch adds the support of none value both in ECHO and HISTCONTROL variables (because HISTCONTROL had the same problem as ECHO had), and also adds the description of that value into the document. I looked to code, you removed a check against duplicate varname in list. Is it ok? Oh, just revived that code. yes, It is looking well regards Pavel Regards, -- Fujii Masao
Re: [HACKERS] 9.4 pg_restore --help changes
Robert Haas wrote: On Fri, Aug 8, 2014 at 9:44 PM, Peter Eisentraut pete...@gmx.net wrote: or perhaps make a note at the bottom The options -I, -n, -P, -t, -T, --section can be combined and specified multiple times to select multiple objects. Other ideas? I like the note at the bottom idea. +1 -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
* Tom Lane (t...@sss.pgh.pa.us) wrote: Robert Haas robertmh...@gmail.com writes: ... I think it would be a better idea to arrange some method by which JSONB (and perhaps other data types) can provide compression hints to pglz. I agree with that as a long-term goal, but not sure if it's sane to push into 9.4. Agreed. What we could conceivably do now is (a) add a datatype OID argument to toast_compress_datum, and (b) hard-wire the selection of a different compression-parameters struct if it's JSONBOID. The actual fix would then be to increase the first_success_by field of this alternate struct. Isn't the offset-to-compressable-data variable though, depending on the number of keys, etc? Would we be increasing first_success_by based off of some function which inspects the object? I had been worrying about API-instability risks associated with (a), but on reflection it seems unlikely that any third-party code calls toast_compress_datum directly, and anyway it's not something we'd be back-patching to before 9.4. Agreed. The main objection to (b) is that it wouldn't help for domains over jsonb (and no, I don't want to insert a getBaseType call there to fix that). While not ideal, that seems like an acceptable compromise for 9.4 to me. A longer-term solution would be to make this some sort of type property that domains could inherit, like typstorage is already. (Somebody suggested dealing with this by adding more typstorage values, but I don't find that attractive; typstorage is known in too many places.) Think that was me and having it be something which domains can inherit makes sense. Would be able to use this approach to introduce type (and domains inheirited from that type) specific compression algorithms, perhaps? Or even get to a point where we could have a chunk-based compression scheme for certain types of objects (such as JSONB) where we keep track of which keys exist at which points in the compressed object, allowing us to skip to the specific chunk which contains the requested key, similar to what we do with uncompressed data? We'd need some thought about exactly what we want to expose, since the specific knobs that pglz_compress has today aren't necessarily good for the long term. Agreed. This is all kinda ugly really, but since I'm not hearing brilliant ideas for redesigning jsonb's storage format, maybe this is the best we can do for now. This would certainly be an improvement over what's going on now, and I love the idea of possibly being able to expand this in the future to do more. What I'd hate to see is having all of this and only ever using it to say skip ahead another 1k for JSONB. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] jsonb format is pessimal for toast compression
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: What we could conceivably do now is (a) add a datatype OID argument to toast_compress_datum, and (b) hard-wire the selection of a different compression-parameters struct if it's JSONBOID. The actual fix would then be to increase the first_success_by field of this alternate struct. Isn't the offset-to-compressable-data variable though, depending on the number of keys, etc? Would we be increasing first_success_by based off of some function which inspects the object? Given that this is a short-term hack, I'd be satisfied with setting it to INT_MAX. If we got more ambitious, we could consider improving the cutoff logic so that it gives up at x% of the object or n bytes, whichever comes first; but I'd want to see some hard evidence that that was useful before adding any more cycles to pglz_compress. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb format is pessimal for toast compression
* Peter Geoghegan (p...@heroku.com) wrote: If there is to be any effort to make jsonb a more effective target for compression, I imagine that that would have to target redundancy between JSON documents. With idiomatic usage, we can expect plenty of it. While I certainly agree, that's a rather different animal to address and doesn't hold a lot of relevance to the current problem. Or, to put it another way, I don't think anyone is going to be surprised that two rows containing the same data (even if they're inserted in the same transaction and have the same visibility information) are compressed together in some fashion. We've got a clear example of someone, quite reasonably, expecting their JSONB object to be compressed using the normal TOAST mechanism, and we're failing to do that in cases where it's actually a win to do so. That's the focus of this discussion and what needs to be addressed before 9.4 goes out. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] jsonb format is pessimal for toast compression
On Mon, Aug 11, 2014 at 1:01 PM, Stephen Frost sfr...@snowman.net wrote: We've got a clear example of someone, quite reasonably, expecting their JSONB object to be compressed using the normal TOAST mechanism, and we're failing to do that in cases where it's actually a win to do so. That's the focus of this discussion and what needs to be addressed before 9.4 goes out. Sure. I'm not trying to minimize that. We should fix it, certainly. However, it does bear considering that JSON data, with each document stored in a row is not an effective target for TOAST compression in general, even as text. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 logical replication - walsender keepalive replies
On 07/14/2014 01:19 PM, Steve Singer wrote: On 07/06/2014 10:11 AM, Andres Freund wrote: Hi Steve, Right. I thought about this for a while, and I think we should change two things. For one, don't request replies here. It's simply not needed, as this isn't dealing with timeouts. For another don't just check -flush sentPtr but also -write sentPtr. The reason we're sending these feedback messages is to inform the 'logical standby' that there's been WAL activity which it can't see because they don't correspond to anything that's logically decoded (e.g. vacuum stuff). Would that suit your needs? Greetings, Yes I think that will work for me. I tested with the attached patch that I think does what you describe and it seems okay. Any feedback on this? Do we want that change for 9.4, or do we want something else? Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 logical replication - walsender keepalive replies
On 2014-08-11 17:22:27 -0400, Steve Singer wrote: On 07/14/2014 01:19 PM, Steve Singer wrote: On 07/06/2014 10:11 AM, Andres Freund wrote: Hi Steve, Right. I thought about this for a while, and I think we should change two things. For one, don't request replies here. It's simply not needed, as this isn't dealing with timeouts. For another don't just check -flush sentPtr but also -write sentPtr. The reason we're sending these feedback messages is to inform the 'logical standby' that there's been WAL activity which it can't see because they don't correspond to anything that's logically decoded (e.g. vacuum stuff). Would that suit your needs? Greetings, Yes I think that will work for me. I tested with the attached patch that I think does what you describe and it seems okay. Any feedback on this? Do we want that change for 9.4, or do we want something else? I plan to test and apply it in the next few days. Digging myself from under stuff from before my holiday right now... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bad estimation together with large work_mem generates terrible slow hash joins
On 11.8.2014 20:25, Robert Haas wrote: On Sat, Aug 9, 2014 at 9:13 AM, Tomas Vondra t...@fuzzy.cz wrote: Adding least-significant bit does not work, we need get back to adding the most-significant one. Not sure what's the least complex way to do that, though. I'm thinking about computing the nbuckets limit (how many buckets may fit into work_mem): tupsize = HJTUPLE_OVERHEAD + MAXALIGN(sizeof(MinimalTupleData)) + MAXALIGN(tupwidth); nbuckets_bits_max = my_log2(work_mem / tupsize) and then start with nbatches from this bit, like this: 31 23 max 1570 ||||| | | - batches| free | - buckets | That doesn't seem unreasonable provide there's enough bit space, but, as you point out, the day might not be far off when there isn't. Thinking about this a bit more, I believe the danger is not that imminent. 32 bits mean the Hash node should handle 2^32 tuples in total (possibly split into multiple batches). It used to be 2^32 buckets thanks to NTUP_PER_BUCKET=10, but the patch lowers this to 1 so it's tuples now. But while we're we're a bit closer to exhausting the 32 bits, I think it's not really that big issue. Not only hashing a table with ~4e9 rows is going to be a hellish experience, but if we really want to support it, we should probably switch to 64-bit hashes. Because adding some checks is not going to help - it may detect an issue, but it won't fix it. And actually, even if the two values get overlap, it does not mean the hashjoin will stop working. There'll be dependency between batches and buckets, causing non-uniform usage of the buckets, but it won't blow up. So I don't think we need to worry about exhausting the 32 bits for now. It also strikes me that when there's only 1 batch, the set of bits that map onto the batch number is zero-width, and one zero-width bit range is as good as another. In other words, if you're only planning to do one batch, you can easily grow the number of buckets on the fly. Growing the number of buckets only becomes difficult once you have more than one batch. Yes, that's true. It's also roughly what the patch does IMHO. If you know you'll need more batches, you can start with the maximum number of buckets right away (because you expect there's more data than work_mem, so shoot for nbuckets = mylog2(work_mem/tuple_size) or something like that. It's also true that if you're starting with a single batch, you can do whatever you want with nbuckets until you need to do (nbatches *= 2). It's also true that once you're done with the first batch, all the other batches will be just fine with the number of buckets, because the batches be about equally sized. But I think this is pretty much what the patch does, in the end. And I mention that because, in the scenario mentioned in your original post (a hash table with small number of buckets (8192) containing large number of tuples [~3.3M]), presumably what happens is you start out thinking you are going to have one batch with 8K buckets. Then, as more tuples continue to pour in, you see the load factor rising and responding by bumping up the size of the hash table. Now eventually you realize, gosh, this isn't even going to fit into work_mem, so you need to start batching it. But by that point you've presumably done as much as you're going to do in terms of increasing the number of buckets; after that, you're just going to add more batches as necessary. So not being able to further increase the number of buckets once the number of batches is no longer 1 wouldn't be a problem in that case. Now if you start out planning for multiple batches, and then you discover you'd like to keep the same number of batches but have more buckets per batch, that's gonna be hard. It's not impossible, because you could steal some of the unused high-order bits above the number of batches, and then concatenate them with the low-order bits that you already had, but that seems like kind of an ugly kludge, and AFAICS it only helps if the new tuples are narrower by enough to justify the cost of splitting all the buckets. I won't say that couldn't happen, but I think it would be better to keep that complexity out of the patch unless we're absolutely certain it's justified. I'm definitely in favor of keeping the patch as simple as possible. I was considering using reversing the bits of the hash, because that's pretty much the simplest solution. But I think you're right it might actually work like this: * Are more batches needed? (yes) = just use nbuckets = my_log2(work_mem / tuple_size) (no) = go ahead, until processing all tuples or hitting work_mem (work_mem) = meh, use the same nbuckets above (all tuples) = compute optimal nbuckets /
Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
Robert Haas robertmh...@gmail.com writes: On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote: I've tracked down the real root cause. The fix is very simple. Please check the attached one-liner patch. I'd support back-porting that commit to 9.1 and 9.2 as a fix for this problem. As the commit message says, it's dead simple. While I have no great objection to back-porting Heikki's patch, it seems like a very large stretch to call this a root-cause fix. At best it's band-aiding one symptom in a rather fragile way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] failures on barnacle (CLOBBER_CACHE_RECURSIVELY) because of memory leaks
Tomas Vondra t...@fuzzy.cz writes: So after 83 days, the regression tests on barnacle completed, Hah, that's perseverance! and it smells like another memory leak in CacheMemoryContext, similar to those fixed in 078b2ed on May 18. Ugh, will look. See this: http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=barnacledt=2014-05-19%2011%3A25%3A03 Initially I was more worried about the crashes, but it looks like that's probably just a side-effect of the leak: [537a053d.1f4b:2] LOG: server process (PID 32110) was terminated by signal 9: Killed [537a053d.1f4b:3] DETAIL: Failed process was running: select pg_get_viewdef('vv3', true); [537a053d.1f4b:4] LOG: terminating any other active server processes [537a053d.1f52:2] WARNING: terminating connection because of crash of another server process ... [537a053d.1f4b:7] LOG: server process (PID 3882) was terminated by signal 9: Killed [537a053d.1f4b:8] DETAIL: Failed process was running: SELECT gid FROM pg_prepared_xacts; [537a053d.1f4b:9] LOG: terminating any other active server processes ... Evidently the OOM killer is at large on this machine. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] failures on barnacle (CLOBBER_CACHE_RECURSIVELY) because of memory leaks
On 12.8.2014 02:05, Tom Lane wrote: Evidently the OOM killer is at large on this machine. Yes. It's a machine with only 8GB of RAM, and there are 3 VMs (LXC containers), with 2GB of RAM each. That's not much, but while it's mostly out of necessity, it's apparently a good way to catch leaks. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing dependency to wsock32.lib when compiling code on WIndows
On Mon, Aug 11, 2014 at 11:02:48AM -0700, Jeff Janes wrote: While trying to test more recent stuff against mingw, I kept running into a problem which bisected down to this (a16bac36eca8158cbf78987e953). This is the warning/error I get: auth.c: In function 'ClientAuthentication': auth.c:458:6: warning: implicit declaration of function 'gai_strerror' [-Wimplicit-function-declaration] libpq/auth.o:auth.c:(.text+0x1949): undefined reference to `gai_strerror' libpq/auth.o:auth.c:(.text+0x19c4): undefined reference to `gai_strerror' libpq/auth.o:auth.c:(.text+0x1b1a): undefined reference to `gai_strerror' libpq/auth.o:auth.c:(.text+0x1cb4): undefined reference to `gai_strerror' libpq/auth.o:auth.c:(.text+0x1cdc): undefined reference to `gai_strerror' Any suggestions on what to try? This patch doesn't seem to explicitly mention gai_strerror at all, so it must be some indirect effect. I, too, encountered that. The patch let configure detect HAVE_GETADDRINFO under MinGW-w64; passing ac_cv_func_getaddrinfo=no on the configure command line is a workaround. Under !HAVE_GETADDRINFO, configure arranges to build src/port/getaddrinfo.c, and src/include/getaddrinfo.h injects the replacement gai_strerror() prototype. That understandably doesn't happen in a HAVE_GETADDRINFO build, yet src/include/port/win32/sys/socket.h does the following unconditionally: /* * we can't use the windows gai_strerror{AW} functions because * they are defined inline in the MS header files. So we'll use our * own */ #undef gai_strerror Somehow or other, we must bring these parts into agreement. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of versioning on Windows, take two
On Sun, Aug 10, 2014 at 9:31 PM, MauMau maumau...@gmail.com wrote: From: Michael Paquier michael.paqu...@gmail.com LINK : fatal error LNK1104: ファイル '.\release\postgres\src_timezone_win32ver.obj' を開くことができません。 Oh yes, right. I don't really know how I missed this error when testing v1. Adding an explicit call to RemoveFile('src\timezone\win32ver.rc') for project postgres calms down the build. Is the attached working for you? Regards, -- Michael From f88aa01e1f6f17d2b1a0cba05fb9efb8fe06e45f Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Tue, 12 Aug 2014 10:02:32 +0900 Subject: [PATCH] Versioning support on MinGW and MSVC for remaining exe and dll files Windows versioning is added for the following files: - regress.dll (MSVC only) - isolationtester.exe - pg_isolation_regress.exe - pg_regress.exe - pg_regress_ecpg.exe - zic.exe --- src/interfaces/ecpg/test/Makefile | 1 + src/test/isolation/Makefile | 7 +-- src/test/regress/GNUmakefile | 7 +-- src/timezone/Makefile | 5 - src/tools/msvc/Mkvcbuild.pm | 7 +++ src/tools/msvc/clean.bat | 3 +++ 6 files changed, 25 insertions(+), 5 deletions(-) diff --git a/src/interfaces/ecpg/test/Makefile b/src/interfaces/ecpg/test/Makefile index 56f6a17..a2e0a83 100644 --- a/src/interfaces/ecpg/test/Makefile +++ b/src/interfaces/ecpg/test/Makefile @@ -14,6 +14,7 @@ override CPPFLAGS := \ $(CPPFLAGS) PGFILEDESC = ECPG Test - regression tests for ECPG +PGAPPICON = win32 # where to find psql for testing an existing installation PSQLDIR = $(bindir) diff --git a/src/test/isolation/Makefile b/src/test/isolation/Makefile index 26bcf22..77bc43d 100644 --- a/src/test/isolation/Makefile +++ b/src/test/isolation/Makefile @@ -6,12 +6,15 @@ subdir = src/test/isolation top_builddir = ../../.. include $(top_builddir)/src/Makefile.global +PGFILEDESC = pg_isolation_tester/isolationtester - isolation regression tests +PGAPPICON = win32 + # where to find psql for testing an existing installation PSQLDIR = $(bindir) override CPPFLAGS := -I$(srcdir) -I$(libpq_srcdir) -I$(srcdir)/../regress $(CPPFLAGS) -OBJS = specparse.o isolationtester.o +OBJS = specparse.o isolationtester.o $(WIN32RES) all: isolationtester$(X) pg_isolation_regress$(X) @@ -21,7 +24,7 @@ submake-regress: pg_regress.o: | submake-regress rm -f $@ $(LN_S) $(top_builddir)/src/test/regress/pg_regress.o . -pg_isolation_regress$(X): isolation_main.o pg_regress.o +pg_isolation_regress$(X): isolation_main.o pg_regress.o $(WIN32RES) $(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@ isolationtester$(X): $(OBJS) | submake-libpq submake-libpgport diff --git a/src/test/regress/GNUmakefile b/src/test/regress/GNUmakefile index b084e0a..64c9924 100644 --- a/src/test/regress/GNUmakefile +++ b/src/test/regress/GNUmakefile @@ -14,6 +14,9 @@ subdir = src/test/regress top_builddir = ../../.. include $(top_builddir)/src/Makefile.global +PGFILEDESC = pg_regress - regression tests +PGAPPICON = win32 + # file with extra config for temp build TEMP_CONF = ifdef TEMP_CONFIG @@ -43,7 +46,7 @@ EXTRADEFS = '-DHOST_TUPLE=$(host_tuple)' \ all: pg_regress$(X) -pg_regress$(X): pg_regress.o pg_regress_main.o | submake-libpgport +pg_regress$(X): pg_regress.o pg_regress_main.o $(WIN32RES) | submake-libpgport $(CC) $(CFLAGS) $^ $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@ # dependencies ensure that path changes propagate @@ -177,7 +180,7 @@ bigcheck: all tablespace-setup clean distclean maintainer-clean: clean-lib # things built by `all' target rm -f $(OBJS) refint$(DLSUFFIX) autoinc$(DLSUFFIX) dummy_seclabel$(DLSUFFIX) - rm -f pg_regress_main.o pg_regress.o pg_regress$(X) + rm -f pg_regress_main.o pg_regress.o pg_regress$(X) $(WIN32RES) # things created by various check targets rm -f $(output_files) $(input_files) rm -rf testtablespace diff --git a/src/timezone/Makefile b/src/timezone/Makefile index ef739e9..ab65d22 100644 --- a/src/timezone/Makefile +++ b/src/timezone/Makefile @@ -12,11 +12,14 @@ subdir = src/timezone top_builddir = ../.. include $(top_builddir)/src/Makefile.global +PGFILEDESC = timezone - timezone database +PGAPPICON = win32 + # files to build into backend OBJS= localtime.o strftime.o pgtz.o # files needed to build zic utility program -ZICOBJS= zic.o ialloc.o scheck.o localtime.o +ZICOBJS= zic.o ialloc.o scheck.o localtime.o $(WIN32RES) # timezone data files TZDATA = africa antarctica asia australasia europe northamerica southamerica \ diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm index e6fb3ec..8026543 100644 --- a/src/tools/msvc/Mkvcbuild.pm +++ b/src/tools/msvc/Mkvcbuild.pm @@ -112,6 +112,7 @@ sub mkvcbuild $postgres-AddFiles('src\backend\utils\misc', 'guc-file.l'); $postgres-AddFiles('src\backend\replication', 'repl_scanner.l', 'repl_gram.y'); + $postgres-RemoveFile('src\timezone\win32ver.rc'); $postgres-AddDefine('BUILDING_DLL');
Re: [HACKERS] psql: show only failed queries
On Tue, Aug 12, 2014 at 4:41 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2014-08-11 14:59 GMT+02:00 Fujii Masao masao.fu...@gmail.com: On Sat, Aug 9, 2014 at 4:16 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hi 2014-08-08 13:58 GMT+02:00 Fujii Masao masao.fu...@gmail.com: On Fri, Aug 8, 2014 at 4:31 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2014-08-07 7:10 GMT+02:00 Fujii Masao masao.fu...@gmail.com: On Thu, Aug 7, 2014 at 6:26 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello updated version patch in attachment Thanks! But ISTM you forgot to attached the patch. grr .. I am sorry No problem. Thanks for the patch! Attached is the revised version of the patch. +/* all psql known variables are included in list by default */ +for (known_varname = known_varnames; *known_varname; known_varname++) +varnames[nvars++] = pg_strdup(*known_varname); Don't we need to append both prefix and suffix to even known variables? ??? I am not sure if I understand well - probably system read only variables as DBNAME, USER, VERSION should not be there I had that question because complete_from_variables() is also called by the tab-completion of \echo : and it shows half-baked variables list. So I thought probably we need to change complete_from_variables() more to fix the problem. I understand now. I fixed it. I have a question.\echo probably should not to show empty known variable. data for autocomplete for \echo should be same as result of \set Agreed. I think that only the variables having the set values should be displayed in \echo : case. So I modified complete_from_variables() so that the unset variables are not shown in that case but all the variables are shown in the tab-completion of \set. +else if (strcmp(prev2_wd, \\set) == 0) +{ +if (strcmp(prev_wd, AUTOCOMMIT) == 0) ISTM that some psql variables like IGNOREEOF are not there. Why not? yes, there are not complete for DBNAME, ENCODING, FETCH_COUNT, HISTCONTROL, HISTFILE, HISTSIZE, HOST, IGNOREEOFF, PROMPT*,USER, VERSION There are more reasons: * paremeter is not a enum (string, number or both): FETCH_COUNT, PROMPT, HISTSIZE, .. * variable is pseudo read only - change has not any effect: DBNAME, ENCODING, VERSION So HISTCONTROL should be there because it doesn't have such reasons at all? yes ISTM that you forgot to add HISTCONTROL to your patch. So I just added that. I added the tab-completion for \unset command. Like \echo :, only the variables having the set values should be displayed in \unset case. perfect I changed complete_from_variables() so that it checks the memory size of the variable array even when appending the known variables. If the memory size is not enough, it's dynamically increased. Practically this check would not be required because the initial size of the array is enough larger than the number of the known variables. I added this as the safe-guard. Typo: IGNOREEOFF - IGNOREEOF I removed the value none from the value list of ECHO because it's not documented and a user might get confused when he or she sees the undocumented value none. Thought? isn't better to fix doc? I don't know any reason why we should not to support none I'm OK with this. The attached patch adds the support of none value both in ECHO and HISTCONTROL variables (because HISTCONTROL had the same problem as ECHO had), and also adds the description of that value into the document. I looked to code, you removed a check against duplicate varname in list. Is it ok? Oh, just revived that code. yes, It is looking well Ok, committed! Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql: show only failed queries
Oh, just revived that code. yes, It is looking well Ok, committed! super thank you very much Regards Pavel Regards, -- Fujii Masao
Re: [HACKERS] replication commands and log_statements
On Tue, Aug 12, 2014 at 2:34 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Aug 8, 2014 at 1:04 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Aug 9, 2014 at 12:29 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Aug 8, 2014 at 11:00 AM, Bruce Momjian br...@momjian.us wrote: On Fri, Aug 8, 2014 at 08:51:13AM -0400, Robert Haas wrote: On Thu, Aug 7, 2014 at 12:07 PM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: At 2014-08-07 23:22:43 +0900, masao.fu...@gmail.com wrote: That is, we log replication commands only when log_statement is set to all. Neither new parameter is introduced nor log_statement is redefined as a list. That sounds good to me. It sounds fairly unprincipled to me. I liked the idea of making log_statement a list, but if we aren't gonna do that, I think this should be a separate parameter. I am unclear there is enough demand for a separate replication logging parameter --- using log_statement=all made sense to me. Most people don't want to turn on log_statement=all because it produces too much log volume. See, for example: http://bonesmoses.org/2014/08/05/on-postgresql-logging-verbosity/ But logging replication commands is quite low-volume, so it is not hard to imagine someone wanting to log all replication commands but not all SQL statements. You can do that by executing ALTER ROLE replication user SET log_statement TO 'all'. If you don't use the replication user to execute SQL statements, no SQL statements are logged in that setting. If you have a user devoted to it, I suppose that's true. I still think it shouldn't get munged together like that. Why do we need to treat only replication commands as special ones and add new parameter to display them? I agree to add new parameter like log_replication to log the replication activity instead of the command itself, like checkpoint activity which can be enabled by log_checkpoints. But I'm not sure why logging of replication commands also needs to be controlled by separate parameter. And, I still think that those who set log_statement to all expect that all the commands including replication commands are logged. I'm afraid that separating only parameter for replication commands might confuse the users. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql.auto.conf and reload
On Mon, Aug 11, 2014 at 11:08 PM, Fujii Masao masao.fu...@gmail.com wrote: While updating the patch, I found that the ConfigVariable which is removed from list has the fields that the memory has been already allocated but we forgot to free them. So I included the code to free them in the patch. Yes, that is right. ! /* ! * Pick up only the data_directory if DataDir is not set, which ! * means that the configuration file is read for the first time and ! * PG_AUTOCONF_FILENAME file cannot be read yet. In this case, ! * we shouldn't pick any settings except the data_directory ! * from postgresql.conf because they might be overwritten ! * with the settings in PG_AUTOCONF_FILENAME file which will be ! * read later. OTOH, since it's ensured that data_directory doesn't ! * exist in PG_AUTOCONF_FILENAME file, it will never be overwritten ! * later. ! */ ! else It is bit incovinient to read this part of code, some other places in same file use comment inside else construct which seems to be better. one example is as below: else { /* * ordinary variable, append to list. For multiple items of * same parameter, retain only which comes later. */ I have marked this as Ready For Committer. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] replication commands and log_statements
On Tue, Aug 12, 2014 at 9:29 AM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Aug 12, 2014 at 2:34 AM, Robert Haas robertmh...@gmail.com wrote: If you have a user devoted to it, I suppose that's true. I still think it shouldn't get munged together like that. Why do we need to treat only replication commands as special ones and add new parameter to display them? One difference is that replication commands are internally generated commands. Do we anywhere else log such internally generated commands with log_statement = all? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Removing dependency to wsock32.lib when compiling code on WIndows
On Tue, Aug 12, 2014 at 9:43 AM, Noah Misch n...@leadboat.com wrote: Somehow or other, we must bring these parts into agreement. It is interesting to see that with MinGW-32b getaddrinfo is still set to no at configure phase. What if we simply wrap undef gai_strerror like in the patch attached? I just set up an environment with MinGW-64 and I was able to build the code (tested as well with MinGW-32 btw). Another idea would be to enforce getaddrinfo to no at configure for MinGW environments. Thoughts? -- Michael From 284aff1932d085587aa98a6023d8a6bc4ab9a16a Mon Sep 17 00:00:00 2001 From: Michael Paquier mich...@otacoo.com Date: Mon, 11 Aug 2014 19:11:45 -0700 Subject: [PATCH] Fix build error of MinGW-64 caused by incorrect gai_strerror Since commit a16bac3, configure detects HAVE_GETADDRINFO at configure phase in builds done with MinGW-64, resulting in an incorrect definition of gai_strerror as this is inconditionally undefined in src/include/port/win32/sys/socket.h. This commit simply prevents gai_strerror from being undefined when HAVE_GETADDRINFO is detected. This error has been first triggered with libpq, but with no doubts it may have been present in other places, libpq being only the first one to fail. --- src/include/port/win32/sys/socket.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/include/port/win32/sys/socket.h b/src/include/port/win32/sys/socket.h index edaee6a..33e697a 100644 --- a/src/include/port/win32/sys/socket.h +++ b/src/include/port/win32/sys/socket.h @@ -28,6 +28,8 @@ * they are defined inline in the MS header files. So we'll use our * own */ +#if !defined(HAVE_GETADDRINFO) #undef gai_strerror +#endif #endif /* WIN32_SYS_SOCKET_H */ -- 2.0.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5: Memory-bounded HashAgg
On Mon, 2014-08-11 at 01:29 +0200, Tomas Vondra wrote: On 10.8.2014 23:26, Jeff Davis wrote: This patch is requires the Memory Accounting patch, or something similar to track memory usage. I think the patch you sent actually includes the accounting patch. Is that on purpose, or by accident? Accident, thank you. So once a group gets into memory, it stays there? That's going to work fine for aggregates with fixed-size state (int4, or generally state that gets allocated and does not grow), but I'm afraid for aggregates with growing state (as for example array_agg and similar) that's not really a solution. I agree in theory, but for now I'm just not handling that case at all because there is other work that needs to be done first. For one thing, we would need a way to save the transition state, and we don't really have that. In the case of array_agg, the state is not serialized and there's no generic way to ask it to serialize itself without finalizing. I'm open to ideas. Do you think my patch is going generally in the right direction, and we can address this problem later; or do you think we need a different approach entirely? While hacking on the hash join, I envisioned the hash aggregate might work in a very similar manner, i.e. something like this: * nbatches=1, nbits=0 * when work_mem gets full = nbatches *= 2, nbits += 1 * get rid of half the groups, using nbits from the hash = dump the current states into 'states.batchno' file = dump further tuples to 'tuples.batchno' file * continue until the end, or until work_mem gets full again It would get a little messy with HashAgg. Hashjoin is dealing entirely with tuples; HashAgg deals with tuples and groups. Also, if the transition state is fixed-size (or even nearly so), it makes no sense to remove groups from the hash table before they are finished. We'd need to detect that somehow, and it seems almost like two different algorithms (though maybe not a bad idea to use a different algorithm for things like array_agg). Not saying that it can't be done, but (unless you have an idea) requires quite a bit more work than what I did here. It also seems to me that the logic of the patch is about this: * try to lookup the group in the hash table * found = call the transition function * not found * enough space = call transition function * not enough space = tuple/group goes to a batch Which pretty much means all tuples need to do the lookup first. The nice thing on the hash-join approach is that you don't really need to do the lookup - you just need to peek at the hash whether the group belongs to the current batch (and if not, to which batch it should go). That's an interesting point. I suspect that, in practice, the cost of hashing the tuple is more expensive (or at least not much cheaper than) doing a failed lookup. For aggregates using 'internal' to pass pointers that requires some help from the author - serialization/deserialization functions. Ah, yes, this is what I was referring to earlier. * EXPLAIN details for disk usage * choose number of partitions intelligently What is the purpose of HASH_DISK_MAX_PARTITIONS? I mean, when we decide we need 2048 partitions, why should we use less if we believe it will get us over work_mem? Because I suspect there are costs in having an extra file around that I'm not accounting for directly. We are implicitly assuming that the OS will keep around enough buffers for each BufFile to do sequential writes when needed. If we create a zillion partitions, then either we end up with random I/O or we push the memory burden into the OS buffer cache. We could try to model those costs explicitly to put some downward pressure on the number of partitions we select, but I just chose to cap it for now. For us, removing the sort is a big deal, because we're working with 100M rows regularly. It's more complicated though, because the sort is usually enforced by COUNT(DISTINCT) and that's not going to disappear because of this patch. But that's solvable with a custom aggregate. I hope this offers you a good alternative. I'm not sure it will ever beat sort for very high cardinality cases, but I hope it can beat sort when the group size averages something higher than one. It will also be safer, so the optimizer can be more aggressive about choosing HashAgg. Thank you for taking a look so quickly! Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing dependency to wsock32.lib when compiling code on WIndows
On Tue, Aug 12, 2014 at 01:58:17PM +0900, Michael Paquier wrote: On Tue, Aug 12, 2014 at 9:43 AM, Noah Misch n...@leadboat.com wrote: Somehow or other, we must bring these parts into agreement. It is interesting to see that with MinGW-32b getaddrinfo is still set to no at configure phase. What if we simply wrap undef gai_strerror like in the patch attached? I just set up an environment with MinGW-64 and I was able to build the code (tested as well with MinGW-32 btw). It's easy to devise something that fixes the build. What is the right fix, and why? Note that MinGW-w64 is available in both 32-bit and 64-bit. It is a fork of MinGW, which is always 32-bit. I experienced the problem with 64-bit MinGW-w64; I don't know how 32-bit MinGW-w64 compares. Another idea would be to enforce getaddrinfo to no at configure for MinGW environments. Consistency with the MSVC build is desirable, either HAVE_GETADDRINFO for both or !HAVE_GETADDRINFO for both. nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers