Re: [HACKERS] pg_stat_transaction patch
Joel Jacobson j...@gluefinance.com wrote: I applied all the changes on 9.0beta manually and then it compiled without any assertion failures. I also changed the oids to a different unused range, since the ones I used before had been taken in 9.0beta1. Thanks, but you still need to test your patch: - You need to check your patch with make check, because it requires adjustments in rule test; Your pg_stat_transaction_function is the longest name in the system catalog. - You need to configure postgres with --enable-cassert to enable internal varidations. The attached test case failed with the following TRAP. TRAP: FailedAssertion(!(entry-trans == ((void *)0)), File: pgstat.c, Line: 715) TRAP: FailedAssertion(!(tabstat-trans == trans), File: pgstat.c, Line: 1758) I suspect it is because get_tabstat_entry for some reason returns NULL, in for example pg_stat_get_transaction_tuples_inserted(PG_FUNCTION_ARGS). Does the function look valid? If you can find the error in it, the other functions probably have the same problem. For the above trap, we can see the comment: /* Shouldn't have any pending transaction-dependent counts */ We don't expect to read stats entries during transactions. I'm not sure whether accessing transitional stats during transaction is safe or not. We might need to go other directions, for example: - Use session stats instead transaction stats. You can see the same information in difference of counters between before and after the transaction. - Export pgBufferUsage instead of relation counters. They are buffer counters for all relations, but we can obviously export them because they are just plain variables. Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
On 24/05/10 22:49, Alvaro Herrera wrote: Excerpts from Josh Berkus's message of vie may 21 17:57:35 -0400 2010: Problem: currently, if your database has a large amount of cold data, such as 350GB of 3-year-old sales transactions, in 8.4 vacuum no longer needs to touch it thanks to the visibility map. However, every freeze_age transactions, very old pages need to be sucked into memory and rewritten just in order to freeze those pages. This can have a huge impact on system performance, and seems unjustified because the pages are not actually being used. I think this is nonsense. If you have 3-years-old sales transactions, and your database has any interesting churn, tuples those pages have been frozen for a very long time *already*. The problem is vacuum reading them in so that it can verify there's nothing to do. If we want to avoid *reading* those pages, this solution is useless: Suggested resolution: we would add a 4-byte field to the *page* header which would track the XID wraparound count. because you still have to read the page. What's missing from the suggestion is that relfrozenxid and datfrozenxid also need to be expanded to 8-bytes. That way you effectively have 8-byte XIDs, which means that you never need to vacuum to avoid XID wraparound. You still need to freeze to truncate clog, though, but if you have the disk space, you can now do that every 100 billion transactions for example if you wish. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExecutorCheckPerms() hook
(2010/05/25 12:19), Robert Haas wrote: On Mon, May 24, 2010 at 9:27 PM, Stephen Frostsfr...@snowman.net wrote: * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: We have two options; If the checker function takes the list of RangeTblEntry, it will be comfortable to ExecCheckRTPerms(), but not DoCopy(). Inversely, if the checker function takes arguments in my patch, it will be comfortable to DoCopy(), but ExecCheckRTPerms(). In my patch, it takes 6 arguments, but we can reference all of them from the given RangeTblEntry. On the other hand, if DoCopy() has to set up a pseudo RangeTblEntry to call checker function, it entirely needs to set up similar or a bit large number of variables. I don't know that it's really all that difficult to set up an RT in DoCopy or RI_Initial_Check(). In my opinion, those are the strange or corner cases- not the Executor code, through which all 'regular' DML is done. It makes me wonder if COPY shouldn't have been implemented using the Executor instead, but that's, again, a completely separate topic. It wasn't, but it wants to play like it operates in the same kind of way as INSERT, so it needs to pick up the slack. I think this approach is definitely worth investigating. KaiGai, can you please work up what the patch would look like if we do it this way? OK, the attached patch reworks it according to the way. * ExecCheckRTEPerms() becomes to take 2nd argument the caller to suggest behavior on access violation. The 'abort' argument is true, it raises an error using aclcheck_error() or ereport(). Otherwise, it returns false immediately without rest of checks. * DoCopy() and RI_Initial_Check() were reworked to call ExecCheckRTEPerms() with locally built RangeTblEntry. Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com *** a/src/backend/commands/copy.c --- b/src/backend/commands/copy.c *** *** 21,26 --- 21,27 #include arpa/inet.h #include access/heapam.h + #include access/sysattr.h #include access/xact.h #include catalog/namespace.h #include catalog/pg_type.h *** *** 37,43 #include rewrite/rewriteHandler.h #include storage/fd.h #include tcop/tcopprot.h - #include utils/acl.h #include utils/builtins.h #include utils/lsyscache.h #include utils/memutils.h --- 38,43 *** *** 725,733 DoCopy(const CopyStmt *stmt, const char *queryString) List *force_notnull = NIL; bool force_quote_all = false; bool format_specified = false; - AclMode required_access = (is_from ? ACL_INSERT : ACL_SELECT); - AclMode relPerms; - AclMode remainingPerms; ListCell *option; TupleDesc tupDesc; int num_phys_attrs; --- 725,730 *** *** 988,993 DoCopy(const CopyStmt *stmt, const char *queryString) --- 985,995 if (stmt-relation) { + RangeTblEntry rte; + Bitmapset *columnsSet = NULL; + List *attnums; + ListCell *cur; + Assert(!stmt-query); cstate-queryDesc = NULL; *** *** 998,1026 DoCopy(const CopyStmt *stmt, const char *queryString) tupDesc = RelationGetDescr(cstate-rel); /* Check relation permissions. */ ! relPerms = pg_class_aclmask(RelationGetRelid(cstate-rel), GetUserId(), ! required_access, ACLMASK_ALL); ! remainingPerms = required_access ~relPerms; ! if (remainingPerms != 0) { ! /* We don't have table permissions, check per-column permissions */ ! List *attnums; ! ListCell *cur; ! ! attnums = CopyGetAttnums(tupDesc, cstate-rel, attnamelist); ! foreach(cur, attnums) ! { ! int attnum = lfirst_int(cur); ! if (pg_attribute_aclcheck(RelationGetRelid(cstate-rel), ! attnum, ! GetUserId(), ! remainingPerms) != ACLCHECK_OK) ! aclcheck_error(ACLCHECK_NO_PRIV, ACL_KIND_CLASS, ! RelationGetRelationName(cstate-rel)); ! } } /* check read-only transaction */ if (XactReadOnly is_from !cstate-rel-rd_islocaltemp) PreventCommandIfReadOnly(COPY FROM); --- 1000,1025 tupDesc = RelationGetDescr(cstate-rel); /* Check relation permissions. */ ! attnums = CopyGetAttnums(tupDesc, cstate-rel, attnamelist); ! foreach (cur, attnums) { ! int attnum = lfirst_int(cur) - FirstLowInvalidHeapAttributeNumber; ! columnsSet = bms_add_member(columnsSet, attnum); } + memset(rte, 0, sizeof(rte)); + rte.type = T_RangeTblEntry; + rte.rtekind = RTE_RELATION; + rte.relid = RelationGetRelid(cstate-rel); + rte.requiredPerms = (is_from ? ACL_INSERT : ACL_SELECT); + if (is_from) + rte.modifiedCols = columnsSet; + else + rte.selectedCols = columnsSet; + + ExecCheckRTEPerms(rte, true); + /* check read-only transaction */ if (XactReadOnly is_from !cstate-rel-rd_islocaltemp) PreventCommandIfReadOnly(COPY FROM); *** a/src/backend/executor/execMain.c --- b/src/backend/executor/execMain.c *** *** 63,68
Re: [HACKERS] JSON manipulation functions
I started a wiki article for brainstorming the JSON API: http://wiki.postgresql.org/wiki/JSON_API_Brainstorm . I also made substantial changes to the draft of the API based on discussion here and on the #postgresql IRC channel. Is it alright to use the wiki for brainstorming, or should it stay on the mailing list or go somewhere else? I'll try not to spend too much time quibbling over the specifics as I tend to do. While the brainstorming is going on, I plan to start implementing the datatype by itself so I can establish an initial working codebase. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ROLLBACK TO SAVEPOINT
On May 25, 2010, at 6:08 , Sam Vilain wrote: http://www.postgresql.org/docs/8.4/static/sql-savepoint.html Lead us to believe that if you roll back to the same savepoint name twice in a row, that you might start walking back through the savepoints. I guess I missed the note on ROLLBACK TO SAVEPOINT that that is not how it works. Here is the section: SQL requires a savepoint to be destroyed automatically when another savepoint with the same name is established. In PostgreSQL, the old savepoint is kept, though only the more recent one will be used when rolling back or releasing. (Releasing the newer savepoint will cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming. I'm confused. The sentence in brackets Releasing the newer savepoint will cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and RELEASE SAVEPOINT implies that you *will* walk backwards through all the savepoints named a if you repeatedly issue ROLLBACK TO SAVEPOINT a, no? If that is not how it actually works, then this whole paragraph is wrong, I'd say. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery getting interrupted is not so unusual as it used to be
On Mon, May 17, 2010 at 5:33 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, May 15, 2010 at 3:20 AM, Robert Haas robertmh...@gmail.com wrote: Hmm, OK, I think that makes sense. Would you care to propose a patch? Yep. Here is the patch. This patch distinguishes normal shutdown from unexpected exit, while the server is in recovery. That is, when smart or fast shutdown is requested during recovery, the bgwriter sets the ControlFile-state to new-introduced DB_SHUTDOWNED_IN_RECOVERY state. This patch is worth applying for 9.0? If not, I'll add it into the next CF for 9.1. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Hot Standby performance and deadlocking
Some performance problems have been reported on HS from two users: Erik and Stefan. The characteristics of those issues have been that performance is * sporadically reduced, though mostly runs at full speed * context switch storms reported as being associated So we're looking for something that doesn't always happen, but when it does it involves lots of processes and context switching. Unfortunately neither test reporter has been able to re-run tests, leaving me not much to go on. Though since I know the code well, I can focus in on likely suspects fairly easily; in this case I think I have a root cause. Earlier this year I added deadlock detection into Startup process when it waits for a buffer pin. The deadlock detection was simplified since it doesn't wait for deadlock_timeout before acting, it just immediately sends a signal to all active processes to resolve the deadlock, even if the buffer pin is released very soon afterwards. Heikki questioned this implementation at the time, though I said it was easier to start simple and add more code if problems arose and time allowed. It's clear that with 100+ connections and reasonably frequent buffer pin waits, as would occur when accessing same data blocks on both primary and standby, that the current too-simple coding would cause performance issues, as Heikki implied. Certainly actual deadlocks are much rarer than buffer pin waits, so the current coding is wasteful. The following patch adds some simple logic to make the Startup process wait for deadlock_timeout before it sends the deadlock resolution signals. It does that by refactoring the API to enable_standby_sigalrm(), though doesn't change other behaviour or add new features. Viewpoints? -- Simon Riggs www.2ndQuadrant.com *** a/src/backend/storage/ipc/standby.c --- b/src/backend/storage/ipc/standby.c *** *** 388,399 ResolveRecoveryConflictWithBufferPin(void) } else if (MaxStandbyDelay 0) { /* ! * Send out a request to check for buffer pin deadlocks before we ! * wait. This is fairly cheap, so no need to wait for deadlock timeout ! * before trying to send it out. */ ! SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK); } else { --- 388,402 } else if (MaxStandbyDelay 0) { + TimestampTz now = GetCurrentTimestamp(); + /* ! * Set timeout for deadlock check (only) */ ! if (enable_standby_sig_alarm(now, now, true)) ! sig_alarm_enabled = true; ! else ! elog(FATAL, could not set timer for process wakeup); } else { *** *** 410,443 ResolveRecoveryConflictWithBufferPin(void) } else { ! TimestampTz fin_time; /* Expected wake-up time by timer */ ! long timer_delay_secs; /* Amount of time we set timer ! * for */ ! int timer_delay_usecs; ! ! /* ! * Send out a request to check for buffer pin deadlocks before we ! * wait. This is fairly cheap, so no need to wait for deadlock ! * timeout before trying to send it out. ! */ ! SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK); /* ! * How much longer we should wait? */ ! fin_time = TimestampTzPlusMilliseconds(then, MaxStandbyDelay); ! ! TimestampDifference(now, fin_time, ! timer_delay_secs, timer_delay_usecs); /* ! * It's possible that the difference is less than a microsecond; ! * ensure we don't cancel, rather than set, the interrupt. */ ! if (timer_delay_secs == 0 timer_delay_usecs == 0) ! timer_delay_usecs = 1; ! ! if (enable_standby_sig_alarm(timer_delay_secs, timer_delay_usecs, fin_time)) sig_alarm_enabled = true; else elog(FATAL, could not set timer for process wakeup); --- 413,431 } else { ! TimestampTz max_standby_time; /* ! * At what point in the future do we hit MaxStandbyDelay? */ ! max_standby_time = TimestampTzPlusMilliseconds(then, MaxStandbyDelay); ! Assert(max_standby_time now); /* ! * Wake up at MaxStandby delay, and check for deadlocks as well ! * if we will be waiting longer than deadlock_timeout */ ! if (enable_standby_sig_alarm(now, max_standby_time, false)) sig_alarm_enabled = true; else elog(FATAL, could not set timer for process wakeup); *** a/src/backend/storage/lmgr/proc.c --- b/src/backend/storage/lmgr/proc.c *** *** 85,90 static TimestampTz timeout_start_time; --- 85,91 /* statement_fin_time is valid only if statement_timeout_active is true */ static TimestampTz statement_fin_time; + static TimestampTz statement_fin_time2; /* valid only in recovery */ static void RemoveProcFromArray(int code, Datum arg); *** *** 1619,1641 handle_sig_alarm(SIGNAL_ARGS) * To avoid various edge cases, we must be careful to do nothing * when there is nothing to be done. We also need to
Re: [HACKERS] ROLLBACK TO SAVEPOINT
On 25/05/10 13:03, Florian Pflug wrote: On May 25, 2010, at 6:08 , Sam Vilain wrote: http://www.postgresql.org/docs/8.4/static/sql-savepoint.html Lead us to believe that if you roll back to the same savepoint name twice in a row, that you might start walking back through the savepoints. I guess I missed the note on ROLLBACK TO SAVEPOINT that that is not how it works. Here is the section: SQL requires a savepoint to be destroyed automatically when another savepoint with the same name is established. In PostgreSQL, the old savepoint is kept, though only the more recent one will be used when rolling back or releasing. (Releasing the newer savepoint will cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming. I'm confused. The sentence in brackets Releasing the newer savepoint will cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and RELEASE SAVEPOINT implies that you *will* walk backwards through all the savepoints named a if you repeatedly issue ROLLBACK TO SAVEPOINT a, no? If that is not how it actually works, then this whole paragraph is wrong, I'd say. Releasing the newer savepoint will cause the older one to again become accessible, as the doc says, but rolling back to a savepoint does not implicitly release it. You'll have to use RELEASE SAVEPOINT for that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stefan's bug (was: max_standby_delay considered harmful)
On Tue, May 18, 2010 at 3:09 PM, Fujii Masao masao.fu...@gmail.com wrote: (2) pg_ctl -ms stop emits the following warning whenever there is the backup_label file in $PGDATA. WARNING: online backup mode is active Shutdown will not complete until pg_stop_backup() is called. This warning doesn't fit in with the shutdown during recovery case. Since smart shutdown might be requested by other than pg_ctl, the warning should be emitted in server side rather than client, I think. How about moving the warning to the server side? Though I'm not sure if this should be fixed for 9.0, I attached the patch (move_bkp_cancel_warning_v1.patch). This patch is worth applying for 9.0? If not, I'll add it into the next CF for 9.1. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regression testing for psql
* Robert Haas (robertmh...@gmail.com) wrote: Of course, if people want to suggest tests that just shouldn't be included, I can go through and strip things out. Well... I'm a little reluctant to believe that we should have 3.3M of tests for the entire backend and 5M of tests just for psql. Then, too, there's the fact that many of these tests fail on my machine because my username is not sfrost, and/or because of row-ordering differences on backslash commands without enough ORDER BY to fully determine the output order. Yeah, you know, I had fully intended to go grepping through the output last night to check for things like that, but my wife decided I needed sleep instead. :) Sorry about that. Still, it's more of a general proposal than something I think should be committed as-is. Should we try to deal with those kinds of differences, or just eliminate the tests which are dependent on username, etc? It definitely strikes me that there's a fair bit of code in psql we're not exercising in some fashion in the regression suite... :/ Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] ExecutorCheckPerms() hook
KaiGai, * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: OK, the attached patch reworks it according to the way. I havn't looked at it yet, but the hook was added to ExecCheckRTPerms(), not RTE. This was for two main reasons- it seemed simpler to us and it meant that any security module implemented would have access to essentially everything we know the query is going to use all at once (instead of on a per-range-table basis). That could be particularly useful if you wanted to, say, enforce a constraint that says no two tables of different labels shall ever be used in the same query at the same time (perhaps with some caveats on that, etc). Could you change this patch to use ExecCheckRTPerms() instead? * ExecCheckRTEPerms() becomes to take 2nd argument the caller to suggest behavior on access violation. The 'abort' argument is true, it raises an error using aclcheck_error() or ereport(). Otherwise, it returns false immediately without rest of checks. * DoCopy() and RI_Initial_Check() were reworked to call ExecCheckRTEPerms() with locally built RangeTblEntry. Does this change fix the issue you had in RI_Initial_Check()? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] ROLLBACK TO SAVEPOINT
On May 25, 2010, at 12:18 , Heikki Linnakangas wrote: On 25/05/10 13:03, Florian Pflug wrote: On May 25, 2010, at 6:08 , Sam Vilain wrote: http://www.postgresql.org/docs/8.4/static/sql-savepoint.html Lead us to believe that if you roll back to the same savepoint name twice in a row, that you might start walking back through the savepoints. I guess I missed the note on ROLLBACK TO SAVEPOINT that that is not how it works. Here is the section: SQL requires a savepoint to be destroyed automatically when another savepoint with the same name is established. In PostgreSQL, the old savepoint is kept, though only the more recent one will be used when rolling back or releasing. (Releasing the newer savepoint will cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming. I'm confused. The sentence in brackets Releasing the newer savepoint will cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and RELEASE SAVEPOINT implies that you *will* walk backwards through all the savepoints named a if you repeatedly issue ROLLBACK TO SAVEPOINT a, no? If that is not how it actually works, then this whole paragraph is wrong, I'd say. Releasing the newer savepoint will cause the older one to again become accessible, as the doc says, but rolling back to a savepoint does not implicitly release it. You'll have to use RELEASE SAVEPOINT for that. Ah, now I get it. Thanks. Would changing Releasing the newer savepoint will cause ... to Explicitly releasing the newer savepoint or maybe even Explicitly releasing the newer savepoint with RELEASE SAVEPOINT will cause ... make things clearer? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON manipulation functions
On Tue, May 25, 2010 at 5:37 AM, Joseph Adams joeyadams3.14...@gmail.com wrote: I started a wiki article for brainstorming the JSON API: http://wiki.postgresql.org/wiki/JSON_API_Brainstorm . I also made substantial changes to the draft of the API based on discussion here and on the #postgresql IRC channel. Is it alright to use the wiki for brainstorming, or should it stay on the mailing list or go somewhere else? Well, I think it's fine to use the wiki for brainstorming, but before you change the design you probably need to talk about it here. You can't rely on everyone on -hackers to follow changes on a wiki page somewhere. It looks like the API has been overhauled pretty heavily since the last version we talked about here, and I'm not sure I understand it. I'll try not to spend too much time quibbling over the specifics as I tend to do. While the brainstorming is going on, I plan to start implementing the datatype by itself so I can establish an initial working codebase. Sounds good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On May 25, 2010, at 3:21 , Tom Lane wrote: Florian Pflug f...@phlo.org writes: The subtle point here is whether you consider the view from the outside (in the sense of what a read-only transaction started at an arbitrary time can or cannot observe), or from the inside (what updating transactions can observe and might base their updates on). The former case is completely determined by the commit ordering of the transactions, while the latter is not - otherwise serializability wouldn't be such a hard problem. BTW, doesn't all this logic fall in a heap as soon as you consider read-committed transactions? Why would it? There's still a well defined point in time at which the transaction's effects become visible, and every other transaction commits either before that time or after that time. An observer started between two transactions sees the first's changes but not the second's. One replace observing read committed transactions by a series of smaller repeatable read transactions, since the observers are read-only anyway. This of course says nothing about what state the updating transactions themselves see as the current state. For e.g. replication that is adequate, since you'd not replay the original commands but rather the effects they had in terms of physical tuple updates. On replay, the effects of a transaction to therefor not depend on the state the transaction sees. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT
,--- I/Alex (Mon, 24 May 2010 12:25:18 -0400) * | No equivalent of FETCH_COUNT is available at the libpq level, so I | assume that the interface I am using is smart enough not to send | gigabytes of data to FE. | | Where does the result set (GBs of data) reside after I call | PQexecPrepared? On BE, I hope? Sorry for asking again... No sarcasm meant: is there no straightforward answer here? Or nobody is certain? Or a wrong list? Thanks, -- Alex -- alex-goncha...@comcast.net -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
2010/5/25 Dan Ports d...@csail.mit.edu: On Mon, May 24, 2010 at 10:24:07AM -0500, Kevin Grittner wrote: Replicating or recreating the whole predicate locking and conflict detection on slaves is not feasible for performance reasons. (I won't elaborate unless someone feels that's not intuitively obvious.) The only sane way I can see to have a slave database allow serializable behavior is to WAL-log the acquisition of a snapshot by a serializable transaction, and the rollback or commit, on the master, and to have the serializable snapshot build on a slave exclude any serializable transactions for which there are still concurrent serializable transactions. Yes, that does mean WAL- logging the snapshot acquisition even if the transaction doesn't yet have an xid, and WAL-logging the commit or rollback even if it never acquires an xid. One important observation is that any anomaly that occurs on the slave can be resolved by aborting a local read-only transaction. This is a good thing, because the alternatives are too horrible to consider. You could possibly cut the costs of predicate locking by having the master ship with each transaction the list of predicate locks it acquired. But you'd still have to track locks for read-only transactions, so maybe that's not a significant cost improvement. On the other hand, if you're willing to pay the price of serializability on the master, why not the slaves too? I don't understand the problem. According to me, in the context of SSI, a read-only slave can just map SERIALIZABLE to the technical implementation of REPEATABLE READ (i.e., the currently-existing SERIALIZABLE). The union of the transactions on the master and the slave(s) will still exhibit SERIALIZABLE behavior because the transactions on the slave cannot write anything and are therefore irrelevant. Is anything wrong with that reasoning? Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT
Alex Goncharov wrote: ,--- I/Alex (Mon, 24 May 2010 12:25:18 -0400) * | No equivalent of FETCH_COUNT is available at the libpq level, so I | assume that the interface I am using is smart enough not to send | gigabytes of data to FE. | | Where does the result set (GBs of data) reside after I call | PQexecPrepared? On BE, I hope? Sorry for asking again... No sarcasm meant: is there no straightforward answer here? Or nobody is certain? Or a wrong list? The straighforward answer is that the libpq frontend c-library does not support something like the JDBC client's setFetchSize. The GBs of data are gathered at the site of the libpq client (pgresult object gathered/allocated while consuming result input from backend). regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT
At 2010-05-25 07:35:34 -0400, alex-goncha...@comcast.net wrote: | Where does the result set (GBs of data) reside after I call | PQexecPrepared? On BE, I hope? Unless you explicitly declare and fetch from an SQL-level cursor, your many GBs of data are going to be transmitted to libpq, which will eat lots of memory. (The wire protocol does have something like cursors, but libpq does not use them, it retrieves the entire result set.) -- ams -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON manipulation functions
On Tue, May 25, 2010 at 12:57, Robert Haas robertmh...@gmail.com wrote: On Tue, May 25, 2010 at 5:37 AM, Joseph Adams joeyadams3.14...@gmail.com wrote: I started a wiki article for brainstorming the JSON API: http://wiki.postgresql.org/wiki/JSON_API_Brainstorm . I also made substantial changes to the draft of the API based on discussion here and on the #postgresql IRC channel. Is it alright to use the wiki for brainstorming, or should it stay on the mailing list or go somewhere else? Well, I think it's fine to use the wiki for brainstorming, but before you change the design you probably need to talk about it here. You can't rely on everyone on -hackers to follow changes on a wiki page somewhere. It looks like the API has been overhauled pretty heavily since the last version we talked about here, and I'm not sure I understand it. The general idea that most people have been using, and that I think is correct, is to have the discussion here on the list, and then keep a summary of the current state of it on the wiki page so it's easier for someone entering the discussion to catch up on where it is. I'll try not to spend too much time quibbling over the specifics as I tend to do. While the brainstorming is going on, I plan to start implementing the datatype by itself so I can establish an initial working codebase. Sounds good. Agreed. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExecutorCheckPerms() hook
KaiGai, * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: OK, the attached patch reworks it according to the way. Reviewing this patch, there are a whole slew of problems. #1: REALLY BIG ISSUE- Insufficient comment updates. You've changed function definitions in a pretty serious way as well as moved some code around such that some of the previous comments don't make sense. You have got to update comments when you're writing a patch. Indeed, the places I see a changes in comments are when you've removed what appears to still be valid and appropriate comments, or places where you've added comments which are just blatently wrong with the submitted patch. #2: REALLY BIG ISSUE- You've added ExecutorCheckPerms_hook as part of this patch- don't, we're in feature-freeze right now and should not be adding hooks at this time. #3: You didn't move ExecCheckRTPerms() and ExecCheckRTEPerms() to utils/acl and instead added executor/executor.h to rt_triggers.c. I don't particularly like that. I admit that DoCopy() already knew about the executor, and if that were the only case outside of the executor where ExecCheckRTPerms() was getting called it'd probably be alright, but we already have another place that wants to use it, so let's move it to a more appropriate place. #4: As mentioned previously, the hook (which should be added in a separate patch anyway) makes more sense to me to be in ExecCheckRTPerms(), not ExecCheckRTEPerms(). This also means that we need to be calling ExecCheckRTPerms() from DoCopy and RI_Initial_Check(), to make sure that the hook gets called. To that end, I wouldn't even expose ExecCheckRTEPerms() outside of acl.c. Also, there should be a big comment about not using or calling ExecCheckRTEPerms() directly outside of ExecCheckRTPerms() since the hook would then be skipped. #5: In DoCopy, you can remove relPerms and remainingPerms, but I'd probably leave required_access up near the top and then just use it to set rte-required_access directly rather than moving that bit deep down into the function. #6: I havn't checked yet, but if there are other things in an RTE which would make sense in the DoCopy case, beyond just what's needed for the permissions checking, and which wouldn't be 'correct' with a NULL'd value, I would set those. Yes, we're building the RTE to check permissions, but we don't want someone downstream to be suprised when they make a change to something in the permissions checking and discover that a value in RTE they expected to be there wasn't valid. Even more so, if there are function helpers which can be used to build an RTE, we should be using them. The same goes for RI_Initial_Check(). #7: I'd move the conditional if (is_from) into the foreach which is building the columnsSet and eliminate the need for columnsSet; I don't see that it's really adding much here. #8: When moving ExecCheckRTPerms(), you should rename it to be more like the other function calls in acl.h Perhaps pg_rangetbl_aclcheck()? Also, it should return an actual AclResult instead of just true/false. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT
,--- Abhijit Menon-Sen (Tue, 25 May 2010 17:26:18 +0530) * | Unless you explicitly declare and fetch from an SQL-level cursor, your | many GBs of data are going to be transmitted to libpq, which will eat | lots of memory. (The wire protocol does have something like cursors, | but libpq does not use them, it retrieves the entire result set.) ,--- Yeb Havinga (Tue, 25 May 2010 14:08:51 +0200) * | The GBs of data are gathered at the site of the libpq client (pgresult | object gathered/allocated while consuming result input from backend). `--* Thank you very much! -- Alex -- alex-goncha...@comcast.net -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade docs
On Mon, May 24, 2010 at 11:35 PM, Bruce Momjian br...@momjian.us wrote: Have you read the docs? It does mention the issue with /contrib and stuff. How do I document a limitation I don't know about? This is all very vague. Please suggest some wording. OK, here's an attempt. Please fact-check. -- General Limitations pg_upgrade relies on binary compatibility between the old and new on-disk formats, including the on-disk formats of individual data types. pg_upgrade attempts to detect cases in which the on-disk format has changed; for example, it verifies that the old and new clusters have the same value for --enable-integer-datetimes. However, there is no systematic way for pg_upgrade to detect problems of this type; it has hard-coded knowledge of the specific cases known to exist in core PostgreSQL, including /contrib. If third-party or user-defined data types or access methods are used, it is the user's responsibility to verify that the versions loaded into the old and new clusters use compatible on-disk formats. If they do not, pg_upgrade may appear to work but subsequently crash or silently corrupt data. pg_upgrade also relies on ABI compatibility between modules loaded into the old and new clusters. For example, if an SQL function in the old cluster is defined to call a particular C function, pg_upgrade will recreate SQL function in the new cluster and will configure it to call the same C function. If no such C function can be found by the new cluster, pg_upgrade will simply fail. However, if a C function of the same name exists in the new cluster, but expects a different number of arguments or different types of arguments, then it is likely to crash the system when called. In the worst case, data corruption could result. -- Also, the following sentence appears not to fit with our only to 9.0 policy: For Windows users, note that due to different integer datetimes settings used by the one-click installer and the MSI installer, it is only possible to upgrade from version 8.3 of the one-click distribution to version 8.4 of the one-click distribution. It is not possible to upgrade from the MSI installer to the one-click installer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT
On 05/25/2010 07:35 AM, Alex Goncharov wrote: ,--- I/Alex (Mon, 24 May 2010 12:25:18 -0400) * | No equivalent of FETCH_COUNT is available at the libpq level, so I | assume that the interface I am using is smart enough not to send | gigabytes of data to FE. | | Where does the result set (GBs of data) reside after I call | PQexecPrepared? On BE, I hope? Sorry for asking again... No sarcasm meant: is there no straightforward answer here? Or nobody is certain? Or a wrong list? Issue multiple queries and make use of LIMIT/OFFSET. You'll have to go manual on this one. -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExecutorCheckPerms() hook
* KaiGai Kohei (kai...@ak.jp.nec.com) wrote: * DoCopy() and RI_Initial_Check() were reworked to call ExecCheckRTEPerms() with locally built RangeTblEntry. Maybe I missed it somewhere, but we still need to address the case where the user doesn't have those SELECT permissions that we're looking for in RI_Initial_Check(), right? KaiGai, your patch should be addressing that in a similar fashion.. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard (was: Add xpath_exists Function)
I've been reading the SQL/XML standard and discovered that it defines a function named XMLEXISTS that does exactly what the todo item xpath_exists defines. My original patch named the function as per the todo but I think using the function name from the standard is a better idea. So this patch is the same as before, but the function is now named XMLEXISTS instead of xpath_exists. Regards, -- Mike Fowler Registered Linux user: 379787 Index: src/backend/utils/adt/xml.c === RCS file: /home/mfowler/cvsrepo/pgrepo/pgsql/src/backend/utils/adt/xml.c,v retrieving revision 1.97 diff -c -r1.97 xml.c *** src/backend/utils/adt/xml.c 3 Mar 2010 17:29:45 - 1.97 --- src/backend/utils/adt/xml.c 25 May 2010 14:02:33 - *** *** 3495,3497 --- 3495,3668 return 0; #endif } + + /* + * Determines if the node specified by the supplied XPath exists + * in a given XML document, returning a boolean. + * + * It is up to the user to ensure that the XML passed is in fact + * an XML document - XPath doesn't work easily on fragments without + * a context node being known. + */ + Datum + xmlexists(PG_FUNCTION_ARGS) + { + #ifdef USE_LIBXML + text *xpath_expr_text = PG_GETARG_TEXT_P(0); + xmltype*data = PG_GETARG_XML_P(1); + ArrayType *namespaces = PG_GETARG_ARRAYTYPE_P(2); + xmlParserCtxtPtr ctxt = NULL; + xmlDocPtr doc = NULL; + xmlXPathContextPtr xpathctx = NULL; + xmlXPathCompExprPtr xpathcomp = NULL; + char *datastr; + int32 len; + int32 xpath_len; + xmlChar*string; + xmlChar*xpath_expr; + int i; + int ndim; + Datum *ns_names_uris; + bool *ns_names_uris_nulls; + int ns_count; + int result; + + /* + * Namespace mappings are passed as text[]. If an empty array is passed + * (ndim = 0, 0-dimensional), then there are no namespace mappings. + * Else, a 2-dimensional array with length of the second axis being equal + * to 2 should be passed, i.e., every subarray contains 2 elements, the + * first element defining the name, the second one the URI. Example: + * ARRAY[ARRAY['myns', 'http://example.com'], ARRAY['myns2', + * 'http://example2.com']]. + */ + ndim = ARR_NDIM(namespaces); + if (ndim != 0) + { + int *dims; + + dims = ARR_DIMS(namespaces); + + if (ndim != 2 || dims[1] != 2) + ereport(ERROR, + (errcode(ERRCODE_DATA_EXCEPTION), + errmsg(invalid array for XML namespace mapping), + errdetail(The array must be two-dimensional with length of the second axis equal to 2.))); + + Assert(ARR_ELEMTYPE(namespaces) == TEXTOID); + + deconstruct_array(namespaces, TEXTOID, -1, false, 'i', + ns_names_uris, ns_names_uris_nulls, + ns_count); + + Assert((ns_count % 2) == 0); /* checked above */ + ns_count /= 2; /* count pairs only */ + } + else + { + ns_names_uris = NULL; + ns_names_uris_nulls = NULL; + ns_count = 0; + } + + datastr = VARDATA(data); + len = VARSIZE(data) - VARHDRSZ; + xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ; + if (xpath_len == 0) + ereport(ERROR, + (errcode(ERRCODE_DATA_EXCEPTION), + errmsg(empty XPath expression))); + + string = (xmlChar *) palloc((len + 1) * sizeof(xmlChar)); + memcpy(string, datastr, len); + string[len] = '\0'; + + xpath_expr = (xmlChar *) palloc((xpath_len + 1) * sizeof(xmlChar)); + memcpy(xpath_expr, VARDATA(xpath_expr_text), xpath_len); + xpath_expr[xpath_len] = '\0'; + + pg_xml_init(); + xmlInitParser(); + + PG_TRY(); + { + /* + * redundant XML parsing (two parsings for the same value during one + * command execution are possible) + */ + ctxt = xmlNewParserCtxt(); + if (ctxt == NULL) + xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY, + could not allocate parser context); + doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0); + if (doc == NULL) + xml_ereport(ERROR, ERRCODE_INVALID_XML_DOCUMENT, + could not parse XML document); + xpathctx = xmlXPathNewContext(doc); + if (xpathctx == NULL) + xml_ereport(ERROR, ERRCODE_OUT_OF_MEMORY, + could not allocate XPath context); + xpathctx-node = xmlDocGetRootElement(doc); + if (xpathctx-node == NULL) + xml_ereport(ERROR, ERRCODE_INTERNAL_ERROR, + could not find root XML element); + + /* register namespaces, if any */ + if (ns_count 0) + { + for (i = 0; i ns_count; i++) + { + char *ns_name; + char *ns_uri; + + if (ns_names_uris_nulls[i * 2] || + ns_names_uris_nulls[i * 2 + 1]) + ereport(ERROR, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg(neither namespace name nor URI may be null))); + ns_name = TextDatumGetCString(ns_names_uris[i * 2]); + ns_uri = TextDatumGetCString(ns_names_uris[i * 2 + 1]); + if (xmlXPathRegisterNs(xpathctx, + (xmlChar *) ns_name, + (xmlChar *) ns_uri) != 0) + ereport(ERROR, /* is
Re: [HACKERS] libpq, PQexecPrepared, data size sent to FE vs. FETCH_COUNT
Alex Goncharov wrote: ,--- I/Alex (Mon, 24 May 2010 12:25:18 -0400) * | No equivalent of FETCH_COUNT is available at the libpq level, so I | assume that the interface I am using is smart enough not to send | gigabytes of data to FE. | | Where does the result set (GBs of data) reside after I call | PQexecPrepared? On BE, I hope? Sorry for asking again... No sarcasm meant: is there no straightforward answer here? Or nobody is certain? Or a wrong list? You have been given the answer. Please re-read the replies, e.g. the one from Abhijit Menon-Sen. The data is saved on the client side before the call returns. If that uses too much memory, use a cursor. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON manipulation functions
Well, I think it's fine to use the wiki for brainstorming, but before you change the design you probably need to talk about it here. You can't rely on everyone on -hackers to follow changes on a wiki page somewhere. It looks like the API has been overhauled pretty heavily since the last version we talked about here, and I'm not sure I understand it. I'll try to explain it in one big nutshell: Instead of, for instance, json_to_number('5') and number_to_json(5), I propose changing it to from_json(5)::INT and to_json('5'). Note how from_json simply returns TEXT containing the underlying value for the user to cast. I plan to make calling to_json/from_json with arrays or objects (e.g. to_json(ARRAY[1,2,3]) and from_json('[1,2,3]') ) throw an error for now, as implementing all the specifics of this could be quite distracting. If I'm not mistaken, json_object([content [AS name] [, ...]] | *) RETURNS json can't be implemented without augmenting the grammar (as was done with xmlforest), so I considered making it take a RECORD parameter like the hstore(RECORD) function does, as was suggested on IRC. However, this may be inadequate for selecting some columns but not others. Using examples from hstore: SELECT hstore(foo) FROM foo; = 'e=2.71828, pi=3.14159' -- this works, but what if we only want one field? SELECT hstore(pi) FROM foo; -- function type error SELECT hstore(row(pi)) FROM foo; = 'f1=3.14159' -- field name is lost SELECT hstore(bar) FROM (select pi FROM foo) AS bar; = 'f1=3.14159' -- ugly, and field name is *still* lost To get (and set, which I overlooked before), use json_get and json_set. These take JSONPath expressions, but I don't plan to implement all sorts of fancy features during the summer. However, I do plan to support some kind of parameter substitution so you can do this: json_get('[0,1,4,9,16,25]', '[%]' %% 2)= '4'::TEXT For this use case, though, it would be simpler to say: '[0,1,4,9,16,25]'::JSON - 2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade docs
Robert Haas wrote: On Mon, May 24, 2010 at 11:35 PM, Bruce Momjian br...@momjian.us wrote: Have you read the docs? ?It does mention the issue with /contrib and stuff. ?How do I document a limitation I don't know about? ?This is all very vague. ?Please suggest some wording. OK, here's an attempt. Please fact-check. -- General Limitations pg_upgrade relies on binary compatibility between the old and new on-disk formats, including the on-disk formats of individual data types. pg_upgrade attempts to detect cases in which the on-disk format has changed; for example, it verifies that the old and new clusters have the same value for --enable-integer-datetimes. However, there is no systematic way for pg_upgrade to detect problems of this type; it has hard-coded knowledge of the specific cases known to exist in core PostgreSQL, including /contrib. If third-party or user-defined data types or access methods are used, it is the user's responsibility to verify that the versions loaded into the old and new clusters use compatible on-disk formats. If they do not, pg_upgrade may appear to work but subsequently crash or silently corrupt data. OK, I have added a mention of the issues above, in a more abbreviated format. pg_upgrade also relies on ABI compatibility between modules loaded into the old and new clusters. For example, if an SQL function in the old cluster is defined to call a particular C function, pg_upgrade will recreate SQL function in the new cluster and will configure it to call the same C function. If no such C function can be found by the new cluster, pg_upgrade will simply fail. However, if a C function of the same name exists in the new cluster, but expects a different number of arguments or different types of arguments, then it is likely to crash the system when called. In the worst case, data corruption could result. These issues are not unique to pg_upgrade, and could happen even in a pg_dump restore. Also, the following sentence appears not to fit with our only to 9.0 policy: For Windows users, note that due to different integer datetimes settings used by the one-click installer and the MSI installer, it is only possible to upgrade from version 8.3 of the one-click distribution to version 8.4 of the one-click distribution. It is not possible to upgrade from the MSI installer to the one-click installer. Agreed. I added a 8.4 or later mention. It is not worth calling it 9.0 or later because then I would have to update this mention for every major release. Applied patch attached. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com Index: doc/src/sgml/pgupgrade.sgml === RCS file: /cvsroot/pgsql/doc/src/sgml/pgupgrade.sgml,v retrieving revision 1.10 diff -c -c -r1.10 pgupgrade.sgml *** doc/src/sgml/pgupgrade.sgml 24 May 2010 17:43:39 - 1.10 --- doc/src/sgml/pgupgrade.sgml 25 May 2010 14:50:36 - *** *** 16,21 --- 16,31 9.0.1 - 9.0.4. /para + para + applicationpg_upgrade/ works because, though new features are + regularly added to Postgres major releases, the internal data storage + format rarely changes. applicationpg_upgrade/ does its best to + make sure the old and new clusters are binary-compatible, e.g. by + checking for compatible compile-time settings. It is important that + any external modules are also binary compatibile, though this cannot + be checked by applicationpg_upgrade/. + /para + sect2 titleSupported Versions/title *** *** 440,446 sect2 titleLimitations in migrating emphasisfrom/ PostgreSQL 8.3/title - para Upgrading from PostgreSQL 8.3 has additional restrictions not present when upgrading from later PostgreSQL releases. For example, --- 450,455 *** *** 502,509 For Windows users, note that due to different integer datetimes settings used by the one-click installer and the MSI installer, it is only possible to upgrade from version 8.3 of the one-click distribution to !version 8.4 of the one-click distribution. It is not possible to upgrade !from the MSI installer to the one-click installer. /para para --- 511,518 For Windows users, note that due to different integer datetimes settings used by the one-click installer and the MSI installer, it is only possible to upgrade from version 8.3 of the one-click distribution to !version 8.4 or later of the one-click distribution. It is not !possible to upgrade from the MSI installer to the one-click installer. /para para -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clearing psql`s input buffer after auto-reconnect
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 3. Have CheckConnection do longjmp(sigint_interrupt_jmp) after resetting ... Now #1 might be the best long-term solution but I have no particular appetite to tackle it, and #2 is just too ugly to contemplate. That leaves #3, which is a bit ugly in its own right but seems like the best fix we're likely to get. Comments, better ideas? I like #3. If this were a more common event I might lean towards #1 but it's not so #3 seems fine. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201005251113 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAkv76TYACgkQvJuQZxSWSsiP6wCePU5TDpfFiv7MQpQ0vdIMms0d XZcAoMES58ilXZr2m5TEfeRUeiuuuss2 =36Z9 -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard (was: Add xpath_exists Function)
On Tue, May 25, 2010 16:31, Mike Fowler wrote: I've been reading the SQL/XML standard and discovered that it defines a function named XMLEXISTS that does exactly what the todo item xpath_exists defines. My original patch named the function as per the todo but I think using the function name from the standard is a better idea. So this patch is the same as before, but the function is now named XMLEXISTS instead of xpath_exists. I tried this path (cvs HEAD, applies without error), but get this error: [...] utils/adt/xml.o: In function `xmlexists': /var/data1/pg_stuff/pg_sandbox/pgsql.xmlexists/src/backend/utils/adt/xml.c:3639: undefined reference to `xmlXPathCompiledEvalToBoolean' collect2: ld returned 1 exit status make[2]: *** [postgres] Error 1 make[2]: Leaving directory `/var/data1/pg_stuff/pg_sandbox/pgsql.xmlexists/src/backend' make[1]: *** [all] Error 2 make[1]: Leaving directory `/var/data1/pg_stuff/pg_sandbox/pgsql.xmlexists/src' make: *** [all] Error 2 ./configure --prefix=/var/data1/pg_stuff/pg_installations/pgsql.xmlexists --with-pgport=6548 --quiet --enable-depend --enable-cassert --enable-debug --with-openssl --with-perl --with-libxml --with-libxslt centos 5.4 2.6.18-164.el5 x86_64 GNU/Linux libxml2.x86_64 2.6.26-2.1.2.8 installed libxml2-devel.x86_642.6.26-2.1.2.8 installed Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard
Erik Rijkers wrote: libxml2.x86_64 2.6.26-2.1.2.8 installed libxml2-devel.x86_642.6.26-2.1.2.8 installed Thanks for testing my patch Erik. It turns out I've got libxml2 installed at version 2.7.5. Searching the gnome mailing lists, it turns out xmlXPathCompiledEvalToBoolean was added (unbelievably) in the very next version from yours, 2.6.27 (see: http://mail.gnome.org/archives/xml/2006-October/msg00119.html). Regards, -- Mike Fowler Registered Linux user: 379787 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: In some systems (data warehousing, replication), the order of commits is important, since that is the order in which changes have become visible. This information could theoretically be extracted from the WAL, but scanning the entire WAL just to extract this tidbit of information would be excruciatingly painful. I think it would be quite simple to read WAL. WALSender reads the WAL file after its been flushed, so it would be simple for it to read a blob of WAL and then extract the commit order from it. Overall though, it would be easier and more efficient to *add* info to WAL and then do all this processing *after* WAL has been transported elsewhere. Extracting info with DDL triggers, normal triggers, commit order and everything else seems like too much work to me. Every other RDBMS has moved away from trigger-based replication and we should give that serious consideration also. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 12:40 +0900, Fujii Masao wrote: On Tue, May 25, 2010 at 10:29 AM, Josh Berkus j...@agliodbs.com wrote: I agree that #4 should be done last, but it will be needed, not in the least by your employer ;-) . I don't see any obvious way to make #4 compatible with any significant query load on the slave, but in general I'd think that users of #4 are far more concerned with 0% data loss than they are with getting the slave to run read queries. Since #2 and #3 are enough for 0% data loss, I think that such users would be more concerned about what results are visible in the standby. No? Please add #4 also. You can do that easily at the same time as #2 and #3, and it will leave me free to fix the perceived conflict problems. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Mon, 2010-05-24 at 22:20 +0900, Fujii Masao wrote: Second, we need to discuss about how to specify the synch level. There are three approaches: * Per standby Since the purpose, location and H/W resource often differ from one standby to another, specifying level per standby (i.e., we set the level in recovery.conf) is a straightforward approach, I think. For example, we can choose #3 for high-availability standby near the master, and choose #1 (async) for the disaster recovery standby remote. * Per transaction Define the PGC_USERSET option specifying the level and specify it on the master in response to the purpose of transaction. In this approach, for example, we can choose #4 for the transaction which should be visible on the standby as soon as a success of the commit has been returned to a client. We can also choose #1 for time-critical but not mission-critical transaction. * Mix Allow users to specify the level per standby and transaction at the same time, and then calculate the real level from them by using some algorithm. Which should we adopt for 9.1? I'd like to implement the per-standby approach at first since it's simple and seems to cover more use cases. Thought? -1 Synchronous replication implies that a commit should wait. This wait is experienced by the transaction, not by other parts of the system. If we define robustness at the standby level then robustness depends upon unseen administrators, as well as the current up/down state of standbys. This is action-at-a-distance in its worst form. Imagine having 2 standbys, 1 synch, 1 async. If the synch server goes down, performance will improve and robustness will have been lost. What good would that be? Imagine a standby connected over a long distance. DBA brings up standby in synch mode accidentally and the primary server hits massive performance problems without any way of directly controlling this. The worst aspect of standby-level controls is that nobody ever knows how safe a transaction is. There is no definition or test for us to check exactly how safe any particular transaction is. Also, the lack of safety occurs at the time when you least want it - when one of your servers is already down. So I call per-standby settings simple, and broken in multiple ways. Putting the control in the hands of the transaction owner (i.e. on the master) is exactly where the control should be. I personally like the idea of that being a USERSET, though could live with system wide settings if need be. But the control must be on the *master* not on the standbys. The best parameter we can specify is the number of servers that we wish to wait for confirmation from. That is a definition that easily manages the complexity of having various servers up/down at any one time. It also survives misconfiguration more easily, as well as providing a workaround if replicating across a bursty network where we can't guarantee response times, even of the typical response time is good. (We've discussed this many times before over a period of years and not really sure why we have to re-discuss this repeatedly just because people disagree. You don't mention the earlier discussions, not sure why. If we want to follow the community process, then all previous discussions need to be taken into account, unless things have changed - which they haven't: same topic, same people, AFAICS.) -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Mon, 2010-05-24 at 18:29 -0700, Josh Berkus wrote: If people agree that the above is our roadmap, implementing per-standby first makes sense, and then we can implement per-session GUC later. IMHO per-standby sounds simple, but is dangerously simplistic, explained on another part of the thread. We need to think clearly about failure modes and how they will be handled. Failure modes and edge cases completely govern the design here. All running smoothly isn't a major concern and so it appears that the user interface can be done various ways. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery getting interrupted is not so unusual as it used to be
On Tue, 2010-05-25 at 19:12 +0900, Fujii Masao wrote: On Mon, May 17, 2010 at 5:33 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, May 15, 2010 at 3:20 AM, Robert Haas robertmh...@gmail.com wrote: Hmm, OK, I think that makes sense. Would you care to propose a patch? Yep. Here is the patch. This patch distinguishes normal shutdown from unexpected exit, while the server is in recovery. That is, when smart or fast shutdown is requested during recovery, the bgwriter sets the ControlFile-state to new-introduced DB_SHUTDOWNED_IN_RECOVERY state. This patch is worth applying for 9.0? If not, I'll add it into the next CF for 9.1. Presumably Robert will be applying the patch? It seems to address the concern raised on the thread. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, May 25, 2010 at 12:28 PM, Simon Riggs si...@2ndquadrant.com wrote: Synchronous replication implies that a commit should wait. This wait is experienced by the transaction, not by other parts of the system. If we define robustness at the standby level then robustness depends upon unseen administrators, as well as the current up/down state of standbys. This is action-at-a-distance in its worst form. Maybe, but I can't help thinking people are going to want some form of this. The case where someone wants to do sync rep to the machine in the next rack over and async rep to a server at a remote site seems too important to ignore. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery getting interrupted is not so unusual as it used to be
On Tue, May 25, 2010 at 12:36 PM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, 2010-05-25 at 19:12 +0900, Fujii Masao wrote: On Mon, May 17, 2010 at 5:33 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, May 15, 2010 at 3:20 AM, Robert Haas robertmh...@gmail.com wrote: Hmm, OK, I think that makes sense. Would you care to propose a patch? Yep. Here is the patch. This patch distinguishes normal shutdown from unexpected exit, while the server is in recovery. That is, when smart or fast shutdown is requested during recovery, the bgwriter sets the ControlFile-state to new-introduced DB_SHUTDOWNED_IN_RECOVERY state. This patch is worth applying for 9.0? If not, I'll add it into the next CF for 9.1. Presumably Robert will be applying the patch? It seems to address the concern raised on the thread. Yes, I was planning to review it. But if you or someone else would like to cut in, that's OK too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard
On Tue, May 25, 2010 at 12:04 PM, Mike Fowler m...@mlfowler.com wrote: Erik Rijkers wrote: libxml2.x86_64 2.6.26-2.1.2.8 installed libxml2-devel.x86_64 2.6.26-2.1.2.8 installed Thanks for testing my patch Erik. It turns out I've got libxml2 installed at version 2.7.5. Searching the gnome mailing lists, it turns out xmlXPathCompiledEvalToBoolean was added (unbelievably) in the very next version from yours, 2.6.27 (see: http://mail.gnome.org/archives/xml/2006-October/msg00119.html). We're unlikely to accept this patch if it changes the minimum version of libxml2 required to compile PostgreSQL. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 12:40 -0400, Robert Haas wrote: On Tue, May 25, 2010 at 12:28 PM, Simon Riggs si...@2ndquadrant.com wrote: Synchronous replication implies that a commit should wait. This wait is experienced by the transaction, not by other parts of the system. If we define robustness at the standby level then robustness depends upon unseen administrators, as well as the current up/down state of standbys. This is action-at-a-distance in its worst form. Maybe, but I can't help thinking people are going to want some form of this. The case where someone wants to do sync rep to the machine in the next rack over and async rep to a server at a remote site seems too important to ignore. Uhh yeah, that is pretty much the standard use case. The next rack is only 50% of the equation. The next part is the disaster recovery rack over 100Mb (or even 10Mb) that is half way across the country. It is common, very common. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Confused about the buffer pool size
Hello All: In the code (costsize.c), I see that effective_cache_size is set to DEFAULT_EFFECTIVE_CACHE_SIZE. This is defined as follows in cost.h #define DEFAULT_EFFECTIVE_CACHE_SIZE 16384 But when I say show shared_buffers in psql I get, shared_buffers 28MB In postgresql.conf file, the following lines appear shared_buffers = 28MB # min 128kB # (change requires restart)#temp_buffers = 8MB # min 800kB So I am assuming that the buffer pool size is 28MB = 28 * 128 = 3584 8K pages. So should effective_cache_size be set to 3584 rather than the 16384? Thanks, MMK.
Re: [HACKERS] JSON manipulation functions
On Tue, May 25, 2010 at 10:52 AM, Joseph Adams joeyadams3.14...@gmail.com wrote: Well, I think it's fine to use the wiki for brainstorming, but before you change the design you probably need to talk about it here. You can't rely on everyone on -hackers to follow changes on a wiki page somewhere. It looks like the API has been overhauled pretty heavily since the last version we talked about here, and I'm not sure I understand it. I'll try to explain it in one big nutshell: Instead of, for instance, json_to_number('5') and number_to_json(5), I propose changing it to from_json(5)::INT and to_json('5'). Note how from_json simply returns TEXT containing the underlying value for the user to cast. I plan to make calling to_json/from_json with arrays or objects (e.g. to_json(ARRAY[1,2,3]) and from_json('[1,2,3]') ) throw an error for now, as implementing all the specifics of this could be quite distracting. I don't see how that's an improvement over the previous design. It seems like it adds a lot of extra casting and removes useful list operations without any corresponding advantage. If I'm not mistaken, json_object([content [AS name] [, ...]] | *) RETURNS json can't be implemented without augmenting the grammar (as was done with xmlforest), so I considered making it take a RECORD parameter like the hstore(RECORD) function does, as was suggested on IRC. However, this may be inadequate for selecting some columns but not others. Using examples from hstore: SELECT hstore(foo) FROM foo; = 'e=2.71828, pi=3.14159' -- this works, but what if we only want one field? SELECT hstore(pi) FROM foo; -- function type error SELECT hstore(row(pi)) FROM foo; = 'f1=3.14159' -- field name is lost SELECT hstore(bar) FROM (select pi FROM foo) AS bar; = 'f1=3.14159' -- ugly, and field name is *still* lost Yeah. I'm not sure what to do about this problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
Robert Haas robertmh...@gmail.com wrote: Simon Riggs si...@2ndquadrant.com wrote: If we define robustness at the standby level then robustness depends upon unseen administrators, as well as the current up/down state of standbys. This is action-at-a-distance in its worst form. Maybe, but I can't help thinking people are going to want some form of this. The case where someone wants to do sync rep to the machine in the next rack over and async rep to a server at a remote site seems too important to ignore. I think there may be a terminology issue here -- I took configure by standby to mean that *at the master* you would specify rules for each standby. I think Simon took it to mean that each standby would define the rules for replication to it. Maybe this issue can resolve gracefully with a bit of clarification? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Add _PG_init to PL language handler documentation
Howdy, This tiny doc patch adds _PG_init to the skeleton example code for a PL. The information is quite valuable to PL authors, who might miss it when it is described in the shared library documentation. This patch was based off of 6e2ba96 in the git mirror and a colorized diff can be viewed here: http://github.com/leto/postgres/commit/a9e265a7f55a0605fb4c6135f0f689c8b89e9623 Duke -- Jonathan Duke Leto jonat...@leto.net http://leto.net pginit.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 12:40 -0400, Robert Haas wrote: On Tue, May 25, 2010 at 12:28 PM, Simon Riggs si...@2ndquadrant.com wrote: Synchronous replication implies that a commit should wait. This wait is experienced by the transaction, not by other parts of the system. If we define robustness at the standby level then robustness depends upon unseen administrators, as well as the current up/down state of standbys. This is action-at-a-distance in its worst form. Maybe, but I can't help thinking people are going to want some form of this. The case where someone wants to do sync rep to the machine in the next rack over and async rep to a server at a remote site seems too important to ignore. The use case of machine in the next rack over and async rep to a server at a remote site *is* important, but you give no explanation as to why that implies per-standby is the solution to it. If you read the rest of my email, you'll see that I have explained the problems per-standby settings would cause. Please don't be so quick to claim it is me ignoring anything. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard
Robert Haas wrote: On Tue, May 25, 2010 at 12:04 PM, Mike Fowler m...@mlfowler.com wrote: Erik Rijkers wrote: libxml2.x86_64 2.6.26-2.1.2.8 installed libxml2-devel.x86_642.6.26-2.1.2.8 installed Thanks for testing my patch Erik. It turns out I've got libxml2 installed at version 2.7.5. Searching the gnome mailing lists, it turns out xmlXPathCompiledEvalToBoolean was added (unbelievably) in the very next version from yours, 2.6.27 (see: http://mail.gnome.org/archives/xml/2006-October/msg00119.html). We're unlikely to accept this patch if it changes the minimum version of libxml2 required to compile PostgreSQL Why? 2.6.27 is almost 4 years old. I realise that my patch didn't update configure and configure.in, and indeed I didn't think of it when I responded to Erik (I'm too used to the Java world where people manage their own dependencies). I've now attached the updated patch which ups the check from version 2.6.23 to 2.6.27. Regards, -- Mike Fowler Registered Linux user: 379787 Index: configure === RCS file: /home/mfowler/cvsrepo/pgrepo/pgsql/configure,v retrieving revision 1.679 diff -c -r1.679 configure *** configure 13 May 2010 22:07:40 - 1.679 --- configure 25 May 2010 16:57:49 - *** *** 9079,9087 if test $with_libxml = yes ; then ! { $as_echo $as_me:$LINENO: checking for xmlSaveToBuffer in -lxml2 5 ! $as_echo_n checking for xmlSaveToBuffer in -lxml2... 6; } ! if test ${ac_cv_lib_xml2_xmlSaveToBuffer+set} = set; then $as_echo_n (cached) 6 else ac_check_lib_save_LIBS=$LIBS --- 9079,9087 if test $with_libxml = yes ; then ! { $as_echo $as_me:$LINENO: checking for xmlXPathCompiledEvalToBoolean in -lxml2 5 ! $as_echo_n checking for xmlXPathCompiledEvalToBoolean in -lxml2... 6; } ! if test ${ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean+set} = set; then $as_echo_n (cached) 6 else ac_check_lib_save_LIBS=$LIBS *** *** 9099,9109 #ifdef __cplusplus extern C #endif ! char xmlSaveToBuffer (); int main () { ! return xmlSaveToBuffer (); ; return 0; } --- 9099,9109 #ifdef __cplusplus extern C #endif ! char xmlXPathCompiledEvalToBoolean (); int main () { ! return xmlXPathCompiledEvalToBoolean (); ; return 0; } *** *** 9129,9140 test $cross_compiling = yes || $as_test_x conftest$ac_exeext }; then ! ac_cv_lib_xml2_xmlSaveToBuffer=yes else $as_echo $as_me: failed program was: 5 sed 's/^/| /' conftest.$ac_ext 5 ! ac_cv_lib_xml2_xmlSaveToBuffer=no fi rm -rf conftest.dSYM --- 9129,9140 test $cross_compiling = yes || $as_test_x conftest$ac_exeext }; then ! ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean=yes else $as_echo $as_me: failed program was: 5 sed 's/^/| /' conftest.$ac_ext 5 ! ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean=no fi rm -rf conftest.dSYM *** *** 9142,9150 conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi ! { $as_echo $as_me:$LINENO: result: $ac_cv_lib_xml2_xmlSaveToBuffer 5 ! $as_echo $ac_cv_lib_xml2_xmlSaveToBuffer 6; } ! if test x$ac_cv_lib_xml2_xmlSaveToBuffer = xyes; then cat confdefs.h _ACEOF #define HAVE_LIBXML2 1 _ACEOF --- 9142,9150 conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi ! { $as_echo $as_me:$LINENO: result: $ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean 5 ! $as_echo $ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean 6; } ! if test x$ac_cv_lib_xml2_xmlXPathCompiledEvalToBoolean = xyes; then cat confdefs.h _ACEOF #define HAVE_LIBXML2 1 _ACEOF *** *** 9152,9159 LIBS=-lxml2 $LIBS else ! { { $as_echo $as_me:$LINENO: error: library 'xml2' (version = 2.6.23) is required for XML support 5 ! $as_echo $as_me: error: library 'xml2' (version = 2.6.23) is required for XML support 2;} { (exit 1); exit 1; }; } fi --- 9152,9159 LIBS=-lxml2 $LIBS else ! { { $as_echo $as_me:$LINENO: error: library 'xml2' (version = 2.6.27) is required for XML support 5 ! $as_echo $as_me: error: library 'xml2' (version = 2.6.27) is required for XML support 2;} { (exit 1); exit 1; }; } fi Index: configure.in === RCS file: /home/mfowler/cvsrepo/pgrepo/pgsql/configure.in,v retrieving revision 1.627 diff -c -r1.627 configure.in *** configure.in 13 May 2010 22:07:42 - 1.627 --- configure.in 25 May 2010 16:22:32 - *** *** 940,946 fi if test $with_libxml = yes ; then ! AC_CHECK_LIB(xml2, xmlSaveToBuffer, [], [AC_MSG_ERROR([library 'xml2' (version = 2.6.23) is required for XML support])]) fi if test $with_libxslt = yes ; then --- 940,946 fi if test $with_libxml = yes ; then ! AC_CHECK_LIB(xml2,
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 11:52 -0500, Kevin Grittner wrote: Robert Haas robertmh...@gmail.com wrote: Simon Riggs si...@2ndquadrant.com wrote: If we define robustness at the standby level then robustness depends upon unseen administrators, as well as the current up/down state of standbys. This is action-at-a-distance in its worst form. Maybe, but I can't help thinking people are going to want some form of this. The case where someone wants to do sync rep to the machine in the next rack over and async rep to a server at a remote site seems too important to ignore. I think there may be a terminology issue here -- I took configure by standby to mean that *at the master* you would specify rules for each standby. I think Simon took it to mean that each standby would define the rules for replication to it. Maybe this issue can resolve gracefully with a bit of clarification? The use case of machine in the next rack over and async rep to a server at a remote site would require the settings server.nextrack = synch server.remotesite = async which leaves open the question of what happens when nextrack is down. In many cases, to give adequate performance in that situation people add an additional server, so the config becomes server.nextrack1 = synch server.nextrack2 = synch server.remotesite = async We then want to specify for performance reasons that we can get a reply from either nextrack1 or nextrack2, so it all still works safely and quickly if one of them is down. How can we express that rule concisely? With some difficulty. My suggestion is simply to have a single parameter (name unimportant) number_of_synch_servers_we_wait_for = N which is much easier to understand because it is phrased in terms of the guarantee given to the transaction, not in terms of what the admin thinks is the situation. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
Excerpts from Heikki Linnakangas's message of mar may 25 04:41:30 -0400 2010: On 24/05/10 22:49, Alvaro Herrera wrote: I think this is nonsense. If you have 3-years-old sales transactions, and your database has any interesting churn, tuples those pages have been frozen for a very long time *already*. What's missing from the suggestion is that relfrozenxid and datfrozenxid also need to be expanded to 8-bytes. That way you effectively have 8-byte XIDs, which means that you never need to vacuum to avoid XID wraparound. Hmm, so are we going to use the xid epoch more officially? That's entirely a new line of development, perhaps it opens new possibilities. This sounds like extending Xid to 64 bits, without having to store the high bits everywhere. Was this discussed in the PGCon devs meeting? -- Álvaro Herrera alvhe...@alvh.no-ip.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, May 25, 2010 at 1:10 PM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, 2010-05-25 at 11:52 -0500, Kevin Grittner wrote: Robert Haas robertmh...@gmail.com wrote: Simon Riggs si...@2ndquadrant.com wrote: If we define robustness at the standby level then robustness depends upon unseen administrators, as well as the current up/down state of standbys. This is action-at-a-distance in its worst form. Maybe, but I can't help thinking people are going to want some form of this. The case where someone wants to do sync rep to the machine in the next rack over and async rep to a server at a remote site seems too important to ignore. I think there may be a terminology issue here -- I took configure by standby to mean that *at the master* you would specify rules for each standby. I think Simon took it to mean that each standby would define the rules for replication to it. Maybe this issue can resolve gracefully with a bit of clarification? The use case of machine in the next rack over and async rep to a server at a remote site would require the settings server.nextrack = synch server.remotesite = async which leaves open the question of what happens when nextrack is down. In many cases, to give adequate performance in that situation people add an additional server, so the config becomes server.nextrack1 = synch server.nextrack2 = synch server.remotesite = async We then want to specify for performance reasons that we can get a reply from either nextrack1 or nextrack2, so it all still works safely and quickly if one of them is down. How can we express that rule concisely? With some difficulty. Perhaps the difficulty here is that those still look like per-server settings to me. Just maybe with a different set of semantics. My suggestion is simply to have a single parameter (name unimportant) number_of_synch_servers_we_wait_for = N which is much easier to understand because it is phrased in terms of the guarantee given to the transaction, not in terms of what the admin thinks is the situation. So I agree that we need to talk about whether or not we want to do this. I'll give my opinion. I am not sure how useful this really is. Consider a master with two standbys. The master commits a transaction and waits for one of the two standbys, then acknowledges the commit back to the user. Then the master crashes. Now what? It's not immediately obvious which standby we should being online as the primary, and if we guess wrong we could lose transactions thought to be committed. This is probably a solvable problem, with enough work: we can write a script to check the last LSN received by each of the two standbys and promote whichever one is further along. But... what happens if the master and one standby BOTH crash simultaneously? There's no way of knowing (until we get at least one of them back up) whether it's safe to promote the other standby. I like the idea of a quorum commit type feature where we promise the user that things are committed when enough servers have acknowledged the commit. But I think most people are not going to want that configuration unless we also provide some really good management tools that we don't have today. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confused about the buffer pool size
On 25/05/10 19:49, MMK wrote: Hello All: In the code (costsize.c), I see that effective_cache_size is set to DEFAULT_EFFECTIVE_CACHE_SIZE. This is defined as follows in cost.h #define DEFAULT_EFFECTIVE_CACHE_SIZE 16384 But when I say show shared_buffers in psql I get, shared_buffers 28MB In postgresql.conf file, the following lines appear shared_buffers = 28MB # min 128kB # (change requires restart)#temp_buffers = 8MB # min 800kB So I am assuming that the buffer pool size is 28MB = 28 * 128 = 3584 8K pages. So should effective_cache_size be set to 3584 rather than the 16384? No. Please see the manual for what effective_cache_size means: http://www.postgresql.org/docs/8.4/interactive/runtime-config-query.html#GUC-EFFECTIVE-CACHE-SIZE -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 13:31 -0400, Robert Haas wrote: On Tue, May 25, 2010 at 1:10 PM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, 2010-05-25 at 11:52 -0500, Kevin Grittner wrote: Robert Haas robertmh...@gmail.com wrote: Simon Riggs si...@2ndquadrant.com wrote: If we define robustness at the standby level then robustness depends upon unseen administrators, as well as the current up/down state of standbys. This is action-at-a-distance in its worst form. Maybe, but I can't help thinking people are going to want some form of this. The case where someone wants to do sync rep to the machine in the next rack over and async rep to a server at a remote site seems too important to ignore. I think there may be a terminology issue here -- I took configure by standby to mean that *at the master* you would specify rules for each standby. I think Simon took it to mean that each standby would define the rules for replication to it. Maybe this issue can resolve gracefully with a bit of clarification? The use case of machine in the next rack over and async rep to a server at a remote site would require the settings server.nextrack = synch server.remotesite = async which leaves open the question of what happens when nextrack is down. In many cases, to give adequate performance in that situation people add an additional server, so the config becomes server.nextrack1 = synch server.nextrack2 = synch server.remotesite = async We then want to specify for performance reasons that we can get a reply from either nextrack1 or nextrack2, so it all still works safely and quickly if one of them is down. How can we express that rule concisely? With some difficulty. Perhaps the difficulty here is that those still look like per-server settings to me. Just maybe with a different set of semantics. (Those are the per-server settings.) -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard
On Tue, May 25, 2010 at 1:09 PM, Mike Fowler m...@mlfowler.com wrote: We're unlikely to accept this patch if it changes the minimum version of libxml2 required to compile PostgreSQL Why? 2.6.27 is almost 4 years old. Because we work hard to minimize our dependencies and make them as non-onerous as possible. At a minimum, I think it's fair to say that the burden is on you to justify what it's worth bumping the version number. If there is some major speed or performance advantage to using the newer API, maybe we'll consider it. But if it's just a few extra lines of code to work around it, then it's better to write those extra lines of code rather than potentially force users to upgrade packages they're otherwise happy with. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 19:08 +0200, Alastair Turner wrote: On Tue, May 25, 2010 at 6:28 PM, Simon Riggs si...@2ndquadrant.com wrote: ... The best parameter we can specify is the number of servers that we wish to wait for confirmation from. That is a definition that easily manages the complexity of having various servers up/down at any one time. It also survives misconfiguration more easily, as well as providing a workaround if replicating across a bursty network where we can't guarantee response times, even of the typical response time is good. This may be an incredibly naive question, but what happens to the transaction on the master if the number of confirmations is not received? Is this intended to create a situation where the master effectively becomes unavailable for write operations when its synchronous slaves are unavailable? How we handle degraded mode is important, yes. Whatever parameters we choose the problem will remain the same. Should we just ignore degraded mode and respond as if nothing bad had happened? Most people would say not. If we specify server1 = synch and server2 = async we then also need to specify what happens if server1 is down. People might often specify if (server1 == down) server2 = synch. So now we have 3 configuration settings, one quite complex. It's much easier to say you want to wait for N servers to respond, but don't care which they are. One parameter, simple and flexible. In both cases, we have to figure what to do if we can't get either server to respond. In replication there is no such thing as server down just a server didn't reply in time X. So we need to define timeouts. So whatever we do, we need additional parameters to specify timeouts (including wait-forever as an option) and action-on-timeout: commit or rollback. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard
On 05/25/2010 01:09 PM, Mike Fowler wrote: Why? 2.6.27 is almost 4 years old. RHEL 5 ships with 2.6.26. I imagine that supporting it is very desirable, regardless of its age, since that is unfortunately still the latest version of RHEL. -- m. tharp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 13:31 -0400, Robert Haas wrote: So I agree that we need to talk about whether or not we want to do this. I'll give my opinion. I am not sure how useful this really is. Consider a master with two standbys. The master commits a transaction and waits for one of the two standbys, then acknowledges the commit back to the user. Then the master crashes. Now what? It's not immediately obvious which standby we should being online as the primary, and if we guess wrong we could lose transactions thought to be committed. This is probably a solvable problem, with enough work: we can write a script to check the last LSN received by each of the two standbys and promote whichever one is further along. But... what happens if the master and one standby BOTH crash simultaneously? There's no way of knowing (until we get at least one of them back up) whether it's safe to promote the other standby. Not much of a problem really, is it? If you have one server left out of 3, then you promote it OR you stay down - your choice. There is no safe to promote knowledge in *any* scenario; you never know what was on the primary, only what was received by the standby. If you have N standbys still up, you can pick which using the algorithm you mention. Remember that the WAL is sequential, so its not like the commit order of transactions will differ across servers if we use quorum commit. So not a problem. The multiple simultaneous case is fairly common for people that pick the synch to server in next rack because there's a 100 reasons why we'd take out both at the same time, ask JD. I like the idea of a quorum commit type feature where we promise the user that things are committed when enough servers have acknowledged the commit. But I think most people are not going to want that configuration unless we also provide some really good management tools that we don't have today. Good name. Management tools has nothing to do with this; completely orthogonal. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, May 25, 2010 at 6:28 PM, Simon Riggs si...@2ndquadrant.com wrote: ... The best parameter we can specify is the number of servers that we wish to wait for confirmation from. That is a definition that easily manages the complexity of having various servers up/down at any one time. It also survives misconfiguration more easily, as well as providing a workaround if replicating across a bursty network where we can't guarantee response times, even of the typical response time is good. This may be an incredibly naive question, but what happens to the transaction on the master if the number of confirmations is not received? Is this intended to create a situation where the master effectively becomes unavailable for write operations when its synchronous slaves are unavailable? Alastair Bell Turner ^F5 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard
Robert Haas wrote: On Tue, May 25, 2010 at 1:09 PM, Mike Fowler m...@mlfowler.com wrote: We're unlikely to accept this patch if it changes the minimum version of libxml2 required to compile PostgreSQL Why? 2.6.27 is almost 4 years old. Because we work hard to minimize our dependencies and make them as non-onerous as possible. At a minimum, I think it's fair to say that the burden is on you to justify what it's worth bumping the version number. If there is some major speed or performance advantage to using the newer API, maybe we'll consider it. But if it's just a few extra lines of code to work around it, then it's better to write those extra lines of code rather than potentially force users to upgrade packages they're otherwise happy with. The real issue is what's going to be available on most of the platforms we build on. Unfortunately, 2.6.26 is what's on my CentOS 5.4 boxes, for example. I'm sure we don't want to make 9.1 not buildable with the installed libraries on still fairly current RedHat-derived platforms. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add XMLEXISTS function from the SQL/XML standard
Robert Haas robertmh...@gmail.com writes: On Tue, May 25, 2010 at 1:09 PM, Mike Fowler m...@mlfowler.com wrote: We're unlikely to accept this patch if it changes the minimum version of libxml2 required to compile PostgreSQL Why? 2.6.27 is almost 4 years old. Because we work hard to minimize our dependencies and make them as non-onerous as possible. At a minimum, I think it's fair to say that the burden is on you to justify what it's worth bumping the version number. Yes. Increasing the minimum required version of some library is a Big Deal, we don't do it on a whim. And we definitely don't do it just because it's old. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
Alvaro Herrera alvhe...@alvh.no-ip.org writes: This sounds like extending Xid to 64 bits, without having to store the high bits everywhere. Was this discussed in the PGCon devs meeting? Yeah, that's what it would amount to. It was not discussed at the dev meeting --- it was an idea that came up one evening at PGCon. I'm not sure whether this would imply having to widen xid to 64 bits internally. That could be a bit unpleasant as far as CPU and shared memory space go, although every year that goes by makes 32-bit machines less interesting as DB servers. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Tue, May 25, 2010 at 02:00:42PM +0200, Nicolas Barbier wrote: I don't understand the problem. According to me, in the context of SSI, a read-only slave can just map SERIALIZABLE to the technical implementation of REPEATABLE READ (i.e., the currently-existing SERIALIZABLE). The union of the transactions on the master and the slave(s) will still exhibit SERIALIZABLE behavior because the transactions on the slave cannot write anything and are therefore irrelevant. This, unfortunately, isn't true in SSI. Consider read-only transactions on a single node SSI database -- the situation is the same for read-only transactions that run on a slave. These transactions can be part of anomalies, so they need to be checked for conflicts and potentially aborted. Consider Kevin's favorite example, where one table contains the current date and the other is a list of receipts (initially empty). T1 inserts (select current_date) into receipts, but doesn't commit T2 increments current_date and commits T3 reads both current_date and the receipt table T1 commits T3, which is a read-only transaction, sees the incremented date and an empty list of receipts. But T1 later commits a new entry in the receipts table with the old date. No serializable ordering allows this. However, if T3 hadn't performed its read, there'd be no problem; we'd just serialize T1 before T2 and no one would be the wiser. SSI would detect a potential conflict here, which we could resolve by aborting T3. (We could also abort T1, but if this is a replicated system this isn't always an option -- T3 might be running on the slave, so only the slave will know about the conflict, and it can't very well abort an update transaction on the master.) There's another example of a read-only transaction anomaly that could cause similar problems at http://portal.acm.org/citation.cfm?doid=1031570.1031573, but I think this one is easier to follow. Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On May 25, 2010, at 20:18 , Dan Ports wrote: On Tue, May 25, 2010 at 02:00:42PM +0200, Nicolas Barbier wrote: I don't understand the problem. According to me, in the context of SSI, a read-only slave can just map SERIALIZABLE to the technical implementation of REPEATABLE READ (i.e., the currently-existing SERIALIZABLE). The union of the transactions on the master and the slave(s) will still exhibit SERIALIZABLE behavior because the transactions on the slave cannot write anything and are therefore irrelevant. This, unfortunately, isn't true in SSI. Consider read-only transactions on a single node SSI database -- the situation is the same for read-only transactions that run on a slave. These transactions can be part of anomalies, so they need to be checked for conflicts and potentially aborted. Consider Kevin's favorite example, where one table contains the current date and the other is a list of receipts (initially empty). T1 inserts (select current_date) into receipts, but doesn't commit T2 increments current_date and commits T3 reads both current_date and the receipt table T1 commits T3, which is a read-only transaction, sees the incremented date and an empty list of receipts. But T1 later commits a new entry in the receipts table with the old date. No serializable ordering allows this. However, if T3 hadn't performed its read, there'd be no problem; we'd just serialize T1 before T2 and no one would be the wiser. Hm, so in fact SSI sometimes allows the database to be inconsistent, but only as long as nobody tries to observe it? Btw, I still don't get how this follows from the Cahill paper. For a transaction to lie on a dangerous circle, it needs incoming and outgoing edges in the conflict graph, right? But I'd have though that conflicts are always between a reader and a writer or between two writers. So how can a read-only transaction have incoming and outgoing edges? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsvector pg_stats seems quite a bit off.
Excerpts from Jesper Krogh's message of mié may 19 15:01:18 -0400 2010: But the distribution is very flat at the end, the last 128 values are excactly 1.00189e-05 which means that any term sitting outside the array would get an estimate of 1.00189e-05 * 350174 / 2 = 1.75 ~ 2 rows I don't know if this is related, but tsvector stats are computed and stored per term, not per datum. This is different from all other datatypes. Maybe there's code somewhere that's assuming per-datum and coming up with the wrong estimates? Or maybe the tsvector-specific code contains a bug somewhere; maybe a rounding error? -- Álvaro Herrera alvhe...@alvh.no-ip.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Florian Pflug f...@phlo.org wrote: Hm, so in fact SSI sometimes allows the database to be inconsistent, but only as long as nobody tries to observe it? Not exactly. The eventually-persisted state is always consistent, but there can be a transitory committed state which would violate user-defined constraints or business rules *if viewed*. This is what I've been on about -- the commit sequence is not necessarily the same as the apparent order of execution. A read-only transaction, if run before the overlapping commits settle, can view a state which is not consistent with any serial order of execution, and might therefore break the rules. SSI detects that and rolls one of the transactions back if they're all running at serializable transaction isolation in a single SSI database, but the question is how to handle this when the read happens in a replica. Btw, I still don't get how this follows from the Cahill paper. For a transaction to lie on a dangerous circle, it needs incoming and outgoing edges in the conflict graph, right? At least one of the transactions participating in the cycle does. There's no requirement that they all do. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Tue, May 25, 2010 at 08:35:44PM +0200, Florian Pflug wrote: Hm, so in fact SSI sometimes allows the database to be inconsistent, but only as long as nobody tries to observe it? Yes. Note that even while it's in an inconsistent state, you can still perform any query that doesn't observe the inconsistency -- hopefully most queries fall into this category. Btw, I still don't get how this follows from the Cahill paper. For a transaction to lie on a dangerous circle, it needs incoming and outgoing edges in the conflict graph, right? But I'd have though that conflicts are always between a reader and a writer or between two writers. So how can a read-only transaction have incoming and outgoing edges? Right, the read-only transaction can't have incoming edges, but it can have outgoing edges. So it can't be the pivot itself (the transaction with both outgoing and incoming edges), but it can cause *another* transaction to be. In the example I gave, T3 (the r/o transaction) has an outgoing edge to T1, because it didn't see T1's concurrent update. T1 already had an outgoing edge to T2, so adding in this incoming edge from T3 creates the dangerous structure. Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck janwi...@yahoo.com wrote: Have you ever looked at one of those queries, that Londiste or Slony issue against the provider DB in order to get all the log data that has been committed between two snapshots? Is that really the best you can think of? No, I admit I haven't. In fact, I was thinking primarily in terms of log-driven situations, like HS. What would be the best place for me to look to come up to speed on your use case? (I'm relatively sure that the issue isn't that there's no information to find, but that a sequential pass over all available information would take a *long* time.) I've been working through the issues on WAL-based replicas, and have some additional ideas and alternatives, but I'd like to see the big picture, including trigger-based replication, before posting. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
Simon Riggs wrote: How we handle degraded mode is important, yes. Whatever parameters we choose the problem will remain the same. Should we just ignore degraded mode and respond as if nothing bad had happened? Most people would say not. If we specify server1 = synch and server2 = async we then also need to specify what happens if server1 is down. People might often specify if (server1 == down) server2 = synch. I have a hard time imagining including async servers in the quorum. If an async servers vote is necessary to reach quorum due to a 'real' sync standby server failure, it would mean that the async-intended standby is now also in sync with the master transactions. IMHO this is a bad situation, since instead of the DBA getting the error: not enough sync standbys to reach quorum, he'll now get database is slow complaints, only to find out later that too much sync standby servers went south. (under the assumption that async servers are mostly on too slow links to consider for sync standby). regards, Yeb Havinga -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
Hi, Simon Riggs si...@2ndquadrant.com writes: On Tue, 2010-05-25 at 19:08 +0200, Alastair Turner wrote: On Tue, May 25, 2010 at 6:28 PM, Simon Riggs si...@2ndquadrant.com wrote: The best parameter we can specify is the number of servers that we wish to wait for confirmation from. This may be an incredibly naive question, but what happens to the transaction on the master if the number of confirmations is not received? It's much easier to say you want to wait for N servers to respond, but don't care which they are. One parameter, simple and flexible. [...] So whatever we do, we need additional parameters to specify timeouts (including wait-forever as an option) and action-on-timeout: commit or rollback. I was preparing an email on the line that we need each slave to declare its desired minimum level of synchronicity, and have the master filter that with what the transaction wants. Scratch that. Thinking about it some more, I see that Simon's proposal is both more simple and effective: we already have Hot Standby and admin functions that tells us the last replayed LSN. The bigger wins. So in case of failover we know which slave to choose. The only use case I can see for what I had in mind is to allow the user to choose which server is trusted to have accurate data or better read only performances. But if the link is slow, the code will soon enough notice, mind you. I'm still not sure about my preference here, but I can see why Simon's proposal is simpler and addresses all concerns apart from forcing the servers into a non-optimal setup for a gain that is uneasy to see. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confused about the buffer pool size
Hello Heikki: This is what the documentation says (see below). But it does not tell my anything about what the actual buffer size is. How do I know what the real buffer size is? I am using 8.4.4 and I am running only one query at a time. Cheers, MMK. Sets the planner's assumption about the effective size of the disk cache that is available to a single query. This is factored into estimates of the cost of using an index; a higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be used. When setting this parameter you should consider both PostgreSQL's shared buffers and the portion of the kernel's disk cache that will be used for PostgreSQL data files. Also, take into account the expected number of concurrent queries on different tables, since they will have to share the available space. This parameter has no effect on the size of shared memory allocated by PostgreSQL, nor does it reserve kernel disk cache; it is used only for estimation purposes. The default is 128 megabytes (128MB). --- On Tue, 5/25/10, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: From: Heikki Linnakangas heikki.linnakan...@enterprisedb.com Subject: Re: [HACKERS] Confused about the buffer pool size To: MMK bom...@yahoo.com Cc: PostgreSQL-development pgsql-hackers@postgresql.org Date: Tuesday, May 25, 2010, 11:36 AM On 25/05/10 19:49, MMK wrote: Hello All: In the code (costsize.c), I see that effective_cache_size is set to DEFAULT_EFFECTIVE_CACHE_SIZE. This is defined as follows in cost.h #define DEFAULT_EFFECTIVE_CACHE_SIZE 16384 But when I say show shared_buffers in psql I get, shared_buffers 28MB In postgresql.conf file, the following lines appear shared_buffers = 28MB # min 128kB # (change requires restart)#temp_buffers = 8MB # min 800kB So I am assuming that the buffer pool size is 28MB = 28 * 128 = 3584 8K pages. So should effective_cache_size be set to 3584 rather than the 16384? No. Please see the manual for what effective_cache_size means: http://www.postgresql.org/docs/8.4/interactive/runtime-config-query.html#GUC-EFFECTIVE-CACHE-SIZE -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
2010/5/25 Dan Ports d...@csail.mit.edu: On Tue, May 25, 2010 at 02:00:42PM +0200, Nicolas Barbier wrote: I don't understand the problem. According to me, in the context of SSI, a read-only slave can just map SERIALIZABLE to the technical implementation of REPEATABLE READ (i.e., the currently-existing SERIALIZABLE). The union of the transactions on the master and the slave(s) will still exhibit SERIALIZABLE behavior because the transactions on the slave cannot write anything and are therefore irrelevant. This, unfortunately, isn't true in SSI. Consider read-only transactions on a single node SSI database -- the situation is the same for read-only transactions that run on a slave. These transactions can be part of anomalies, so they need to be checked for conflicts and potentially aborted. Consider Kevin's favorite example, where one table contains the current date and the other is a list of receipts (initially empty). T1 inserts (select current_date) into receipts, but doesn't commit T2 increments current_date and commits T3 reads both current_date and the receipt table T1 commits T3, which is a read-only transaction, sees the incremented date and an empty list of receipts. But T1 later commits a new entry in the receipts table with the old date. No serializable ordering allows this. However, if T3 hadn't performed its read, there'd be no problem; we'd just serialize T1 before T2 and no one would be the wiser. SSI would detect a potential conflict here, which we could resolve by aborting T3. (We could also abort T1, but if this is a replicated system this isn't always an option -- T3 might be running on the slave, so only the slave will know about the conflict, and it can't very well abort an update transaction on the master.) Ah, indeed. I made the same reasoning mistake as Florian (presumably) did: I didn't think of the fact that the read-only transaction doesn't need to be the pivot. Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On May 25, 2010, at 20:48 , Dan Ports wrote: On Tue, May 25, 2010 at 08:35:44PM +0200, Florian Pflug wrote: Hm, so in fact SSI sometimes allows the database to be inconsistent, but only as long as nobody tries to observe it? Yes. Note that even while it's in an inconsistent state, you can still perform any query that doesn't observe the inconsistency -- hopefully most queries fall into this category. Yeah, as long as you just walk by without looking, the database is happy ;-) Btw, I still don't get how this follows from the Cahill paper. For a transaction to lie on a dangerous circle, it needs incoming and outgoing edges in the conflict graph, right? But I'd have though that conflicts are always between a reader and a writer or between two writers. So how can a read-only transaction have incoming and outgoing edges? Right, the read-only transaction can't have incoming edges, but it can have outgoing edges. So it can't be the pivot itself (the transaction with both outgoing and incoming edges), but it can cause *another* transaction to be. In the example I gave, T3 (the r/o transaction) has an outgoing edge to T1, because it didn't see T1's concurrent update. T1 already had an outgoing edge to T2, so adding in this incoming edge from T3 creates the dangerous structure. Hm, but for there to be an actual problem (and not a false positive), an actual dangerous circle has to exist in the dependency graph. The existence of a dangerous structure is just a necessary (but not sufficient) and easily checked-for condition for that, right? Now, if a read-only transaction only ever has outgoing edges, it cannot be part of a (dangerous or not) circle, and hence any dangerous structure it is part of is a false positive. I guess my line of reasoning is flawed somehow, but I cannot figure out why... best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
2010/5/25 Florian Pflug f...@phlo.org: On May 25, 2010, at 20:18 , Dan Ports wrote: T3, which is a read-only transaction, sees the incremented date and an empty list of receipts. But T1 later commits a new entry in the receipts table with the old date. No serializable ordering allows this. However, if T3 hadn't performed its read, there'd be no problem; we'd just serialize T1 before T2 and no one would be the wiser. Hm, so in fact SSI sometimes allows the database to be inconsistent, but only as long as nobody tries to observe it? I would not call this an inconsistent state: it would become inconsistent only after someone (e.g., T3) has observed it _and_ T1 commits. Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fwd: Hiding data in postgresql
Hi guys, (I tried the question in another forum first) Does someone have any ideas how I can hide data without the meta data noticing? To explain further, I would like to save some collection of data where the meta-data does not see it. I am trying to do some security through obscurity. It is for research purposes. For example, populate a table with 1000 rows, but the meta-data only knows of about 500 of them? Only on an export of a dump can you find all the data again. Or maybe to make a hidden duplicate schema that can point to the hidden data? Does someone have any good ideas on how to achieve this or something similar? Kind regards Hector On Mon, May 24, 2010 at 9:16 PM, Hector Beyers hqbey...@gmail.com wrote: Hi guys, does ANYONE have any tips on hiding data on a database server? This means that data is stored in places that is not necessarily picked up in the schema of the database. I am doing some research on databases and need some direction. Any help or direction will be highly appreciated. Kind regards Hector
Re: [HACKERS] Exposing the Xact commit order to the user
2010/5/25 Florian Pflug f...@phlo.org: Hm, but for there to be an actual problem (and not a false positive), an actual dangerous circle has to exist in the dependency graph. The existence of a dangerous structure is just a necessary (but not sufficient) and easily checked-for condition for that, right? Now, if a read-only transaction only ever has outgoing edges, it cannot be part of a (dangerous or not) circle, and hence any dangerous structure it is part of is a false positive. I guess my line of reasoning is flawed somehow, but I cannot figure out why... In the general case, wr dependencies also create must be serialized before edges. It seems that those edges can be discarded when finding a pivot, but if you want to go back to basics: ( means must be serialized before.) * T1 T2, because T1 reads a version of a data element for which T2 later creates a newer version (rw between T1 and T2). * T3 T1, because T3 reads a version of a data element for which T1 later creates a newer version (rw between T3 and T1). * T2 T3, because T2 creates a version of a data element, which is then read by T3 (wr between T2 and T3). (As you can see, those 3 edges form a cycle.) Nicolas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Florian Pflug f...@phlo.org wrote: Hm, but for there to be an actual problem (and not a false positive), an actual dangerous circle has to exist in the dependency graph. The existence of a dangerous structure is just a necessary (but not sufficient) and easily checked-for condition for that, right? Now, if a read-only transaction only ever has outgoing edges, it cannot be part of a (dangerous or not) circle, and hence any dangerous structure it is part of is a false positive. I guess my line of reasoning is flawed somehow, but I cannot figure out why... Here's why: We're tracking rw-dependencies, where the time-arrow showing effective order of execution points from the reader to the writer (since the reader sees a state prior to the write, it effectively executes before it). These are important because there have to be two such dependencies, one in to the pivot and one out from the pivot, for a problem to exist. (See various works by Dr. Alan Fekete, et al, for details.) But other dependencies can imply an order of execution. In particular, a wr-dependency, where a transaction *can* see data committed by another transaction, implies that the *writer* came first in the order of execution. In this example, the transaction which lists the receipts successfully reads the control table update, but is not able to read the receipt insert. This completes the cycle, making it a real anomaly and not a false positive. Note that the wr-dependency can actually exist outside the database, making it pretty much impossible to accurately tell a false positive from a true anomaly when the pivot exists and the transaction writing data which the pivot can't read commits first. For example, let's say that the update to the control table is committed from an application which, seeing that its update came back without error, proceeds to list the receipts for the old date in a subsequent transaction. You have a wr-dependency which is, in reality, quite real and solid with no way to notice it within the database engine. That's why the techniques used in SSI are pretty hard to improve upon beyond more detailed and accurate tracking of rw-conflicts. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/24/2010 9:30 AM, Greg Sabino Mullane wrote: -BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 In light of the proposed purging scheme, how would it be able to distinguish between those two cases (nothing there yet vs. was there but purged)? There is a difference between an empty result set and an exception. No, I meant how will the *function* know, if a superuser and/or some background process can purge records at any time? The data contains timestamps which are supposedly taken in commit order. Checking the age of the last entry in the file should be simple enough to determine if the segment matches the max age configuration (if set). In the case of a superuser telling what to purge he would just call a function with a serial number (telling the obsolete segments). Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] mergejoin null handling (was Re: [PERFORM] merge join killing performance)
Scott Marlowe scott.marl...@gmail.com writes: So, Tom, so you think it's possible that the planner isn't noticing all those nulls and thinks it'll just take a row or two to get to the value it needs to join on? I dug through this and have concluded that it's really an oversight in the patch I wrote some years ago in response to this: http://archives.postgresql.org/pgsql-performance/2005-05/msg00219.php That patch taught nodeMergejoin that a row containing a NULL key can't possibly match anything on the other side. However, its response to observing a NULL is just to advance to the next row of that input. What we should do, if the NULL is in the first merge column and the sort order is nulls-high, is realize that every following row in that input must also contain a NULL and so we can just terminate the mergejoin immediately. The original patch works well for cases where there are just a few nulls in one input and the important factor is to not read all the rest of the other input --- but it fails to cover the case where there are many nulls and the important factor is to not read all the rest of the nulls. The problem can be demonstrated if you modify the example given in the above-referenced message so that table t1 contains lots of nulls rather than just a few: explain analyze will show that all of t1 gets read by the mergejoin, and that's not necessary. I'm inclined to think this is a performance bug and should be back-patched, assuming the fix is simple (which I think it is, but haven't coded/tested yet). It'd probably be reasonable to go back to 8.3; before that, sorting nulls high versus nulls low was pretty poorly defined and so there'd be risk of breaking cases that gave the right answers before. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: Hiding data in postgresql
On Tue, May 25, 2010 at 3:39 PM, Hector Beyers hqbey...@gmail.com wrote: Hi guys, (I tried the question in another forum first) Does someone have any ideas how I can hide data without the meta data noticing? To explain further, I would like to save some collection of data where the meta-data does not see it. I am trying to do some security through obscurity. It is for research purposes. For example, populate a table with 1000 rows, but the meta-data only knows of about 500 of them? Only on an export of a dump can you find all the data again. Or maybe to make a hidden duplicate schema that can point to the hidden data? Does someone have any good ideas on how to achieve this or something similar? Kind regards Hector On Mon, May 24, 2010 at 9:16 PM, Hector Beyers hqbey...@gmail.com wrote: Hi guys, does ANYONE have any tips on hiding data on a database server? This means that data is stored in places that is not necessarily picked up in the schema of the database. I am doing some research on databases and need some direction. Any help or direction will be highly appreciated. Kind regards Hector Not sure if this helpful, but be sure to know about views, which can be used to filter out rows of a table. Example: CREATE TABLE foo (name TEXT, visible BOOL); INSERT INTO foo VALUES ('two', true); INSERT INTO foo VALUES ('three', true); INSERT INTO foo VALUES ('four', false); INSERT INTO foo VALUES ('five', true); INSERT INTO foo VALUES ('six', false); INSERT INTO foo VALUES ('seven', true); INSERT INTO foo VALUES ('eight', false); INSERT INTO foo VALUES ('nine', false); INSERT INTO foo VALUES ('ten', false); INSERT INTO foo VALUES ('eleven', true); CREATE VIEW foo_view AS SELECT foo.name FROM foo WHERE visible=true; = SELECT * FROM foo; name | visible +- two| t three | t four | f five | t six| f seven | t eight | f nine | f ten| f eleven | t (10 rows) = SELECT * FROM foo_view; name two three five seven eleven (5 rows) Note that views are SELECT-only, but you can use CREATE RULE to simulate an updatable view. You may also want to read about Veil: http://veil.projects.postgresql.org/curdocs/main.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Jan Wieck janwi...@yahoo.com writes: No, I meant how will the *function* know, if a superuser and/or some background process can purge records at any time? The data contains timestamps which are supposedly taken in commit order. You can *not* rely on the commit timestamps to be in exact order. (Perhaps approximate ordering is good enough for what you want here, but just be careful to not fall into the trap of assuming that they're exactly ordered.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On Tue, 2010-05-25 at 21:19 +0200, Yeb Havinga wrote: Simon Riggs wrote: How we handle degraded mode is important, yes. Whatever parameters we choose the problem will remain the same. Should we just ignore degraded mode and respond as if nothing bad had happened? Most people would say not. If we specify server1 = synch and server2 = async we then also need to specify what happens if server1 is down. People might often specify if (server1 == down) server2 = synch. I have a hard time imagining including async servers in the quorum. If an async servers vote is necessary to reach quorum due to a 'real' sync standby server failure, it would mean that the async-intended standby is now also in sync with the master transactions. IMHO this is a bad situation, since instead of the DBA getting the error: not enough sync standbys to reach quorum, he'll now get database is slow complaints, only to find out later that too much sync standby servers went south. (under the assumption that async servers are mostly on too slow links to consider for sync standby). Yeh, there's difficulty either way. We don't need to think of servers as being synch or async, more likely we would rate them in terms of typical synchronisation delay. So yeh, calling them fast and slow in terms of synchronisation delay makes sense. Some people with low xact rate and high need for protection might want to switch across to the slow server and keep running. If not, the max_synch_delay would trip and you would then select synch_failure_action = rollback. The realistic response is to add a second fast sync server, to allow you to stay up even when you lose one of the fast servers. That now gives you 4 servers and the failure modes start to get real complex. Specifying rules to achieve what you're after would be much harder. Some people might want that, but most people won't in the general case and if they did specify them they'd likely get them wrong. All of these issues show why I want to specify the synchronisation mode as a USERSET. That will allow us to specify more easily which parts of our application are important when the cluster is degraded and which data is so critical it must reach multiple servers. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
Robert Haas robertmh...@gmail.com wrote: maybe we should get serializable working and committed on one node first and then worry about how to distribute it. I think there might be other approaches to this problem Well, I've got two or three other ideas on how we can manage this for HS, but since I now realize that I've totally misunderstood the main use case for this (which is to support trigger-based replication), I'd like to be clear on something before letting it drop. The big question is, do such replicas need to support serializable access to the data modified by serializable transactions in the source database? That is, is there a need for such replicas to only see states which are possible in some serial order of execution of serializable transactions on the source database? Or to phrase the same question a third way, should there be a way to run queries on such replicas with confidence that what is viewed is consistent with user-defined constraints and business rules? If not, there's no intersection between this feature and SSI. If there is, I think we should think through at least a general strategy sooner, rather than later. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/25/2010 12:03 PM, Simon Riggs wrote: On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: In some systems (data warehousing, replication), the order of commits is important, since that is the order in which changes have become visible. This information could theoretically be extracted from the WAL, but scanning the entire WAL just to extract this tidbit of information would be excruciatingly painful. I think it would be quite simple to read WAL. WALSender reads the WAL file after its been flushed, so it would be simple for it to read a blob of WAL and then extract the commit order from it. Overall though, it would be easier and more efficient to *add* info to WAL and then do all this processing *after* WAL has been transported elsewhere. Extracting info with DDL triggers, normal triggers, commit order and everything else seems like too much work to me. Every other RDBMS has moved away from trigger-based replication and we should give that serious consideration also. Reading the entire WAL just to find all COMMIT records, then go back to the origin database to get the actual replication log you're looking for is simpler and more efficient? I don't think so. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
Alvaro, This sounds like extending Xid to 64 bits, without having to store the high bits everywhere. Was this discussed in the PGCon devs meeting? Essentially, yes. One of the main objections to raising XID to 64-bit has been the per-row overhead. But adding 4 bytes per page wouldn't be much of an impact. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On Tue, 2010-05-25 at 16:41 -0400, Jan Wieck wrote: On 5/25/2010 12:03 PM, Simon Riggs wrote: On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: In some systems (data warehousing, replication), the order of commits is important, since that is the order in which changes have become visible. This information could theoretically be extracted from the WAL, but scanning the entire WAL just to extract this tidbit of information would be excruciatingly painful. I think it would be quite simple to read WAL. WALSender reads the WAL file after its been flushed, so it would be simple for it to read a blob of WAL and then extract the commit order from it. Overall though, it would be easier and more efficient to *add* info to WAL and then do all this processing *after* WAL has been transported elsewhere. Extracting info with DDL triggers, normal triggers, commit order and everything else seems like too much work to me. Every other RDBMS has moved away from trigger-based replication and we should give that serious consideration also. Reading the entire WAL just to find all COMMIT records, then go back to the origin database to get the actual replication log you're looking for is simpler and more efficient? I don't think so. Agreed, but I think I've not explained myself well enough. I proposed two completely separate ideas; the first one was this: If you must get commit order, get it from WAL on *origin*, using exact same code that current WALSender provides, plus some logic to read through the WAL records and extract commit/aborts. That seems much simpler than the proposal you outlined and as SR shows, its low latency as well since commits write to WAL. No need to generate event ticks either, just use XLogRecPtrs as WALSender already does. I see no problem with integrating that into core, technically or philosophically. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
On 5/24/2010 9:30 AM, Heikki Linnakangas wrote: On 22/05/10 16:35, Tom Lane wrote: Josh Berkusj...@agliodbs.com writes: From a discussion at dinner at pgcon, I wanted to send this to the list for people to poke holes in it: Somebody (I think Joe or Heikki) poked a big hole in this last night at the Royal Oak. Me. Although the scheme would get rid of the need to replace old XIDs with FrozenXid, it does not get rid of the need to set hint bits before you can truncate CLOG. Hmm, we don't rely on setting hint bits to truncate CLOG anymore (http://archives.postgresql.org/pgsql-committers/2006-11/msg00026.php). It's the replacement of xids with FrozenXid that matters, the hint bits are really just hints. Doesn't change the conclusion, though: you still need to replace XIDs with FrozenXids to truncate the clog. Conceivably we could keep around more than 2^32 transactions in clog with this scheme, but then you need a lot more space for the clog. But perhaps it would be better to do that than to launch anti-wraparound vacuums, or to refuse more updates in the extreme cases. Correct. The problem actually are aborted transactions. Just because an XID is really old doesn't mean it was committed. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Confused about the buffer pool size
MMK, But it does not tell my anything about what the actual buffer size is. How do I know what the real buffer size is? I am using 8.4.4 and I am running only one query at a time. Please move this discussion to the pgsql-general or pgsql-performance lists. pgsql-hackers is for working on PostgreSQL code, and further questions on this list will probably not be answered. Other than that, I have no idea what you mean by buffer size, nor why you need to know it. I'd suggest starting your post on the other mailing list by explaining what specific problem you're trying to solve. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea for getting rid of VACUUM FREEZE on cold pages
Correct. The problem actually are aborted transactions. Just because an XID is really old doesn't mean it was committed. Yes, that's the main issue with my idea; XIDs which fell off the CLOG would become visible even if they'd aborted. Do we get a bit in the visibility map for a page which has aborted transaction rows on it? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/25/2010 4:50 PM, Simon Riggs wrote: On Tue, 2010-05-25 at 16:41 -0400, Jan Wieck wrote: On 5/25/2010 12:03 PM, Simon Riggs wrote: On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote: In some systems (data warehousing, replication), the order of commits is important, since that is the order in which changes have become visible. This information could theoretically be extracted from the WAL, but scanning the entire WAL just to extract this tidbit of information would be excruciatingly painful. I think it would be quite simple to read WAL. WALSender reads the WAL file after its been flushed, so it would be simple for it to read a blob of WAL and then extract the commit order from it. Overall though, it would be easier and more efficient to *add* info to WAL and then do all this processing *after* WAL has been transported elsewhere. Extracting info with DDL triggers, normal triggers, commit order and everything else seems like too much work to me. Every other RDBMS has moved away from trigger-based replication and we should give that serious consideration also. Reading the entire WAL just to find all COMMIT records, then go back to the origin database to get the actual replication log you're looking for is simpler and more efficient? I don't think so. Agreed, but I think I've not explained myself well enough. I proposed two completely separate ideas; the first one was this: If you must get commit order, get it from WAL on *origin*, using exact same code that current WALSender provides, plus some logic to read through the WAL records and extract commit/aborts. That seems much simpler than the proposal you outlined and as SR shows, its low latency as well since commits write to WAL. No need to generate event ticks either, just use XLogRecPtrs as WALSender already does. I see no problem with integrating that into core, technically or philosophically. Which means that if I want to allow a consumer of that commit order data to go offline for three days or so to replicate the 5 requested, low volume tables, the origin needs to hang on to the entire WAL log from all 100 other high volume tables? Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [spf:guess] Re: [HACKERS] ROLLBACK TO SAVEPOINT
Florian Pflug wrote: On May 25, 2010, at 12:18 , Heikki Linnakangas wrote: On 25/05/10 13:03, Florian Pflug wrote: On May 25, 2010, at 6:08 , Sam Vilain wrote: http://www.postgresql.org/docs/8.4/static/sql-savepoint.html Lead us to believe that if you roll back to the same savepoint name twice in a row, that you might start walking back through the savepoints. I guess I missed the note on ROLLBACK TO SAVEPOINT that that is not how it works. Here is the section: SQL requires a savepoint to be destroyed automatically when another savepoint with the same name is established. In PostgreSQL, the old savepoint is kept, though only the more recent one will be used when rolling back or releasing. (Releasing the newer savepoint will cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and RELEASE SAVEPOINT.) Otherwise, SAVEPOINT is fully SQL conforming. I'm confused. The sentence in brackets Releasing the newer savepoint will cause the older one to again become accessible to ROLLBACK TO SAVEPOINT and RELEASE SAVEPOINT implies that you *will* walk backwards through all the savepoints named a if you repeatedly issue ROLLBACK TO SAVEPOINT a, no? If that is not how it actually works, then this whole paragraph is wrong, I'd say. Releasing the newer savepoint will cause the older one to again become accessible, as the doc says, but rolling back to a savepoint does not implicitly release it. You'll have to use RELEASE SAVEPOINT for that. Ah, now I get it. Thanks. Would changing Releasing the newer savepoint will cause ... to Explicitly releasing the newer savepoint or maybe even Explicitly releasing the newer savepoint with RELEASE SAVEPOINT will cause ... make things clearer? Yes, probably - your misreading matches my misreading of it :-) There is another way you can get there - releasing to a savepoint before the re-used savepoint name will also release the savepoints after it. ie savepoint foo; savepoint bar; savepoint foo; release to savepoint bar; release to savepoint foo; After the first release, the second 'foo' savepoint is gone. I think this is a key advantage in saving the old savepoints. Cheers, Sam -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tsvector pg_stats seems quite a bit off.
On 19/05/10 21:01, Jesper Krogh wrote: The document base is arount 350.000 documents and I have set the statistics target on the tsvector column to 1000 since the 100 seems way of. So for tsvectors the statistics target means more or less at any time track at most 10 * target lexemes simultaneously where track means keeping them in memory while going through the tuples being analysed. Remember that the measure is in lexemes, not whole tsvectors and the 10 factor is meant to approximate the average number of unique lexemes in a tsvector. If your documents are very large, this might not be a good approximation. # ANALYZE verbose reference (document_tsvector); INFO: analyzing reference INFO: reference: scanned 14486 of 14486 pages, containing 350174 live rows and 6027 dead rows; 30 rows in sample, 350174 estimated total rows ANALYZE Ok, so analyze allmost examined all rows. Looking into most_common_freqs I find # select count(unnest) from (select unnest(most_common_freqs) from pg_stats where attname = 'document_tsvector') as foo; count --- 2810 (1 row) So the size of the most_common_freqs and most_common_vals rows in pg_statistics for tsvectors has an upper bound of stats-target * 10 (for the same reasons as mentioned before) and holds lexemes (not whole tsvectors). What happens also is that lexemes that where seen only one while going through the analysed set are discarded, so that's why you can actually get less entries in these arrays, even if your document set is big. But the distribution is very flat at the end, the last 128 values are excactly 1.00189e-05 which means that any term sitting outside the array would get an estimate of 1.00189e-05 * 350174 / 2 = 1.75 ~ 2 rows Yeah, this might meant that you could try cranking up the stats target a lot, to make the set of simulatenously tracked lexemes larger (it will cost time and memory during analyse though). If the documents have completely different contents, what can happen is that almost all lexemes are only seen a few times and get removed during the pruning of the working set. I have seen similar behaviour while working on the typanalyze function for tsvectors. So far I have no idea if this is bad or good, so a couple of sample runs of stuff that is sitting outside the most_common_vals array: [gathered statistics suck] So the most_common_vals seems to contain a lot of values that should never have been kept in favor of other values that are more common. In practice, just cranking the statistics estimate up high enough seems to solve the problem, but doesn't there seem to be something wrong in how the statistics are collected? The algorithm to determine most common vals does not do it accurately. That would require keeping all lexemes from the analysed tsvectors in memory, which would be impractical. If you want to learn more about the algorithm being used, try reading http://www.vldb.org/conf/2002/S10P03.pdf and corresponding comments in ts_typanalyze.c It would be interesting to know what's the average size of a tsvector in your document set (ie. how many unique lexemes does a tsvector have on average). In general, the tsvector typanalyze function is designed to suck less than the constant factor that has been used previously, but it only works really well on the most common lexemes (thus preventing most gross misestimates). I'm not very surprised it misses the difference between 1612/350174 and 4/350174 and I'm quite happy that is gets that if you set the stats target really high :o) There's always the possibility that there's some stupid bug there, but I think you just set your expectations too high for the tsvector typanalze function. If you could come up with a better way of doing tsvector stats, that would be awesome - currently it's just doing its best to prevent the most outrageous errors. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExecutorCheckPerms() hook
(2010/05/25 21:44), Stephen Frost wrote: KaiGai, * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: OK, the attached patch reworks it according to the way. Reviewing this patch, there are a whole slew of problems. #1: REALLY BIG ISSUE- Insufficient comment updates. You've changed function definitions in a pretty serious way as well as moved some code around such that some of the previous comments don't make sense. You have got to update comments when you're writing a patch. Indeed, the places I see a changes in comments are when you've removed what appears to still be valid and appropriate comments, or places where you've added comments which are just blatently wrong with the submitted patch. Hmm. I'll revise/add the comment around the patched code. #2: REALLY BIG ISSUE- You've added ExecutorCheckPerms_hook as part of this patch- don't, we're in feature-freeze right now and should not be adding hooks at this time. The patch is intended to submit for the v9.1 development, not v9.0, isn't it? #3: You didn't move ExecCheckRTPerms() and ExecCheckRTEPerms() to utils/acl and instead added executor/executor.h to rt_triggers.c. I don't particularly like that. I admit that DoCopy() already knew about the executor, and if that were the only case outside of the executor where ExecCheckRTPerms() was getting called it'd probably be alright, but we already have another place that wants to use it, so let's move it to a more appropriate place. Sorry, I'm a bit confused. It seemed to me you suggested to utilize ExecCheckRTPerms() rather than moving its logic anywhere, so I kept it here. (Was it misunderstand?) If so, but, I doubt utils/acl is the best placeholder of the moved ExecCheckRTPerms(), because the checker function calls both of the default acl functions and a optional external security function. It means the ExecCheckRTPerms() is caller of acl functions, not acl function itself, isn't it? In other words, I wonder we should categorize a function X which calls A and (optionally) B as a part of A. I agreed the checker function is not a part of executor, but it is also not a part of acl functions in my opinion. If it is disinclined to create a new directory to deploy the checker function, my preference is src/backend/utils/adt/security.c and src/include/utils/security.h . #4: As mentioned previously, the hook (which should be added in a separate patch anyway) makes more sense to me to be in ExecCheckRTPerms(), not ExecCheckRTEPerms(). This also means that we need to be calling ExecCheckRTPerms() from DoCopy and RI_Initial_Check(), to make sure that the hook gets called. To that end, I wouldn't even expose ExecCheckRTEPerms() outside of acl.c. Also, there should be a big comment about not using or calling ExecCheckRTEPerms() directly outside of ExecCheckRTPerms() since the hook would then be skipped. I don't have any differences in preference between ExecCheckRTPerms() and ExecCheckRTEPerms(), except for DoCopy() and RI_Initial_Check() have to call the checker function with list_make1(rte), instead of rte. #5: In DoCopy, you can remove relPerms and remainingPerms, but I'd probably leave required_access up near the top and then just use it to set rte-required_access directly rather than moving that bit deep down into the function. OK, #6: I havn't checked yet, but if there are other things in an RTE which would make sense in the DoCopy case, beyond just what's needed for the permissions checking, and which wouldn't be 'correct' with a NULL'd value, I would set those. Yes, we're building the RTE to check permissions, but we don't want someone downstream to be suprised when they make a change to something in the permissions checking and discover that a value in RTE they expected to be there wasn't valid. Even more so, if there are function helpers which can be used to build an RTE, we should be using them. The same goes for RI_Initial_Check(). Are you saying something like makeFuncExpr()? I basically agree. However, should it be done in this patch? #7: I'd move the conditional if (is_from) into the foreach which is building the columnsSet and eliminate the need for columnsSet; I don't see that it's really adding much here. OK, #8: When moving ExecCheckRTPerms(), you should rename it to be more like the other function calls in acl.h Perhaps pg_rangetbl_aclcheck()? Also, it should return an actual AclResult instead of just true/false. See the comments in #3. And, if the caller has to handle aclcheck_error(), user cannot distinguish access violation errors between the default PG permission and any other external security stuff, isn't it? Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExecutorCheckPerms() hook
(2010/05/25 22:59), Stephen Frost wrote: * KaiGai Kohei (kai...@ak.jp.nec.com) wrote: * DoCopy() and RI_Initial_Check() were reworked to call ExecCheckRTEPerms() with locally built RangeTblEntry. Maybe I missed it somewhere, but we still need to address the case where the user doesn't have those SELECT permissions that we're looking for in RI_Initial_Check(), right? KaiGai, your patch should be addressing that in a similar fashion.. The reason why user must have SELECT privileges on the PK/FK tables is the validateForeignKeyConstraint() entirely calls SPI_execute() to verify FK constraints can be established between two tables (even if fallback path). And, the reason why RI_Initial_Check() now calls pg_class_aclcheck() is to try to avoid unexpected access violation error because of SPI_execute(). However, the fallback path also calls SPI_execute() entirely, so I concluded the permission checks in RI_Initial_Check() is nonsense. However, it is an independent issue right now, so I kept it as is. The origin of the matter is that we applies unnecessary permission checks, although it is purely internal use and user was already checked to execute whole of ALTER TABLE statement. Right? Thanks, -- KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronization levels in SR
On May 25, 2010, at 22:16 , Simon Riggs wrote: All of these issues show why I want to specify the synchronisation mode as a USERSET. That will allow us to specify more easily which parts of our application are important when the cluster is degraded and which data is so critical it must reach multiple servers. Hm, but since flushing a important COMMIT to the slave(s) will also need to flush all previous (potentially unimportant) COMMITs to the slave(s), isn't there a substantial chance of priority-inversion type problems there? Then again, if asynchronous_commit proved to be effective than so will this probably, so maybe my fear is unjustified. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: Hiding data in postgresql
Hector, * Hector Beyers (hqbey...@gmail.com) wrote: Does someone have any ideas how I can hide data without the meta data noticing? To explain further, I would like to save some collection of data where the meta-data does not see it. I am trying to do some security through obscurity. It is for research purposes. This explanation doesn't actually explain anything, near as I can tell. Perhaps if you would share what your actual problem is, we could recommend a solution. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Open Item: pg_controldata - machine readable?
There is an open item pg_controldata - machine readable? in the list: http://wiki.postgresql.org/wiki/PostgreSQL_9.0_Open_Items The proposal by Joe Conway is adding a new contib module. http://archives.postgresql.org/message-id/4b959d7a.6010...@joeconway.com http://github.com/jconway/pg_controldata Should we add the module to 9.0? If we do so, SGML documentation is required. IMHO, I'd like to put the feature into the core instead of a contrib module, but we cannot change the catalog version in this time. So, how about providing control file information through pg_settings view? We will retrieve those variables as GUC options. Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fwd: PDXPUG Day at OSCON 2010
It was recommended to me to forward this to -hackers. Regards, Mark -- Forwarded message -- From: Mark Wong mark...@gmail.com Date: Tue, May 18, 2010 at 6:57 AM Subject: PDXPUG Day at OSCON 2010 To: pgsql-annou...@postgresql.org Thanks to the generosity of O'Reilly, we will be having a full day of free PostgreSQL sessions on Sunday, July 18 at the Oregon Convention Center. Location details and schedule information can be found on the wiki at: http://wiki.postgresql.org/wiki/PDXPUGDay2010 We will ask for a $30 donation towards PostgreSQL at the conference, but no one will be turned away. Sign up here: https://spreadsheets.google.com/viewform?hl=enformkey=dDVBRnJGWVlZRkdycFdXbXVuYTNiU2c6MQ Please submit your talk proposal here: http://spreadsheets.google.com/viewform?hl=enformkey=dHBFMGFIWmxJUzhRM3R6dXVlWWxYQ1E6MQ. Proposals will be decided upon in June 7th and updated on the wiki. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExecutorCheckPerms() hook
* KaiGai Kohei (kai...@ak.jp.nec.com) wrote: #2: REALLY BIG ISSUE- You've added ExecutorCheckPerms_hook as part of this patch- don't, we're in feature-freeze right now and should not be adding hooks at this time. The patch is intended to submit for the v9.1 development, not v9.0, isn't it? That really depends on if this is actually fixing a bug in the existing code or not. I'm on the fence about that at the moment, to be honest. I was trying to find if we expliitly say that SELECT rights are needed to reference a column but wasn't able to. If every code path is expecting that, then perhaps we should just document it that way and move on. In that case, all these changes would be for 9.1. If we decide the current behavior is a bug, it might be something which could be fixed in 9.0 and maybe back-patched. In *either* case, given that one is a 'clean-up' patch and the other is 'new functionality', they should be independent *anyway*. Small incremental changes that don't break things when applied is what we're shooting for here. #3: You didn't move ExecCheckRTPerms() and ExecCheckRTEPerms() to utils/acl and instead added executor/executor.h to rt_triggers.c. I don't particularly like that. I admit that DoCopy() already knew about the executor, and if that were the only case outside of the executor where ExecCheckRTPerms() was getting called it'd probably be alright, but we already have another place that wants to use it, so let's move it to a more appropriate place. Sorry, I'm a bit confused. It seemed to me you suggested to utilize ExecCheckRTPerms() rather than moving its logic anywhere, so I kept it here. (Was it misunderstand?) I'm talking about moving the whole function (all 3 lines of it) to somewhere else and then reworking the function to be more appropriate based on it's new location (including renaming and changing arguments and return values, as appropriate). If so, but, I doubt utils/acl is the best placeholder of the moved ExecCheckRTPerms(), because the checker function calls both of the default acl functions and a optional external security function. Can you explain why you think that having a function in utils/acl (eg: include/utils/acl.h and backend/utils/aclchk.c) which calls default acl functions and an allows for an external hook would be a bad idea? It means the ExecCheckRTPerms() is caller of acl functions, not acl function itself, isn't it? It's providing a higher-level service, sure, but there's nothing particularly interesting or special about what it's doing in this case, and, we need it in multiple places. Why duplicate it? I agreed the checker function is not a part of executor, but it is also not a part of acl functions in my opinion. If it is disinclined to create a new directory to deploy the checker function, my preference is src/backend/utils/adt/security.c and src/include/utils/security.h . We don't need a new directory or file for one function, as Robert already pointed out. #6: I havn't checked yet, but if there are other things in an RTE which would make sense in the DoCopy case, beyond just what's needed for the permissions checking, and which wouldn't be 'correct' with a NULL'd value, I would set those. Yes, we're building the RTE to check permissions, but we don't want someone downstream to be suprised when they make a change to something in the permissions checking and discover that a value in RTE they expected to be there wasn't valid. Even more so, if there are function helpers which can be used to build an RTE, we should be using them. The same goes for RI_Initial_Check(). Are you saying something like makeFuncExpr()? I basically agree. However, should it be done in this patch? Actually, I mean looking for, and using, things like markRTEForSelectPriv() and addRangeTableEntry() or addRangeTableEntryForRelation(). #8: When moving ExecCheckRTPerms(), you should rename it to be more like the other function calls in acl.h Perhaps pg_rangetbl_aclcheck()? Also, it should return an actual AclResult instead of just true/false. See the comments in #3. And, if the caller has to handle aclcheck_error(), user cannot distinguish access violation errors between the default PG permission and any other external security stuff, isn't it? I'm not suggesting that the caller handle aclcheck_error().. ExecCheckRTPerms() could just as easily have a flag which indicates if it will call aclcheck_error() or not, and if not, to return an AclResult to the caller. That flag could then be passed to ExecCheckRTEPerms() as you have it now. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] ExecutorCheckPerms() hook
* KaiGai Kohei (kai...@ak.jp.nec.com) wrote: The reason why user must have SELECT privileges on the PK/FK tables is the validateForeignKeyConstraint() entirely calls SPI_execute() to verify FK constraints can be established between two tables (even if fallback path). And, the reason why RI_Initial_Check() now calls pg_class_aclcheck() is to try to avoid unexpected access violation error because of SPI_execute(). However, the fallback path also calls SPI_execute() entirely, so I concluded the permission checks in RI_Initial_Check() is nonsense. That may be the case. I'm certainly more concerned with a bug in the existing code than any new code that we're working on. The question is- is this actually a user-visible bug? Or do we require that a user creating an FK needs SELECT rights on the primary table? If so, it's still a bug, but at that point it's a bug in our documentation where we don't mention that SELECT rights are also needed. Anyone know what the SQL spec says about this (if anything...)? However, it is an independent issue right now, so I kept it as is. Uh, I don't really see it as independent.. If we have a bug there that we need to fix, and it's because we have two different bits of code trying to do the same checking, we should fix it be eliminating the duplicate checking, imv. The origin of the matter is that we applies unnecessary permission checks, although it is purely internal use and user was already checked to execute whole of ALTER TABLE statement. Right? That's certainly a nice thought, but given the complexity in ALTER TABLE, in particular with regard to permissions checking, I have no idea if what it's doing is intentional or wrong. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Open Item: invalid declspec for PG_MODULE_MAGIC
This open item is for replacing PGDLLIMPORT markers for PG_MODULE_MAGIC and PG_FUNCTION_INFO_V1 to __declspec(dllexport) because they are always expored by user modules rather than by the core codes. http://archives.postgresql.org/message-id/20100329184705.a60e.52131...@oss.ntt.co.jp The fix is simple, so I think we can include it to 9.0. Arguable issues for the patch are: * Are there better name than PGMODULEEXPORT? I like PGDLLEXPORT because it is similar to PGDLLIMPORT, but it might be too similar. * Should we backport the fix to previous releases? I'd like to backport it because it should not break any existing third party modules because they cannot be even built on Windows. Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] libpq should not be using SSL_CTX_set_client_cert_cb
I've been experimenting with SSL setups involving chains of CA certificates, ie, where the server or client cert itself is signed by an intermediate CA rather than a trusted root CA. This appears to work well enough on the server side if you configure the server correctly (see discussion of bug #5468). However, libpq is not able to work with a client certificate unless that cert is directly signed by a CA that the server trusts (ie, one listed directly in the server's root.crt file). This is because there is no good way to feed back any intermediate CA certs to the server. The man page for SSL_CTX_set_client_cert_cb says in so many words that the client_cert_cb API is maldesigned: BUGS The client_cert_cb() cannot return a complete certificate chain, it can only return one client certificate. If the chain only has a length of 2, the root CA certificate may be omitted according to the TLS standard and thus a standard conforming answer can be sent to the server. For a longer chain, the client must send the complete chain (with the option to leave out the root CA certificate). This can only be accomplished by either adding the intermediate CA certificates into the trusted certificate store for the SSL_CTX object (resulting in having to add CA certificates that otherwise maybe would not be trusted), or by adding the chain certificates using the SSL_CTX_add_extra_chain_cert(3) function, which is only available for the SSL_CTX object as a whole and that therefore probably can only apply for one client certificate, making the concept of the callback function (to allow the choice from several certificates) questionable. It strikes me that we could not only fix this case, but make the libpq code simpler and more like the backend case, if we got rid of client_cert_cb and instead preloaded the ~/.postgresql/postgresql.crt file using SSL_CTX_use_certificate_chain_file(). Then, using an indirectly signed client cert would only require including the full cert chain in that file. So I'm wondering if there was any specific reason behind using the callback API to start with. Anybody remember? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Exposing the Xact commit order to the user
On 5/25/2010 4:16 PM, Tom Lane wrote: Jan Wieck janwi...@yahoo.com writes: No, I meant how will the *function* know, if a superuser and/or some background process can purge records at any time? The data contains timestamps which are supposedly taken in commit order. You can *not* rely on the commit timestamps to be in exact order. (Perhaps approximate ordering is good enough for what you want here, but just be careful to not fall into the trap of assuming that they're exactly ordered.) I am well aware of the fact that commit timestamps within the WAL can go backwards and that the serial numbers of this proposed implementation of commit order can even be different from what the timestamps AND the WAL are saying. As long as the serial number (record position inside of segment) is determined while the transaction still holds all its locks, this is going to be good enough for what async replication users today are used to. Again, it will not magically make it possible to determine a serializable order of actions, that happened from transactions running in read committed isolation level, post mortem. I don't even even think that is possible at all. And I don't think anyone proposed a solution for that problem anyways. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers