Re: [HACKERS] Ask help for putting SP-Gist into postgresql
Pei He wrote: I am trying to put the SP-Gist package, a general index framework for space partitioning trees , into Postgresql source code. SP-Gist was developed for postgresql 8.0. However, now it does not work with the new version. So, for the submitted patch, what version of postgresql is required? And, is there anybody can help with that? Please cc to my email, when reply. Start here: http://wiki.postgresql.org/wiki/Submitting_a_Patch Present your idea in detail on this mailing list as a first step, and ask for help as you need. Currently PostgreSQL is in feature freeze for the upcoming 9.0 release, so you should target the first commitfest for 9.1. Your patches should apply to cvs HEAD. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Streaming Replication and CopyOutResponse message
Hi, At the end of the handshake for replication, a CopyOutResponse message might not arrive at the standby for a while if there is no WAL record to send. Also walreceiver would get stuck until that message has arrived. This is not a big problem, but should be fixed. The cause is that walsender hasn't called pq_flush() there. The attached patch changes walsender so as to call pg_flush() immediately when sending that message. Is it worth applying the patch? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center walsender_pg_flush_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: access control jails (and introduction as aspiring GSoC student)
Disclaimer: the following is only my view on the matter and is not meant as representative of the project views, which are reached through discussion and consensus. IOW, my 2¢. Joseph Adams joeyadams3.14...@gmail.com writes: Perhaps we could have some sort of LET statement that allows the client to pass data to the server, then have libpq automatically wrap queries with the LET statement (when necessary). Here's what it would look like to the PHP scripter: My first reaction reading this would be to keep the idea but extend the WITH syntax instead, so you'd have WITH BINDINGS bob AS ( current_user = 'bob' ) SELECT answer FROM secrets WHERE user=current_user AND question=$1; That said, you can already (as of 8.4) do the following: WITH bob(name) AS ( SELECT 'bob' ) SELECT answer FROM secrets, bob WHERE user=bob.name AND question=$1; The syntax WITH bob(current_user) is not possible because of the standard using current_user as a keyword (IIUC), but you get the idea. Granted, it would be silly to pass the value itself to the server over and over, so a serious implementation would probably pass a context ID, and these variable assignments would live in the backend instead. I wonder if creating a temporary (I mean per-backend) new catalog where to store the bindings (or whatever you name them) and then allow another syntax like the following would help here: WITH BINDINGS bob () SELECT ... The real problem though is that when using a transaction level pooling system you want to tie your bindings to a transaction, not to a session. So I'm not sure if storing the bindings in a local backend catalog is a must-have feature. Regards, -- dim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming Replication and CopyOutResponse message
Fujii Masao wrote: At the end of the handshake for replication, a CopyOutResponse message might not arrive at the standby for a while if there is no WAL record to send. Also walreceiver would get stuck until that message has arrived. This is not a big problem, but should be fixed. The cause is that walsender hasn't called pq_flush() there. The attached patch changes walsender so as to call pg_flush() immediately when sending that message. Is it worth applying the patch? Yeah, applied. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incrementally Updated Backups and restartpoints
On Thu, Mar 4, 2010 at 9:00 PM, Fujii Masao masao.fu...@gmail.com wrote: And, when we start an archive recovery from the backup from the standby, we seem to reach a safe starting point before database has actually become consistent. It's because backupStartLoc is zero. Isn't this an issue? This issue seems to still happen. So should this be fixed for 9.0? Or only writing a note in document is enough for 9.0? I'm leaning towards the latter. I'm thinking of adding something like the following to the section 25.6. Incrementally Updated Backups. Thought? The pg_control file must be backed up first. This avoids the problem that we might fail to restore a consistent database state because recovery starts from the later restart point than the start of the backup. When recovering from the incrementally updated backup, the server can begin accepting connections and complete the recovery successfully before the database has become consistent. To avoid these problems, you must check whether the database has been consistent by comparing the progress of the recovery with the backup ending WAL location before your users try to connect to the server and when archive recovery ends. So, in advance, the backup ending WAL location must be taken by calling the pg_last_xlog_replay_location function at the end of the backup. The progress of the recovery is also taken from the pg_last_xlog_replay_location function. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: access control jails (and introduction as aspiring GSoC student)
On 3/26/10 5:42 AM +0200, Joseph Adams wrote: // New libpq function pg_set('current_user', 'bob'); $result = pg_query_params( 'SELECT answer FROM secrets WHERE user=current_user AND question=$1', array('Birth place')); What this really does is something like: $result = pg_query_params( 'LET current_user=$1 DO $2 $3', array( 'bob', 'SELECT answer FROM secrets WHERE user=current_user AND question=$1', 'Birth place') )); Looks to me like this is already achievable with custom GUCs and views. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Postgres 9.0 Alpha, GIN indexes, and intarray contrib module, and SQL Functions
Greetings, I'm using the intarray contrib module[1] gin indexes on arrays (obviously) with postgres 9 alpha 4. I am querying to see the existence of an element. When I do the query normally, it performs as I'd expect (very fast). The explain plan looks like what I'd expect, using the index and whatnot. When I put this in SQL function, it performs very slow, as if I didn't have an index on it. (I thought it might be using the default @ operator, so I tried using @@ operator which wasn't a normal array operator, but it still had the issue). I also tried putting the query in a plpgsql to see if that changed things (it didn't). Then what I did was uninstall the intarray contrib module and created regular gin indexes on the array. When querying by hand it performs fast, but it also seems to use these indexes when in the UDF (so all works when I use the standard gin indexes). Is there something I am missing in my function declarations or is this possibly a bug? My Table looks like: create table followship_rollups ( max_id bigint not null, -- for sorting user_id int not null, append_frozen bool default false not null, follower_ids int[] not null CHECK (my_array_length(follower_ids) = 100), friend_ids int[] not null CHECK (my_array_length(friend_ids) = 100) ); create index followship_rollups_expanded_follower on followship_rollups using gin (follower_ids gin__int_ops); create index followship_rollups_expanded_friend on followship_rollups using gin (friend_ids gin__int_ops); My function is: -- Return true or false if the friendship exists create or replace function has_follower(user_id integer, follower_id integer) returns boolean language sql as $$ (select true from followship_rollups where user_id = $1 and follower_ids @ ARRAY[$2]) union all (select false) limit 1 ; $$; Full sql is here if you need more info: http://github.com/mikelikespie/followships/blob/c1a7e8c16159018d7d1154a11169315ac6560178/followships_2.sql (please forgive the sloppiness, I haven't cleaned it up yet) Thanks, Mike Lewis [1] http://www.postgresql.org/docs/current/static/intarray.html -- Michael Lewis lolrus.org mikelikes...@gmail.com
Re: [HACKERS] Postgres 9.0 Alpha, GIN indexes, and intarray contrib module, and SQL Functions
Mike Lewis mikelikes...@gmail.com writes: I'm using the intarray contrib module[1] gin indexes on arrays (obviously) with postgres 9 alpha 4. I am querying to see the existence of an element. When I do the query normally, it performs as I'd expect (very fast). The explain plan looks like what I'd expect, using the index and whatnot. When I put this in SQL function, it performs very slow, as if I didn't have an index on it. (I thought it might be using the default @ operator, so I tried using @@ operator which wasn't a normal array operator, but it still had the issue). I also tried putting the query in a plpgsql to see if that changed things (it didn't). Then what I did was uninstall the intarray contrib module and created regular gin indexes on the array. When querying by hand it performs fast, but it also seems to use these indexes when in the UDF (so all works when I use the standard gin indexes). I wonder whether you are dealing with a search path issue. Was the function being created/used with the same search_path as you were testing by hand? Some other remarks not directly related to the complaint: My Table looks like: create table followship_rollups ( max_id bigint not null, -- for sorting user_id int not null, append_frozen bool default false not null, follower_ids int[] not null CHECK (my_array_length(follower_ids) = 100), friend_ids int[] not null CHECK (my_array_length(friend_ids) = 100) ); create index followship_rollups_expanded_follower on followship_rollups using gin (follower_ids gin__int_ops); create index followship_rollups_expanded_friend on followship_rollups using gin (friend_ids gin__int_ops); Isn't user_id the primary key for this table? If so, why isn't it declared that way? If not, what the heck *is* the intended structure of this table? I would think that the performance-critical part of your query ought to be the user_id = $1 and the GIN indexes wouldn't be useful at all (for this particular query anyway). create or replace function has_follower(user_id integer, follower_id integer) returns boolean language sql as $$ (select true from followship_rollups where user_id = $1 and follower_ids @ ARRAY[$2]) union all (select false) limit 1 ; $$; FWIW, this strikes me as pretty non-idiomatic SQL. I'd have written it as just ... as $$ select exists(select 1 from followship_rollups where user_id = $1 and follower_ids @ ARRAY[$2]) $$; That doesn't seem to explain your performance complaint though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: access control jails (and introduction as aspiring GSoC student)
On Thu, Mar 25, 2010 at 11:42 PM, Joseph Adams joeyadams3.14...@gmail.com wrote: From what I can tell, a big problem with my jails idea (as well as the variables Robert described) is that there really isn't a way to store context in the backend specifically for the end client (e.g. a PHP script) due to connection pooling. Also, I almost feel that storing such context would be a disadvantage, as it would harm some of the referential transparency that pooling and caching take advantage of, now and in the future. However, I'm not going to give up :) Perhaps we could have some sort of LET statement that allows the client to pass data to the server, then have libpq automatically wrap queries with the LET statement (when necessary). Here's what it would look like to the PHP scripter: // New libpq function pg_set('current_user', 'bob'); $result = pg_query_params( 'SELECT answer FROM secrets WHERE user=current_user AND question=$1', array('Birth place')); What this really does is something like: $result = pg_query_params( 'LET current_user=$1 DO $2 $3', array( 'bob', 'SELECT answer FROM secrets WHERE user=current_user AND question=$1', 'Birth place') )); Here, the hypothetical LET statement executes a query string, binding current_user to our desired value. The client library would wrap all future queries in this fashion. Granted, it would be silly to pass the value itself to the server over and over, so a serious implementation would probably pass a context ID, and these variable assignments would live in the backend instead. Moreover, LET is a terrible keyword choice here, considering most PostgreSQL users won't need to use it explicitly thanks to additional libpq support. Alternatively (this might require changing the client/server protocol), a context ID could be passed back and forth, thus providing a way to tell clients apart. Implementing this idea requires adding to the backend and to libpq. The backend would need at least two new statements. One would set a variable of a session context, creating one if necessary and returning its ID. Another would execute a string as a parameter and bind both immediate arguments and session context to it. libpq would need a function to set a variable, and it would need to wrap queries it sends out with LET statements if necessary. Note that these variables can't be used in pre-defined functions unless they are somehow declared in advance. One idea would be to first add global variable support, then make session-local contexts be able to temporarily reassign those variables. Another would be to provide an explicit declaration statement. Would this make a good proposal for GSoC?: Implement the backend part of my proposal, and create a proof-of-concept wrapper demonstrating it. This way, I add the new statements, but don't mess around with existing functionality too much. Hmm. I'm not sure exactly what problem you're trying to solve here. I don't think this is a particularly good design for supporting variables inside the server, since, well, it doesn't actually support variables inside the server. If we just want a crude hack for allowing the appearance of session-local server-side variables, that could be implemented entirely in client code - in fact it could be done as a thin wrapper around libpq that just does textual substitution of the variables actually referenced by a particular query. That wouldn't require any modifications to core PostgreSQL at all, and it would probably perform better too since you'd not send all the unnecessary variables with every query. Of course, you're 100% correct that connection pooling won't necessarily play well with this feature, but that doesn't mean that we shouldn't implement it. For one thing, not everybody uses connection pooling; for two things, I think global variables (that would behave sort of like a sequence - they'd act sort of like a single column single row relation) would also be useful, and those WOULD work in a connection-pooling environment. But, I think that implementing any kind of variable support in the backend is way too ambitious a project for a first-time hacker to get done in a couple of months. I would guess that's a two-year project for a first time hacker or a one-year project for an experienced hacker (or a three week project for Tom Lane). Here are some ideas from http://wiki.postgresql.org/wiki/Todo that I think MIGHT be closer to the right size for GSOC: Allow administrators to cancel multi-statement idle transactions Check for unreferenced table files created by transactions that were in-progress when the server terminated abruptly Add functions to check correctness of configuration files before they are loaded live Add JSON (JavaScript Object Notation) data type [tricky part will be getting community buy-in on which JSON library to use]
[HACKERS] TODO list updates
In reading through the TODO list, I noticed a few things that I think are done, may be done, or may be partially done. See below. Thoughts? ...Robert Add missing operators for geometric data types - this is at least partly done. not sure if it is entirely done. Add OR REPLACE to CREATE LANGUAGE - Done. Implement full support for window framing clauses. - Not sure if we made any progress on this or not. Add PQescapeIdentifierConn() - Done, as PQescapeIdentifier(). Add UNIQUE capability to non-btree indexes - This is one application of exclusion constraints, so I'd say this is done. [GIN] Support empty indexed values (such as zero-element arrays) properly - I think this might be done but I'm not sure. Clean up VACUUM FULL's klugy transaction management - I think this is moot with the new cluster-based VF. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO list updates
Robert Haas robertmh...@gmail.com writes: In reading through the TODO list, I noticed a few things that I think are done, may be done, or may be partially done. See below. Thoughts? ...Robert Add missing operators for geometric data types - this is at least partly done. not sure if it is entirely done. I think you are thinking of Teodor's point_ops patch, but what that did was add GIST index support for some existing geometric operators. This TODO item suffers from the all-too-common sin of being uselessly vague; it doesn't identify what operators it thinks are missing. Implement full support for window framing clauses. - Not sure if we made any progress on this or not. We made some progress for 9.0, but there's still a lot of unimplemented territory. Add UNIQUE capability to non-btree indexes - This is one application of exclusion constraints, so I'd say this is done. I think exclusion constraints address this as much as it needs to be addressed for GIST and GIN, neither of which have equality as a core concept. Hash, however, does have that. So I'd vote for narrowing the entry to support UNIQUE natively in hash indexes and moving it to the hash-index section. [GIN] Support empty indexed values (such as zero-element arrays) properly - I think this might be done but I'm not sure. Not done, not even started. See the referenced discussions, or the limitations acknowledged here: http://developer.postgresql.org/pgdocs/postgres/gin-limit.html Clean up VACUUM FULL's klugy transaction management - I think this is moot with the new cluster-based VF. Right, that one's either done or no longer relevant depending on how you want to look at it. There is another TODO item that I was just struck by while reading your response to Joseph Adams: Fix system views like pg_stat_all_tables to use set-returning functions, rather than views of per-column functions What is the point of this proposal? The reason for the per-column function implementation was to allow people to slice and dice the underlying data their own way. Changing to monolithic SRFs would make that harder, and I don't see that it's buying anything especially useful. I'd vote for taking this off unless someone can explain why it's an improvement. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] last_statrequest is in the future
Tom Lane wrote: Should we redesign the stats signaling logic to work around this, or just hope we can nag kernel people into fixing it? Even if there was something to be done in kernel space, how many years from now would it be before it made this problem go away for the majority of near future 9.0 users? We've been seeing a fairly regular stream of pgstat wait timeout reports come in. The one I reported was from recent hardware and a very mainstream Linux setup. I'm not real optimistic that this one can get punted toward the OS and get anything done about it in time to head off problems in the field. This particularly pathologic case with jaguar is great because it's made it possible to nail down how to report the problem. I don't think it's possible to make a strong conclusion about how to resolve this just from that data though. What we probably need is for your additional logging code to catch this again on some systems that are not so obviously broken, to get a better idea what a normal (rather than extreme) manifestation looks like. How much skew is showing up, whether those do in fact correspond with the wait timeouts, that sort of thing. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] last_statrequest is in the future
Greg Smith g...@2ndquadrant.com writes: Even if there was something to be done in kernel space, how many years from now would it be before it made this problem go away for the majority of near future 9.0 users? We've been seeing a fairly regular stream of pgstat wait timeout reports come in. The one I reported was from recent hardware and a very mainstream Linux setup. The behavior I'm seeing in jaguar's reports is difficult to characterize as anything except a seriously broken machine (see the followup message with further details). It may be that there is some not-obviously-broken-hardware case that we need to deal with, but I have not seen evidence of it yet. Can you reproduce your test case with CVS HEAD and send the log messages? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO list updates
On Fri, Mar 26, 2010 at 12:57 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: In reading through the TODO list, I noticed a few things that I think are done, may be done, or may be partially done. See below. Thoughts? ...Robert Add missing operators for geometric data types - this is at least partly done. not sure if it is entirely done. I think you are thinking of Teodor's point_ops patch, but what that did was add GIST index support for some existing geometric operators. This TODO item suffers from the all-too-common sin of being uselessly vague; it doesn't identify what operators it thinks are missing. Well, I'm thinking of that because that TODO item has a reference to the point_ops patch. I think we should remove this item and replace it with any more specific suggestions someone cares to raise. Implement full support for window framing clauses. - Not sure if we made any progress on this or not. We made some progress for 9.0, but there's still a lot of unimplemented territory. Perhaps it could be made more specific. Add UNIQUE capability to non-btree indexes - This is one application of exclusion constraints, so I'd say this is done. I think exclusion constraints address this as much as it needs to be addressed for GIST and GIN, neither of which have equality as a core concept. Hash, however, does have that. So I'd vote for narrowing the entry to support UNIQUE natively in hash indexes and moving it to the hash-index section. As far as I know, exclusion constraints would work with hash opclasses also. Do you think there's an advantage to having something that is hash-specific a la the btree-specific stuff we already have? [GIN] Support empty indexed values (such as zero-element arrays) properly - I think this might be done but I'm not sure. Not done, not even started. See the referenced discussions, or the limitations acknowledged here: http://developer.postgresql.org/pgdocs/postgres/gin-limit.html OK. Clean up VACUUM FULL's klugy transaction management - I think this is moot with the new cluster-based VF. Right, that one's either done or no longer relevant depending on how you want to look at it. OK. There is another TODO item that I was just struck by while reading your response to Joseph Adams: Fix system views like pg_stat_all_tables to use set-returning functions, rather than views of per-column functions What is the point of this proposal? The reason for the per-column function implementation was to allow people to slice and dice the underlying data their own way. Changing to monolithic SRFs would make that harder, and I don't see that it's buying anything especially useful. I'd vote for taking this off unless someone can explain why it's an improvement. I assumed it would be faster and less likely to return inconsistent results. If that's not the case then I agree. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] More idle thoughts
The Linux kernel had a big push to reduce latency, and one of the tricks they did was they replaced the usual interrupt points with a call which noted how long it had been since the last interrupt point. It occurs to me we could do the same for CHECK_FOR_INTERRUPTS() by conditionally having it call a function which calls gettimeofday and compares with the previous timestamp received at the last CFI(). I think the only gotcha would be at CHECK_FOR_INTERRUPTS() calls which occur after a syscall that an actual interrupt would interrupt. They might show a long wait even though an actual interrupt would trigger them right away -- the two that come to mind are reading from the client and blocking on a semaphore. So I think we would need to add a parameter to indicate if that's the case for a given call-site. I think it would be easy to know what value to put there because I think we always do a CFI() immediately after such syscalls but I'm not 100% sure that's the case. Obviously this wouldn't run all the time. I'm not even sure it should run on the build farm because I think doing an extra syscall at these places might mask timing bugs by synchronizing the bus in a lot of places. But even a few build farm animals might uncover places where we don't respond to C-c or hold up the sinval messages etc. It also doesn't replace our current method of responding to user complaints -- many if not all of them are relatively subtle cases where the user is doing something unusual to create a loop that doesn't normally occur. But we won't know that unless we try. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO list updates
Robert Haas robertmh...@gmail.com writes: As far as I know, exclusion constraints would work with hash opclasses also. Yeah, they do. Do you think there's an advantage to having something that is hash-specific a la the btree-specific stuff we already have? Sure: it'll be more efficient because of not needing an AFTER trigger. Now of course this assumes that hash indexes ever grow up enough to play in the big leagues, which is a pretty dubious proposition; but having real unique-constraint support is one of the things people would want from them if they ever did get to be credible production choices. So this isn't something I'd put at the front of the TODO list for hash indexes, but it ought to be in there somewhere. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More idle thoughts
Greg Stark st...@mit.edu writes: The Linux kernel had a big push to reduce latency, and one of the tricks they did was they replaced the usual interrupt points with a call which noted how long it had been since the last interrupt point. It occurs to me we could do the same for CHECK_FOR_INTERRUPTS() by conditionally having it call a function which calls gettimeofday and compares with the previous timestamp received at the last CFI(). Hmm. The only thing you could find out is which CFI call was reporting the long delay, not what the path of control in between had been. If the long computation had been down inside some function that's no longer active, this wouldn't help much to track it down. Still, it'd be better than no data at all. I think the only gotcha would be at CHECK_FOR_INTERRUPTS() calls which occur after a syscall that an actual interrupt would interrupt. They might show a long wait even though an actual interrupt would trigger them right away -- the two that come to mind are reading from the client and blocking on a semaphore. So I think we would need to add a parameter to indicate if that's the case for a given call-site. I think it would be easy to know what value to put there because I think we always do a CFI() immediately after such syscalls but I'm not 100% sure that's the case. I'm afraid you're being too optimistic about that. What I'd think about is just adding an extra call to reset the delay-time counter after each such syscall, rather than having two kinds of CFI. Obviously this wouldn't run all the time. Yeah, it would increase the overhead of CFI by orders of magnitude, so you wouldn't even want to think about building a production version that way. But it might be a useful testing option. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
On Sun, 2010-01-31 at 23:43 +0200, Heikki Linnakangas wrote: When replaying the deletion record, the standby could look at all the heap tuples whose index pointers are being removed, to see which one was newest. Long after coding this, I now realise this really is a dumb-ass idea. There is no spoon. The index tuples did once point at valid heap tuples. 1. heap tuples are deleted 2. heap tuples become dead 3. index tuples can now be marked killed 4. index tuples removed Heap tuples can be removed at step 2, index tuples can't be removed until step 4. so the dead index tuples can't be followed reliably to read the xids. They might be the correct ones, might not, and no way to tell. So that puts this back to exactly the place we were on 31 Jan: On Sun, 2010-01-31 at 17:10 -0500, Tom Lane wrote: We can always put it back later if nothing better gets implemented. So, barring huge injections of insight, I'll be putting back the generation of OldestXmin() into the btree delete path. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
Simon Riggs si...@2ndquadrant.com writes: On Sun, 2010-01-31 at 23:43 +0200, Heikki Linnakangas wrote: When replaying the deletion record, the standby could look at all the heap tuples whose index pointers are being removed, to see which one was newest. Long after coding this, I now realise this really is a dumb-ass idea. There is no spoon. The index tuples did once point at valid heap tuples. 1. heap tuples are deleted 2. heap tuples become dead 3. index tuples can now be marked killed 4. index tuples removed Heap tuples can be removed at step 2, index tuples can't be removed until step 4. Uh, no, heap tuples can't be removed until after all index entries that are pointing at them have been removed. Please tell me you have not broken this. So, barring huge injections of insight, I'll be putting back the generation of OldestXmin() into the btree delete path. I stand by my previous comments about where this is going to end up. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Performance improvement for unique checks
Hi, Since we insert a new entry into the index for every update that's being made into the table, we inevitably make a unique check against the older version of the newly inserted row, even when the values are not updated. Of course i am talking about non-HOT updates. (We will not go to the index for HOT updates) a) The page which contains the index entry is Exclusively locked b) We go ahead and visit the heap page for its HeapTupleSatisfiesDirty. If we have the information of the old tuple(its tuple-id) after a heap update, during the index insert, we can avoid the uniqueness check for this tuple,as we know for sure that tuple won't satisfy the visibility criteria. If the table has 'n' unique indexes it avoids 'n' heap tuple lookups, also increasing the concurrency in the btree, as the write lock duration is reduced. Any comments? Thanks, Gokul.
Re: [HACKERS] dtester-0.1 released
Markus Wanner wrote: Thanks, I can now fetch off the postgres-dtester repository, but I'm still getting the error when I try to clone the /dtester repository. Steve, Oh, thank you for pointing this out. I've fixed that for now. Do I need to run git update-server-info after every pull for the http protocol to work? Or just once to initialize the repository? Note that the git protocol said to be is more efficient. At least that's what I've heard. You might want to use that instead (especially if http continues to pose problems). Kind Regards Markus Wanner -- Steve Singer Afilias Canada Data Services Developer 416-673-1142 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] join removal
On Sun, Jul 19, 2009 at 10:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah. Ideally this sort of thing would happen in prepjointree.c, but we don't have nearly enough information at that stage. Tom, You've mentioned this point a couple of times - what is ideal about prepjointree? Reading through the optimizer functions section of src/backend/optimizer/README, it seems like the earliest point at which we could do this would be just before the call to make_one_rel(). I think that would eliminate some redundant computation. Right now, if we have A LJ B LJ C we'll try joining A to C and discover that it's removable; later we'll try joining {A B} to C and again discover that C is removable. But maybe it could be attacked from the other direction: look at C and try to figure out whether there's a some baserel or joinrel to which we can join it removably. If we find one, we don't need to bother generating seq scan or index paths for C, reloptinfos for joinrels that include C, etc. The problem with moving it back any further seems to be that we have to know which clauses are mergejoinable before we can do the necessary computations; I think flattening of the query tree has to already be done too. Obviously this is all 9.1 material but PG east has me thinking about this topic again... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] join removal
Robert Haas robertmh...@gmail.com writes: On Sun, Jul 19, 2009 at 10:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah. Ideally this sort of thing would happen in prepjointree.c, but we don't have nearly enough information at that stage. You've mentioned this point a couple of times - what is ideal about prepjointree? Well, it's the place where we like to rearrange the join tree, and dropping a join altogether certainly counts as that. We can't do it there, at the moment anyway, for lack of supporting data --- but if it were possible to do it at that time I think it'd be the cleanest approach. Reading through the optimizer functions section of src/backend/optimizer/README, it seems like the earliest point at which we could do this would be just before the call to make_one_rel(). I think that would eliminate some redundant computation. Maybe. It would also add a new pass over the join tree that, in 99% of cases, would make no useful contribution whatever. It's possible that this would still be cheaper than a lot of failed calls to join_is_removable, but I'm unconvinced --- we were able to make the failure path through that pretty durn cheap for most simple cases. The approach you're sketching still involves a combinatorial search so I doubt it's going to be cheap. I should maybe pause here a moment to say that my approach to considering the cost of new planner optimizations is to focus on how much time the added code will eat on queries where it fails to make any useful contribution. If the optimization wins, then presumably you will make back at execution time whatever it might have cost you to plan (if this is debatable, you probably shouldn't be bothering with the idea at all). So the pain will come from adding planning time on queries where there isn't any runtime payoff; especially for something like join removal, which only applies to a small minority of queries anyway. Therefore I'm suspicious of adding new passes over the query structure if they are only going to be used for low-probability wins. The problem with moving it back any further seems to be that we have to know which clauses are mergejoinable before we can do the necessary computations; I think flattening of the query tree has to already be done too. Yeah. I had been thinking that we could build the RelOptInfo and IndexOptInfo structs earlier, but really all of the clause-classification work done by deconstruct_jointree is important as well for this function's purposes, so it's not that easy to push back to prepjointree :-(. I suspect that any such attempt would end up requiring a massive rethinking of the order of operations in the planner. Maybe we should do that sometime but I'm not eager for it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: access control jails (and introduction as aspiring GSoC student)
On Fri, Mar 26, 2010 at 12:07 PM, Robert Haas robertmh...@gmail.com wrote: Hmm. I'm not sure exactly what problem you're trying to solve here. I don't think this is a particularly good design for supporting variables inside the server, since, well, it doesn't actually support variables inside the server. If we just want a crude hack for allowing the appearance of session-local server-side variables, that could be implemented entirely in client code - in fact it could be done as a thin wrapper around libpq that just does textual substitution of the variables actually referenced by a particular query. That wouldn't require any modifications to core PostgreSQL at all, and it would probably perform better too since you'd not send all the unnecessary variables with every query. One problem with a textual substitution is that implicit variable use (e.g. selecting from a view) can't be substituted, at least not trivially. As for sending unnecessary variables with every query, my idea was to store those variables in a global table keyed by context ID, then just send that context ID with every query. But, I think that implementing any kind of variable support in the backend is way too ambitious a project for a first-time hacker to get done in a couple of months. I would guess that's a two-year project for a first time hacker or a one-year project for an experienced hacker (or a three week project for Tom Lane). Here are some ideas from http://wiki.postgresql.org/wiki/Todo that I think MIGHT be closer to the right size for GSOC: [...] Add JSON (JavaScript Object Notation) data type [tricky part will be getting community buy-in on which JSON library to use] The JSON idea caught my eye. I guess the best approach here would be not to use an external library, but to implement it manually using flex/bison. Most of the work would probably revolve around converting things to/from PostgreSQL types, writing test cases, and getting it integrated; writing the parser itself should be a piece of cake. At first, I figured adding JSON support would be almost too trivial: just parse it, then you're done. After seeing that src/backend/utils/adt/xml.c is 3497 lines, I learned there's a bit more to it :) I skimmed through some JSON implementations in C, and I didn't find any using bison/flex. From the looks of it, I do like JSON_parser ( http://fara.cs.uni-potsdam.de/~jsg/json_parser/ ) because it appears to be written for speed. I think one benefit of adding JSON support is that it would provide a way to store EAV-type data with less overhead than XML (and no dependency on an external library). If this were the only goal, binary encoding would be even better. However, I suppose JSON is more popular and easier to work with in practice. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers