Re: [HACKERS] Block level concurrency during recovery
Simon Riggs wrote: I'm undecided as to whether this is too much code or too little. I'm somewhat uncertain as to the locking requirements for the physical scan during a vacuum. I've worked out various options if we need to change this. For the B-tree, an exclusive lock is enough to modify the contents of the page. A cleanup lock needs to be taken on every page to ensure that the vacuum doesn't finish and delete a heap tuple that's about to be accessed by an index scan. That could be handled in WAL replay by dropping the exclusive lock and re-acquiring the cleanup lock, but we can't do that for heap vacuums. However, we require that in b-tree vacuum, you take a cleanup lock on *every* leaf page of the index, not only those that you modify. That's a problem, because there's no trace of such pages in the WAL. PS. FSM doesn't need cleanup locks. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making pg_standby compression-friendly
Charles Duffy wrote: I'm interested in compressing archived WAL segments in an environment set up for PITR in the interests of reducing both network traffic and storage requirements. However, pg_standby presently checks file sizes, requiring that an archive segment be exactly the right size to be considered valid. The idea of compressing log segments is not new -- the clearxlogtail project in pgfoundry provides a tool to make such compression more effective, and is explicitly intended for said purpose -- but as of 8.3.4, pg_standby appears not to support such environments; I propose adding such support. Can't you decompress the files in whatever script you use to copy them to the archive location? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PGDay.it collation discussion notes
Dave Gudeman wrote: On Mon, Oct 20, 2008 at 2:28 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Tom Lane wrote: Another objection to this design is that it's completely unclear that functions from text to text should necessarily yield the same collation that went into them, but if you treat collation as a hard-wired part of the expression syntax tree you aren't going to be able to do anything else. (What will you do about functions/operators taking more than one text argument?) Whatever the spec says. Collation is intimately associated with the comparison operations, and doesn't make any sense anywhere else. Of course the comparison operator is involved in many areas such as index creation, ORDER BY, GROUP BY, etc. In order to support GROUP BY and hash joins on values with a collation type, you need to have a hash function corresponding to the collation. Yeah, those are all related to comparison operators. Looking at an individual value, collation just doesn't make sense. Collation is property of the comparison operation, not of a value. Collation can't be a property of the comparison operation because you don't know what comparison to use until you know the collation type of the value. Collation is a property of string values, just like scale and precision are properties of numeric values. And like those properties of numeric values, collation can be statically determined. The rules for determining what collation to use in an expression are similar in kind to the rules for determining what the resulting scale and precision of an arithmetic expression are. If you consider collation as just part of the type, a lot of things are easier. Yeah, the typmod of numerics and varchars is a good analogue, in the parser. The current rules for those are probably not exactly the same that the spec requires for collation, but it's definitely similar. This is a good way to implement collated comparisons, but it's not a new concept, just an additional argument to the comparison operator. It isn't necessary to create new concepts to handle collation when it fits so well into an existing concept, the type. For example, the difference between two indexes with collation is a difference in the type of the index --just like the difference between a DECIMAL(10,4) index and a DECIMAL(20,2) index. Hmm. That could work. So collation would be an extra typemod on the string data types, and casting can be used to force a specific collation. I think we're missing some pieces, like passing the typmod to the comparison function; numeric comparison doesn't depend on the scale and precision, while collation would depend on the typemods. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making pg_standby compression-friendly
On Thu, Oct 23, 2008 at 1:15 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Charles Duffy wrote: I'm interested in compressing archived WAL segments in an environment set up for PITR in the interests of reducing both network traffic and storage requirements. However, pg_standby presently checks file sizes, requiring that an archive segment be exactly the right size to be considered valid. The idea of compressing log segments is not new -- the clearxlogtail project in pgfoundry provides a tool to make such compression more effective, and is explicitly intended for said purpose -- but as of 8.3.4, pg_standby appears not to support such environments; I propose adding such support. Can't you decompress the files in whatever script you use to copy them to the archive location? To be sure I understand -- you're proposing a scenario in which the archive_command on the master compresses the files, passes them over to the slave while compressed, and then decompresses them on the slave for storage in their decompressed state? That succeeds in the goal of decreasing network bandwidth, but (1) isn't necessarily easy to implement over NFS, and (2) doesn't succeed in decreasing storage requirements on the slave. (While pg_standby's behavior is to delete segments which are no longer needed to keep a warm standby slave running, I maintain a separate archive for PITR use with hardlinked copies of those same archive segments; storage on the slave is a much bigger issue in this environment than it would be if the space used for segments were being deallocated as soon as pg_standby chose to unlink them). [Heikki, please accept my apologies for the initial off-list response; I wasn't paying enough attention to gmail's default reply behavior].
Re: [HACKERS] psql Feature request \set query
For making psql fully scriptable I would think it would be smarter to embed a full well-supported existing language rather than trying to invent a new one. If we do want to to that I would suggest looking at lua which is intended to be used this way. greg On 23 Oct 2008, at 05:45 AM, Tom Lane [EMAIL PROTECTED] wrote: Joshua Drake [EMAIL PROTECTED] writes: Robert Haas [EMAIL PROTECTED] wrote: We can't do anything unless we do everything is a recipe for failure. True enough, but a car doesn't roll without at least four wheels. Perhaps more to the point: you don't have to implement every part of a set of features at the same time, but it's a good idea to design them all together. Else you risk writing something that doesn't play nice with the other parts; and by the time you find out, it's too late to change without creating a backwards-compatibility problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql Feature request \set query
On 23 okt 2008, at 00.48, Alvaro Herrera [EMAIL PROTECTED] wrote: Josh Berkus wrote: Hackers, Just had a feature request from Wheeler, and I don't see why it shouldn't go on our todo list. I think you're asking for more scriptability in psql. Personally I think that would be a great idea, but we need a lot more than what's being proposed here. We'll also need loops, conditionals, etc. We've had patches for those submitted over the years, but one at a time they are easily rejected because they're so If you want full scripting, look at pgscript, one of this summers gsoc projects. It'll ship in the next pgadmin and is also available as a standalone executable. /Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql Feature request \set query
2008/10/23 Greg Stark [EMAIL PROTECTED]: For making psql fully scriptable I would think it would be smarter to embed a full well-supported existing language rather than trying to invent a new one. plpgsql? regards Pavel If we do want to to that I would suggest looking at lua which is intended to be used this way. greg On 23 Oct 2008, at 05:45 AM, Tom Lane [EMAIL PROTECTED] wrote: Joshua Drake [EMAIL PROTECTED] writes: Robert Haas [EMAIL PROTECTED] wrote: We can't do anything unless we do everything is a recipe for failure. True enough, but a car doesn't roll without at least four wheels. Perhaps more to the point: you don't have to implement every part of a set of features at the same time, but it's a good idea to design them all together. Else you risk writing something that doesn't play nice with the other parts; and by the time you find out, it's too late to change without creating a backwards-compatibility problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql Feature request \set query
Pavel Stehule wrote: 2008/10/23 Greg Stark [EMAIL PROTECTED]: For making psql fully scriptable I would think it would be smarter to embed a full well-supported existing language rather than trying to invent a new one. plpgsql? I think plpgsql is a pretty good example of the sort of disaster you get when you invent a language from nearly scratch. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql Feature request \set query
Magnus Hagander wrote: If you want full scripting, look at pgscript, one of this summers gsoc projects. It'll ship in the next pgadmin and is also available as a standalone executable. Yet another language? What people asking for psql scriptability really want, in my estimate, is the ability to write SQL plus some control structures anywhere, in the server, in the client, or so that they don't have to know where. Commercial vendors have that: Oracle has PL/SQL as server-side language and client-side language. Microsoft has T-SQL. The others have similar offerings. What PostgreSQL has in one kind of extended SQL language in the terminal, another kind of extended SQL language in the server (but really two separate languages, one for functions, one outside of functions), and now yet another kind of extended SQL language in the GUI, each mutually wildly incompatible. And neither is based on a standard. Something to think about ... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql Feature request \set query
2008/10/23 Peter Eisentraut [EMAIL PROTECTED]: Magnus Hagander wrote: If you want full scripting, look at pgscript, one of this summers gsoc projects. It'll ship in the next pgadmin and is also available as a standalone executable. Yet another language? it is more crazy - pgscript is based on t-sql :). I missing in pgsql mainly #ifdef and #endif - conditional execution. Pavel What people asking for psql scriptability really want, in my estimate, is the ability to write SQL plus some control structures anywhere, in the server, in the client, or so that they don't have to know where. Commercial vendors have that: Oracle has PL/SQL as server-side language and client-side language. Microsoft has T-SQL. The others have similar offerings. What PostgreSQL has in one kind of extended SQL language in the terminal, another kind of extended SQL language in the server (but really two separate languages, one for functions, one outside of functions), and now yet another kind of extended SQL language in the GUI, each mutually wildly incompatible. And neither is based on a standard. Something to think about ... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deriving Recovery Snapshots
On Thu, 2008-10-23 at 08:40 +0300, Heikki Linnakangas wrote: Simon Riggs wrote: On Wed, 2008-10-22 at 21:47 +0100, Simon Riggs wrote: But once you reach 64 transactions, you'll need to write an extra WAL record for every subtransaction, which currently I've managed to avoid. Yes, I've managed to avoid it, but it will simplify the patch if you think its not worth bothering with. This won't really effect anybody I've met running straight Postgres, but it may effect EDB. It's not a problem for me, but I was second guessing objections. If I do that then I can just pass the slotId in full on every WAL record, which simplifies a couple of other things also. So, does everybody accept that we will write a WAL record for every subtransaction assigned, once we hit the size limit of the subxid cache? i.e. currently 65th subxid and beyond. Would have to see the patch to understand what the code simplicity vs. extra WAL logging tradeoff really is. Well, if your not certain now, then my initial feeling was correct. I don't think everybody would agree to that. The code simplification would be real, but I don't think it's that hard now. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Unicode escapes in literals
I would like to add an escape mechanism to PostgreSQL for entering arbitrary Unicode characters into string literals. We currently only have the option of entering the character directly via the keyboard or cut-and-paste, which is difficult for a number of reasons, such as when the font doesn't have the character, and entering the UTF8-encoded bytes using the E'...' strings, which is hardly usable. SQL has the following escape syntax for it: U'special character: \' [ UESCAPE '\' ] where is the hexadecimal Unicode codepoint. So this is pretty much just another variant on what the E'...' syntax does. The trick is that since we have user-definable encoding conversion routines, we can't convert the Unicode codepoint to the server encoding in the scanner stage. I imagine there are two ways to address this: 1. Only support this syntax when the server encoding is UTF8. This would probably cover most use cases anyway. We could have limited support for characters in the ASCII range for all server encodings. 2. Convert this syntax to a function call. But that would then create a lot of inconsistencies, such as needing functional indexes for matches against what should really be a literal. I'd be happy to start with UTF8 support only. Other ideas? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level concurrency during recovery
On Thu, 2008-10-23 at 09:09 +0300, Heikki Linnakangas wrote: However, we require that in b-tree vacuum, you take a cleanup lock on *every* leaf page of the index, not only those that you modify. That's a problem, because there's no trace of such pages in the WAL. OK, good. Thanks for the second opinion. I'm glad you said that, cos I felt sure anybody reading the patch would say what the hell does this bit do?. Now I can add it. My solution is fairly simple: As we pass through the table we keep track of which blocks need visiting, then append that information onto the next WAL record. If the last block doesn't contain removed rows, then we send a no-op message saying which blocks to visit. I'd already invented the XLOG_BTREE_VACUUM record, so now we just need to augment it further with two fields: ordered array of blocks to visit, and a doit flag. Say we have a 10 block table, with rows to be removed on blocks 3,4,8. As we visit all 10 in sequence we would issue WAL records: XLOG_BTREE_VACUUM block 3 visitFirst {1, 2} doit = true XLOG_BTREE_VACUUM block 4 visitFirst {} doit = true XLOG_BTREE_VACUUM block 8 visitFirst {5,6,7} doit = true XLOG_BTREE_VACUUM block 10 visitFirst {9} doit = false So that allows us to issue the same number of WAL messages yet include all the required information to repeat the process correctly. (The blocks can be visited out of sequence in some cases, hence the ordered array of blocks to visit rather than just a first block value). It would also be possible to introduce a special tweak there which is that if the block is not in cache, don't read it in at all. If its not in cache we know that nobody has a pin on it, so don't need to read it in just to say got the lock. That icing for later. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deriving Recovery Snapshots
Simon Riggs wrote: The patch isn't ready to apply standalone because we need to include the changes to XidInMVCCSnapshot() also, which would take a little while to extract. Let me know if that is worth producing a standalone patch for. FWIW, this patch becomes a lot simpler if you don't change the function signature, and don't move the SubtransSetParent() call. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com *** src/backend/access/transam/xact.c --- src/backend/access/transam/xact.c *** *** 404,419 AssignTransactionId(TransactionState s) AssignTransactionId(s-parent); /* ! * Generate a new Xid and record it in PG_PROC and pg_subtrans. ! * ! * NB: we must make the subtrans entry BEFORE the Xid appears anywhere in ! * shared storage other than PG_PROC; because if there's no room for it in ! * PG_PROC, the subtrans entry is needed to ensure that other backends see ! * the Xid as running. See GetNewTransactionId. */ s-transactionId = GetNewTransactionId(isSubXact); ! if (isSubXact) SubTransSetParent(s-transactionId, s-parent-transactionId); /* --- 404,418 AssignTransactionId(s-parent); /* ! * Generate a new Xid and record it in PG_PROC. If there's no room ! * in MyProc- in MyProc-subxids, we must make the pg_subtrans ! * entry BEFORE the Xid appears anywhere in shared storage other than ! * PG_PROC, because the subtrans entry is needed to ensure that other ! * backends see the Xid as running. */ s-transactionId = GetNewTransactionId(isSubXact); ! if (isSubXact MyProc-subxids.overflowed) SubTransSetParent(s-transactionId, s-parent-transactionId); /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] EXPLAIN CREATE TABLE AS
It turns out you cannot EXPLAIN on CREATE TABLE AS, but it seems to work fine if I extend the grammar as below: ExplainableStmt: SelectStmt | InsertStmt | UpdateStmt | DeleteStmt | DeclareCursorStmt + | CreateAsStmt | ExecuteStmt /* by default all are $$=$1 */ ; It's the same as SELECT INTO after all, which already works. Should this be added? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Deriving Recovery Snapshots
On Thu, 2008-10-23 at 13:40 +0300, Heikki Linnakangas wrote: Simon Riggs wrote: The patch isn't ready to apply standalone because we need to include the changes to XidInMVCCSnapshot() also, which would take a little while to extract. Let me know if that is worth producing a standalone patch for. FWIW, this patch becomes a lot simpler if you don't change the function signature, and don't move the SubtransSetParent() call. Yeh, I'm trying to judge between trying to be simple and trying to be neat. Whichever one I pick, I seem to be wrong. :-) But I begin, perhaps too slowly, to understand that this is a natural part of review itself, not really a problem. Thanks for your help. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN CREATE TABLE AS
On Thu, 2008-10-23 at 14:29 +0300, Peter Eisentraut wrote: It turns out you cannot EXPLAIN on CREATE TABLE AS, but it seems to work fine if I extend the grammar as below: Yes please. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
Hi, On Wed, Oct 22, 2008 at 8:14 PM, Jaime Casanova [EMAIL PROTECTED] wrote: just remembering that -patches is a dead list, so i'm sending this to -hackers where it will have more visibility... -- Forwarded message -- On 10/22/08, Nikhil Sontakke [EMAIL PROTECTED] wrote: Hi, Thanks for taking a look. But if I am not mistaken Gavin and co. are working on a much exhaustive proposal. In light of that maybe this patch might not be needed in the first place? I will wait for discussion and a subsequent collective consensus here, before deciding the further course of actions. I think it is unwise to wait on Gavin for a more complex implemention --- we might end up with nothing for 8.4. As long as your syntax is compatible with whatever Gavin proposed Gavin can add on to your patch once it is applied. seems like you're a prophet... or i miss something? :) Maybe I will try to summarize the functionality of this patch, rebase it against latest CVS head and try to get it on the commitfest queue atleast for further feedback to keep the ball rolling on auto-partitioning... yeah! i was thinking on doing that but still have no time... and frankly you're the best man for the job ;) one thing i was thinking of is to use triggers instead of rules just as our current docs recommends http://www.postgresql.org/docs/8.3/static/ddl-partitioning.html with the benefit that a trigger can check if the child table exists for the range being inserted and if not it can create it first... haven't looked at the code in the detail but seems that your patch is still missing the create rule part so we are in time to change that... no? Yes triggers should be used instead of rules. Automatic generation of rules/triggers would be kind of hard and needs some looking into. Also there are issues like checking mutual exclusivity of the partition clauses specified too (I have been maintaining that the onus of ensuring sane partition ranges/clauses should rest with the users atleast initially..). I will take a stab at this again whenever I get some free cycles. Regards, Nikhils -- http://www.enterprisedb.com
Re: [HACKERS] EXPLAIN CREATE TABLE AS
Peter Eisentraut [EMAIL PROTECTED] writes: It turns out you cannot EXPLAIN on CREATE TABLE AS, but it seems to work fine if I extend the grammar as below: This seems to me to be something that will look like a wart, not a feature, from the user's point of view. You can't explain CREATE TABLE, but you can explain CREATE TABLE AS? It's just weird. It feels to me like exposing an implementation artifact. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql Feature request \set query
On Wed, Oct 22, 2008 at 04:14:11PM -0700, Joshua Drake wrote: True enough, but a car doesn't roll without at least four wheels. I'm not sure I agree: http://en.wikipedia.org/wiki/Image:1885Benz.jpg (Sorry, I couldn't resist.) -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_hba options parsing
Magnus Hagander wrote: This patch changes the options field of pg_hba.conf to take name/value pairs instead of a fixed string. This makes it a lot nicer to deal with auth methods that need more than one parameter, such as LDAP. While at it, it also adds map support to kerberos, gssapi and sspi and not just ident - basically all methods where the username comes from an outside source (lmk if I missed one). Also in passing, changes the methods in auth.c to deal with unsupported auth method on this platform errors the same way for all authentication methods. Applied with suggested changes. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN CREATE TABLE AS
Le jeudi 23 octobre 2008, Tom Lane a écrit : Peter Eisentraut [EMAIL PROTECTED] writes: It turns out you cannot EXPLAIN on CREATE TABLE AS, but it seems to work fine if I extend the grammar as below: This seems to me to be something that will look like a wart, not a feature, from the user's point of view. You can't explain CREATE TABLE, but you can explain CREATE TABLE AS? It's just weird. It feels to me like exposing an implementation artifact. As a user, I'm disagreeing about the weirdness of it. +1 for Peter's initiative. Regards, -- dim signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] EXPLAIN CREATE TABLE AS
Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: It turns out you cannot EXPLAIN on CREATE TABLE AS, but it seems to work fine if I extend the grammar as below: This seems to me to be something that will look like a wart, not a feature, from the user's point of view. You can't explain CREATE TABLE, but you can explain CREATE TABLE AS? It's just weird. It feels to me like exposing an implementation artifact. Isn't the whole point of EXPLAIN to expose implementation artifacts in the first place? It shows the plan associated with a plannable statement. Why are some statements plannable and some not? Why can you explain DELETE and not TRUNCATE? Why can you explain INSERT, what is plannable about that? Ah, there is INSERT ... SELECT. If you can plan INSERT, can you plan SELECT INTO, right? Yes, that works. But CREATE TABLE AS is the same, why won't that work? Should we prohibit explaining SELECT INTO? So you can't explain SELECT INTO, but you can explain SELECT? It's all very weird. So it's not like this is not already quite implementation-dependent. We expose the information that we have and let the user process it. Someone who has advanced to the level of using EXPLAIN should be able to tell the difference, and those who are maximally confused won't be harmed by this addition any more than they already are. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode escapes in literals
Peter Eisentraut [EMAIL PROTECTED] writes: SQL has the following escape syntax for it: U'special character: \' [ UESCAPE '\' ] Man that's ugly. Why the ampersand? How do you propose to distinguish this from a perfectly legitimate use of the operator? 2. Convert this syntax to a function call. But that would then create a lot of inconsistencies, such as needing functional indexes for matches against what should really be a literal. Uh, why do you think that? The function could surely be stable, even immutable if you grant that a database's encoding can't change. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXPLAIN CREATE TABLE AS
Peter Eisentraut [EMAIL PROTECTED] writes: Isn't the whole point of EXPLAIN to expose implementation artifacts in the first place? Touche ... I withdraw the complaint. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Any reason to have heap_(de)formtuple?
Currently in heaptuple.c we have duplicated code. heap_deformtuple and heap_formtuple are mark as a obsolete interface. Is any reason to have still them? I know that they are still used on many places, but is there any stopper to keep these function alive? Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] A small performance bug in BTree Infrastructure
Hi All, BTree Insert requires a data-structure called BTStack to report the page splits that have happened in the leaf pages to non-leaf pages upwards. It basically goes back and updates those non-leaf pages. But this will never happen in a search operation. You never need to climb upwards for a Select statement. Actually we can clearly see that in the _bt_first function, as soon as we complete calling _bt_search, we call _bt_freestack to free it. So unnecessarily a BTStack structure is getting formed and deleted. Inside _bt_search function, if we just add a if condition to check whether access is for BT_READ, then we can avoid the creation of stack and also remove the _bt_freestack in the _bt_first function. I just implemented the change and tested it and i am not seeing any performance difference(as expected). Still i thought of reporting it, so that whoever is working on it, can incorporate this change also. Thanks, Gokul.
Re: [HACKERS] Unicode escapes in literals
Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: SQL has the following escape syntax for it: U'special character: \' [ UESCAPE '\' ] Man that's ugly. Why the ampersand? Yeah, excellent question. It seems completely unnecessary, but it is surely there in the syntax diagram. How do you propose to distinguish this from a perfectly legitimate use of the operator? Well, technically, there is going to be some conflict, but the practical impact should be minimal because: - There are no spaces allowed between U' . We typically suggest spaces around binary operators. - Naming a column u might not be terribly common. - Binary-and with an undecorated string literal is not very common. Of course, I have no data for these assertions. An inquiry on -general might give more insight. 2. Convert this syntax to a function call. But that would then create a lot of inconsistencies, such as needing functional indexes for matches against what should really be a literal. Uh, why do you think that? The function could surely be stable, even immutable if you grant that a database's encoding can't change. Yeah, true, that would work. There are some other disadvantages for making a function call. You couldn't use that kind of literal in any other place where the parser calls for a string constant: role names, tablespace locations, passwords, copy delimiters, enum values, function body, file names. There is also a related feature for Unicode escapes in identifiers, and it might be good to keep the door open on that. We could to a dual approach: Convert in the scanner when server encoding is UTF8, and pass on as function call otherwise. Surely ugly though. Or pass it on as a separate token type to the analyze phase, but that is a lot more work. Others: What use cases do you envision, and what requirements would they create for this feature? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Any reason to have heap_(de)formtuple?
Zdenek Kotala [EMAIL PROTECTED] writes: Currently in heaptuple.c we have duplicated code. heap_deformtuple and heap_formtuple are mark as a obsolete interface. Is any reason to have still them? I know that they are still used on many places, but is there any stopper to keep these function alive? Well, aside from the gruntwork needed to convert all the core code that still uses the old APIs, there's the prospect of breaking extension modules that still use the old APIs. It's kind of annoying to have two copies of that code, but less annoying than removing it would be ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode escapes in literals
Peter Eisentraut [EMAIL PROTECTED] writes: There are some other disadvantages for making a function call. You couldn't use that kind of literal in any other place where the parser calls for a string constant: role names, tablespace locations, passwords, copy delimiters, enum values, function body, file names. Good point. I'm okay with supporting the feature only when database encoding is UTF8. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Any reason to have heap_(de)formtuple?
Tom Lane napsal(a): Zdenek Kotala [EMAIL PROTECTED] writes: Currently in heaptuple.c we have duplicated code. heap_deformtuple and heap_formtuple are mark as a obsolete interface. Is any reason to have still them? I know that they are still used on many places, but is there any stopper to keep these function alive? Well, aside from the gruntwork needed to convert all the core code that still uses the old APIs, there's the prospect of breaking extension modules that still use the old APIs. It's kind of annoying to have two copies of that code, but less annoying than removing it would be ... What's about convert null array to boolean and call heap_form_tuple? Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Any reason to have heap_(de)formtuple?
Zdenek Kotala [EMAIL PROTECTED] writes: Tom Lane napsal(a): Well, aside from the gruntwork needed to convert all the core code that still uses the old APIs, there's the prospect of breaking extension modules that still use the old APIs. It's kind of annoying to have two copies of that code, but less annoying than removing it would be ... What's about convert null array to boolean and call heap_form_tuple? Yeah, that's a thought. We'd want to be sure we'd converted any call sites that are performance-critical, but surely the vast majority are not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Any reason to have heap_(de)formtuple?
Zdenek Kotala wrote: Tom Lane napsal(a): Well, aside from the gruntwork needed to convert all the core code that still uses the old APIs, there's the prospect of breaking extension modules that still use the old APIs. It's kind of annoying to have two copies of that code, but less annoying than removing it would be ... What's about convert null array to boolean and call heap_form_tuple? Agreed, I started doing that some time ago ... it doesn't seem all that complicated. We could try to add a #warning if an external module uses the deprecated interface, for a couple of releases, and then perhaps drop it. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Any reason to have heap_(de)formtuple?
Alvaro Herrera napsal(a): Zdenek Kotala wrote: Tom Lane napsal(a): Well, aside from the gruntwork needed to convert all the core code that still uses the old APIs, there's the prospect of breaking extension modules that still use the old APIs. It's kind of annoying to have two copies of that code, but less annoying than removing it would be ... What's about convert null array to boolean and call heap_form_tuple? Agreed, I started doing that some time ago ... it doesn't seem all that complicated. Do you have any half patch? We could try to add a #warning if an external module uses the deprecated interface, for a couple of releases, and then perhaps drop it. +1 Zdenek -- Zdenek Kotala Sun Microsystems Prague, Czech Republic http://sun.com/postgresql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] patch: Client certificate requirements
This patch adds a configuration option to pg_hba.conf for clientcert. This makes it possible to have different client certificate requirements on different connections. It also makes sure that if you specify that you want client cert verification and the root store isn't there, we give an error instead of silently allowing the user in (like we do now). This still does not implement actual client certificate validation - that's for a later step. It just cleans up the handling we have now. //Magnus *** a/src/backend/libpq/auth.c --- b/src/backend/libpq/auth.c *** *** 273,278 ClientAuthentication(Port *port) --- 273,304 errmsg(missing or erroneous pg_hba.conf file), errhint(See server log for details.))); + /* + * This is the first point where we have access to the hba record for + * the current connection, so perform any verifications based on the + * hba options field that should be done *before* the authentication + * here. + */ + if (port-hba-clientcert) + { + /* + * When we parse pg_hba.conf, we have already made sure that we have + * been able to load a certificate store. Thus, if a certificate is + * present on the client, it has been verified against our root + * certificate store, and the connection would have been aborted + * already if it didn't verify ok. + */ + if (!port-peer) + { + ereport(FATAL, + (errcode(ERRCODE_INVALID_AUTHORIZATION_SPECIFICATION), + errmsg(connection requires a valid client certificate))); + } + } + + /* + * Now proceed to do the actual authentication check + */ switch (port-hba-auth_method) { case uaReject: *** a/src/backend/libpq/be-secure.c --- b/src/backend/libpq/be-secure.c *** *** 128,133 static const char *SSLerrmessage(void); --- 128,134 #define RENEGOTIATION_LIMIT (512 * 1024 * 1024) static SSL_CTX *SSL_context = NULL; + static bool ssl_loaded_verify_locations = false; /* GUC variable controlling SSL cipher list */ char *SSLCipherSuites = NULL; *** *** 229,234 secure_destroy(void) --- 230,248 } /* + * Indicate if we have loaded the root CA store to verify certificates + */ + bool + secure_loaded_verify_locations(void) + { + #ifdef USE_SSL + return ssl_loaded_verify_locations; + #endif + + return false; + } + + /* * Attempt to negotiate secure session. */ int *** *** 780,794 initialize_SSL(void) elog(FATAL, could not set the cipher list (no valid ciphers available)); /* ! * Require and check client certificates only if we have a root.crt file. */ ! if (!SSL_CTX_load_verify_locations(SSL_context, ROOT_CERT_FILE, NULL)) { ! /* Not fatal - we do not require client certificates */ ereport(LOG, (errmsg(could not load root certificate file \%s\: %s, ROOT_CERT_FILE, SSLerrmessage()), ! errdetail(Will not verify client certificates.))); } else { --- 794,821 elog(FATAL, could not set the cipher list (no valid ciphers available)); /* ! * Attempt to load CA store, so we can verify client certificates if needed. */ ! if (access(ROOT_CERT_FILE, R_OK)) { ! /* ! * Root certificate file simply not found. Don't log an error here, because ! * it's quite likely the user isn't planning on using client certificates. ! */ ! ssl_loaded_verify_locations = false; ! } ! else if (!SSL_CTX_load_verify_locations(SSL_context, ROOT_CERT_FILE, NULL)) ! { ! /* ! * File was there, but we could not load it. This means the file is somehow ! * broken, and we should log this. Don't log it as a fatal error, because ! * there is still a chance that the user isn't going to use client certs. ! */ ! ssl_loaded_verify_locations = false; ereport(LOG, (errmsg(could not load root certificate file \%s\: %s, ROOT_CERT_FILE, SSLerrmessage()), ! errdetail(Will not be able to verify client certificates.))); } else { *** *** 821,833 initialize_SSL(void) ROOT_CRL_FILE, SSLerrmessage()), errdetail(Certificates will not be checked against revocation list.))); } - } ! SSL_CTX_set_verify(SSL_context, ! (SSL_VERIFY_PEER | ! SSL_VERIFY_FAIL_IF_NO_PEER_CERT | ! SSL_VERIFY_CLIENT_ONCE), ! verify_cb); } } --- 848,865 ROOT_CRL_FILE, SSLerrmessage()), errdetail(Certificates will not be checked against revocation list.))); } ! /* ! * Always ask for SSL client cert, but don't fail if it's not presented. We'll fail later in this case, ! * based on what we find in pg_hba.conf. ! */ ! SSL_CTX_set_verify(SSL_context, ! (SSL_VERIFY_PEER | ! SSL_VERIFY_CLIENT_ONCE), ! verify_cb); ! ! ssl_loaded_verify_locations = true; ! } } } *** a/src/backend/libpq/hba.c --- b/src/backend/libpq/hba.c *** *** 873,878
Re: [HACKERS] Making pg_standby compression-friendly
In terms of compress/decompress WAL in archive/restore, please take a look at my project pglesslog, http://pgfoundry.org/projects/pglesslog/ This project compresses WAL segment by replacing full page writes with corresponding incremental logs. When restored, it inserts dummy WAL record to maintain LSN and file size. This can be applied to log-shipping mechanism, asynchronous or synchronous. 2008/10/23 Charles Duffy [EMAIL PROTECTED]: Howdy, all. I'm interested in compressing archived WAL segments in an environment set up for PITR in the interests of reducing both network traffic and storage requirements. However, pg_standby presently checks file sizes, requiring that an archive segment be exactly the right size to be considered valid. The idea of compressing log segments is not new -- the clearxlogtail project in pgfoundry provides a tool to make such compression more effective, and is explicitly intended for said purpose -- but as of 8.3.4, pg_standby appears not to support such environments; I propose adding such support. To allow pg_standby to operate in an environment where archive segments are compressed, two behaviors are necessary: - suppressing the file-size checks. This puts the onus on the user to create these files via an atomic mechanism, but is necessary to allow compressed files to be considered. - allowing a custom restore command to be provided. This permits the user to specify the mechanism to be used to decompress the segment. One bikeshed is determining whether the user should pass in a command suitable for use in a pipeline or a command which accepts input and output as arguments. A sample implementation is attached, intended only to kickstart discussion; I'm not attached to either its implementation or its proposed command-line syntax. Thoughts? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making pg_standby compression-friendly
Koichi Suzuki wrote: In terms of compress/decompress WAL in archive/restore, please take a look at my project pglesslog, http://pgfoundry.org/projects/pglesslog/ This project compresses WAL segment by replacing full page writes with corresponding incremental logs. When restored, it inserts dummy WAL record to maintain LSN and file size. This can be applied to log-shipping mechanism, asynchronous or synchronous. I believe Charles' question was: how do you hook that decompression into pg_standby? I suggested that whatever script is run on the standby server to copy xlog files to the archive location, should also call the decompression program, like pglesslog, but apparently there is no such script in his setup. How would you set up a standby server, using pg_lesslog? 2008/10/23 Charles Duffy [EMAIL PROTECTED]: Howdy, all. I'm interested in compressing archived WAL segments in an environment set up for PITR in the interests of reducing both network traffic and storage requirements. However, pg_standby presently checks file sizes, requiring that an archive segment be exactly the right size to be considered valid. The idea of compressing log segments is not new -- the clearxlogtail project in pgfoundry provides a tool to make such compression more effective, and is explicitly intended for said purpose -- but as of 8.3.4, pg_standby appears not to support such environments; I propose adding such support. To allow pg_standby to operate in an environment where archive segments are compressed, two behaviors are necessary: - suppressing the file-size checks. This puts the onus on the user to create these files via an atomic mechanism, but is necessary to allow compressed files to be considered. - allowing a custom restore command to be provided. This permits the user to specify the mechanism to be used to decompress the segment. One bikeshed is determining whether the user should pass in a command suitable for use in a pipeline or a command which accepts input and output as arguments. A sample implementation is attached, intended only to kickstart discussion; I'm not attached to either its implementation or its proposed command-line syntax. Thoughts? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Any reason to have heap_(de)formtuple?
Zdenek Kotala wrote: Alvaro Herrera napsal(a): Agreed, I started doing that some time ago ... it doesn't seem all that complicated. Do you have any half patch? Couldn't find it here, sorry :-( -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: Client certificate requirements
Magnus Hagander wrote: This patch adds a configuration option to pg_hba.conf for clientcert. This makes it possible to have different client certificate requirements on different connections. It also makes sure that if you specify that you want client cert verification and the root store isn't there, we give an error instead of silently allowing the user in (like we do now). This still does not implement actual client certificate validation - that's for a later step. It just cleans up the handling we have now. Uh, with docs. //Magnus *** a/doc/src/sgml/runtime.sgml --- b/doc/src/sgml/runtime.sgml *** *** 1646,1658 $ userinputkill -INT `head -1 /usr/local/pgsql/data/postmaster.pid`/userinput been entered. /para ! para To require the client to supply a trusted certificate, place certificates of the certificate authorities (acronymCA/acronym) you trust in the file filenameroot.crt/filename in the data !directory. A certificate will then be requested from the client during SSL connection startup. (See xref linkend=libpq-ssl for a !description of how to set up client certificates.) The server will verify that the client's certificate is signed by one of the trusted certificate authorities. Certificate Revocation List (CRL) entries are also checked if the file filenameroot.crl/filename exists. --- 1646,1662 been entered. /para ! sect2 id=ssl-client-certificates !titleUsing client certificates/title !para To require the client to supply a trusted certificate, place certificates of the certificate authorities (acronymCA/acronym) you trust in the file filenameroot.crt/filename in the data !directory, and set the literalclientcert/literal parameter !to literal1/literal on the appropriate line(s) in pg_hba.conf. !A certificate will then be requested from the client during SSL connection startup. (See xref linkend=libpq-ssl for a !description of how to set up certificates on the client.) The server will verify that the client's certificate is signed by one of the trusted certificate authorities. Certificate Revocation List (CRL) entries are also checked if the file filenameroot.crl/filename exists. *** *** 1663,1673 $ userinputkill -INT `head -1 /usr/local/pgsql/data/postmaster.pid`/userinput /para para !If the filenameroot.crt/filename file is not present, client !certificates will not be requested or checked. In this mode, SSL !provides encrypted communication but not authentication. /para para The files filenameserver.key/, filenameserver.crt/, filenameroot.crt/filename, and filenameroot.crl/filename --- 1667,1689 /para para !The literalclientcert/literal option in filenamepg_hba.conf/ !is available for all authentication methods, but only for rows !specified as literalhostssl/. Unless specified, the default is !not to verify the client certificate. ! /para ! ! para !productnamePostgreSQL/ currently does not support authentication !using client certificates, since it cannot differentiate between !different users. As long as the user holds any certificate issued !by a trusted CA it will be accepted, regardless of what account the !user is trying to connect with. /para + /sect2 + sect2 id=ssl-server-files +titleSSL Server File Usage/title para The files filenameserver.key/, filenameserver.crt/, filenameroot.crt/filename, and filenameroot.crl/filename *** *** 1704,1710 $ userinputkill -INT `head -1 /usr/local/pgsql/data/postmaster.pid`/userinput row entryfilenameroot.crt//entry entrytrusted certificate authorities/entry ! entryrequests client certificate; checks certificate is signed by a trusted certificate authority/entry /row --- 1720,1726 row entryfilenameroot.crt//entry entrytrusted certificate authorities/entry ! entrychecks that client certificate is signed by a trusted certificate authority/entry /row *** *** 1717,1722 $ userinputkill -INT `head -1 /usr/local/pgsql/data/postmaster.pid`/userinput --- 1733,1739 /tbody /tgroup /table + /sect2 sect2 id=ssl-certificate-creation titleCreating a Self-Signed Certificate/title *** a/src/backend/libpq/auth.c --- b/src/backend/libpq/auth.c *** *** 273,278 ClientAuthentication(Port *port) --- 273,304 errmsg(missing or erroneous pg_hba.conf file), errhint(See server log for details.))); + /* + * This is the first point where we have access to the hba record for + * the current connection, so perform any verifications based on the + * hba options field that should be done *before* the authentication + * here. + */ + if (port-hba-clientcert) + {
Re: [HACKERS] double-buffering page writes
ITAGAKI Takahiro wrote: I have some comments about the double-buffering: Since posting this patch I have realized that this implementation is bogus. I'm now playing with WAL-logging hint bits though. As to your questions: - Are there any performance degradation because of addtional memcpy? 8kB of memcpy seems not to be free. Of course, it is not free. However it comes with the benefit that we can release the io_in_progress lock earlier for the block -- we lock, copy, unlock; whereas the old code did lock, write(), unlock. Avoding a system call in the locked area could be a win. Whether this is a net benefit is something that I have not measured. - Is it ok to allocale dblbuf[BLCKSZ] as local variable? It might be unaligned. AFAICS we avoid such usages in other places. I thought about that too. I admit I am not sure if this really works portably; however I don't want to add a palloc() to that routine. - It is the best if we can delay double-buffering until locks are conflicted actually. But we might need to allocale shadow buffers from shared buffers instead of local memory. The point of double-buffering is that the potential writer (a process doing concurrent hint-bit setting) is not going to grab any locks. - Are there any other modules that can share in the benefits of double-buffering? For example, we could avoid avoid waiting for LockBufferForCleanup(). It is cool if the double-buffering can be used for multiple purposes. Not sure on this. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL cleanups/hostname verification
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: The patch seems pretty far short of sufficient as far as supporting a new conninfo option goes --- for instance it appears to leak the string at disconnect. Check through all the references to some existing option field to see if you missed anything else. Looking over it again, that's pretty much the only one I find. I checked against sslmode. Did you spot something else? Found a bug in the GSSAPI code though, which also does not free :) Will fix. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level concurrency during recovery
Simon Riggs wrote: It would also be possible to introduce a special tweak there which is that if the block is not in cache, don't read it in at all. If its not in cache we know that nobody has a pin on it, so don't need to read it in just to say got the lock. That icing for later. Heh, that's clever :-). We could actually use a method like that to solve the whole problem. After the (replay of the) b-tree vacuum is finished, scan through all shared buffers, and get a vacuum lock on every page of that index that's in cache. Groveling through all shared buffers would be slower for small indexes, though. I believe the vacuum lock every leaf page behavior is only needed for system catalogs. You have other issues with those still, namely that table locks are not yet taken appropriately, so I'm not sure if this is worth worrying about until that's done. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode escapes in literals
On Thu, Oct 23, 2008 at 06:04:43PM +0300, Peter Eisentraut wrote: Man that's ugly. Why the ampersand? Yeah, excellent question. It seems completely unnecessary, but it is surely there in the syntax diagram. Probably because many Unicode representations are done with U+ followed by 4-6 hexadecimal units, but + is problematic for other reasons (in some vendor's implementation)? A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] double-buffering page writes
Alvaro Herrera wrote: ITAGAKI Takahiro wrote: I have some comments about the double-buffering: Since posting this patch I have realized that this implementation is bogus. I'm now playing with WAL-logging hint bits though. Yeah, the torn page + hint bit updates problem is the tough question. - Is it ok to allocale dblbuf[BLCKSZ] as local variable? It might be unaligned. AFAICS we avoid such usages in other places. I thought about that too. I admit I am not sure if this really works portably; however I don't want to add a palloc() to that routine. It should work, AFAIK, but unaligned memcpy()s and write()s can be a significantly slower. There can be only one write() happening at any time, so you could just palloc() a single 8k buffer in TopMemoryContext in backend startup, and always use that. - Are there any other modules that can share in the benefits of double-buffering? For example, we could avoid avoid waiting for LockBufferForCleanup(). It is cool if the double-buffering can be used for multiple purposes. Not sure on this. You'd need to keep both versions of the buffer simultaneously in the buffer cache for that. When we talked about the various designs for HOT, that was one of the ideas I had to enable more aggressive pruning: if you can't immediately get a vacuum lock, allocate a new buffer in the buffer cache for the same block, copy the page to the new buffer, and do the pruning, including moving tuples around, there. Any new ReadBuffer calls would return the new page version, but old readers would keep referencing the old one. The intrusive part of that approach, in addition to the obvious changes required in the buffer manager to keep around multiple copies of the same block, is that all modifications must be done on the new version, so anyone who needs to lock the page for modification would need to switch to the new page version at the LockBuffer call. As discussed in the other thread with Simon, we also use vacuum locks in b-tree for waiting out index scans, so avoiding the waiting there would be just wrong. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL cleanups/hostname verification
Magnus Hagander [EMAIL PROTECTED] writes: Looking over it again, that's pretty much the only one I find. I checked against sslmode. Did you spot something else? No, I just thought that it looked like too small a patch for adding a new conn option, and complained as soon as I found something missing. If you've got it working the same as sslmode then it should be okay. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode escapes in literals
Andrew Sullivan [EMAIL PROTECTED] writes: On Thu, Oct 23, 2008 at 06:04:43PM +0300, Peter Eisentraut wrote: Yeah, excellent question. It seems completely unnecessary, but it is surely there in the syntax diagram. Probably because many Unicode representations are done with U+ followed by 4-6 hexadecimal units, but + is problematic for other reasons (in some vendor's implementation)? They could hardly ignore the conflict with the operator interpretation for +. The committee has now cut themselves off from ever having a standard operator named , but I suppose they didn't think ahead to that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level concurrency during recovery
On Thu, 2008-10-23 at 19:21 +0300, Heikki Linnakangas wrote: Simon Riggs wrote: It would also be possible to introduce a special tweak there which is that if the block is not in cache, don't read it in at all. If its not in cache we know that nobody has a pin on it, so don't need to read it in just to say got the lock. That icing for later. Heh, that's clever :-). We could actually use a method like that to solve the whole problem. After the (replay of the) b-tree vacuum is finished, scan through all shared buffers, and get a vacuum lock on every page of that index that's in cache. Groveling through all shared buffers would be slower for small indexes, though. Well, re-examining all of the assumptions in the code seems to have been worthwhile so far. I think that makes 4 significant tweaks that have come out of the Search For Hot Standby. I believe the vacuum lock every leaf page behavior is only needed for system catalogs. So we will still need it then. Which is good 'cos I just wrote it. You have other issues with those still, namely that table locks are not yet taken appropriately, so I'm not sure if this is worth worrying about until that's done. Please explain. If you know of a correctness issue, please say. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] double-buffering page writes
Heikki Linnakangas [EMAIL PROTECTED] writes: Alvaro Herrera wrote: I thought about that too. I admit I am not sure if this really works portably; however I don't want to add a palloc() to that routine. It should work, AFAIK, but unaligned memcpy()s and write()s can be a significantly slower. There can be only one write() happening at any time, so you could just palloc() a single 8k buffer in TopMemoryContext in backend startup, and always use that. Some time ago, we arranged for shared buffers to be aligned on *more* than maxalign boundaries (cf BUFFERALIGN) because on at least some platforms this makes a very significant difference in the speed of copying to/from kernel space. If you are going to double-buffer it is going to be important to have the intermediate buffer just as well aligned. A local char array won't be acceptable, and even for a palloc'd one you'll need to take some extra steps (like wasting ALIGNOF_BUFFER extra bytes so you can align the pointer palloc gives). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level concurrency during recovery
Simon Riggs wrote: On Thu, 2008-10-23 at 19:21 +0300, Heikki Linnakangas wrote: You have other issues with those still, namely that table locks are not yet taken appropriately, so I'm not sure if this is worth worrying about until that's done. Please explain. If you know of a correctness issue, please say. Things like DROP TABLE need to take an AccessExclusiveLock, and you don't have that yet, no? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql Feature request \set query
Everyone, What people asking for psql scriptability really want, in my estimate, is the ability to write SQL plus some control structures anywhere, in the server, in the client, or so that they don't have to know where. Commercial vendors have that: Oracle has PL/SQL as server-side language and client-side language. Microsoft has T-SQL. The others have similar offerings. I'm *not* proposing to implement full scriptability in psql. Just one obvious extension of the existing pgsql variable feature. I agree that giving psql its own unique scripting language would be stupid. We should use something existing. Ideally, we should make psql pluggable with the Perl Python debuggers. --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block level concurrency during recovery
On Thu, 2008-10-23 at 20:24 +0300, Heikki Linnakangas wrote: Simon Riggs wrote: On Thu, 2008-10-23 at 19:21 +0300, Heikki Linnakangas wrote: You have other issues with those still, namely that table locks are not yet taken appropriately, so I'm not sure if this is worth worrying about until that's done. Please explain. If you know of a correctness issue, please say. Things like DROP TABLE need to take an AccessExclusiveLock, and you don't have that yet, no? Oh right, I thought you meant something not already in the plan. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Rework subtransaction commit protocol for hot standby.
On Thu, 2008-10-23 at 04:38 +0100, Simon Riggs wrote: That means that out of the four state transitions that are disallowed by the original coding of that Assert, you are now having to consider two as legal. I don't like that, and I like even less that it's not even trying to determine whether this is a replay-driven change. Possible state changes TRANSACTION_STATUS_IN_PROGRESS to TRANSACTION_STATUS_IN_PROGRESS is allowed TRANSACTION_STATUS_COMMITTED is allowed TRANSACTION_STATUS_ABORTED is allowed TRANSACTION_STATUS_SUB_COMMITTED is allowed TRANSACTION_STATUS_SUB_COMMITTED to TRANSACTION_STATUS_IN_PROGRESS is allowed (but should not be) TRANSACTION_STATUS_COMMITTED is allowed TRANSACTION_STATUS_ABORTED is allowed TRANSACTION_STATUS_SUB_COMMITTED is allowed TRANSACTION_STATUS_COMMITTED to TRANSACTION_STATUS_IN_PROGRESS is disallowed TRANSACTION_STATUS_COMMITTED is allowed TRANSACTION_STATUS_ABORTED is disallowed TRANSACTION_STATUS_SUB_COMMITTED is ignored in redo only TRANSACTION_STATUS_ABORTED to TRANSACTION_STATUS_IN_PROGRESS is disallowed TRANSACTION_STATUS_COMMITTED is disallowed TRANSACTION_STATUS_ABORTED is allowed TRANSACTION_STATUS_SUB_COMMITTED is disallowed So out of 16 possible state change requests 10 were previously allowed, one of which was allowed but should not have been. This patch allows 1 additional legal state change request, now in redo only. There are still 5 disallowed state changes, plus another one disallowed in normal running. That seems fine. Presumably you would like to see an additional parameter to allow that test to be more strictly determined? Bug fix v2 patch enclosed, mostly API changes. I suggest a third version with these changes: * Write the SUBCOMMITTED to COMMIT transition as a no-op during redo rather than as an Assert. This prevents a transition from COMMIT to SUBCOMMIT to ABORT. By making it a no-op the attempt to set COMMIT to SUBCOMMIT never causes a failure, but it doesn't take place either. * Disallow SUBCOMMITTED to IN_PROGRESS transition via an Assert. What do you think? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regression in IN( field, field, field ) performance
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Works fine for me, eg I think he's looking for something like: 5 IN (col1,col2,col3) resulting in a bitmap or of three index scans of three different indexes on col1, col2, and col3. Ah, I see. It would be easy to make transformAExprIn() generate an OR tree instead of = ANY(ARRAY[]), if we could figure out the conditions where an OR tree is superior. I'm not sure it's easy to tell though. Is it sufficient to do this when there are Vars on the right side and none on the left? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Any reason to have heap_(de)formtuple?
On Thu, 23 Oct 2008, Tom Lane wrote: Well, aside from the gruntwork needed to convert all the core code that still uses the old APIs, there's the prospect of breaking extension modules that still use the old APIs. It's kind of annoying to have two copies of that code, but less annoying than removing it would be ... The problem with trying to deprecate it is that the vast majority of the backend is still using the old interfaces, so people looking for inspiration for their external modules will likely end up using the old interface. Like Alvaro I started this conversion a while ago, got bored, and forgot about it. If people do want this conversion done while keeping the old interface around, I can track down that patch, update it and finish it up for the next CommitFest. Kris Jurka -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby utility and administrator functions
On Mon, 2008-10-20 at 10:25 +0100, Simon Riggs wrote: What else do we need? * pg_freeze_recovery() * pg_unfreeze_recovery() Two more functions pg_freeze_recovery_cleanup() pg_unfreeze_recovery_cleanup() These would allow recovery to continue normally, except for row removal operations which would halt the progress of recovery. It would eventually be possible to have a function that halts recovery whenever row removal takes place for a list of tables. Not planning on implementing that initially though. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL cleanups/hostname verification
Magnus Hagander wrote: Attached patch cleans up the certificate verification in libpq, and adds a configuration paraqmeter to control it. The new parameter is sslverify, and can be set to: Because SSL offers both encryption and authentication, I wonder if we should call this sslauthenticate. * cn = default = will validate that the certificate chains to a trusted root, *and* that the cn on the certificate matches the hostname specificed in the connection. This is the only option that prevents man-in-the-middle attacks completely, and therefor is the default. Should this be commonname? * cert = what we had before if there was a root certificate file = will validate that the certificate chains to a trusted root, but ignore the cn. Should this be chain? * none = will disable certificate validation completely This means that the connection string is now in charge of the security policy, and not just the if file exists or not. IMHO this is the only proper way to do it. Now, if you for some reason loose the root Agreed. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Any reason to have heap_(de)formtuple?
The sad thing us that I also did a patch for this and lost it. I think it wouldn't be too hard to convert all the call sites in the backend and provide a wrapper for other users. greg On 23 Oct 2008, at 08:59 PM, Kris Jurka [EMAIL PROTECTED] wrote: On Thu, 23 Oct 2008, Tom Lane wrote: Well, aside from the gruntwork needed to convert all the core code that still uses the old APIs, there's the prospect of breaking extension modules that still use the old APIs. It's kind of annoying to have two copies of that code, but less annoying than removing it would be ... The problem with trying to deprecate it is that the vast majority of the backend is still using the old interfaces, so people looking for inspiration for their external modules will likely end up using the old interface. Like Alvaro I started this conversion a while ago, got bored, and forgot about it. If people do want this conversion done while keeping the old interface around, I can track down that patch, update it and finish it up for the next CommitFest. Kris Jurka -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
Julius Stroffek wrote: Hi All, we would like to start some work on improving the performance of PostgreSQL in a multi-CPU environment. Dano Vojtek is student at the Faculty of Mathematics and Physics of Charles university in Prague (http://www.mff.cuni.cz) and he is going to cover this topic in his master thesis. He is going to do some investigation in the methods and write down the possibilities and then he is going to implement something from that for PostgreSQL. We want to come out with a serious proposal for this work after collecting the feedback/opinions and doing the more serious investigation. Exciting stuff, and clearly a direction we need to explore. Topics that seem to be of interest and most of them were already discussed at developers meeting in Ottawa are 1.) parallel sorts 2.) parallel query execution 3.) asynchronous I/O I think the current plan is to use posix_advise() to allow parallel I/O, rather than async I/O becuase posix_advise() will require fewer code changes. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
On Thu, 23 Oct 2008, Bruce Momjian wrote: I think the current plan is to use posix_advise() to allow parallel I/O, rather than async I/O becuase posix_advise() will require fewer code changes. These are not necessarily mutually exclusive designs. fadvise works fine on Linux, but as far as I know only async I/O works on Solaris. Linux also has an async I/O library, and it's not clear to me yet whether that might work even better than the fadvise approach. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL cleanups/hostname verification
Bruce Momjian wrote: Magnus Hagander wrote: Attached patch cleans up the certificate verification in libpq, and adds a configuration paraqmeter to control it. The new parameter is sslverify, and can be set to: Because SSL offers both encryption and authentication, I wonder if we should call this sslauthenticate. I think that would confuse people with actual SSL certificate based authentication, which we do not (yet) have. * cn = default = will validate that the certificate chains to a trusted root, *and* that the cn on the certificate matches the hostname specificed in the connection. This is the only option that prevents man-in-the-middle attacks completely, and therefor is the default. Should this be commonname? cn is a fairly standard way to refer to it, but if people think the longer name is better, I'm ok with changing it. * cert = what we had before if there was a root certificate file = will validate that the certificate chains to a trusted root, but ignore the cn. Should this be chain? Could be, not sure. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SSL README
I notice that there is a src/backend/libpq/README.SSL that contains a couple of copied emails in it (it even contains our end-of-broadcast message string!) And it contains a reply to the mail, with quoted stuff! In general, this seem to be some hints about how to use things. Things that if we need them at all, should go in the main manual. It's certainly not code documentation, so it should not go in a README file in the sourcetree. My suggestion is to just get rid of any enduser docs from that one, and possibly add stuff to the main manual once we feel it's needed. The part that documents the SSL negotiation protocol inside libpq should stay, of course. Any objections? //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL README
While you're cleaning up SSL, how about the thread with this email on it: 19212172.post%40talk.nabble.com -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
On Thu, Oct 23, 2008 at 4:53 PM, Greg Smith [EMAIL PROTECTED] wrote: I think the current plan is to use posix_advise() to allow parallel I/O, rather than async I/O becuase posix_advise() will require fewer code changes. These are not necessarily mutually exclusive designs. fadvise works fine on Linux, but as far as I know only async I/O works on Solaris. Linux also has an async I/O library, and it's not clear to me yet whether that might work even better than the fadvise approach. fadvise is a kludge. While it will help, it still makes us completely reliant on the OS. For performance reasons, we should be supporting a multi-block read directly into shared buffers. IIRC, we currently have support for rings in the buffer pool, which we could read directly into. Though, an LRU-based buffer manager design would be more optimal in this case. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL README
Magnus Hagander wrote: I notice that there is a src/backend/libpq/README.SSL that contains a couple of copied emails in it (it even contains our end-of-broadcast message string!) And it contains a reply to the mail, with quoted stuff! In general, this seem to be some hints about how to use things. Things that if we need them at all, should go in the main manual. It's certainly not code documentation, so it should not go in a README file in the sourcetree. My suggestion is to just get rid of any enduser docs from that one, and possibly add stuff to the main manual once we feel it's needed. The part that documents the SSL negotiation protocol inside libpq should stay, of course. Any objections? Agreed, remove it. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
Jonah H. Harris wrote: On Thu, Oct 23, 2008 at 4:53 PM, Greg Smith [EMAIL PROTECTED] wrote: I think the current plan is to use posix_advise() to allow parallel I/O, rather than async I/O becuase posix_advise() will require fewer code changes. These are not necessarily mutually exclusive designs. fadvise works fine on Linux, but as far as I know only async I/O works on Solaris. Linux also has an async I/O library, and it's not clear to me yet whether that might work even better than the fadvise approach. fadvise is a kludge. While it will help, it still makes us completely reliant on the OS. For performance reasons, we should be supporting a multi-block read directly into shared buffers. IIRC, we currently have support for rings in the buffer pool, which we could read directly into. Though, an LRU-based buffer manager design would be more optimal in this case. True, it is a kludge but if it gives us 95% of the benfit with 10% of the code, it is a win. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL README
Alvaro Herrera wrote: While you're cleaning up SSL, how about the thread with this email on it: 19212172.post%40talk.nabble.com Yeah, I mentioned this to Magnus this morning (my time) and he said Bruce was compiling a patch in time for the next commit fest. I'm not sure where it's all at, but an official update would be helpful. Russell. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL README
Russell Smith wrote: Alvaro Herrera wrote: While you're cleaning up SSL, how about the thread with this email on it: 19212172.post%40talk.nabble.com Yeah, I mentioned this to Magnus this morning (my time) and he said Bruce was compiling a patch in time for the next commit fest. I'm not sure where it's all at, but an official update would be helpful. Yep, got the patch done, actually; will post it now. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Reducing the memory footprint of large sets of pending triggers
We've occasionally talked about allowing pending-trigger-event lists to spill to disk when there get to be so many events that it's a memory problem. I'm not especially interested in doing that right now, but I noticed recently that we could alleviate the problem a lot by adopting a more compact representation. Currently, each event is a separately palloc'd instance of struct AfterTriggerEventData. On a 32-bit machine that struct takes 32 bytes, plus 8 bytes palloc overhead = 40 bytes. On a 64-bit machine the struct takes 36 bytes, but palloc rounds that up to 64 bytes, plus there's 16 bytes palloc overhead = 80 bytes :-(. I see several things we could do here: * Allocate the event structs in reasonably-large arrays instead of separate palloc chunks. This would require some data copying where we now get away with pointer-swinging --- but on the other hand per-event palloc'ing isn't exactly free either, so I suspect that this would net out to a wash if not an actual speedup. * Don't store the ate_tgoid and ate_relid fields in each individual event struct. Instead keep a separate array with one instance of these values for each distinct trigger that's been fired in the current transaction (in most cases this list should be pretty short, even if there are many events). We can commandeer the high order bits of ate_event to store an index into that array. Currently only 8 bits of ate_event are actually used, so we'd have room for 16 million distinct triggers fired in a transaction. Even if we need a few more ate_event flag bits later, I don't see a problem there. * Don't store two ItemPointers in insert or delete events. This would make the array element stride variable, but since we don't need random access into the arrays AFAICS, that doesn't seem to be a problem. In combination these changes would get us down to 16 bytes per insert/delete and 20 per update event, which represents a factor of 2 or 2.5 savings on a 32-bit machine and a factor of 4 or 5 on a 64-bit machine. Seems worth doing to me, especially since it looks like only about a 1-day project touching only a single source file. It might be possible to go further and move the event status bits and firing_id into the separate array, which would save a further four bytes per event in the typical situation that a lot of events of the same trigger are queued by a single command. I think I'd want to tackle that as a follow-on patch though, because it would be a change in the data structure semantics not just rearranging the representation a tad. Comments, better ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
I couldn't get async I/O to work on Linux. That is it worked but performed the same as reading one block at a time. On solaris the situation is reversed. In what way is fadvise a kludge? greg On 24 Oct 2008, at 01:44 AM, Bruce Momjian [EMAIL PROTECTED] wrote: Jonah H. Harris wrote: On Thu, Oct 23, 2008 at 4:53 PM, Greg Smith [EMAIL PROTECTED] wrote: I think the current plan is to use posix_advise() to allow parallel I/O, rather than async I/O becuase posix_advise() will require fewer code changes. These are not necessarily mutually exclusive designs. fadvise works fine on Linux, but as far as I know only async I/O works on Solaris. Linux also has an async I/O library, and it's not clear to me yet whether that might work even better than the fadvise approach. fadvise is a kludge. While it will help, it still makes us completely reliant on the OS. For performance reasons, we should be supporting a multi-block read directly into shared buffers. IIRC, we currently have support for rings in the buffer pool, which we could read directly into. Though, an LRU-based buffer manager design would be more optimal in this case. True, it is a kludge but if it gives us 95% of the benfit with 10% of the code, it is a win. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
Greg Stark wrote: I couldn't get async I/O to work on Linux. That is it worked but performed the same as reading one block at a time. On solaris the situation is reversed. In what way is fadvise a kludge? I think he is saying AIO gives us more flexibility, but I am unsure we need it. --- greg On 24 Oct 2008, at 01:44 AM, Bruce Momjian [EMAIL PROTECTED] wrote: Jonah H. Harris wrote: On Thu, Oct 23, 2008 at 4:53 PM, Greg Smith [EMAIL PROTECTED] wrote: I think the current plan is to use posix_advise() to allow parallel I/O, rather than async I/O becuase posix_advise() will require fewer code changes. These are not necessarily mutually exclusive designs. fadvise works fine on Linux, but as far as I know only async I/O works on Solaris. Linux also has an async I/O library, and it's not clear to me yet whether that might work even better than the fadvise approach. fadvise is a kludge. While it will help, it still makes us completely reliant on the OS. For performance reasons, we should be supporting a multi-block read directly into shared buffers. IIRC, we currently have support for rings in the buffer pool, which we could read directly into. Though, an LRU-based buffer manager design would be more optimal in this case. True, it is a kludge but if it gives us 95% of the benfit with 10% of the code, it is a win. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
On Thu, Oct 23, 2008 at 8:44 PM, Bruce Momjian [EMAIL PROTECTED] wrote: True, it is a kludge but if it gives us 95% of the benfit with 10% of the code, it is a win. I'd say, optimistically, maybe 30-45% the benefit over a proper multi-block read using O_DIRECT. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
On Thu, Oct 23, 2008 at 10:36 PM, Greg Stark [EMAIL PROTECTED] wrote: I couldn't get async I/O to work on Linux. That is it worked but performed the same as reading one block at a time. On solaris the situation is reversed. Hmm, then obviously you did something wrong, because my tests showed it quite well. Pull the source to iozone or fio. In what way is fadvise a kludge? non-portable, requires more user-to-system CPU, ... need I go on? -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
Jonah H. Harris [EMAIL PROTECTED] writes: On Thu, Oct 23, 2008 at 10:36 PM, Greg Stark In what way is fadvise a kludge? non-portable, requires more user-to-system CPU, ... need I go on? I'd be interested to know which of these proposals you claim *is* portable. The single biggest reason to reject 'em all is that they aren't. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
On Fri, Oct 24, 2008 at 12:42 AM, Tom Lane [EMAIL PROTECTED] wrote: non-portable, requires more user-to-system CPU, ... need I go on? I'd be interested to know which of these proposals you claim *is* portable. The single biggest reason to reject 'em all is that they aren't. Yes, that was bad wording on my part. What I mean to say was unpredictable. Different OSes and filesystems handle fadvise differently (or not at all), which makes any claim to performance gain configuration-dependent. My preferred method, using O_DIRECT and fetching directly into shared buffers, is not without its issues or challenges as well. However, by abstracting the multi-block read interface, we could use more optimal calls depending on the OS. Having done a bit of research and testing in this area (AIO and buffer management), I don't see any easy solution. fadvise will work on some systems and will likely give some gain on them, but won't work for everyone. The alternative is to abstract prefetching and allow platform-specific code, which we rarely do. While we could build an abstract prefetch interface and simply use fadvise for it now (rather than OS-specific code), I don't see an easy win in any case. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
On 24 Oct 2008, at 04:31 AM, Jonah H. Harris [EMAIL PROTECTED] wrote: On Thu, Oct 23, 2008 at 10:36 PM, Greg Stark [EMAIL PROTECTED] wrote: I couldn't get async I/O to work on Linux. That is it worked but performed the same as reading one block at a time. On solaris the situation is reversed. Hmm, then obviously you did something wrong, because my tests showed it quite well. Pull the source to iozone or fio. I posted the source, feel free to point out what I did wrong. It did work on solaris with and without o_direct so I didn't think it was a bug in my code. In what way is fadvise a kludge? non-portable, requires more user-to-system CPU, ... need I go on? Well it's just as portable, they're both specified by posix. Actually async I/o is in the real-time extensions so one could argue it's less portable. Also before posix_fadvise there was plain old fadvise so it's portable to older platforms too whereas async I/o isn't. Posix_fadvise does require two syscalls and two trips to the buffer manager. But that doesn't really make it a kludge if the resulting code is cleaner than the async I/o code would be. To use async I/o we would have to pin all the buffers we're reading which would be quite a lot of code changes. I did ask for feedback on precisely this point of whether two trips to the buffer manager was a problem. It would have been nice to get the feedback 6 months ago when I posted it instead of now two weeks before feature freeze. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
Based on what? I did test this and posted the data. The results I posted showed that posix_fadvise on Linux performed nearly as well on Linux as async I/O on Solaris on identical hardware. More importantly it scaled with the number if drives. A 15 drive array gets about 15x the performance of a 1 drive array if enough read-ahead is done. Plus an extra boost if the input wasn't already sorted which presumably reflects the better i/o ordering. -- greg On 24 Oct 2008, at 04:29 AM, Jonah H. Harris [EMAIL PROTECTED] wrote: On Thu, Oct 23, 2008 at 8:44 PM, Bruce Momjian [EMAIL PROTECTED] wrote: True, it is a kludge but if it gives us 95% of the benfit with 10% of the code, it is a win. I'd say, optimistically, maybe 30-45% the benefit over a proper multi-block read using O_DIRECT. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
We did discuss this in Ottawa and I beleive your comment was the hint is in the name referring to posix_fadvise. In any case both aio and posix_fadvise are specified by posix so I don't see either as a problem on that front. I don't think we can ignore any longer that we effectively can't use raid arrays with postgres. If you have many concurrent queries or restrict yourself to sequential scans you're ok but if you're doing data warehousing you're going to be pretty disappointed to see your shiny raid array performing like a single drive. greg On 24 Oct 2008, at 05:42 AM, Tom Lane [EMAIL PROTECTED] wrote: Jonah H. Harris [EMAIL PROTECTED] writes: On Thu, Oct 23, 2008 at 10:36 PM, Greg Stark In what way is fadvise a kludge? non-portable, requires more user-to-system CPU, ... need I go on? I'd be interested to know which of these proposals you claim *is* portable. The single biggest reason to reject 'em all is that they aren't. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers