Re: [HACKERS] PostgreSQL + Replicator developer meeting 10/28
On Tue, 2008-10-28 at 22:37 -0300, Alvaro Herrera wrote: Hannu Krosing wrote: On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote: The two obvious problems with the existing MCP architecture is: 1. Single point of failure For async replication there is always SPoF, at least the master until first slave has aquired log is a SPoF, or do you plan that both Master and MCP|Slave to keep the log and be able to step in for each other if the other fails? Yeah, with the new architecture there is still going to be a bit of a SPoF in the master-MCP but it's a lot smaller than the current setup, in which if you lose the MCP you basically lose everything. 2. Portability Portability to where ? Other DBMS's ? Other PG versions ? Other operating systems mainly. The trouble is we never got around to porting the MCP to any OS beyond Linux; I think it should work on Solaris and BSDs, but surely not Windows. We want to just get rid of what I consider a (crappy) reimplementation of postmaster; instead we should just let postmaster do the job. Additionally we would get rid of the ugly way we import backend code into the MCP server. for me there was also two more problems: 3. separate replication log, which at least seems to be able to get out of sync with main DB. Why don't you just use a DB table, WAL-logged and all The whole replication log thing is a topic of dissent in the team ;-) I see. To work reliably, the replication log should work very similar to WAL, so why just not use a table + WAL, or if you want extra performance from storing it on a separate disk, then work on having multiple WAL's in backend ;) 4. Also, again from reading Replicator FAQ, it seems that there is a window of corruption/data loss when rotating the Replicators transaction log. I think that doing it with copy/truncate either needs locking the logfile (== bad performance, during copy/truncate) or is just a data-eating failure waiting to happen. Hmm, what Replicator FAQ? We used to have this copy/truncate problem, and we rearchitected the log to avoid this (we use a rotating setup now) it was in subsection mcp_server mysteriously dies http://www.commandprompt.com/products/mammothreplicator/tips , Master-MCP|Slave -Slave1 -Slave2 -Slave3 The process being, Master sends data to MCP|Slave, MCP|Slave writes it to disk (optionally restores it) Will this first send be sync or async ? Or have you planned to have it be configurable among several robustness vs. performance levels, similar to the planned integrated WAL-shipping. It is async, and we haven't talked about sync. if async, will it also use MVCC for keeping log on Master (l.ike Slony and pgQ do), just to be at least as reliable as postgreSQL core itself and not require a full resync at server crash. You mean WAL? We don't currently. So hopw do you cope with possible loss of sync on master crash ? Alvaro or Alexey can speak more technically about implementation than I can. Alvaro - I guess you already have discussed most of it, but basically you need to solve all the same problems that WAL-shipping based Hot Standby is solving and Slony/pgQ/Londiste have solved. If you mean that we're duplicating the effort that's already going elsewhere, my opinion is yes, we are. duplicating the effort is not always a bad thing. I was mostly suggesting to watch discussions and dig around in materials and/or asking people who have been working on these same issues. And of course to _think_ deeply about design before writing lots of duplicate code which ends up being an often inferior implementation of something that already exists, ( see: http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspx ) ;-) Hopefully you get it more robust than Slony when making changes under high load :) Hmm, I don't know about lack of robustness in Slony, so I don't know. Slony is brittle once you start using it under high load and tends to display all kinds of frustrating qualities 1) it has not enough controls put in for conf changes to guarantee either success or clean rollback, do if something goes wrong (like some conf change has not propagated to all nodes, in right order, you end up with no working replication. 2) you usually can't test for 1) on your test setup, as it happens only under really high loads, which most test setups don't provide. there are/were other warts (like forcing an index scan covering the whole table, or being unable to continue replication after some slonik downtime because postgreSQL would give query too complex errors on generated 700kb lobg query), some of which are fixed in 1.x, some are maybe fixed in 2.0. I was a heavy user (at Skype) at some point and have helped in fixing some. But in the end we could not figure out how to make it robust and extracted the good stuff for
[HACKERS] some problem with casting unknown to smallint
Hello I am not sure, it's probably bug. postgres=# create function t1(smallint) returns smallint as $$select $1$$ language sql; CREATE FUNCTION postgres=# create function t2(bigint) returns bigint as $$select $1$$ language sql; CREATE FUNCTION postgres=# select t1(10); ERROR: function t1(integer) does not exist LINE 1: select t1(10); ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. but postgres=# select t2(10); t2 10 (1 row) Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Feature Request - Table Definition query
It could be useful to have a command that returns the table definition (like pg_dump -st) from within the query interface. This could be particularly useful if one doesn't have access to or prefers not to manipulate results coming from a shell. If I have an API from which to query the database, it seems silly to have to spawn a shell and run a command to query the very same database. Yes. That functionality would be really appreciated by myself and a few pgsql-users I know. (I'm not exactly familiar with how pg_dump manipulates or extracts that data, so this could be a very, very stupid question. If so, please forgive me.) Me neither, but I think one of the problems is, that during upgrade you are supposed to use pg_dump from the new version on the existing database instance. The server could most probably only generate its own format (not the new) which might be a problem. Perhaps it could be implemented by stored function (autoinstalled and available in contrib or perhaps installed by a psql on first run) in a namespace like pg_dump_84 (i.e. tagged with server version). If you want to upgrade you could then install the new set of functions.. Psql could then be a thin wrapper (which calls the needed functions). /Svenne -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal of PITR performance improvement for 8.4.
On Tue, 2008-10-28 at 14:21 +0200, Heikki Linnakangas wrote: 1. You should avoid useless posix_fadvise() calls. In the naive implementation, where you simply call posix_fadvise() for every page referenced in every WAL record, you'll do 1-2 posix_fadvise() syscalls per WAL record, and that's a lot of overhead. We face the same design question as with Greg's patch to use posix_fadvise() to prefetch index and bitmap scans: what should the interface to the buffer manager look like? The simplest approach would be a new function call like AdviseBuffer(Relation, BlockNumber), that calls posix_fadvise() for the page if it's not in the buffer cache, but is a no-op otherwise. But that means more overhead, since for every page access, we need to find the page twice in the buffer cache; once for the AdviseBuffer() call, and 2nd time for the actual ReadBuffer(). That's a much smaller overhead than waiting for an I/O. The CPU overhead isn't really a problem if we're I/O bound. It would be more efficient to pin the buffer in the AdviseBuffer() call already, but that requires much more changes to the callers. That would be hard to cleanup safely, plus we'd have difficulty with timing: is there enough buffer space to allow all the prefetched blocks live in cache at once? If not, this approach would cause problems. 2. The format of each WAL record is different, so you need a readahead handler for every resource manager, for every record type. It would be a lot simpler if there was a standardized way to store that information in the WAL records. I would prefer a new rmgr API call that returns a list of blocks. That's better than trying to make everything fit one pattern. If the call doesn't exist then that rmgr won't get prefetch. 3. IIRC I tried to handle just a few most important WAL records at first, but it turned out that you really need to handle all WAL records (that are used at all) before you see any benefit. Otherwise, every time you hit a WAL record that you haven't done posix_fadvise() on, the recovery stalls, and you don't need much of those to diminish the gains. Not sure how these apply to your approach, it's very different. You seem to handle 1. by collecting all the page references for the WAL file, and sorting and removing the duplicates. I wonder how much CPU time is spent on that? Removing duplicates seems like it will save CPU. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1155)
I've updated my patches, these are ready for CommitFest:Nov. [1/6] http://sepgsql.googlecode.com/files/sepostgresql-sepgsql-8.4devel-3-r1155.patch [2/6] http://sepgsql.googlecode.com/files/sepostgresql-pg_dump-8.4devel-3-r1155.patch [3/6] http://sepgsql.googlecode.com/files/sepostgresql-policy-8.4devel-3-r1155.patch [4/6] http://sepgsql.googlecode.com/files/sepostgresql-docs-8.4devel-3-r1155.patch [5/6] http://sepgsql.googlecode.com/files/sepostgresql-tests-8.4devel-3-r1155.patch [6/6] http://sepgsql.googlecode.com/files/sepostgresql-row_acl-8.4devel-3-r1155.patch The comprehensive documentation for SE-PostgreSQL is here: http://wiki.postgresql.org/wiki/SEPostgreSQL (it is now under reworking.) List of updates: - Patches are rebased to the latest CVS HEAD. - bugfix: TRUNCATE checks assumed SECCLASS_DB_TUPLE object class - bugfix: sepgsqlCopyFile assumed SECCLASS_FILE object class, but it has to be adjusted by st_mode. Request for Comments: - The 4th patch is actually needed? It can be replaced by wiki page. - Do you think anything remained towards the final CommitFest? - Do you have any reviewing comment? Most of patches are unchanged from the previous vesion. If you can comment anything, I can fix them without waiting for the final commit fest. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal of PITR performance improvement for 8.4.
On Wed, 2008-10-29 at 09:55 +0900, Koichi Suzuki wrote: I'd like to hear some more about these. I'm more than happy to write all the code inside PG core to avoid overhead to create another process. Having an external program can help earlier releases also, so I think this is the right approach for now. In next PG release we should bring this into core, along with streaming. Interface would be better if it accepted pg_readahead filename or pg_readahead filename start-lsn We don't always need a starting lsn. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL + Replicator developer meeting 10/28
On Tue, 2008-10-28 at 22:16 -0700, Joshua D. Drake wrote: Alvaro Herrera wrote: Hannu Krosing wrote: On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote: Will there be an helper application for setting up and configuring changes in replication. or will it all be done using added SQL commands ? Well, the interface I work on is all SQL commands :-) Case in point. To replicate a table currently you do this: ALTER TABLE foo ENABLE REPLICATION; ALTER TABLE foo ENABLE REPLICATION ON SLAVE 0; How do you define SLAVE 0 ? That is, do you use something similar to SQL/MED , discussed currently on other thread on pgsql-hackers, or just tables, or external conf files, or ... ? Hmm I am not sure. We are pretty deep into the core and only use triggers for GRANT/REVOKE/CREATE ROLE . By the way, why did you choose pretty deep into the core approach instead of triggers ? Speed maybe? I don't know. Well actually that was my decision years ago and it was made for two reasons: 1. Speed Is this some MySQL kind of speed, achieved by compromising on ACID ? 2. Lots of people think trigger replication is a hack. (not interested in arguing just stating an observation). And making the illogical conclusion that non-trigger replication is not a hack ? Probably the same people who think that expensive and closed source automatically means high quality :) Actually I see trigger based replication as a power tool. You can make it (almost) as fast as any source level hack, even if the latter compromises on on ACID, plus you can be much more flexible on what/when/how you replicate. One of the core requirements of the original replicator which can still be seen today AND is a requirement of 1.9 as well is: Though shall not cause grief to thy master even if thy replicates many children. IOW, one slave and many slaves cause similar load on master. And you do it by initially replicating to a single distributor slave ? That is where the original idea of the MCP came from. They theory was, we could have 50 slaves and the master wouldn't care. A good goal. But why would anybody _need_ 50 slaves ? Perhaps some application with small data amount (meaning it fits on one server) and massively (100:1) skewed towards reading, like dynamically generated mostly-static-content web pages ? I mean, you probably end up duplicating (or missing) lots of postgreSQL-s internal goodness instead of just using what is already available ? Well yes and no. We have made more and more use of the internal postgresql code through 1.7, 1.8. I expect that trend will continue. A hairy wart would be the use of own log, but hey its BSD if someone can figure out how to make it work with WAL and not lose the feature set we have Why not just make a new kind of wal-logged heap, which stores your version of trx log ? I would not vote against it and would happily except the patch (assuming technical sign off by Alvaro and Alexey). I hope something useful will come out of this too, though I hoped that it already had some advantages over trigger-based replication, like Well it does. We can replicate large objects for example I guess we get more benefit to general public in long run by making it possible to have triggers on large objects (and system objects or DLL statements). They could be some kind of restricted or system triggers, at least for DDL, but exposing the mechanisms to power users without them having to start patching backend code would be a good thing. and our user experience is far more friendly than anything else. Do you mean that you have a friendly UI for beginners ? Or is it proved to be robust in continued use, with changes going smoothly in and automatic recovery from network outages and random reboots on live systems in 24/7 environment ? ability to replicate DDL . That is what 1.9 is all about. Remember that the hope (no laughing now) is that 1.9 will hit for 8.3 and 8.4 around the time 8.4 releases. So this isn't a year away. I fear that our approach to replication is so ad-hoc that there's not much to be gained from elsewhere. Replicator is pretty much a fork that's not likely to yield anything useful to upstream. That is probably true. If/when you get DDL replication done (and tested, and robust, ... :) ), it at least solves the following two problems for others as well 1) where to get at DDL statements at right time 2) how to put them in replication stream in right place/ right order Having these solved is a big thing in itself, even if no actual code reuse is possible. It may be easier to get nr 2) right in a more flexible trigger-based environment, so I try to talk Marko into looking at it for pgQ as well ;) -- -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent
Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores
Le mardi 28 octobre 2008, Pavel Stehule a écrit : 2008/10/28 Dimitri Fontaine [EMAIL PROTECTED]: Hi, In the python language, functions that lazily return collections are called generators and use the yield keyword instead of return. http://www.python.org/doc/2.5.2/tut/node11.html#SECTION001110 0 Maybe having such a concept in PostgreSQL would allow the user to choose between current behavior (materializing) and lazy computing, with a new internal API to get done in the executor maybe. lazy computing is good idea, but I am afraid so it should be really wery hard implemented. You should to store somewhere current state, stop execution, return back from node, and you should be able restore PL state and continue in process. I can't to see it without thread support. I'm not sure to understand what is the current situation then. By reading this Tom's commit message Extend ExecMakeFunctionResult() to support set-returning functions that return via a tuplestore instead of value-per-call ... For the moment, SQL functions still do things the old way. http://git.postgresql.org/?p=postgresql.git;a=commit;h=6d5301be5ece6394433d73288e0fafaed6326485 I had the impression we already have a lazy implementation, this value-per-call returning code path, which still exists for SQL functions. CREATE FUNCTION my_generator_example(integer, integer) returns setof integer generator language SQL $f$ SELECT generate_series($1, $2); $f$; So my idea would be to have the SQL function behavior choose to return values either via tuplestore or via value-per-call, depending on the user setting generator or lazy. Done this way, the user could also choose for the function to be lazy or to use a tuplestore whatever the language in which it's written. Current behaviour would then mean the default depends on the language, lazy for SQL and tuplestore for PL/pgSQL. Well, it will have to be documented, whatever the final choice is. Is it possible? A good idea? -- dim signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Updating FSM on recovery
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Tom Lane wrote: As far as the ugliness in XLogRecordPageWithFreeSpace goes: couldn't you just call XLogReadBufferWithFork with init = true, and then initialize the page if PageIsNew? With init=true, we don't even try to read the page from the disk (since 8.3), so all FSM pages accessed during recovery would be zeroed out. I don't think that's what you intended. Ah, right. Maybe the API change you suggested in the comment is the way to go. Done, patch attached. But while I was hacking that, I realized another problem: Because changes to FSM pages are not WAL-logged, they can be torn if at crash, one part of the page is flushed to disk, but another is not. The FSM code will recover from internally inconsistent pages, caused by torn pages or other errors, but we still have a problem if the FSM file is extended, and the new page is torn. It can happen that the first part of the page, containing the page header, doesn't make it to disk, but other parts of the page do. ReadBuffer() checks that the page header is valid, so it will throw an error on a torn page like that. ReadBuffer() doesn't complain about a page that is all-zeros, but it's not in this scenario. The FSM would be perfectly happy to just initialize torn or otherwise damaged pages, so I think we should add yet another mode to ReadBuffer() to allow that. We could also treat read() errors as merely warnings in that mode, effectively the same as with zero_damaged_pages=on. The ReadBuffer() interface is already pretty complex, with all the different variants. We should probably keep the good old ReadBuffer() the same, for the sake of simplicity in the callers, but try to reduce the number of other variatns. The current API is this: Buffer ReadBuffer(Relation reln, BlockNumber blockNum); Buffer ReadBufferWithFork(Relation reln, ForkNumber forkNum, BlockNumber blockNum); Buffer ReadBufferWithStrategy(Relation reln, BlockNumber blockNum, BufferAccessStrategy strategy); Buffer ReadOrZeroBuffer(Relation reln, ForkNumber forkNum, BlockNumber blockNum); Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp, ForkNumber forkNum, BlockNumber blockNum, bool zeroPage); Here's my proposal for new API: typedef enum { RBM_NORMAL, /* checks header, ereport(ERROR) on errors */ RBM_INIT, /* just allocate a buffer, don't read from disk. Caller must initialize the page */ RBM_INIT_ON_ERROR /* read, but instead of ERRORing, return an all-zeros page */ } ReadBufferMode; Buffer ReadBuffer(Relation reln, BlockNumber blockNum); Buffer ReadBufferExt(Relation reln, ForkNumber forkNum, BlockNumber blockNum, BufferAccessStrategy strategy, ReadBufferMode mode); Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp, ForkNumber forkNum, BlockNumber blockNum, ReadBufferMode mode); Thoughts? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com *** src/backend/access/heap/heapam.c --- src/backend/access/heap/heapam.c *** *** 54,59 --- 54,60 #include miscadmin.h #include pgstat.h #include storage/bufmgr.h + #include storage/freespace.h #include storage/lmgr.h #include storage/procarray.h #include storage/smgr.h *** *** 4029,4034 heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record, bool clean_move) --- 4030,4036 int nredirected; int ndead; int nunused; + Size freespace; if (record-xl_info XLR_BKP_BLOCK_1) return; *** *** 4060,4065 heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record, bool clean_move) --- 4062,4069 nowunused, nunused, clean_move); + freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */ + /* * Note: we don't worry about updating the page's prunability hints. * At worst this will cause an extra prune cycle to occur soon. *** *** 4069,4074 heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record, bool clean_move) --- 4073,4087 PageSetTLI(page, ThisTimeLineID); MarkBufferDirty(buffer); UnlockReleaseBuffer(buffer); + + /* + * Update the FSM as well. + * + * XXX: We don't get here if the page was restored from full page image. + * We don't bother to update the FSM in that case, it doesn't need to be + * totally accurate anyway. + */ + XLogRecordPageWithFreeSpace(xlrec-node, xlrec-block, freespace); } static void *** *** 4212,4226 heap_xlog_insert(XLogRecPtr lsn, XLogRecord *record) HeapTupleHeader htup; xl_heap_header xlhdr; uint32 newlen; if (record-xl_info XLR_BKP_BLOCK_1) return; if (record-xl_info XLOG_HEAP_INIT_PAGE) { ! buffer = XLogReadBuffer(xlrec-target.node, ! ItemPointerGetBlockNumber((xlrec-target.tid)), ! true); Assert(BufferIsValid(buffer)); page = (Page) BufferGetPage(buffer); --- 4225,4241 HeapTupleHeader htup;
Re: [HACKERS] some problem with casting unknown to smallint
2008/10/29 Tom Lane [EMAIL PROTECTED]: Pavel Stehule [EMAIL PROTECTED] writes: I am not sure, it's probably bug. postgres=# create function t1(smallint) returns smallint as $$select $1$$ language sql; CREATE FUNCTION postgres=# select t1(10); ERROR: function t1(integer) does not exist That's not unknown to smallint, that's integer to smallint, which is not an implicit cast. unknown to smallint would be t1('10') I understand. So every smallint should be call with casting? regards Pavel Stehule regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] some problem with casting unknown to smallint
Pavel Stehule [EMAIL PROTECTED] writes: I am not sure, it's probably bug. postgres=# create function t1(smallint) returns smallint as $$select $1$$ language sql; CREATE FUNCTION postgres=# select t1(10); ERROR: function t1(integer) does not exist That's not unknown to smallint, that's integer to smallint, which is not an implicit cast. unknown to smallint would be t1('10') regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updating FSM on recovery
Heikki Linnakangas [EMAIL PROTECTED] writes: The FSM would be perfectly happy to just initialize torn or otherwise damaged pages, so I think we should add yet another mode to ReadBuffer() to allow that. We could also treat read() errors as merely warnings in that mode, effectively the same as with zero_damaged_pages=on. The ReadBuffer() interface is already pretty complex, with all the different variants. We should probably keep the good old ReadBuffer() the same, for the sake of simplicity in the callers, but try to reduce the number of other variatns. Indeed. Did you see the discussion about the similarly-too-complex heap_insert API a couple days ago in connection with bulk-write scenarios? The conclusion there was to try to shift stuff into a bitmask options argument, in hopes that future additions might not require touching every caller. Can we do it similarly here? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] some problem with casting unknown to smallint
Pavel Stehule [EMAIL PROTECTED] writes: I understand. So every smallint should be call with casting? A long time ago we tried to make small integer literals be interpreted as int2 initially, instead of int4, and the attempt failed rather spectacularly. (It broke most of the regression tests, IIRC, in ways that suggested that many client applications would have problems too.) Perhaps PG's type system has matured to the point where it'd work better now, but I'm not really interested in trying it. I don't see very much point in declaring functions to take smallint rather than int anyway... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] some problem with casting unknown to smallint
2008/10/29 Tom Lane [EMAIL PROTECTED]: Pavel Stehule [EMAIL PROTECTED] writes: I understand. So every smallint should be call with casting? A long time ago we tried to make small integer literals be interpreted as int2 initially, instead of int4, and the attempt failed rather spectacularly. (It broke most of the regression tests, IIRC, in ways that suggested that many client applications would have problems too.) Perhaps PG's type system has matured to the point where it'd work better now, but I'm not really interested in trying it. I don't see very much point in declaring functions to take smallint rather than int anyway... I found this question on one czech it specialized site. It's mostly beginner's problem. regards Pavel Stehule regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores
So my idea would be to have the SQL function behavior choose to return values either via tuplestore or via value-per-call, depending on the user setting generator or lazy. Done this way, the user could also choose for the function to be lazy or to use a tuplestore whatever the language in which it's written. The problem is not the general PostgreSQL executor, but whatever body of code executes PL/pgsql functions (and other PL languages). It does not, as I understand it, support freezing execution of the function midway through and picking up again later. I haven't looked at the code, but based on previous experience, that could turn out to be a pretty major refactoring. I suspect it would be worthwhile and quite welcome to many users - but I doubt very much that it would be easy. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updating FSM on recovery
Buffer ReadBuffer(Relation reln, BlockNumber blockNum); Buffer ReadBufferExt(Relation reln, ForkNumber forkNum, BlockNumber blockNum, BufferAccessStrategy strategy, ReadBufferMode mode); Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp, ForkNumber forkNum, BlockNumber blockNum, ReadBufferMode mode); Thoughts? I'm not sure why we would abbreviate Extended to Ext when nothing else in here is abbreviated. Seems needlessly inconsistent. We may also want to rethink our approach to BufferAccessStrategy a bit. Right now, we don't admit that GetBufferAccessStrategy(BAS_NORMAL) just returns a NULL pointer - we expect the caller to get that strategy and later call FreeBufferAccessStrategy it just as if it were a real object. Particularly in light of this API change, I think we should just give up on that. Otherwise, a caller that wants to specify a fork number or ReadBufferMode has to get and free an access strategy that doesn't amount to anything. Perhaps it would be sufficient to do this: #define NormalBufferAccessStrategy NULL That way, it would be easy to grep for any place where we used this to get around a useless pair of get/free calls if we ever need to go back and make a normal buffer access strategy into a real object. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL + Replicator developer meeting 10/28
On Wed, Oct 29, 2008 at 12:02:20PM +0200, Hannu Krosing wrote: A good goal. But why would anybody _need_ 50 slaves ? They might have a contractual responsibility for extremely wide geographic distribution. Or they might be building an application that needs extremely wide network-topological distribution to avoid large loads on any one network. For instance, I can imagine building a network of nameservers in which you peered the nameservers, colocated in every ISP you could think of. If you were backing the nameserver with Postgres, this would work. To be clear, this is _not_ the case with any product I've ever built, but it is a design I have seen deployed. That design was supposed to be on top of Oracle. There were well over 50 slaves. I don't really believe they had that many Oracle-using slaves, though. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores
Dimitri Fontaine [EMAIL PROTECTED] writes: Done this way, the user could also choose for the function to be lazy or to use a tuplestore whatever the language in which it's written. The odds of this ever happening for any of the PLs are not distinguishable from zero. It might be nice to have, but the amount of work involved would be incredibly out of proportion to the benefits --- even assuming that it's technically possible at all, which I rather doubt for the PLs that depend on language interpreters that aren't under our control. So the fact that it's possible for SQL-language functions is an idiosyncrasy of that language, not something we should cram into the general CREATE FUNCTION syntax in the vain hope that having syntax might cause an implementation to appear someday. Therefore, if we were going to expose a knob to the user to control this behavior, I'd be inclined to make it a part of the language-specific syntax of SQL function bodies. We could take a hint from the (underdocumented) #option syntax in plpgsql, something like CREATE FUNCTION foo(...) RETURNS SETOF something AS $$ #option lazy SELECT ... $$ LANGUAGE SQL; Mind you, I'm not exactly *for* this, because I think it will result in making functions.c a whole lot more complex and hard to maintain than it needs to be, in exchange for a behavior that I don't believe is especially useful in most cases, and can easily be worked around when it is useful. But if people are going to be sticky about the point, something like this might be a workable compromise. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores
Le mercredi 29 octobre 2008, Tom Lane a écrit : So the fact that it's possible for SQL-language functions is an idiosyncrasy of that language, not something we should cram into the general CREATE FUNCTION syntax in the vain hope that having syntax might cause an implementation to appear someday. Ok, that confirms that lazy evaluation and call-per-value are distinct things, for once, and that what you where after was not an easy syntax bit. :) Therefore, if we were going to expose a knob to the user to control this behavior, I'd be inclined to make it a part of the language-specific syntax of SQL function bodies. How would we support the option for SQL functions? Mind you, I'm not exactly *for* this, because I think it will result in making functions.c a whole lot more complex and hard to maintain than it needs to be, in exchange for a behavior that I don't believe is especially useful in most cases, and can easily be worked around when it is useful. From what I understand, the lazy evaluation of functions is not seen as easy to be worked around by people asking for it. But if people are going to be sticky about the point, something like this might be a workable compromise. What's against PLpgSQL implementing a YIELD statement? Wouldn't it be simpler to integrate for both hackers and users? This would maybe even allow to have a new API in the executor for this, and each PL would be free to add support for it when best suits them. Maybe that's exactly what you're calling a whole lot more complex and hard to maintain than it needs to be, though. Regards, -- dim signature.asc Description: This is a digitally signed message part.
[HACKERS] WIP parallel restore patch
Attached is my latest parallel restore patch. I think it's functionally complete for Unix. Many bugs have been fixed since the last patch, and the hardcoded limitation to two table dependencies is removed. It seems fairly robust in my recent testing. Remaining to be done: . code cleanup . better error checking in a few places . final decision re command line option names/defaults . documentation . Windows support. cheers andrew parallel_restore_10.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores
Dimitri Fontaine [EMAIL PROTECTED] writes: Le mercredi 29 octobre 2008, Tom Lane a écrit : So the fact that it's possible for SQL-language functions is an idiosyncrasy of that language, not something we should cram into the general CREATE FUNCTION syntax in the vain hope that having syntax might cause an implementation to appear someday. Ok, that confirms that lazy evaluation and call-per-value are distinct things, for once, and that what you where after was not an easy syntax bit. :) Well, call-per-value is *necessary* for lazy evaluation, but it's not *sufficient*. You need a function implementation that can suspend and resume execution, and that's difficult in general. Therefore, if we were going to expose a knob to the user to control this behavior, I'd be inclined to make it a part of the language-specific syntax of SQL function bodies. How would we support the option for SQL functions? Well, we'd just tweak how the executor gets called inside functions.c. The main problem is that we'd have to have two different sets of behavior there, depending on whether we are trying to evaluate commands a row at a time or all at once, plus interlocks to disallow cases like using LAZY with a RETURNING query. It's certainly possible but I believe it will make functions.c a lot longer and uglier than it would be without it. Mind you, I'm not exactly *for* this, because I think it will result in making functions.c a whole lot more complex and hard to maintain than it needs to be, in exchange for a behavior that I don't believe is especially useful in most cases, and can easily be worked around when it is useful. From what I understand, the lazy evaluation of functions is not seen as easy to be worked around by people asking for it. Nobody has refuted the argument that sticking a LIMIT into the function would accomplish the same result. What's against PLpgSQL implementing a YIELD statement? Feel free to try it, if you want. When you get done you might have some grasp of why it'll be nearly impossible for PLs that we don't control the entire implementation of. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores
Le mercredi 29 octobre 2008, Tom Lane a écrit : Well, call-per-value is *necessary* for lazy evaluation, but it's not *sufficient*. You need a function implementation that can suspend and resume execution, and that's difficult in general. Ok, I think I begin to understand how things are tied together. Thanks again for your patience explaining :) Well, we'd just tweak how the executor gets called inside functions.c. The main problem is that we'd have to have two different sets of behavior there, depending on whether we are trying to evaluate commands a row at a time or all at once, plus interlocks to disallow cases like using LAZY with a RETURNING query. It's certainly possible but I believe it will make functions.c a lot longer and uglier than it would be without it. And I fail to see how the user would control which behavior will get chosen, which I think was part of the going further with your ideas sub thread. Nobody has refuted the argument that sticking a LIMIT into the function would accomplish the same result. Fair enough. What's against PLpgSQL implementing a YIELD statement? Feel free to try it, if you want. Hehe, not this year. But being able to ask questions and get clarifications from hackers certainly is a step in this direction. Feeling ready and organizing one's time around it is the next :) When you get done you might have some grasp of why it'll be nearly impossible for PLs that we don't control the entire implementation of. Hence the YIELD / new API idea, with the LAZY property which would be optional for PLs and only implemented in plpgsql (and maybe plpython, as python supports the generator functions concept) first. Maybe having optional features for PLs has not yet been done? But again, I was started in this only by misunderstanding your call here: I think the PL side of the problem is the hard part --- if we knew how to solve these issues for plpgsql then SQL functions would surely be easy. I'm not being sticky on the feature request, just struggling to understand correctly the issues at hand, recognizing that easy choice of EAGER or LAZY function evaluation would be great as a user, even if unsupported in a number of PLs. Regards, -- dim signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores
Dimitri Fontaine [EMAIL PROTECTED] writes: And I fail to see how the user would control which behavior will get chosen, Oh, I'm sorry, I didn't realize you misunderstood my syntax example. I was suggesting that the SQL function manager recognize some optional non-SQL keywords at the start of a SQL function body, along the lines of CREATE FUNCTION foo(...) RETURNS SETOF something AS $$ #option eager SELECT ... $$ LANGUAGE SQL; versus CREATE FUNCTION foo(...) RETURNS SETOF something AS $$ #option lazy SELECT ... $$ LANGUAGE SQL; (I'm not wedded to this particular spelling of it, but there is precedent in plpgsql.) Now of course the bigger problem with either this syntax or yours is that attaching such a property to a function is arguably the Wrong Thing in the first place. Which one is the best way is likely to depend on the calling query more than it does on the function. However, I see no solution to that problem except function inlining; and if the function gets inlined then all this discussion is moot anyhow. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL + Replicator developer meeting 10/28
On Wed, 2008-10-29 at 12:02 +0200, Hannu Krosing wrote: On Tue, 2008-10-28 at 22:16 -0700, Joshua D. Drake wrote: Alvaro Herrera wrote: Case in point. To replicate a table currently you do this: ALTER TABLE foo ENABLE REPLICATION; ALTER TABLE foo ENABLE REPLICATION ON SLAVE 0; How do you define SLAVE 0 ? SLAVE 0 is defined by a GUC on the slave, replication_slave_no I think. ( I would have to check the docs) Well actually that was my decision years ago and it was made for two reasons: 1. Speed Is this some MySQL kind of speed, achieved by compromising on ACID ? No this is the kind that realizes that throwing triggers on 300 relations is a slow way to go about replication. Actually I see trigger based replication as a power tool. You can make it (almost) as fast as any source level hack, even if the latter compromises on on ACID, plus you can be much more flexible on what/when/how you replicate. Like I said, really not interested in the trigger versus not discussion. One of the core requirements of the original replicator which can still be seen today AND is a requirement of 1.9 as well is: Though shall not cause grief to thy master even if thy replicates many children. IOW, one slave and many slaves cause similar load on master. One slave doesn't effect the load on the master any more than fifty. And you do it by initially replicating to a single distributor slave ? Essentially. That is where the original idea of the MCP came from. They theory was, we could have 50 slaves and the master wouldn't care. A good goal. But why would anybody _need_ 50 slaves ? Perhaps some application with small data amount (meaning it fits on one server) and massively (100:1) skewed towards reading, like dynamically generated mostly-static-content web pages ? Well it was just a number. It could be any number but yes your example would fit. Why not just make a new kind of wal-logged heap, which stores your version of trx log ? That would be a question for Alvaro or Alexey not I :) and our user experience is far more friendly than anything else. Do you mean that you have a friendly UI for beginners ? Not just beginners. People don't like complicated software. Generally speaking, if you go to a DBA of other systems, someone with 10 years experience and you show them Slony and they look at you like your nuts. It doesn't matter that Slony works great. It doesn't matter that it is proven in the field. Or is it proved to be robust in continued use, with changes going smoothly in and automatic recovery from network outages and random reboots on live systems in 24/7 environment ? Well we have had plenty of people use it although I admit over the last year we have been weening them off in preparation for 1.8. 1.8 is a big step up from previous releases and yes it has some issues all software does but we are working on them. Sincerely, Joshua D. Drake -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL + Replicator developer meeting 10/28
On Wed, 2008-10-29 at 09:54 +0200, Hannu Krosing wrote: On Tue, 2008-10-28 at 22:37 -0300, Alvaro Herrera wrote: Hannu Krosing wrote: On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote: it was in subsection mcp_server mysteriously dies http://www.commandprompt.com/products/mammothreplicator/tips , Ehh, need to remove that. That is very old. Joshua D. Drake -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pre-MED
Folks, Please find enclosed a WIP patch to add the ability for functions to see the qualifiers of the query in which they're called. It's not working just yet, and I'm not sure how best to get it working, but I'd like to see this as part of 8.4, as SQL/MED is just way too ambitious given the time frame. Any tips, hints, pointers, etc. would be much appreciated. Also, PL/Perl shouldn't be the only language to have this capability. How might we add similar capabilities to PL/PythonU and PL/Tcl? To the rest of the PLs? Would it make any sense to have it in SQL language functions? Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c index 7a9ddcd..c5b35be 100644 --- a/contrib/dblink/dblink.c +++ b/contrib/dblink/dblink.c @@ -728,6 +728,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async, bool do_get) char *conname = NULL; remoteConn *rconn = NULL; boolfail = true;/* default to backward compatible */ + ReturnSetInfo *rsi; /* set up for qual-pushing */ /* create a function context for cross-call persistence */ funcctx = SRF_FIRSTCALL_INIT(); @@ -802,6 +803,17 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async, bool do_get) elog(ERROR, wrong number of arguments); } + if (sql rsi-qual) /* add qualifiers if available. */ + { + char *quals = rsinfo_get_qual_str(rsi); + char *qualifiedQuery = palloc(strlen(sql) + strlen( WHERE ) + + strlen(quals) + 1); + + sprintf(qualifiedQuery, %s WHERE %s, sql, quals); + + sql = qualifiedQuery; + } + if (!conn) DBLINK_CONN_NOT_AVAIL; diff --git a/doc/src/sgml/plperl.sgml b/doc/src/sgml/plperl.sgml index 2f2e53b..f0c6587 100644 --- a/doc/src/sgml/plperl.sgml +++ b/doc/src/sgml/plperl.sgml @@ -873,6 +873,41 @@ CREATE TRIGGER test_valid_id_trig /para /sect1 + sect1 id=plperl-qualifiers + titlePL/Perl Qualifiers/title + + para + PL/Perl exposes qualifiers in the current query. In a function, + the hash reference varname$_QUAL/varname contains information + about the currently executing query. varname$_QUAL/varname is + a global variable, which gets a separate local value for each query. + The fields (currently just one) of varname$_QUAL/varname are: + +variablelist + varlistentry + termliteral$_TD-gt;{qual_string}/literal/term + listitem + para +A string containing all the qualifiers for the current query. + /para + /listitem + /varlistentry +/variablelist + /para + para + Here is an example of a function using varname$_QUAL/varname. +programlisting +CREATE OR REPLACE FUNCTION show_quals() +RETURNS TEXT +LANGUAGE plperl +AS $$ +return $_QUAL-{qual_string}; +$$; +/programlisting + /para + + /sect1 + sect1 id=plperl-missing titleLimitations and Missing Features/title diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c index 4c4742d..d6f7ef8 100644 --- a/src/backend/executor/execQual.c +++ b/src/backend/executor/execQual.c @@ -45,6 +45,7 @@ #include miscadmin.h #include nodes/makefuncs.h #include nodes/nodeFuncs.h +#include optimizer/clauses.h #include optimizer/planmain.h #include pgstat.h #include utils/acl.h @@ -1693,6 +1694,27 @@ ExecMakeFunctionResultNoSets(FuncExprState *fcache, return result; } +/* + * + * rsinfo_get_qual_str + * + * Get either an empty string or a batch of qualifiers. + * + */ +char * +rsinfo_get_qual_str(ReturnSetInfo *rsinfo) +{ + Node*qual; + List*context; + + if (rsinfo-qual == NIL) + return pstrdup(); + + qual = (Node *) make_ands_explicit(rsinfo-qual); + context = deparse_context_for_plan(NULL, NULL, rsinfo-rtable, NULL); + + return deparse_expression(qual, context, false, false); +} /* * ExecMakeTableFunctionResult @@ -1703,6 +1725,7 @@ ExecMakeFunctionResultNoSets(FuncExprState *fcache, Tuplestorestate * ExecMakeTableFunctionResult(ExprState *funcexpr, ExprContext *econtext, + List *qual, List *rtable, TupleDesc expectedDesc, bool randomAccess) { @@ -1736,6 +1759,8 @@
Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores
Le mercredi 29 octobre 2008, Tom Lane a écrit : Now of course the bigger problem with either this syntax or yours is that attaching such a property to a function is arguably the Wrong Thing in the first place. Which one is the best way is likely to depend on the calling query more than it does on the function. Let the planner figure this out, and add in some starting cost considerations too maybe? That sounds even better, yes. However, I see no solution to that problem except function inlining; and if the function gets inlined then all this discussion is moot anyhow. How to inline PLs functions? -- dim signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] pre-MED
On Wed, Oct 29, 2008 at 12:40 PM, David Fetter [EMAIL PROTECTED] wrote: Please find enclosed a WIP patch to add the ability for functions to see the qualifiers of the query in which they're called. It's not working just yet, and I'm not sure how best to get it working, but I'd like to see this as part of 8.4, as SQL/MED is just way too ambitious given the time frame. To be more specific, SQL/MED is going to be 8.5. This is an overall improvement for accessing the predicate. -- Jonah H. Harris, Senior DBA myYearbook.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL + Replicator developer meeting 10/28
On Wed, 2008-10-29 at 09:01 -0700, Joshua D. Drake wrote: On Wed, 2008-10-29 at 12:02 +0200, Hannu Krosing wrote: On Tue, 2008-10-28 at 22:16 -0700, Joshua D. Drake wrote: Alvaro Herrera wrote: Case in point. To replicate a table currently you do this: ALTER TABLE foo ENABLE REPLICATION; ALTER TABLE foo ENABLE REPLICATION ON SLAVE 0; How do you define SLAVE 0 ? SLAVE 0 is defined by a GUC on the slave, replication_slave_no I think. ( I would have to check the docs) And the only other GUC slave needs for replication is MCP/Slave connect info ? Well actually that was my decision years ago and it was made for two reasons: 1. Speed Is this some MySQL kind of speed, achieved by compromising on ACID ? No this is the kind that realizes that throwing triggers on 300 relations is a slow way to go about replication. It may be slow-ish to set up initially, but in your (old) tips wikipage you say that replicating more than 1000 relations would be prohibitively slow. I don't think this would be true for trigger-based replication. And FK checks implemented using with triggers, and I have not heard much complaining about it being a hack, or unsuitable upwards of some N-hundred tables. Actually I see trigger based replication as a power tool. You can make it (almost) as fast as any source level hack, even if the latter compromises on on ACID, plus you can be much more flexible on what/when/how you replicate. Like I said, really not interested in the trigger versus not discussion. Why not ? Because you are actually doing it with triggers, just not standard postgreSQL table-level triggers, but your own ones patched into backend ? When doing it without triggers (or to be more exact, with triggers hardwired into backend) you still have to solve most of the same problems that trigger-based replication does, just with less visibility and less help from existing infrastructure. One more question about triggers - what happens to triggers, foreign keys and other constraints on slave ? One of the core requirements of the original replicator which can still be seen today AND is a requirement of 1.9 as well is: Though shall not cause grief to thy master even if thy replicates many children. IOW, one slave and many slaves cause similar load on master. One slave doesn't effect the load on the master any more than fifty. But have you measured, what is impact of replication when going from 0 to 1 slaves ? ... and our user experience is far more friendly than anything else. Do you mean that you have a friendly UI for beginners ? Not just beginners. People don't like complicated software. Generally speaking, if you go to a DBA of other systems, someone with 10 years experience and you show them Slony and they look at you like your nuts. I wonder why. To me Slony's UI (slonik) seems quite simplistic, the only thing they have above what Replicator seems to have is grouping tables so that tables inside one group are always guaranteed to be in a consistent state on the slave. Otherways I can't see how -- create set (id=1, origin=1, comment=Master); set add table (set id=1, origin=1, id=1, fully qualified name = 'public.accounts'); --- store node (id=2, comment = 'Slave node'); store path (server = 1, client = 2, conninfo='dbname=$MASTERDBNAME host=$MASTERHOST user=$REPLICATIONUSER'); -- subscribe set ( id = 1, provider = 1, receiver = 2, -- can be made much simpler You can't replicate a table on Replicator by just one command: ALTER TABLE public.accounts ENABLE REPLICATION; ALTER TABLE public.accounts ENABLE REPLICATION ON SLAVE 2; You first have to set things up by telling master and slave about themselves (Thou Art the Master) and about each other. It doesn't matter that Slony works great. It doesn't matter that it is proven in the field. Actually slony works great for simple cases, and not-so-great for more complex topologies or higher loads. It is relatively easy to push Slony over by accident. And it was Slony's apparent _simplicity_ that fooled me into believing that it is a robust system, similar to postgreSQL proper. A belief, the consequences of which caused me a lot of grief and frustration for about two years Or is it proved to be robust in continued use, with changes going smoothly in and automatic recovery from network outages and random reboots on live systems in 24/7 environment ? Well we have had plenty of people use it although I admit over the last year we have been weening them off in preparation for 1.8. 1.8 is a big step up from previous releases and yes it has some issues all software does but we are working on them. Is Replicator always replicating from one single master to one or more slaves ? Or is it possible to set it up so, that some tables are replicated from one and some others from some other master. I don't
Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores
Dimitri Fontaine [EMAIL PROTECTED] writes: Le mercredi 29 octobre 2008, Tom Lane a écrit : However, I see no solution to that problem except function inlining; and if the function gets inlined then all this discussion is moot anyhow. How to inline PLs functions? All of this is pie-in-the-sky for PL functions, and I think properly so: the whole reason for supporting PLs is to enable doing things that SQL does poorly or not at all. So expecting SQL to interoperate very closely with them seems impossible, or at least unreasonably limiting. The real issue at hand is what to do with SQL-language functions. I'm currently going to have a look at just what it would take to support both lazy and eager evaluation in functions.c (independently of what syntax, if any, we settle on to expose the choice to the user). If it's either really awful or really easy we should know that before arguing further. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pre-MED
On Wed, 2008-10-29 at 09:40 -0700, David Fetter wrote: Folks, Please find enclosed a WIP patch to add the ability for functions to see the qualifiers of the query in which they're called. It's not working just yet, and I'm not sure how best to get it working, but I'd like to see this as part of 8.4, as SQL/MED is just way too ambitious given the time frame. Any tips, hints, pointers, etc. would be much appreciated. Also, PL/Perl shouldn't be the only language to have this capability. How might we add similar capabilities to PL/PythonU I'll look at adding this to pl/pythonu. I have to finish some stuff there before freeze anyway. -- Hannu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores
On Wed, 2008-10-29 at 11:58 -0400, Tom Lane wrote: Dimitri Fontaine [EMAIL PROTECTED] writes: And I fail to see how the user would control which behavior will get chosen, Oh, I'm sorry, I didn't realize you misunderstood my syntax example. I was suggesting that the SQL function manager recognize some optional non-SQL keywords at the start of a SQL function body, along the lines of CREATE FUNCTION foo(...) RETURNS SETOF something AS $$ #option eager SELECT ... $$ LANGUAGE SQL; versus CREATE FUNCTION foo(...) RETURNS SETOF something AS $$ #option lazy SELECT ... $$ LANGUAGE SQL; (I'm not wedded to this particular spelling of it, but there is precedent in plpgsql.) Now of course the bigger problem with either this syntax or yours is that attaching such a property to a function is arguably the Wrong Thing in the first place. Which one is the best way is likely to depend on the calling query more than it does on the function. However, I see no solution to that problem except function inlining; and if the function gets inlined then all this discussion is moot anyhow. I have some vague ideas about extending SET-returning functions to NODE-returning functions, which will have some extra methods (for OO languages like python) or extra functions (for C, somewhat similar to how AGGREGATE functions are defined) to interact with planner/optimiser, so that planner can ask the function instance things like can you do fast start or how many rows for theses args during planning and also can advise function about strategies once the plan is chosen. That would be something which could be very useful for SQL/MED implementation as well. - Hannu -- -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pre-MED
On Wed, 2008-10-29 at 19:17 +0200, Hannu Krosing wrote: On Wed, 2008-10-29 at 09:40 -0700, David Fetter wrote: Folks, Please find enclosed a WIP patch to add the ability for functions to see the qualifiers of the query in which they're called. It's not working just yet, and I'm not sure how best to get it working, but I'd like to see this as part of 8.4, as SQL/MED is just way too ambitious given the time frame. Any tips, hints, pointers, etc. would be much appreciated. Also, PL/Perl shouldn't be the only language to have this capability. How might we add similar capabilities to PL/PythonU I'll look at adding this to pl/pythonu. I have to finish some stuff there before freeze anyway. Have we tested plpython with version 3? -- Hannu -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL + Replicator developer meeting 10/28
On Wed, 2008-10-29 at 19:15 +0200, Hannu Krosing wrote: SLAVE 0 is defined by a GUC on the slave, replication_slave_no I think. ( I would have to check the docs) And the only other GUC slave needs for replication is MCP/Slave connect info ? https://projects.commandprompt.com/public/replicator/wiki/Documentation Well actually that was my decision years ago and it was made for two reasons: 1. Speed Is this some MySQL kind of speed, achieved by compromising on ACID ? No this is the kind that realizes that throwing triggers on 300 relations is a slow way to go about replication. It may be slow-ish to set up initially, but in your (old) tips wikipage you say that replicating more than 1000 relations would be prohibitively slow. Depends on the workload I would assume. Actually I see trigger based replication as a power tool. You can make it (almost) as fast as any source level hack, even if the latter compromises on on ACID, plus you can be much more flexible on what/when/how you replicate. Like I said, really not interested in the trigger versus not discussion. Why not ? Because you are actually doing it with triggers, just not standard postgreSQL table-level triggers, but your own ones patched into backend ? Because it isn't productive. I am happy to answer all questions (that I can) about usage, implementation etc... but hashing through decisions that were made 6 years ago isn't helpful, imo. One more question about triggers - what happens to triggers, foreign keys and other constraints on slave ? Alvaro? One slave doesn't effect the load on the master any more than fifty. But have you measured, what is impact of replication when going from 0 to 1 slaves ? Not anytime recently no. You can't replicate a table on Replicator by just one command: Actually you can. We provide stored procedures you can use if you like. ALTER TABLE public.accounts ENABLE REPLICATION; ALTER TABLE public.accounts ENABLE REPLICATION ON SLAVE 2; You first have to set things up by telling master and slave about themselves (Thou Art the Master) and about each other. Actually you have to tell the MCP but yes there is configuration to be had. It takes about 15 minutes from start to finish for a current master,mcp,two slave scenario. Is Replicator always replicating from one single master to one or more slaves ? Yes. Or is it possible to set it up so, that some tables are replicated from one and some others from some other master. We do not support cascading like that but we do support partial replication. E.g; ALTER TABLE foo ENABLE REPLICATION; ALTER TABLE foo ENABLE REPLICATION ON SLAVE 1; ALTER TABLE BAR ENABLE REPLICATION; ALTER TABLE BAR ENABLE REPLICATION ON SLAVE 2; I don't mean real multi-master, just that the single master is not the same for all tables - A good example would be a sales system where each office has its own sales table for inserting/updating data, but has a read-only fresh copy of all other offices data via replication . Yes this you could do. What you can't do is this: MASTER-[SLAVE0|MASTER]-SLAVE1 Sincerely, Joshua D. Drake -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
Hi Simon, He is going to do some investigation in the methods and write down the possibilities and then he is going to implement something from that for PostgreSQL. When will this work be complete? We are days away from completing main work on 8.4, so you won't get much discussion on this for a few months yet. Will it be complete in time for 8.5? Or much earlier even? The first guess is that the work will be done for 8.6. Dano is supposed to finish the work and defend his thesis in something a bit more than a year. Julius, you don't mention what your role is in this. In what sense is Dano's master's thesis a we thing? I am Dano's mentor and we have a closed contact with Zdenek as well. We would like the project to become a we thing as another reason why to work on the project. It seems to be better to research some ideas at the begging and discuss the stuff during development than just individually writing some piece of code which could be published afterwards. Especially, when this area seems to be of interest of more people. Cheers Julo
Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!
Hi Simon, He is going to do some investigation in the methods and write down the possibilities and then he is going to implement something from that for PostgreSQL. When will this work be complete? We are days away from completing main work on 8.4, so you won't get much discussion on this for a few months yet. Will it be complete in time for 8.5? Or much earlier even? The first guess is that the work will be done for 8.6. Dano is supposed to finish the work and defend his thesis in something a bit more than a year. Julius, you don't mention what your role is in this. In what sense is Dano's master's thesis a we thing? I am Dano's mentor and we have a closed contact with Zdenek as well. We would like the project to become a we thing as another reason why to work on the project. It seems to be better to research some ideas at the begging and discuss the stuff during development than just individually writing some piece of code which could be published afterwards. Especially, when this area seems to be of interest of more people. Threads are where future performance is going to come from: General purpose- http://www.setup32.com/hardware/cpuchipset/32core-processors-intel-reach e.php GPU- http://wwwx.cs.unc.edu/~lastra/Research/GPU_performance.html http://www.cs.unc.edu/~geom/GPUSORT/results.html Database engines that want to exploit the ultimate in performance will utilize multiple threads of execution. True, the same thing can be realized by multiple processes, but a process is more expensive than a thread. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Simon Riggs wrote: But perhaps writing a single WAL record if you scan whole page and set all bits at once. Then it makes sense in some cases. So this is what I ended up doing; attached. There are some gotchas in this patch: 1. it does not consider hint bits other than the ones defined in htup.h. Some index AMs use hint bits to kill tuples (LP_DEAD mostly, I think). This means that CRCs will be broken for such pages when pages are torn. 2. some parts of the code could be considered modularity violations. For example, tqual.c is setting a bit in a Page structure; bufmgr.c is later checking that bit to determine when to log. 3. the bgwriter is seen writing WAL entries at checkpoint. At shutdown, this might cause an error to be reported on how there was not supposed to be activity on the log. I didn't save the exact error report and I can't find it in the source :-( So it mostly works at this time. I very much welcome opinions to improve the weak points. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. Index: src/backend/access/heap/heapam.c === RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/heap/heapam.c,v retrieving revision 1.266 diff -c -p -r1.266 heapam.c *** src/backend/access/heap/heapam.c 27 Oct 2008 21:50:12 - 1.266 --- src/backend/access/heap/heapam.c 29 Oct 2008 17:56:25 - *** log_newpage(RelFileNode *rnode, ForkNumb *** 4007,4012 --- 4007,4064 return recptr; } + XLogRecPtr + log_hintbits(RelFileNode *rnode, ForkNumber forkNum, BlockNumber blkno, + Page page) + { + xl_heap_hintbits xlrec; + OffsetNumber i; + XLogRecPtr recptr; + XLogRecData rdata[2]; + uint16 bits[MaxHeapTuplesPerPage * 4]; + intpos = 0; + + for (i = FirstOffsetNumber; i = PageGetMaxOffsetNumber(page); + i = OffsetNumberNext(i)) + { + HeapTupleHeader htup; + ItemId lp = PageGetItemId(page, i); + + if (!ItemIdHasStorage(lp)) + continue; + + htup = (HeapTupleHeader) PageGetItem(page, lp); + + bits[pos++] = htup-t_infomask HEAP_XACT_MASK; + bits[pos++] = htup-t_infomask2 HEAP2_XACT_MASK; + } + + /* NO ELOG(ERROR) from here till hint bits are logged */ + START_CRIT_SECTION(); + + xlrec.node = *rnode; + xlrec.block = blkno; + + rdata[0].data = (char *) xlrec; + rdata[0].len = SizeOfHeapHintbits; + rdata[0].buffer = InvalidBuffer; + rdata[0].next = (rdata[1]); + + rdata[1].data = (char *) bits; + rdata[1].len = sizeof(uint16) * pos; + rdata[1].buffer = InvalidBuffer; + rdata[1].next = NULL; + + recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_HINTBITS, rdata); + + PageSetLSN(page, recptr); + PageSetTLI(page, ThisTimeLineID); + + END_CRIT_SECTION(); + + return recptr; + } + /* * Handles CLEAN and CLEAN_MOVE record types */ *** heap_xlog_freeze(XLogRecPtr lsn, XLogRec *** 4113,4118 --- 4165,4230 } static void + heap_xlog_hintbits(XLogRecPtr lsn, XLogRecord *record) + { + xl_heap_hintbits *xlrec = (xl_heap_hintbits *) XLogRecGetData(record); + Buffer buffer; + Page page; + + buffer = XLogReadBuffer(xlrec-node, xlrec-block, false); + if (!BufferIsValid(buffer)) + return; + page = (Page) BufferGetPage(buffer); + + if (XLByteLE(lsn, PageGetLSN(page))) + { + UnlockReleaseBuffer(buffer); + return; + } + + if (record-xl_len SizeOfHeapHintbits) + { + uint16 *infomask; + uint16 *infomask_end; + OffsetNumber offset = FirstOffsetNumber; + + infomask = (uint16 *) ((char *) xlrec + SizeOfHeapHintbits); + infomask_end = (uint16 *) ((char *) xlrec + record-xl_len); + + while (infomask infomask_end) + { + for (;;) + { + HeapTupleHeader htup; + ItemId lp = PageGetItemId(page, offset); + + if (!ItemIdHasStorage(lp)) + { + offset++; + continue; + } + + htup = (HeapTupleHeader) PageGetItem(page, lp); + + htup-t_infomask |= *infomask; + infomask++; + htup-t_infomask2 |= *infomask; + infomask++; + + offset++; + + break; + } + } + } + + PageSetLSN(page, lsn); + PageSetTLI(page, ThisTimeLineID); + MarkBufferDirty(buffer); + UnlockReleaseBuffer(buffer); + } + + static void heap_xlog_newpage(XLogRecPtr lsn, XLogRecord *record) { xl_heap_newpage *xlrec = (xl_heap_newpage *) XLogRecGetData(record); *** heap2_redo(XLogRecPtr lsn, XLogRecord *r *** 4614,4619 --- 4726,4734 case XLOG_HEAP2_CLEAN_MOVE: heap_xlog_clean(lsn, record, true); break; + case XLOG_HEAP2_HINTBITS: + heap_xlog_hintbits(lsn, record); + break; default: elog(PANIC, heap2_redo: unknown op code %u, info); } *** heap2_desc(StringInfo buf, uint8 xl_info *** 4755,4760 --- 4870,4883 xlrec-node.spcNode, xlrec-node.dbNode, xlrec-node.relNode, xlrec-block); } + else if (info ==
[HACKERS] autovacuum: I need some explanation
Hello, I have a table (accounts) with 600,000,000 rows. A heavy high-concurrent workload that makes mostly updates on this table generates a lot of dead tuples in its run, which is expected due to MVCC. The problem is that even though autovacuum is enabled, the autovacuum worker does not vacuum this table (I entered custom autovacuum settings for this table in pg_autovacuum to try to force a situation). Autovacuum is working for other smaller tables but not for accounts. 2008-10-29 11:09:03.453 PDTDEBUG: 0: accounts: vac: 16697969 (threshold 650), anl: 16697969 (threshold 12048) 2008-10-29 11:09:05.610 PDTDEBUG: 0: accounts: vac: 16699578 (threshold 650), anl: 16699578 (threshold 12048) 2008-10-29 11:10:03.563 PDTDEBUG: 0: accounts: vac: 16735906 (threshold 650), anl: 16735906 (threshold 12048) please check the first log message: the vacuum threshold is 6,000,050 rows and the number of dead tuples is 16,697,969. Even though the number of dead_tuples is greater than the threshold the autovacuum is not being triggered for this table. So, besides this condition (dead_tuples threshold) what else is taken into account by autovacuum? Thank you, -Noah
[HACKERS] recovery_target_time and last completed transaction log time
I'm posting on hackers because I think that there is a problem in one of three places, and I'm not sure which: (1) The documentation needs clarification. (2) A LOG message needs clarification. (3) There is a bug in recovery. (unlikely) This issue was previously posted here with no response: http://archives.postgresql.org/pgsql-admin/2008-10/msg00123.php Regarding the documentation -- even if I'm interpreting it correctly, it might be good to specifically state that it is the commit times of database transactions which matter for recovery_target_time; that transactions started but not committed by that time will be omitted, regardless of the recovery_target_inclusive setting. Regarding the LOG message, it is confusing to have a recovery.conf file which contains this: recovery_target_time = '2008-10-16 17:00:00.0' and see this in the log: [2008-10-16 23:04:26.006 CDT] 19951 LOG: last completed transaction was at log time 2008-10-16 17:00:23.205347-05 -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pre-MED
On Wed, Oct 29, 2008 at 12:40 PM, David Fetter [EMAIL PROTECTED] wrote: Also, PL/Perl shouldn't be the only language to have this capability. How might we add similar capabilities to PL/PythonU and PL/Tcl? To the rest of the PLs? Would it make any sense to have it in SQL language functions? Here's a vote for allowing this in plain SQL. I use the tablefunc contrib module as a way to build a view of a specific questionnaire's responses (using Elein's nice model here [1]). Currently, if I then write queries against these views that include WHERE clauses they don't perform very well as the underlying data size grows. I was using the afore-mentioned large view that casts everything to text, but recently I started using separate calls to the crosstab function for each underlying table, then joining them together based on their response ID. This seems to work much better for more complex queries, but I think it would still be beneficial to have access to these qualifiers so I could push down to each subquery the list of response ID's to pull. I don't have access to sample SQL at the moment, but if it is wanted I can try to get that this week. David Blewett 1. http://www.varlena.com/GeneralBits/110.php -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum: I need some explanation
Noah Freire escribió: please check the first log message: the vacuum threshold is 6,000,050 rows and the number of dead tuples is 16,697,969. Even though the number of dead_tuples is greater than the threshold the autovacuum is not being triggered for this table. So, besides this condition (dead_tuples threshold) what else is taken into account by autovacuum? That there's no other process doing the same. Did you check pg_stat_activity to ensure that there's really no autovacuum worker processing this table? 2008-10-29 11:09:03.453 PDTDEBUG: 0: accounts: vac: 16697969 (threshold 650), anl: 16697969 (threshold 12048) 2008-10-29 11:09:05.610 PDTDEBUG: 0: accounts: vac: 16699578 (threshold 650), anl: 16699578 (threshold 12048) 2008-10-29 11:10:03.563 PDTDEBUG: 0: accounts: vac: 16735906 (threshold 650), anl: 16735906 (threshold 12048) Are these log entries by the same process? Please add %p to log_line_prefix to see what's going on. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Alvaro Herrera wrote: So this is what I ended up doing; attached. Oh, another thing. The contents for the WAL log message here is very simplistic; just store all the t_infomask and t_infomask2 relevant bits, for all the tuples in the table. A possible optimization to reduce the WAL traffic is to add another infomask bit which indicates whether a hint bit has been set since the last time we visited the page. I'm unsure if this is worth the pain. (Another possibility, even more painful, is to choose at runtime between the two formats, depending on the number of tuples that need hint bits logged.) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] minimal update
Kenneth Marshall wrote: On Wed, Oct 22, 2008 at 06:05:26PM -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Wed, Oct 22, 2008 at 3:24 PM, Tom Lane [EMAIL PROTECTED] wrote: Minimal really fails to convey the point here IMHO. How about something like suppress_no_op_updates_trigger? I think it means something to us, but no op is a very technical phrase that probably doesn't travel very well. Agreed --- I was hoping someone could improve on that part. The only other words I could come up with were empty and useless, neither of which seem quite le mot juste ... regards, tom lane redundant? I think I like this best of all the suggestions - suppress_redundant_updates_trigger() is what I have now. If there's no further discussion, I'll go ahead and commit this in a day or two. cheers andrew ? GNUmakefile ? config.log ? config.status ? contrib/spi/.deps ? src/Makefile.global ? src/backend/postgres ? src/backend/access/common/.deps ? src/backend/access/gin/.deps ? src/backend/access/gist/.deps ? src/backend/access/hash/.deps ? src/backend/access/heap/.deps ? src/backend/access/index/.deps ? src/backend/access/nbtree/.deps ? src/backend/access/transam/.deps ? src/backend/bootstrap/.deps ? src/backend/catalog/.deps ? src/backend/catalog/postgres.bki ? src/backend/catalog/postgres.description ? src/backend/catalog/postgres.shdescription ? src/backend/commands/.deps ? src/backend/executor/.deps ? src/backend/lib/.deps ? src/backend/libpq/.deps ? src/backend/main/.deps ? src/backend/nodes/.deps ? src/backend/optimizer/geqo/.deps ? src/backend/optimizer/path/.deps ? src/backend/optimizer/plan/.deps ? src/backend/optimizer/prep/.deps ? src/backend/optimizer/util/.deps ? src/backend/parser/.deps ? src/backend/port/.deps ? src/backend/postmaster/.deps ? src/backend/regex/.deps ? src/backend/rewrite/.deps ? src/backend/snowball/.deps ? src/backend/snowball/snowball_create.sql ? src/backend/storage/buffer/.deps ? src/backend/storage/file/.deps ? src/backend/storage/freespace/.deps ? src/backend/storage/ipc/.deps ? src/backend/storage/large_object/.deps ? src/backend/storage/lmgr/.deps ? src/backend/storage/page/.deps ? src/backend/storage/smgr/.deps ? src/backend/tcop/.deps ? src/backend/tsearch/.deps ? src/backend/utils/.deps ? src/backend/utils/probes.h ? src/backend/utils/adt/.deps ? src/backend/utils/cache/.deps ? src/backend/utils/error/.deps ? src/backend/utils/fmgr/.deps ? src/backend/utils/hash/.deps ? src/backend/utils/init/.deps ? src/backend/utils/mb/.deps ? src/backend/utils/mb/conversion_procs/conversion_create.sql ? src/backend/utils/mb/conversion_procs/ascii_and_mic/.deps ? src/backend/utils/mb/conversion_procs/cyrillic_and_mic/.deps ? src/backend/utils/mb/conversion_procs/euc_cn_and_mic/.deps ? src/backend/utils/mb/conversion_procs/euc_jis_2004_and_shift_jis_2004/.deps ? src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/.deps ? src/backend/utils/mb/conversion_procs/euc_kr_and_mic/.deps ? src/backend/utils/mb/conversion_procs/euc_tw_and_big5/.deps ? src/backend/utils/mb/conversion_procs/latin2_and_win1250/.deps ? src/backend/utils/mb/conversion_procs/latin_and_mic/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_ascii/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_big5/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_euc_jis_2004/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_gb18030/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_gbk/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_iso8859/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_johab/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_shift_jis_2004/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_sjis/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_uhc/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_win/.deps ? src/backend/utils/misc/.deps ? src/backend/utils/mmgr/.deps ? src/backend/utils/resowner/.deps ? src/backend/utils/sort/.deps ? src/backend/utils/time/.deps ? src/bin/initdb/.deps ? src/bin/initdb/initdb ? src/bin/pg_config/.deps ? src/bin/pg_config/pg_config ? src/bin/pg_controldata/.deps ? src/bin/pg_controldata/pg_controldata ? src/bin/pg_ctl/.deps ? src/bin/pg_ctl/pg_ctl ? src/bin/pg_dump/.deps ? src/bin/pg_dump/pg_dump ? src/bin/pg_dump/pg_dumpall ? src/bin/pg_dump/pg_restore ? src/bin/pg_resetxlog/.deps ? src/bin/pg_resetxlog/pg_resetxlog ? src/bin/psql/.deps ? src/bin/psql/psql ? src/bin/scripts/.deps ? src/bin/scripts/clusterdb ? src/bin/scripts/createdb ? src/bin/scripts/createlang ?
Re: [HACKERS] autovacuum: I need some explanation
Noah Freire escribió: datid | datname | procpid | usesysid | usename | current_query | waiting | xact_start | query_start | backend_start | client_addr | client_port ---+-+-+--+--+-+-+---+---+---+-+- 45974 | pgbench | 14660 | 10 | postgres | autovacuum: VACUUM public.accounts | f | 2008-10-29 11:09:05.610857-07 | 2008-10-29 11:09:05.610857-07 | 2008-10-29 11:09:03.45083-07 | | indeed; autovacuum is currently running for accounts. It is running for 1 hour but the number of dead rows keeps increasing. Apparently autovacuum cannot pace the rate of updates on this table (please check the two snapshots of pg_stat_user_tables below taken with a 2 seconds interval between them). Maybe you just need to decrease the vacuum_cost_delay, so that it goes a bit faster. Keep in mind that the number of dead tuples only decreases when vacuum finishes, not while it is working. It would be better to run vacuum manually than using autovacuum in this case? It would be exactly the same (unless it used different cost_limit/delay settings, but then you can configure them via pg_autovacuum) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] minimal update
Andrew Dunstan wrote: Kenneth Marshall wrote: On Wed, Oct 22, 2008 at 06:05:26PM -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Wed, Oct 22, 2008 at 3:24 PM, Tom Lane [EMAIL PROTECTED] wrote: Minimal really fails to convey the point here IMHO. How about something like suppress_no_op_updates_trigger? I think it means something to us, but no op is a very technical phrase that probably doesn't travel very well. Agreed --- I was hoping someone could improve on that part. The only other words I could come up with were empty and useless, neither of which seem quite le mot juste ... regards, tom lane redundant? I think I like this best of all the suggestions - suppress_redundant_updates_trigger() is what I have now. If there's no further discussion, I'll go ahead and commit this in a day or two. Nitpicking, but you have: +para + Currently productnamePostgreSQL/ provides one built in trigger + function, functionsuppress_redundant_updates_trigger/, Should we perhaps mention the fulltext triggers (with the appropriate links) here? If it's intended to be an authoritative list of the userspace triggers we ship, I think that may be a good idea. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum: I need some explanation
Hi Alvaro, On Wed, Oct 29, 2008 at 3:46 PM, Alvaro Herrera [EMAIL PROTECTED]wrote: Noah Freire escribió: please check the first log message: the vacuum threshold is 6,000,050 rows and the number of dead tuples is 16,697,969. Even though the number of dead_tuples is greater than the threshold the autovacuum is not being triggered for this table. So, besides this condition (dead_tuples threshold) what else is taken into account by autovacuum? That there's no other process doing the same. Did you check pg_stat_activity to ensure that there's really no autovacuum worker processing this table? datid | datname | procpid | usesysid | usename | current_query | waiting | xact_start | query_start | backend_start | client_addr | client_port ---+-+-+--+--+-+-+---+---+---+-+- 45974 | pgbench | 14660 | 10 | postgres | autovacuum: VACUUM public.accounts | f | 2008-10-29 11:09:05.610857-07 | 2008-10-29 11:09:05.610857-07 | 2008-10-29 11:09:03.45083-07 | | indeed; autovacuum is currently running for accounts. It is running for 1 hour but the number of dead rows keeps increasing. Apparently autovacuum cannot pace the rate of updates on this table (please check the two snapshots of pg_stat_user_tables below taken with a 2 seconds interval between them). It would be better to run vacuum manually than using autovacuum in this case? pgbench=# select relname, n_dead_tup from pg_stat_user_tables where relname='accounts'; relname | n_dead_tup --+ accounts | 19917490 (1 row) pgbench=# select relname, n_dead_tup from pg_stat_user_tables where relname='accounts'; relname | n_dead_tup --+ accounts | 19923767 (1 row) Thanks, -Noah
Re: [HACKERS] minimal update
Andrew Dunstan escribió: + /* make sure it's called as a trigger */ + if (!CALLED_AS_TRIGGER(fcinfo)) + elog(ERROR, suppress_redundant_updates_trigger: must be called as trigger); Shouldn't these all be ereport()? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PostgreSQL network discovery
Hi, There is a nice little feature within MSSQL where it is possible to enumerate all MSSQL servers on the local network. I wonder how this can be made possible with PG. Pinging every IP:PGPORT within the current subnet is one of the options but I guess that would be just a bad solution. Any thoughts? Regards, Gevik http://www.truesoftware.net/gevik/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL network discovery
Gevik Babakhani wrote: Hi, There is a nice little feature within MSSQL where it is possible to enumerate all MSSQL servers on the local network. I wonder how this can be made possible with PG. Pinging every IP:PGPORT within the current subnet is one of the options but I guess that would be just a bad solution. Any thoughts? Isn't the bonjour support supposed to do this? (Never used it myself though) //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] minimal update
Alvaro Herrera wrote: Andrew Dunstan escribió: + /* make sure it's called as a trigger */ + if (!CALLED_AS_TRIGGER(fcinfo)) + elog(ERROR, suppress_redundant_updates_trigger: must be called as trigger); Shouldn't these all be ereport()? Good point. I'll fix them. Maybe we should fix our C sample trigger, from which this was taken. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL network discovery
Gevik Babakhani wrote: Hi, There is a nice little feature within MSSQL where it is possible to enumerate all MSSQL servers on the local network. I wonder how this can be made possible with PG. Pinging every IP:PGPORT within the current subnet is one of the options but I guess that would be just a bad solution. Any thoughts? We have rendezvous support too. We need to update it to use the newer Avahi library, but the one person who proposed using the thread interface got scolded for that and fled :-) The interface we'd need to use is complex and the patch would be a lot bigger. If you want to work on it, you're welcome to do so :-) -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Alvaro Herrera wrote: There are some gotchas in this patch: 1. it does not consider hint bits other than the ones defined in htup.h. Some index AMs use hint bits to kill tuples (LP_DEAD mostly, I think). This means that CRCs will be broken for such pages when pages are torn. The other hint bits are: - LP_DEAD as used by the various callers of ItemIdMarkDead. - PD_PAGE_FULL - BTPageOpaque-btpo_flags and btpo_cycleid All of them are changed with only SetBufferCommitInfoNeedsSave being called afterwards. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL network discovery
If I am not mistaken, it only works fine on OSX environments. (I might be very wrong here) -Original Message- From: Magnus Hagander [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 29, 2008 8:47 PM To: Gevik Babakhani Cc: 'PGSQL Hackers' Subject: Re: [HACKERS] PostgreSQL network discovery Gevik Babakhani wrote: Hi, There is a nice little feature within MSSQL where it is possible to enumerate all MSSQL servers on the local network. I wonder how this can be made possible with PG. Pinging every IP:PGPORT within the current subnet is one of the options but I guess that would be just a bad solution. Any thoughts? Isn't the bonjour support supposed to do this? (Never used it myself though) //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL network discovery
I asked this question because I have a situation where a service like this would be very useful. If such a functionality would be accepted by the core team, I am willing to work on it. -Original Message- From: Alvaro Herrera [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 29, 2008 8:50 PM To: Gevik Babakhani Cc: 'PGSQL Hackers' Subject: Re: [HACKERS] PostgreSQL network discovery Gevik Babakhani wrote: Hi, There is a nice little feature within MSSQL where it is possible to enumerate all MSSQL servers on the local network. I wonder how this can be made possible with PG. Pinging every IP:PGPORT within the current subnet is one of the options but I guess that would be just a bad solution. Any thoughts? We have rendezvous support too. We need to update it to use the newer Avahi library, but the one person who proposed using the thread interface got scolded for that and fled :-) The interface we'd need to use is complex and the patch would be a lot bigger. If you want to work on it, you're welcome to do so :-) -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] minimal update
On Wed, Oct 29, 2008 at 03:48:09PM -0400, Andrew Dunstan wrote: + /* make sure it's called as a trigger */ + if (!CALLED_AS_TRIGGER(fcinfo)) + elog(ERROR, suppress_redundant_updates_trigger: must be called as trigger); Shouldn't these all be ereport()? Good point. I'll fix them. Maybe we should fix our C sample trigger, from which this was taken. Yes :) Does the attached have the right error code? Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml index a3f17c9..69430ea 100644 --- a/doc/src/sgml/trigger.sgml +++ b/doc/src/sgml/trigger.sgml @@ -581,7 +581,9 @@ trigf(PG_FUNCTION_ARGS) /* make sure it's called as a trigger at all */ if (!CALLED_AS_TRIGGER(fcinfo)) -elog(ERROR, trigf: not called by trigger manager); +ereport(ERROR, +(error(TRIGGERED_DATA_CHANGE_VIOLATION), + errmsg(trigf: not called by trigger manager))); /* tuple to return to executor */ if (TRIGGER_FIRED_BY_UPDATE(trigdata-gt;tg_event)) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] minimal update
David Fetter wrote: Maybe we should fix our C sample trigger, from which this was taken. Yes :) Does the attached have the right error code? -elog(ERROR, trigf: not called by trigger manager); +ereport(ERROR, +(error(TRIGGERED_DATA_CHANGE_VIOLATION), + errmsg(trigf: not called by trigger manager))); Not sure that's appropriate, but I can't see anything else that is very appropriate either. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1155)
On Wed, 2008-10-29 at 17:42 +0900, KaiGai Kohei wrote: I've updated my patches, these are ready for CommitFest:Nov. [1/6] http://sepgsql.googlecode.com/files/sepostgresql-sepgsql-8.4devel-3-r1155.patch [2/6] http://sepgsql.googlecode.com/files/sepostgresql-pg_dump-8.4devel-3-r1155.patch [3/6] http://sepgsql.googlecode.com/files/sepostgresql-policy-8.4devel-3-r1155.patch [4/6] http://sepgsql.googlecode.com/files/sepostgresql-docs-8.4devel-3-r1155.patch [5/6] http://sepgsql.googlecode.com/files/sepostgresql-tests-8.4devel-3-r1155.patch [6/6] http://sepgsql.googlecode.com/files/sepostgresql-row_acl-8.4devel-3-r1155.patch The comprehensive documentation for SE-PostgreSQL is here: http://wiki.postgresql.org/wiki/SEPostgreSQL (it is now under reworking.) List of updates: - Patches are rebased to the latest CVS HEAD. - bugfix: TRUNCATE checks assumed SECCLASS_DB_TUPLE object class - bugfix: sepgsqlCopyFile assumed SECCLASS_FILE object class, but it has to be adjusted by st_mode. Request for Comments: - The 4th patch is actually needed? It can be replaced by wiki page. - Do you think anything remained towards the final CommitFest? - Do you have any reviewing comment? Most of patches are unchanged from the previous vesion. If you can comment anything, I can fix them without waiting for the final commit fest. I'm copying some general comments from my contact here, verbatim. Other comments have been requested and may be forthcoming: By way of background Common Criteria (ISO Standard 15408) are in effect pre-defined security requirements that have been agreed between multiple friendly governments so that they can share the results from independent lab work in each country and avoid the costs and duplication of effort. The published lab work results in two outputs: - a Target of Evaluation (TOE) i.e. tight definition of the software version, configuration and environment (hardware, external controls) which was the subject of the evaluation - an Evaluation Report which, in the happy case has assigns an Evaluation Assurance Level (EAL) number to the product (which needless to say is only valid if the product is used in its TOE If you're interested in reading more about formal Government security evaluation schemes, these are some good sites: General http://www.commoncriteriaportal.org/ UK http://www.cesg.gov.uk/ Australia Defence Signals Directorate www.dsd.gov.au/infosec/ Canada Communications Security Establishment www.cse.dnd.ca France Direction Centrale de la Sécurité des Systèmes d'Information www.ssi.gouv.fr/en/ Germany Bundesamt fur Sicherheit in der Informationstechnik www.bsi.bund.de Japan Japan Information Technology Security Evaluation and Certification Scheme (JISEC) www.ipa.go.jp/security/jisec/jisec_e/index.html USA National Institute of Standards and Technologywww.nist.gov National Information Assurance Partnership (NIAP) www.nsa.gov/ia/industry/niap.cfm -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] minimal update
Andrew Dunstan wrote: David Fetter wrote: Maybe we should fix our C sample trigger, from which this was taken. Yes :) Does the attached have the right error code? -elog(ERROR, trigf: not called by trigger manager); +ereport(ERROR, +(error(TRIGGERED_DATA_CHANGE_VIOLATION), + errmsg(trigf: not called by trigger manager))); Not sure that's appropriate, but I can't see anything else that is very appropriate either. The plpgsql code uses errcode(ERRCODE_FEATURE_NOT_SUPPORTED) for this situation, so I guess we should be consistent with that. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL network discovery
Gevik Babakhani [EMAIL PROTECTED] writes: I asked this question because I have a situation where a service like this would be very useful. If such a functionality would be accepted by the core team, I am willing to work on it. The existing code uses APIs that Apple has deprecated, so it's going to need to be fixed sooner or later. The issue isn't whether to fix it, it's how to not introduce any unwanted dependencies (like threading...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Block-level CRC checks
Alvaro Herrera wrote: 3. the bgwriter is seen writing WAL entries at checkpoint. At shutdown, this might cause an error to be reported on how there was not supposed to be activity on the log. I didn't save the exact error report and I can't find it in the source :-( LOG: received fast shutdown request LOG: aborting any active transactions FATAL: terminating connection due to administrator command LOG: autovacuum launcher shutting down LOG: shutting down LOG: INSERT @ 0/67F05F0: prev 0/67F05C0; xid 0: Heap2 - hintbits: rel 1663/16384/1259; blk 4 CONTEXT: writing block 4 of relation 1663/16384/1259 LOG: xlog flush request 0/67F06C0; write 0/0; flush 0/0 CONTEXT: writing block 4 of relation 1663/16384/1259 LOG: INSERT @ 0/67F06C0: prev 0/67F05F0; xid 0: Heap2 - hintbits: rel 1663/16384/2608; blk 40 CONTEXT: writing block 40 of relation 1663/16384/2608 LOG: xlog flush request 0/67F0708; write 0/67F06C0; flush 0/67F06C0 CONTEXT: writing block 40 of relation 1663/16384/2608 LOG: INSERT @ 0/67F0708: prev 0/67F06C0; xid 0: Heap2 - hintbits: rel 1663/16384/1249; blk 29 CONTEXT: writing block 29 of relation 1663/16384/1249 LOG: xlog flush request 0/67F0808; write 0/67F0708; flush 0/67F0708 CONTEXT: writing block 29 of relation 1663/16384/1249 LOG: INSERT @ 0/67F0808: prev 0/67F0708; xid 0: XLOG - checkpoint: redo 0/67F05F0; tli 1; xid 0/9093; oid 90132; multi 1; offset 0; shutdown LOG: xlog flush request 0/67F0850; write 0/67F0808; flush 0/67F0808 PANIC: concurrent transaction log activity while database system is shutting down LOG: background writer process (PID 17411) was terminated by signal 6: Aborted I am completely at a loss what to do here. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BufferAccessStrategy for bulk insert
On Tue, 2008-10-28 at 23:45 -0400, Robert Haas wrote: One concern that I have about this approach is that the situation in which people are probably most concerned about COPY performance is restoring a dump. In that case, the COPY will be the only thing running, and using a BufferAccessStrategy is an anti-optimization. I don't think it's a very big effect (any testing anyone can do on real hardware rather than what I have would be appreciated) but I'm sort of unsold of optimizing for what I believe to be the less-common use case. If the consensus is to reverse course on this point I'm happy to rip those changes back out and resubmit; they are a relatively small proportion of the patch. Having COPY use a BAS is mainly to ensure it doesn't swamp the cache. Which is a gain in itself. If you say its a loss you should publish timings to support that. Using a BAS for VACUUM was a performance gain, not a loss. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pre-MED
On Wed, 2008-10-29 at 10:33 -0700, Joshua D. Drake wrote: On Wed, 2008-10-29 at 19:17 +0200, Hannu Krosing wrote: On Wed, 2008-10-29 at 09:40 -0700, David Fetter wrote: Folks, Please find enclosed a WIP patch to add the ability for functions to see the qualifiers of the query in which they're called. It's not working just yet, and I'm not sure how best to get it working, but I'd like to see this as part of 8.4, as SQL/MED is just way too ambitious given the time frame. Any tips, hints, pointers, etc. would be much appreciated. Also, PL/Perl shouldn't be the only language to have this capability. How might we add similar capabilities to PL/PythonU I'll look at adding this to pl/pythonu. I have to finish some stuff there before freeze anyway. Have we tested plpython with version 3? If you mean python 3.0 , then no, at least I have not tested it yet. not even 2.6 -- -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Please make sure your patches are on the wiki page
Patch submitters, Please make sure your patches are on the November CommitFest wiki page, with correct and updated links. http://wiki.postgresql.org/wiki/CommitFest_2008-11 -- Josh Berkus PostgreSQL San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores
All of this is pie-in-the-sky for PL functions, and I think properly so: the whole reason for supporting PLs is to enable doing things that SQL does poorly or not at all. So expecting SQL to interoperate very closely with them seems impossible, or at least unreasonably limiting. The real issue at hand is what to do with SQL-language functions. I'm currently going to have a look at just what it would take to support both lazy and eager evaluation in functions.c (independently of what syntax, if any, we settle on to expose the choice to the user). If it's either really awful or really easy we should know that before arguing further. It occurs to me that for PL/perl and similar one could design an interface that is similar to the one that is used for C functions - that is, function is invoked multiple times, returns one value per call, and is given a place to stash its state across calls. For example, for PL/perl, you could pass a mutable empty hash reference on the first call and then pass the same hash reference back on each subsequent call. That wouldn't require being able to freeze/thaw the whole state, just being able to maintain the contents of that hash reference across calls. It would probably be a lot more difficult to make something like this work usefully for PL/pgsql, which as a language is rather underpowered (nonetheless I use it heavily; it's awesome for the things it is good at), but I suspect it could be applied to Python, PHP, etc. pretty easily. So that's at least three ways you can evaluate the function: generate the whole thing in one fell swoop, single function call but with lazy execution, or value-per-call mode. I'm guessing someone could dream up other possibilities as well. Now, who's volunteering to implement? :-) ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal of PITR performance improvement for 8.4.
I'm not sure if blocks reffered from all WAL records in single WAL segment can fit kernel cache. This is why current pg_readahead returns the last LSN and require starting LSN. So far, with FPW, it seems that we can prefetch all the pages in a WAL segment. So it will be okay for archive log compressed with pg_compresslog. I'll test if it's okay in the case full_page_writes=off. Anyway, I'd like to keep my proposal for 8.4 and continue the test and evaluation to report to the mailing list. I'll also change the whole code to run in the core. --- Koichi Suzuki 2008/10/29 Simon Riggs [EMAIL PROTECTED]: On Wed, 2008-10-29 at 09:55 +0900, Koichi Suzuki wrote: I'd like to hear some more about these. I'm more than happy to write all the code inside PG core to avoid overhead to create another process. Having an external program can help earlier releases also, so I think this is the right approach for now. In next PG release we should bring this into core, along with streaming. Interface would be better if it accepted pg_readahead filename or pg_readahead filename start-lsn We don't always need a starting lsn. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal of PITR performance improvement for 8.4.
Hi, 2008/10/29 Simon Riggs [EMAIL PROTECTED]: On Tue, 2008-10-28 at 14:21 +0200, Heikki Linnakangas wrote: 1. You should avoid useless posix_fadvise() calls. In the naive implementation, where you simply call posix_fadvise() for every page referenced in every WAL record, you'll do 1-2 posix_fadvise() syscalls per WAL record, and that's a lot of overhead. We face the same design question as with Greg's patch to use posix_fadvise() to prefetch index and bitmap scans: what should the interface to the buffer manager look like? The simplest approach would be a new function call like AdviseBuffer(Relation, BlockNumber), that calls posix_fadvise() for the page if it's not in the buffer cache, but is a no-op otherwise. But that means more overhead, since for every page access, we need to find the page twice in the buffer cache; once for the AdviseBuffer() call, and 2nd time for the actual ReadBuffer(). That's a much smaller overhead than waiting for an I/O. The CPU overhead isn't really a problem if we're I/O bound. As disccused last year about parallel recovery and random read problem, recovery is really I/O bound, especially when FPW is not available. And it is not practical to ask all the archive logs to include huge FPWs. It would be more efficient to pin the buffer in the AdviseBuffer() call already, but that requires much more changes to the callers. That would be hard to cleanup safely, plus we'd have difficulty with timing: is there enough buffer space to allow all the prefetched blocks live in cache at once? If not, this approach would cause problems. I'm not positive to AdviseBuffer() adea. If we do this, we need all the pages reffered from a WAL segment in the shared buffer. This may be several GB and will compete with kernel cache. Current PostgreSQL highly relies on kernel cache (and kernel I/O schedule) and it is not a good idea to have much shared buffer. The worst case is to spare half of the physical memory to the shared buffer. The performance will be very bad. Rather, I prefer to ask kernel to prefetch. 2. The format of each WAL record is different, so you need a readahead handler for every resource manager, for every record type. It would be a lot simpler if there was a standardized way to store that information in the WAL records. I would prefer a new rmgr API call that returns a list of blocks. That's better than trying to make everything fit one pattern. If the call doesn't exist then that rmgr won't get prefetch. Yes, I'd like this idea. Could you let me try this API through prefetch implementation in the core (if it is agreed)? 3. IIRC I tried to handle just a few most important WAL records at first, but it turned out that you really need to handle all WAL records (that are used at all) before you see any benefit. Otherwise, every time you hit a WAL record that you haven't done posix_fadvise() on, the recovery stalls, and you don't need much of those to diminish the gains. Not sure how these apply to your approach, it's very different. You seem to handle 1. by collecting all the page references for the WAL file, and sorting and removing the duplicates. I wonder how much CPU time is spent on that? Removing duplicates seems like it will save CPU. If we invoke posix_fadvise() to the blocks already in the kernel cache, this call will just do nothing but consume some overhead in the kernel. I think duplicate removal saves more. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal of PITR performance improvement for 8.4.
On Thu, 2008-10-30 at 09:46 +0900, Koichi Suzuki wrote: I'm not sure if blocks reffered from all WAL records in single WAL segment can fit kernel cache. This is why current pg_readahead returns the last LSN and require starting LSN. So far, with FPW, it seems that we can prefetch all the pages in a WAL segment. So it will be okay for archive log compressed with pg_compresslog. I'll test if it's okay in the case full_page_writes=off. I'd prefer to be able to specify max_readahead_pages than have to control things at a micro level like that. If you have lots of memory you can set that higher. Anyway, I'd like to keep my proposal for 8.4 and continue the test and evaluation to report to the mailing list. I'll also change the whole code to run in the core. OK, I quite liked the idea of a separate program. That allows it to work with 8.3 as well as 8.4. No problem with it being in core at all. As ever, good thinking, good patch. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BufferAccessStrategy for bulk insert
If you say its a loss you should publish timings to support that. Using a BAS for VACUUM was a performance gain, not a loss. Well, I can dig up and publish the timings from my laptop, but I'm not sure where that will get us. Trust me, the numbers were higher with BAS, otherwise I wouldn't be worrying about this. But I pretty much doubt anyone cares how my laptop runs PostgreSQL anyway, which is why I think someone should test this on good hardware and see what happens there. The only change I made to disable the BAS was a one-line change in GetBulkInsertState to replace BAS_BULKWRITE with BAS_NORMAL, so it should be easy for someone to try it both ways. Not at any point in the development of this patch was I able to match the 15-17% copy speedup, 20% CTAS speedup that you cited with your original email. I did get speedups, but they were considerably smaller. So either my testing methodology is different, or my hardware is different, or there is something wrong with my patch. I don't think we're going to find out which it is until someone other than me looks at this. In any event, VACUUM is a read-write workload, and specifically, it tends to write pages that have been written by other writers, and are therefore potentially already in shared buffers. COPY and CTAS are basically write-only workloads, though with COPY on an existing table the FSM might guide you to free space on a page already in shared buffers, or you might find an index page you need there. Still, if you are doing a large bulk data load, those effects are probably pretty small. So, the profile is somewhat. I'm not really trying to argue that the BAS is a bad idea, but it is certainly true that I do not have the data to prove that it is a good idea. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Please make sure your patches are on the wiki page
I wonder if we should consider: (1) moving all of the patches committed prior to 11/1 to a separate section or page (2) sorting the pending patches by complexity or subject matter ...Robert On Wed, Oct 29, 2008 at 5:26 PM, Josh Berkus [EMAIL PROTECTED] wrote: Patch submitters, Please make sure your patches are on the November CommitFest wiki page, with correct and updated links. http://wiki.postgresql.org/wiki/CommitFest_2008-11 -- Josh Berkus PostgreSQL San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BufferAccessStrategy for bulk insert
On Wed, 2008-10-29 at 21:58 -0400, Robert Haas wrote: If you say its a loss you should publish timings to support that. Using a BAS for VACUUM was a performance gain, not a loss. Well, I can dig up and publish the timings from my laptop, but I'm not sure where that will get us. Trust me, the numbers were higher with BAS, otherwise I wouldn't be worrying about this. But I pretty much doubt anyone cares how my laptop runs PostgreSQL anyway, which is why I think someone should test this on good hardware and see what happens there. The only change I made to disable the BAS was a one-line change in GetBulkInsertState to replace BAS_BULKWRITE with BAS_NORMAL, so it should be easy for someone to try it both ways. Not at any point in the development of this patch was I able to match the 15-17% copy speedup, 20% CTAS speedup that you cited with your original email. I did get speedups, but they were considerably smaller. So either my testing methodology is different, or my hardware is different, or there is something wrong with my patch. I don't think we're going to find out which it is until someone other than me looks at this. In any event, VACUUM is a read-write workload, and specifically, it tends to write pages that have been written by other writers, and are therefore potentially already in shared buffers. COPY and CTAS are basically write-only workloads, though with COPY on an existing table the FSM might guide you to free space on a page already in shared buffers, or you might find an index page you need there. Still, if you are doing a large bulk data load, those effects are probably pretty small. So, the profile is somewhat. I'm not really trying to argue that the BAS is a bad idea, but it is certainly true that I do not have the data to prove that it is a good idea. You should try profiling the patch. You can count the invocations of the buffer access routines to check its all working in the right ratios. Whatever timings you have are worth publishing. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pre-MED
Hannu Krosing [EMAIL PROTECTED] writes: On Wed, 2008-10-29 at 09:40 -0700, David Fetter wrote: Also, PL/Perl shouldn't be the only language to have this capability. How might we add similar capabilities to PL/PythonU I'll look at adding this to pl/pythonu. I would argue that it's already designed wrong if there's need for PL-specific implementation effort. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Please make sure your patches are on the wiki page
Robert, (1) moving all of the patches committed prior to 11/1 to a separate section or page Why? (2) sorting the pending patches by complexity or subject matter Sorting them by complexity would be great, if I thought I could do it. I'm not sure I can. -- Josh Berkus PostgreSQL San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] minimal update
Andrew Dunstan [EMAIL PROTECTED] writes: Not sure that's appropriate, but I can't see anything else that is very appropriate either. The plpgsql code uses errcode(ERRCODE_FEATURE_NOT_SUPPORTED) for this situation, so I guess we should be consistent with that. TRIGGERED_DATA_CHANGE_VIOLATION is most certainly NOT an appropriate code here --- it's talking about invalid database content states. The RI triggers use ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED for these sorts of conditions, and I think that's probably best practice. See ri_CheckTrigger() in particular. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1155)
Simon Riggs wrote: On Wed, 2008-10-29 at 17:42 +0900, KaiGai Kohei wrote: I've updated my patches, these are ready for CommitFest:Nov. [1/6] http://sepgsql.googlecode.com/files/sepostgresql-sepgsql-8.4devel-3-r1155.patch [2/6] http://sepgsql.googlecode.com/files/sepostgresql-pg_dump-8.4devel-3-r1155.patch [3/6] http://sepgsql.googlecode.com/files/sepostgresql-policy-8.4devel-3-r1155.patch [4/6] http://sepgsql.googlecode.com/files/sepostgresql-docs-8.4devel-3-r1155.patch [5/6] http://sepgsql.googlecode.com/files/sepostgresql-tests-8.4devel-3-r1155.patch [6/6] http://sepgsql.googlecode.com/files/sepostgresql-row_acl-8.4devel-3-r1155.patch The comprehensive documentation for SE-PostgreSQL is here: http://wiki.postgresql.org/wiki/SEPostgreSQL (it is now under reworking.) List of updates: - Patches are rebased to the latest CVS HEAD. - bugfix: TRUNCATE checks assumed SECCLASS_DB_TUPLE object class - bugfix: sepgsqlCopyFile assumed SECCLASS_FILE object class, but it has to be adjusted by st_mode. Request for Comments: - The 4th patch is actually needed? It can be replaced by wiki page. - Do you think anything remained towards the final CommitFest? - Do you have any reviewing comment? Most of patches are unchanged from the previous vesion. If you can comment anything, I can fix them without waiting for the final commit fest. I'm copying some general comments from my contact here, verbatim. Other comments have been requested and may be forthcoming: By way of background Common Criteria (ISO Standard 15408) are in effect pre-defined security requirements that have been agreed between multiple friendly governments so that they can share the results from independent lab work in each country and avoid the costs and duplication of effort. The published lab work results in two outputs: - a Target of Evaluation (TOE) i.e. tight definition of the software version, configuration and environment (hardware, external controls) which was the subject of the evaluation - an Evaluation Report which, in the happy case has assigns an Evaluation Assurance Level (EAL) number to the product (which needless to say is only valid if the product is used in its TOE If you're interested in reading more about formal Government security evaluation schemes, these are some good sites: Thanks for your information. However, I've also followed the Common Criteria for a few years, and some of facilities came from its requirements. The security_context system column reflects the requirement of labeled import/export, for example. Don't worry. Let's move our discussion into its implementation in the upcoming CommitFest. It's a good time now. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Please make sure your patches are on the wiki page
Josh Berkus [EMAIL PROTECTED] writes: (2) sorting the pending patches by complexity or subject matter Sorting them by complexity would be great, if I thought I could do it. I'm not sure I can. We organized them by subject matter (or code area, really) in a couple of the earlier fests. I thought that was helpful then. On the other hand, the September fest didn't seem to break down that way. Until we see the final list it's hard to say how November will shake out. Earlier today I had a different thought about how to sort things early in the fest. I think that there is a strong temptation to finish off the simple patches quickly so as to reduce the size of the list --- I know I've done that and I think others have too. The trouble with simple-first is that problematic patches get left till later, which means that their authors don't have as much time to respond to any criticisms that may ultimately be forthcoming. I think it'd be a good idea to intentionally try to focus on difficult patches early, so that they can be bounced back to their authors with useful criticism while there's still time to do something in response. Not sure about details of this, but seems like a process issue that we ought to consider. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] minimal update
Andrew Dunstan [EMAIL PROTECTED] writes: I think I like this best of all the suggestions - suppress_redundant_updates_trigger() is what I have now. If there's no further discussion, I'll go ahead and commit this in a day or two. The documentation seems a bit lacking: it gives neither a hint of why you might want to use this nor why it's not the built-in behavior. Suggest expending a sentence or two pointing out that the trigger takes nonzero execution time to do its comparisons, and that this may or may not be repaid by eliminated updates, depending on whether the client applications are actually in the habit of issuing useless update commands. I think you're missing an indexentry item for the function name, also. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pre-MED
David Blewett [EMAIL PROTECTED] writes: Here's a vote for allowing this in plain SQL. I use the tablefunc contrib module as a way to build a view of a specific questionnaire's responses (using Elein's nice model here [1]). Currently, if I then write queries against these views that include WHERE clauses they don't perform very well as the underlying data size grows. I was using the afore-mentioned large view that casts everything to text, but recently I started using separate calls to the crosstab function for each underlying table, then joining them together based on their response ID. This seems to work much better for more complex queries, but I think it would still be beneficial to have access to these qualifiers so I could push down to each subquery the list of response ID's to pull. I don't have access to sample SQL at the moment, but if it is wanted I can try to get that this week. Please. Some real use-cases would be very helpful here. I'm particularly wondering whether the proposed deparse call actually yields anything that's useful without extensive additional knowledge about the query ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Please make sure your patches are on the wiki page
On Wednesday 29 October 2008 20:12, Tom Lane wrote: Earlier today I had a different thought about how to sort things early in the fest. I think that there is a strong temptation to finish off the simple patches quickly so as to reduce the size of the list --- I know I've done that and I think others have too. Actually, I'd really like it if you and the other advanced hackers ignored the simple patches. I've got a list of 6 new reviewers, and I'd like to be able to give them those patches to review -- they generally can't help with the hard ones. -- Josh Berkus PostgreSQL San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] recovery_target_time and last completed transaction log time
Kevin Grittner [EMAIL PROTECTED] writes: Regarding the LOG message, it is confusing to have a recovery.conf file which contains this: recovery_target_time = '2008-10-16 17:00:00.0' and see this in the log: [2008-10-16 23:04:26.006 CDT] 19951 LOG: last completed transaction was at log time 2008-10-16 17:00:23.205347-05 Actually, I think this *is* a bug: recoveryLastXTime is being set in entirely the wrong place, ie, where we are *considering* whether to apply a commit rather than after we have decided to do so. The result is just a misleading log message, so not too terrible, but we should fix it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pre-MED
On Wed, Oct 29, 2008 at 10:23:36PM -0400, Tom Lane wrote: Hannu Krosing [EMAIL PROTECTED] writes: On Wed, 2008-10-29 at 09:40 -0700, David Fetter wrote: Also, PL/Perl shouldn't be the only language to have this capability. How might we add similar capabilities to PL/PythonU I'll look at adding this to pl/pythonu. I would argue that it's already designed wrong if there's need for PL-specific implementation effort. I'm not sure how else to do this. The current implementation returns char *, which doesn't translate uniformly into the PLs. More sophisticated implementations--a tree or forest structure including ANDs and ORs, etc.--are even less uniform to translate into PLs, at least as far as I can tell. The way I'm looking at it, this could be added onto SPI at varying degrees of sophistication, but there would still be PL-specific bindings for it, each of which would involve a PL-specific implementation effort. What big things have I missed here? Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores
2008/10/30 Robert Haas [EMAIL PROTECTED]: All of this is pie-in-the-sky for PL functions, and I think properly so: the whole reason for supporting PLs is to enable doing things that SQL does poorly or not at all. So expecting SQL to interoperate very closely with them seems impossible, or at least unreasonably limiting. The real issue at hand is what to do with SQL-language functions. I'm currently going to have a look at just what it would take to support both lazy and eager evaluation in functions.c (independently of what syntax, if any, we settle on to expose the choice to the user). If it's either really awful or really easy we should know that before arguing further. It occurs to me that for PL/perl and similar one could design an interface that is similar to the one that is used for C functions - that is, function is invoked multiple times, returns one value per call, and is given a place to stash its state across calls. For example, for PL/perl, you could pass a mutable empty hash reference on the first call and then pass the same hash reference back on each subsequent call. That wouldn't require being able to freeze/thaw the whole state, just being able to maintain the contents of that hash reference across calls. It would probably be a lot more difficult to make something like this work usefully for PL/pgsql, which as a language is rather underpowered (nonetheless I use it heavily; it's awesome for the things it is good at), but I suspect it could be applied to Python, PHP, etc. pretty easily. So that's at least three ways you can evaluate the function: generate the whole thing in one fell swoop, single function call but with lazy execution, or value-per-call mode. I'm guessing someone could dream up other possibilities as well. Now, who's volunteering to implement? :-) With session variables we could implement srf function in plpgsql like current C srf function. Like create or replace function foo() returns record as $$ #option with_srf_context(datatype of srf context) begin return row(...); end; $$ language plpgsql; I thing it is implementable, but It's not very efective. There are lot of initialisation code. But this technique is used for agregation functions without problems. I belive, so it's should not be fast, but it could be usefull for very large datasets, where current srf functions should fail. regards Pavel Stehule ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers