Re: [HACKERS] could not open relation 1663/16384/16584: No such file or directory in a specific combination of transactions with temp tables
John Smith wrote: Architecture: Intel Core 2 Duo OS: linux-2.6.20-gentoo-r8 Filesystem: ext3 Postgres v8.2.3 compiled with gcc 4.1.1-r3 RAM - 2GB Shared buffers - 24MB [All other Postgres configuration parameters are default values] Problem description: COPY into temp table fails using a specific combination of create/insert on temp tables, prepare/commit in subsequent transactions. The could not open relation error occurs reliably. Steps to reproduce: Existing schema (scripts to create and populate these tables are uploaded to http://upload2.net/page/download/rfsLfnuVlYjEcCJ/basetables.tgz.html ): I can't get that link to work. Can you please email me the files offlist? Or upload somewhere else if they're too big for email. Observations: 1. The size of the data seems to matters. If the amount of data being inserted is dropped to just one or two records per table, the error doesn't happen. 2. The order of columns for the select into temp2 matters. Changing the order can cause the error to go away. 3. If the prepare/commit is replaced with a commit; the error goes away. 4. Removing temp3 or temp4 from the transaction causes one run of the above statements to succeed, but if the sequence is issued in the same PSQL session, the second one will fail. 5. Given the current dataset, the error always occurs on line 926 of the COPY (even if the values at line 926 are changed). 6. tablespace/database/oid typically always corresponds to that of temp2 on my system. I think I see what's happening here. We have restricted two-phase commit so that you're not supposed to be able to PREPARE TRANSACTION if the transaction has touched any temporary tables. That's because the 2nd phase commit can be performed from another backend, and another backend can't mess with another backend's temporary tables. However in this case, where you CREATE and DROP the temporary table in the same transaction, we don't detect that, and let the PREPARE TRANSACTION to finish. The detection relies on the lock manager, but we're not holding any locks on the dropped relation. I think we could in fact allow CREATE+DROP in same transaction, and remove the table immediately at PREPARE TRANSACTION, but what happens right now is that we store the relfilenode of the temp table to the two-phase state file in pg_twophase, for deletion at COMMIT/ROLLBACK PREPARED. But we don't store the fact that it's a temporary table, and therefore we try to unlink it like a normal table, and fail to purge the temp buffers of that table which causes problems later. Attached is a simple patch to fix that by disallowing CREATE+DROP+PREPARE TRANSACTION more reliably. It'd still be nice to debug the full test case of yours to verify that that's what's happening, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/access/transam/twophase.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/twophase.c,v retrieving revision 1.39 diff -c -r1.39 twophase.c *** src/backend/access/transam/twophase.c 1 Jan 2008 19:45:48 - 1.39 --- src/backend/access/transam/twophase.c 29 Feb 2008 09:58:19 - *** *** 793,798 --- 793,799 TransactionId *children; RelFileNode *commitrels; RelFileNode *abortrels; + bool haveTempCommit, haveTempAbort; /* Initialize linked list */ records.head = palloc0(sizeof(XLogRecData)); *** *** 815,824 hdr.prepared_at = gxact-prepared_at; hdr.owner = gxact-owner; hdr.nsubxacts = xactGetCommittedChildren(children); ! hdr.ncommitrels = smgrGetPendingDeletes(true, commitrels, NULL); ! hdr.nabortrels = smgrGetPendingDeletes(false, abortrels, NULL); StrNCpy(hdr.gid, gxact-gid, GIDSIZE); save_state_data(hdr, sizeof(TwoPhaseFileHeader)); /* Add the additional info about subxacts and deletable files */ --- 816,830 hdr.prepared_at = gxact-prepared_at; hdr.owner = gxact-owner; hdr.nsubxacts = xactGetCommittedChildren(children); ! hdr.ncommitrels = smgrGetPendingDeletes(true, commitrels, NULL, haveTempCommit); ! hdr.nabortrels = smgrGetPendingDeletes(false, abortrels, NULL, haveTempAbort); StrNCpy(hdr.gid, gxact-gid, GIDSIZE); + if (haveTempCommit || haveTempAbort) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg(cannot PREPARE a transaction that has operated on temporary tables))); + save_state_data(hdr, sizeof(TwoPhaseFileHeader)); /* Add the additional info about subxacts and deletable files */ Index: src/backend/access/transam/xact.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xact.c,v retrieving revision 1.257 diff -c -r1.257 xact.c *** src/backend/access/transam/xact.c 15 Jan 2008 18:56:59 - 1.257 --- src/backend/access/transam/xact.c 29 Feb 2008
[HACKERS] Read-ahead and parallelism in redo recovery
I remember Heikki mentioned improving redo recovery in one of the emails in the past, so I know people are already thinking about this. I have some ideas and just wanted to get comments here. ISTM that its important to keep the redo recovery time as small as possible in order to reduce the downtime in case of unplanned maintenence. One way to do this is to take checkpoints very aggressively to keep the amount of redo work small. But the current checkpoint logic writes all the dirty buffers to disk and hence generates lots of IO. That limits our ability to take very frequent checkpoints. The current redo-recovery is a single threaded, synchronous process. The XLOG is read sequentially, each log record is examined and replayed if required. This requires reading disk blocks in the shared buffers and applying changes to the buffer. The reading happens synchronously and that would usually make the redo process very slow. What I am thinking is if we can read ahead these blocks in the shared buffers and then apply redo changes to them, it can potentially improve things a lot. If there are multiple read requests, kernel (or controller ?) can probably schedule the reads more efficiently. One way to do this is to read ahead the XLOG and make asynchronous read requests for these blocks. But I am not sure if we support asynchronous reads yet. Another (and may be easier) way is to fork another process which can just read-ahead the XLOG and get the blocks in memory while other process does the normal redo recovery. One obvious downside of reading ahead would be that we may need to jump backward and forward in the XLOG file which is otherwise sequentially read. But that can be handled by using XLOG buffers for redo. Btw, isn't our redo recovery completely physical in nature ? I mean, can we replay redo logs related to a block independent of other blocks ? The reason I am asking because if thats the case, ISTM we can introduce parallelism in recovery by splitting and reordering the xlog records and then run multiple processes to do the redo recovery. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CREATE TABLE, load and freezing
On Thu, 2008-02-28 at 14:38 +0530, Pavan Deolasee wrote: I had this idea sometime back. Not sure if this has been discussed before Check the archives for my post to hackers in Jan 2007 and subsequent discussion. It's possible, just a little fiddly. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] could not open relation 1663/16384/16584: No such file or directory in a specific combination of transactions with temp tables
Heikki Linnakangas wrote: Attached is a simple patch to fix that by disallowing CREATE+DROP+PREPARE TRANSACTION more reliably. That patch was missing changes to header files. New patch attached. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/access/transam/twophase.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/twophase.c,v retrieving revision 1.39 diff -c -r1.39 twophase.c *** src/backend/access/transam/twophase.c 1 Jan 2008 19:45:48 - 1.39 --- src/backend/access/transam/twophase.c 29 Feb 2008 13:05:24 - *** *** 793,798 --- 793,799 TransactionId *children; RelFileNode *commitrels; RelFileNode *abortrels; + bool haveTempCommit, haveTempAbort; /* Initialize linked list */ records.head = palloc0(sizeof(XLogRecData)); *** *** 815,824 hdr.prepared_at = gxact-prepared_at; hdr.owner = gxact-owner; hdr.nsubxacts = xactGetCommittedChildren(children); ! hdr.ncommitrels = smgrGetPendingDeletes(true, commitrels, NULL); ! hdr.nabortrels = smgrGetPendingDeletes(false, abortrels, NULL); StrNCpy(hdr.gid, gxact-gid, GIDSIZE); save_state_data(hdr, sizeof(TwoPhaseFileHeader)); /* Add the additional info about subxacts and deletable files */ --- 816,830 hdr.prepared_at = gxact-prepared_at; hdr.owner = gxact-owner; hdr.nsubxacts = xactGetCommittedChildren(children); ! hdr.ncommitrels = smgrGetPendingDeletes(true, commitrels, NULL, haveTempCommit); ! hdr.nabortrels = smgrGetPendingDeletes(false, abortrels, NULL, haveTempAbort); StrNCpy(hdr.gid, gxact-gid, GIDSIZE); + if (haveTempCommit || haveTempAbort) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg(cannot PREPARE a transaction that has operated on temporary tables))); + save_state_data(hdr, sizeof(TwoPhaseFileHeader)); /* Add the additional info about subxacts and deletable files */ Index: src/backend/access/transam/xact.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/transam/xact.c,v retrieving revision 1.257 diff -c -r1.257 xact.c *** src/backend/access/transam/xact.c 15 Jan 2008 18:56:59 - 1.257 --- src/backend/access/transam/xact.c 29 Feb 2008 13:05:24 - *** *** 802,808 TransactionId *children; /* Get data needed for commit record */ ! nrels = smgrGetPendingDeletes(true, rels, haveNonTemp); nchildren = xactGetCommittedChildren(children); /* --- 802,808 TransactionId *children; /* Get data needed for commit record */ ! nrels = smgrGetPendingDeletes(true, rels, haveNonTemp, NULL); nchildren = xactGetCommittedChildren(children); /* *** *** 1174,1180 xid); /* Fetch the data we need for the abort record */ ! nrels = smgrGetPendingDeletes(false, rels, NULL); nchildren = xactGetCommittedChildren(children); /* XXX do we really need a critical section here? */ --- 1174,1180 xid); /* Fetch the data we need for the abort record */ ! nrels = smgrGetPendingDeletes(false, rels, NULL, NULL); nchildren = xactGetCommittedChildren(children); /* XXX do we really need a critical section here? */ Index: src/backend/storage/smgr/smgr.c === RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/smgr/smgr.c,v retrieving revision 1.109 diff -c -r1.109 smgr.c *** src/backend/storage/smgr/smgr.c 1 Jan 2008 19:45:52 - 1.109 --- src/backend/storage/smgr/smgr.c 29 Feb 2008 13:05:24 - *** *** 678,689 * * If haveNonTemp isn't NULL, the bool it points to gets set to true if * there is any non-temp table pending to be deleted; false if not. * * Note that the list does not include anything scheduled for termination * by upper-level transactions. */ int ! smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr, bool *haveNonTemp) { int nestLevel = GetCurrentTransactionNestLevel(); int nrels; --- 678,692 * * If haveNonTemp isn't NULL, the bool it points to gets set to true if * there is any non-temp table pending to be deleted; false if not. + * haveTemp is similar, but gets set if there is any temp table deletions + * pending. * * Note that the list does not include anything scheduled for termination * by upper-level transactions. */ int ! smgrGetPendingDeletes(bool forCommit, RelFileNode **ptr, ! bool *haveNonTemp, bool *haveTemp) { int nestLevel = GetCurrentTransactionNestLevel(); int nrels; *** *** 693,698 --- 696,703 nrels = 0; if (haveNonTemp) *haveNonTemp = false; + if (haveTemp) + *haveTemp = false; for (pending = pendingDeletes; pending != NULL; pending =
Re: [HACKERS] could not open relation 1663/16384/16584: No such file or directory in a specific combination of transactions with temp tables
Plausible theory, and nice explanation Try the following link (I had to wait for 50 sec for the link to appear, but I guess the trade-off of getting knowledge in return is worth it :) ) http://www5.upload2.net/download/77fa86e16a02e52fd5439c76e148d231/47c7fdce/rfsLfnuVlYjEcCJ/basetables.tgz Not sending attachment in this mail; that may cause the mail to be not accepted by the list. I will try to send the attachment in the next mail, to retain it in the mailing list for historica purposes. Thanks and best regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com EnterpriseDB http://www.enterprisedb.com 17° 29' 34.37N, 78° 30' 59.76E - Hyderabad 18° 32' 57.25N, 73° 56' 25.42E - Pune * 37° 47' 19.72N, 122° 24' 1.69 W - San Francisco http://gurjeet.frihost.net Mail sent from my BlackLaptop device On Fri, Feb 29, 2008 at 3:32 PM, Heikki Linnakangas [EMAIL PROTECTED] wrote: John Smith wrote: Architecture: Intel Core 2 Duo OS: linux-2.6.20-gentoo-r8 Filesystem: ext3 Postgres v8.2.3 compiled with gcc 4.1.1-r3 RAM - 2GB Shared buffers - 24MB [All other Postgres configuration parameters are default values] Problem description: COPY into temp table fails using a specific combination of create/insert on temp tables, prepare/commit in subsequent transactions. The could not open relation error occurs reliably. Steps to reproduce: Existing schema (scripts to create and populate these tables are uploaded to http://upload2.net/page/download/rfsLfnuVlYjEcCJ/basetables.tgz.html ): I can't get that link to work. Can you please email me the files offlist? Or upload somewhere else if they're too big for email. Observations: 1. The size of the data seems to matters. If the amount of data being inserted is dropped to just one or two records per table, the error doesn't happen. 2. The order of columns for the select into temp2 matters. Changing the order can cause the error to go away. 3. If the prepare/commit is replaced with a commit; the error goes away. 4. Removing temp3 or temp4 from the transaction causes one run of the above statements to succeed, but if the sequence is issued in the same PSQL session, the second one will fail. 5. Given the current dataset, the error always occurs on line 926 of the COPY (even if the values at line 926 are changed). 6. tablespace/database/oid typically always corresponds to that of temp2 on my system. I think I see what's happening here. We have restricted two-phase commit so that you're not supposed to be able to PREPARE TRANSACTION if the transaction has touched any temporary tables. That's because the 2nd phase commit can be performed from another backend, and another backend can't mess with another backend's temporary tables. However in this case, where you CREATE and DROP the temporary table in the same transaction, we don't detect that, and let the PREPARE TRANSACTION to finish. The detection relies on the lock manager, but we're not holding any locks on the dropped relation. I think we could in fact allow CREATE+DROP in same transaction, and remove the table immediately at PREPARE TRANSACTION, but what happens right now is that we store the relfilenode of the temp table to the two-phase state file in pg_twophase, for deletion at COMMIT/ROLLBACK PREPARED. But we don't store the fact that it's a temporary table, and therefore we try to unlink it like a normal table, and fail to purge the temp buffers of that table which causes problems later. Attached is a simple patch to fix that by disallowing CREATE+DROP+PREPARE TRANSACTION more reliably. It'd still be nice to debug the full test case of yours to verify that that's what's happening, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Why we panic in pglz_decompress
I'm now looking into toast code and I found following code in pglz_decompress: 00704 if (destsize != source-rawsize) 00705 elog(destsize source-rawsize ? FATAL : ERROR, 00706 compressed data is corrupt); I'm surprise why we there panic? By my opinion is not too good idea to crash server in case when we know how much memory we really have for dest and we can check range. Other silly thing is that message compressed data is corrupt does not contain any information about file relation etc. My idea is to improve this piece of code and move error logging to callers (heap_tuple_untoast_attr() and heap_tuple_untoast_attr_slice()) where we have a little bit more details (especially for external storage). Any comments? thanks Zdenek ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Read-ahead and parallelism in redo recovery
* Pavan Deolasee: The current redo-recovery is a single threaded, synchronous process. The XLOG is read sequentially, each log record is examined and replayed if required. This requires reading disk blocks in the shared buffers and applying changes to the buffer. The reading happens synchronously and that would usually make the redo process very slow. Are you sure that it's actually slow for that reason? Sequential I/O on the log is typically quite fast, and if the pages dirtied since the last checkpoint fit into the cache (shared buffers or OS cache), even that part of recovery does not result in lots of random I/O (with 8.3 and full page writes active; this is a relatively recent change). In the end, I wouldn't be surprised if for most loads, cache warming effects dominated recovery times, at least when the machine is not starved on RAM. -- Florian Weimer[EMAIL PROTECTED] BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Read-ahead and parallelism in redo recovery
Florian G. Pflug [EMAIL PROTECTED] writes: I know that Simon has some ideas about parallel restored, though I don't know how he wants to solve the dependency issues involved. Perhaps by not parallelizing withon one table or index... I think we should be *extremely* cautious about introducing any sort of parallelism or other hard-to-test behavior into xlog recovery. Bugs in that area will by definition bite people at the worst possible time. And we already know that we don't have very good testing ability for xlog recovery, because some pretty nasty bugs have gone undetected for long periods. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Why we panic in pglz_decompress
Alvaro Herrera [EMAIL PROTECTED] writes: Zdenek Kotala wrote: I'm now looking into toast code and I found following code in pglz_decompress: 00704 if (destsize != source-rawsize) 00705 elog(destsize source-rawsize ? FATAL : ERROR, 00706 compressed data is corrupt); I'm surprise why we there panic? Agreed, FATAL is too strong. Did either of you read the comment just before this code? The reason it's panicing is that it's possibly already tromped on some critical data structure inside the backend. My idea is to improve this piece of code and move error logging to callers (heap_tuple_untoast_attr() and heap_tuple_untoast_attr_slice()) where we have a little bit more details (especially for external storage). Why move it? Just adding errcontext in the callers should be enough. AFAIR this error has never once been reported from the field, so I don't see the point of investing a lot of effort in it. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Read-ahead and parallelism in redo recovery
Pavan Deolasee wrote: What I am thinking is if we can read ahead these blocks in the shared buffers and then apply redo changes to them, it can potentially improve things a lot. If there are multiple read requests, kernel (or controller ?) can probably schedule the reads more efficiently. The same holds true for index scans, though. Maybe we can find a solution that benefits both cases - something along the line of a bgreader process Btw, isn't our redo recovery completely physical in nature ? I mean, can we replay redo logs related to a block independent of other blocks ? The reason I am asking because if thats the case, ISTM we can introduce parallelism in recovery by splitting and reordering the xlog records and then run multiple processes to do the redo recovery. I'd say its physical on the tuple level (We just log the new tuple on an update, not how to calculate it from the old one), but logical on the page level (We log the fact that a tuple was inserted on a page, but e.g. the physical location of the tuple on the page can come out differently upon replay). It's even more logical for indices, because we log page splits as multiple wal records, letting the recovery process deal with synthesizing upper-level updates should we crash in the middle of a page split. Additionally, we log full-page images as a safeguard against torn page writes. Those would need to be considered as a kind of reorder barrier in any parallel restore scenario, I guess. I know that Simon has some ideas about parallel restored, though I don't know how he wants to solve the dependency issues involved. Perhaps by not parallelizing withon one table or index... ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Why we panic in pglz_decompress
Alvaro Herrera napsal(a): Zdenek Kotala wrote: I'm now looking into toast code and I found following code in pglz_decompress: 00704 if (destsize != source-rawsize) 00705 elog(destsize source-rawsize ? FATAL : ERROR, 00706 compressed data is corrupt); I'm surprise why we there panic? Agreed, FATAL is too strong. My idea is to improve this piece of code and move error logging to callers (heap_tuple_untoast_attr() and heap_tuple_untoast_attr_slice()) where we have a little bit more details (especially for external storage). Why move it? Just adding errcontext in the callers should be enough. Good idea. thanks Zdenek ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Why we panic in pglz_decompress
Zdenek Kotala wrote: I'm now looking into toast code and I found following code in pglz_decompress: 00704 if (destsize != source-rawsize) 00705 elog(destsize source-rawsize ? FATAL : ERROR, 00706 compressed data is corrupt); I'm surprise why we there panic? Agreed, FATAL is too strong. My idea is to improve this piece of code and move error logging to callers (heap_tuple_untoast_attr() and heap_tuple_untoast_attr_slice()) where we have a little bit more details (especially for external storage). Why move it? Just adding errcontext in the callers should be enough. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [BUGS] bug or not bug, xmlvalidate(xml, text) can read and show one line from file
Sergey Burladyan [EMAIL PROTECTED] writes: [ xmlvalidate is a security hole ] Given that this function is not documented nor tested in the regression tests, I propose diking it out entirely. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Why we panic in pglz_decompress
Tom Lane napsal(a): Alvaro Herrera [EMAIL PROTECTED] writes: Zdenek Kotala wrote: I'm now looking into toast code and I found following code in pglz_decompress: 00704 if (destsize != source-rawsize) 00705 elog(destsize source-rawsize ? FATAL : ERROR, 00706 compressed data is corrupt); I'm surprise why we there panic? Agreed, FATAL is too strong. Did either of you read the comment just before this code? The reason it's panicing is that it's possibly already tromped on some critical data structure inside the backend. Yes I did, but if you know how big memory you have for uncompress data you can check a boundaries. It is better then overwrite a data in memory. Yes, it little bit slow down a routine but you will able work with a table. My idea is to improve this piece of code and move error logging to callers (heap_tuple_untoast_attr() and heap_tuple_untoast_attr_slice()) where we have a little bit more details (especially for external storage). Why move it? Just adding errcontext in the callers should be enough. AFAIR this error has never once been reported from the field, so I don't see the point of investing a lot of effort in it. Please, increment a counter :-). I'm now analyzing one core file and it fails finally in elog function (called from pglz_decompress), because memory was overwritten - no error message in a log file. :( Zdenek ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] request feature: alter view add column (or something like)
Sometimes, I have to add column to table, then adding the column to some view. Currently it impossible to alter a view to add a new column, which is really annoying. The only solution is to drop the view, and all dependented view, then recreating everything. Someone on #postgresqlfr said me there were discussion about that. I think this would be a big improvement for pg 8.4. My 2 cents. Regards. signature.asc Description: This is a digitally signed message part.
[HACKERS] creating new aggregate function
Need help and direction creating new aggregate functions. We need to add more average functions for both scientific and finical purposes RMS for electrical measurement purposes Mode for both electrical and finical Weighted Average finical purposes Generalized mean for electrical measurement purposes Geometric mean for electrical measurement purposes Harmonic mean for electrical measurement purposes what would be the best way to create these new functions?? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] creating new aggregate function
On Fri, Feb 29, 2008 at 12:11:59PM -0500, Justin wrote: Need help and direction creating new aggregate functions. We need to add more average functions for both scientific and finical purposes [ ... ] what would be the best way to create these new functions?? I'd be tempted to look at pl/r[1], R[2] is a language for statistical analysis and pl/r integrates it into Postgres. Sam [1] http://joeconway.com/plr/ [2] http://www.r-project.org/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Read-ahead and parallelism in redo recovery
On Fri, 2008-02-29 at 15:49 +0100, Florian G. Pflug wrote: I know that Simon has some ideas about parallel restored, though I don't know how he wants to solve the dependency issues involved. Perhaps by not parallelizing withon one table or index... Well, I think that problem is secondary to making progress with your work on hot standby. I don't want to tune the existing setup and then make it harder to introduce new features. I'm aiming to review your patches in this commit fest, with a view to getting the work fully committed by 4-6 months from now, assuming your happy to make any changes we identify. That still leaves us time to tune things before next release. The hope is to increase the level of functionality here. We may not be able to move forwards in just one more stride. Warm Standby has taken last 4 releases to mature to where we are now and the work ahead is at least as difficult as what has gone before. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Buildfarm member gypsy_moth seems not to like alignment patch
Jorgen Austvik - Sun Norway [EMAIL PROTECTED] writes: Tom Lane wrote: This is unfortunate and surprising, since that patch was intended to prevent compilers from making unsafe alignment assumptions, but it sure looks like this compiler has instead added a new one. Could you poke into it --- at least get a stack trace from the core dump? Running initdb with debug: Hah, I guess the problem is that the compiler decided it was okay to put the local variable chunk_data at an unaligned address. Patched, but we'll have to see whether any other places have similar issues. Thanks for doing the gdb-work. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Re: [BUGS] bug or not bug, xmlvalidate(xml, text) can read and show one line from file
Peter Eisentraut escribió: Am Freitag, 29. Februar 2008 schrieb Tom Lane: Sergey Burladyan [EMAIL PROTECTED] writes: [ xmlvalidate is a security hole ] Given that this function is not documented nor tested in the regression tests, I propose diking it out entirely. Yes, it was accidentally left over from previous work. We should have removed it before the release, but that would have required an initdb. So let's change it for a function that elog(ERROR)s on entry. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Read-ahead and parallelism in redo recovery
On Feb 29, 2008, at 8:10 AM, Florian Weimer wrote: In the end, I wouldn't be surprised if for most loads, cache warming effects dominated recovery times, at least when the machine is not starved on RAM. Uh... that's exactly what all the synchronous reads are doing... warming the cache. And synchronous reads are only fast if the system understands what's going on and reads a good chunk of data in at once. I don't know that that happens. Perhaps a good short-term measure would be to have recovery allocate a 16M buffer and read in entire xlog files at once. -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] [BUGS] bug or not bug, xmlvalidate(xml, text) can read and show one line from file
Alvaro Herrera [EMAIL PROTECTED] writes: Peter Eisentraut escribió: Yes, it was accidentally left over from previous work. We should have removed it before the release, but that would have required an initdb. So let's change it for a function that elog(ERROR)s on entry. Yeah, I was just going to make it throw a not implemented error. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] creating new aggregate function
This probably belongs on General, but On Fri, Feb 29, 2008 at 9:11 AM, Justin [EMAIL PROTECTED] wrote: Need help and direction creating new aggregate functions. We need to add more average functions for both scientific and finical purposes RMS for electrical measurement purposes Mode for both electrical and finical Weighted Average finical purposes Generalized mean for electrical measurement purposes Geometric mean for electrical measurement purposes Harmonic mean for electrical measurement purposes what would be the best way to create these new functions?? Have you already read the documentation on creating aggregates? Have you tried something and it didn't work, or are you interested in design ideas? I would just knock together something in plpgsql. If you have trouble, send the specific questions to the list. best ? C is fastest, etc -- what kind of tradeoffs do you need to satisfy? One thing worth thinking about is using arrays to carry state from one function call to the next in an aggregate; this is how the function used in the average aggregate keeps track of both the running total and the number of rows. The Stdev uses a three item array similarly. ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [BUGS] bug or not bug, xmlvalidate(xml, text) can read and show one line from file
Am Freitag, 29. Februar 2008 schrieb Tom Lane: Sergey Burladyan [EMAIL PROTECTED] writes: [ xmlvalidate is a security hole ] Given that this function is not documented nor tested in the regression tests, I propose diking it out entirely. Yes, it was accidentally left over from previous work. We should have removed it before the release, but that would have required an initdb. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Read-ahead and parallelism in redo recovery
Decibel! wrote: On Feb 29, 2008, at 8:10 AM, Florian Weimer wrote: In the end, I wouldn't be surprised if for most loads, cache warming effects dominated recovery times, at least when the machine is not starved on RAM. Uh... that's exactly what all the synchronous reads are doing... warming the cache. And synchronous reads are only fast if the system understands what's going on and reads a good chunk of data in at once. I don't know that that happens. Perhaps a good short-term measure would be to have recovery allocate a 16M buffer and read in entire xlog files at once. The problem isn't reading the WAL. The OS prefetches that just fine. The problem is the random reads, when we read in the blocks mentioned in the WAL records, to replay the changes to them. The OS has no way of guessing and prefetching those blocks, and we read them synchronously, one block at a time, no matter how big your RAID array is. I used to think it's a big problem, but I believe the full-page-write optimization in 8.3 made it much less so. Especially with the smoothed checkpoints: as checkpoints have less impact on response times, you can shorten checkpoint interval, which helps to keep the recovery time reasonable. It'd still be nice to do the prefetching; I'm sure there's still workloads where it would be a big benefit. But as Tom pointed out, we shouldn't invent something new just for recovery. I think we should look at doing prefetching for index accesses etc. first, and once we have the infrastructure in place and tested, we can consider use it for recovery as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Read-ahead and parallelism in redo recovery
Decibel! [EMAIL PROTECTED] writes: Perhaps a good short-term measure would be to have recovery allocate a 16M buffer and read in entire xlog files at once. If that isn't entirely useless, you need a better kernel. The system should *certainly* be bright enough to do read-ahead for our reads of the source xlog file. The fetches that are likely to be problematic are the ones for pages in the data area, which will be a lot less regular for typical workloads. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Read-ahead and parallelism in redo recovery
* Tom Lane [EMAIL PROTECTED] [080229 15:49]: If that isn't entirely useless, you need a better kernel. The system should *certainly* be bright enough to do read-ahead for our reads of the source xlog file. The fetches that are likely to be problematic are the ones for pages in the data area, which will be a lot less regular for typical workloads. How difficult is it to parse the WAL logs with enough knowledge to know what heap page (file/offset) a wal record contains (I haven't looked into any wal code)? There are compression/decompression archive_command/restore_command programs with rudimentary knowledge of the WAL record formats. Would a restore_command be able to parse the wal records as it copies them over noting which file pages need to be read, and the just before it exits, fork() and read each page in order. This child doesn't need to do anything with the blocks it reads - it just needs to read them to pre-warm the kernel buffer cache... If the restoration is doing any writing, this dumb reader would hopefully be able to keep a block ahead... And since it's separated enough from the backend, any experiments in async_io/fadvise could easily be done. -- Aidan Van Dyk Create like a god, [EMAIL PROTECTED] command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] could not open relation 1663/16384/16584: No such file or directory in a specific combination of transactions with temp tables
Gurjeet Singh wrote: Plausible theory, and nice explanation Try the following link (I had to wait for 50 sec for the link to appear, but I guess the trade-off of getting knowledge in return is worth it :) ) http://www5.upload2.net/download/77fa86e16a02e52fd5439c76e148d231/47c7fdce/rfsLfnuVlYjEcCJ/basetables.tgz Not sending attachment in this mail; that may cause the mail to be not accepted by the list. I will try to send the attachment in the next mail, to retain it in the mailing list for historica purposes. Thanks! As I suspected, what's happening is that buffers for the dropped temp table are not cleaned up from the temporary buffer cache as they should be. The next time the buffers are needed, on the next COPY, we try to write out the buffers make room for new pages, but that fails because the file the buffers are related to doesn't exist anymore. The patch I sent earlier fixes that, by tightening the check that you can't operate on temporary tables on 2pc transactions. If you had a real-world use case for that, sorry :-(. Perhaps we could enhance that for 8.4 if there's demand, so that you could CREATE+DROP or use ON COMMIT TRUNCATE temp tables in a transaction, though I haven't personally planned to work on it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] pg_dump additional options for performance
On Feb 26, 2008, at 4:36 PM, Tom Lane wrote: I think a sane way to think about what Simon would like to accomplish is not turn psql into a parallel job scheduler My $0.02: I often find myself wishing I could perform parallel operations in psql. There was a proposal for that that came up during 8.3 development; whatever happened to it? -- Decibel!, aka Jim C. Nasby, Database Architect [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Read-ahead and parallelism in redo recovery
Florian G. Pflug wrote: The same holds true for index scans, though. Maybe we can find a solution that benefits both cases - something along the line of a bgreader process I posted a patch to do readahead for bitmap index scans using posix_fadvise. Experiments showed it works great on raid arrays on Linux. Solaris will need to use libaio though which I haven't tried yet. Incidentally, I'm off on a ski vacation this week so I won't be around much on email for the first half of the commit-fest. If anyone's putting together a list of patches queued up for review I would like this patch considered. The main feedback I'm blocking on is whether others are interested in restructuring the buffer manager to allow buffers to be allocated and pinned with only the posix_fadvise i/o initiated. That would avoid the redundant trip into the buffer manager for the usual case at the expense of a few buffers being taken out of the cache. The patch I posted is the minimally invasive approach of not altering the buffer management at all and just passing through a readahead request. I looked at doing it for normal index scans and couldn't think of any convenient way. I can see getting a single buffer of read-ahead from the index block's next pointer but that's about it. Luckily it seems to me that bitmap index scans are much more likely to be chosen in the cases where there's a big gain anyways. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Read-ahead and parallelism in redo recovery
Greg Stark wrote: Florian G. Pflug wrote: The same holds true for index scans, though. Maybe we can find a solution that benefits both cases - something along the line of a bgreader process I posted a patch to do readahead for bitmap index scans using posix_fadvise. Experiments showed it works great on raid arrays on Linux. Solaris will need to use libaio though which I haven't tried yet. Cool! I'd like to try it out - is that patch available in the pg-patches archives? Doing it for normal index scans is much much harder. You can readahead a single page by using the next pointer if it looks like you'll need it. But I don't see a convenient way to get more than that. I was thinking that after reading a page from the index, the backend could post a list of heap pages referenced from that index page to the shmem. A background process would repeatedly scan that list, and load those pages into the buffer cache. regards, Florian Pflug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster