[HACKERS] PiTR and other architectures....
Having just tried to restore a 64 bit BSD database to a 32 bit linux machine (using PiTR), I have now realized the (with hindsight, obvious) error in my ways. Now, I need to plan a way forward. From reading of other peoples similar problems, I now realize that I need a system with identical on-disk formats. Question is: Is there a simple way to determine compatibility? (eg. a small well-defined list of requirements) In the specific instance I am working with, I'd like to copy from 64 bit AMD BSD system to a 64 bit Linux system. Philip Warner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
On Tue, 2008-12-02 at 21:37 +0900, Fujii Masao wrote: Thanks for taking many hours to review the code!! On Mon, Dec 1, 2008 at 8:42 PM, Simon Riggs [EMAIL PROTECTED] wrote: Can you confirm that all the docs on the Wiki page are up to date? There are a few minor discrepancies that make me think it isn't. Documentation is ongoing. Sorry for my slow progress. BTW, I'm going to add and change the sgml files listed on wiki. http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Documentation_Plan I'm patient, I know it takes time. Happy to spend hours on the review, but I want to do that knowing I agree with the higher level features and architecture first. This was just a first review, I expect to spend more time on it yet. The reaction to replication_timeout may need to be configurable. I might not want to keep on processing if the information didn't reach the standby. OK. I will add new GUC variable (PGC_SIGHUP) to specify the reaction for the timeout. I would prefer in many cases that the transactions that were waiting for walsender would abort, but the walsender kept processing. Is it dangerous to abort the transaction with replication continued when the timeout occurs? I think that the WAL consistency between two servers might be broken. Because the WAL writing and sending are done concurrently, and the backend might already write the WAL to disk on the primary when waiting for walsender. The issue I see is that we might want to keep wal_sender_delay small so that transaction times are not increased. But we also want wal_sender_delay high so that replication never breaks. It seems better to have the action on wal_sender_delay configurable if we have an unsteady network (like the internet). Marcus made some comments on line dropping that seem relevant here; we should listen to his experience. Hmmm, dangerous? Well assuming we're linking commits with replication sends then it sounds it. We might end up committing to disk and then deciding to abort instead. But remember we don't remove the xid from procarray or mark the result in clog until the flush is over, so it is possible. But I think we should discuss this in more detail when the main patch is committed. Do we report stats on how long the replication has been taking? If the average rep time is close to rep timeout then we will be fragile, so we need some way to notice this and produce warnings. Or at least provide info to an external monitoring system. Sounds good. How about log_min_duration_replication? If the rep time is greater than it, we produce warning (or log) like log_min_duration_xx. Maybe, lets put in something that logs if 50% (?) of timeout. Make that configurable with a #define and see if we need that to be configurable with a GUC later. Do we need to worry about periodic renegotiation of keys in be-secure.c? What is keys you mean? See the notes in that file for explanation. I wondered whether it might be a perf problem for us? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_stat_all_tables vs NULLs
I've noticed that pg_stat_all_tables returns NULL for idx_scan and idx_tup_fetch if there are no indexes present on a table. Is this actually intended, or is that something that should be fixed? //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] maintenance memory vs autovac
Would it make sense to be able to configure maintenance_work_mem specifically for the autovacuum processes? Given that there can be a number of them, it might be good to be able to have one default for all *other* processes, and a separate one from the ones kicked off by autovac? //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] lo_open() makes a warning/falls to an assertion
KaiGai Kohei wrote: -- (3) tries to open it with writable mode under transaction block. postgres=# BEGIN; BEGIN postgres=# SELECT lo_open(24576, x'4'::int); WARNING: Snapshot reference leak: Snapshot 0x96b8488 still referenced Interesting. This is due to this patch http://git.postgresql.org/?p=postgresql.git;a=commit;h=52d70aa5d0cdcdf9ab925fb46024b4b91760ac43 I'm investigating now. Thanks for the report. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] lo_open() makes a warning/falls to an assertion
Alvaro Herrera wrote: KaiGai Kohei wrote: -- (3) tries to open it with writable mode under transaction block. postgres=# BEGIN; BEGIN postgres=# SELECT lo_open(24576, x'4'::int); WARNING: Snapshot reference leak: Snapshot 0x96b8488 still referenced I'm investigating now. Thanks for the report. I think the solution is to have each large object have its own ResourceOwner, and store the snapshot in it. Otherwise the snapshot is left in the calling query's resowner, which is not good. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] GIN improvements
Teodor Sigaev wrote: - Falling back to regular insert will take long time for update of whole table - and that was one of reasons of that patch. Users forget to drop GIN index before a global update and query runs forever. If *that* is a use case we're interested in, the incoming tuples could be accumulated in backend-private memory, and inserted into the index at commit. That would be a lot simpler, with no need to worry about concurrent inserts or vacuums. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] lo_open() makes a warning/falls to an assertion
Alvaro Herrera wrote: I think the solution is to have each large object have its own ResourceOwner, and store the snapshot in it. Otherwise the snapshot is left in the calling query's resowner, which is not good. Turns out to be overkill. This patch solves the problem, by using the transaction's resowner. I'm now going to check if we need something similar elsewhere. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support Index: src/backend/storage/large_object/inv_api.c === RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/storage/large_object/inv_api.c,v retrieving revision 1.135 diff -c -p -r1.135 inv_api.c *** src/backend/storage/large_object/inv_api.c 2 Nov 2008 01:45:28 - 1.135 --- src/backend/storage/large_object/inv_api.c 2 Dec 2008 14:09:34 - *** LargeObjectDesc * *** 232,237 --- 232,238 inv_open(Oid lobjId, int flags, MemoryContext mcxt) { LargeObjectDesc *retval; + ResourceOwner save_resowner; retval = (LargeObjectDesc *) MemoryContextAlloc(mcxt, sizeof(LargeObjectDesc)); *** inv_open(Oid lobjId, int flags, MemoryCo *** 240,245 --- 241,249 retval-subid = GetCurrentSubTransactionId(); retval-offset = 0; + save_resowner = CurrentResourceOwner; + CurrentResourceOwner = TopTransactionResourceOwner; + if (flags INV_WRITE) { retval-snapshot = SnapshotNow; *** inv_open(Oid lobjId, int flags, MemoryCo *** 259,264 --- 263,270 (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg(large object %u does not exist, lobjId))); + CurrentResourceOwner = save_resowner; + return retval; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PiTR and other architectures....
Philip Warner wrote: Having just tried to restore a 64 bit BSD database to a 32 bit linux machine (using PiTR), I have now realized the (with hindsight, obvious) error in my ways. Now, I need to plan a way forward. From reading of other peoples similar problems, I now realize that I need a system with identical on-disk formats. Question is: Is there a simple way to determine compatibility? (eg. a small well-defined list of requirements) initdb on one platform, copy the data directory over to the other system, and try to start postmaster. It will complain if the on-disk format is not compatible. You can also run pg_controlinfo on both systems, and compare the results. If the Maximum data alignment, and all the values below it in the pg_controlinfo output match, the formats are compatible. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] GIN index build speed
While playing around with the GIN fast insert patch, I was puzzled why building a GIN index seemed to take so long. The test case I use was simply: CREATE TABLE foo (bar tsvector); INSERT INTO foo SELECT to_tsvector('foo' || a) FROM generate_series(1, 20) a; CREATE INDEX foogin ON foo USING gin (bar); The CREATE INDEX step takes about 40 seconds on my laptop, which seems excessive. The issue is that the GIN index build code accumulates the lexemes into a binary tree, but there's nothing to keep the tree balanced. My test case with almost monotonically increasing keys, happens to be a worst-case scenario, and the tree degenerates into almost linked list that every insertion has to grovel through. The obvious fix is to use a balanced tree algorithm. I wrote a quick patch to turn the tree into a splay tree. That fixed the degenerative behavior, and the runtime of CREATE INDEX for the above test case fell from 40s to 1.5s. Magnus kindly gave me a dump of the full-text-search tables from search.postgresql.org, for some real-world testing. Quick testing with that suggests that the patch unfortunately makes the index build 5-10% slower with that data set. We're in commitfest, not supposed to be submitting new features, so I'm not going to pursue this further right now. Patch attached, however, which seems to work fine. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com *** src/backend/access/gin/ginbulk.c --- src/backend/access/gin/ginbulk.c *** *** 25,31 void ginInitBA(BuildAccumulator *accum) { ! accum-maxdepth = 1; accum-stackpos = 0; accum-entries = NULL; accum-stack = NULL; --- 25,31 void ginInitBA(BuildAccumulator *accum) { ! accum-stacksize = 0; accum-stackpos = 0; accum-entries = NULL; accum-stack = NULL; *** *** 98,159 getDatumCopy(BuildAccumulator *accum, OffsetNumber attnum, Datum value) } /* * Find/store one entry from indexed value. */ static void ginInsertEntry(BuildAccumulator *accum, ItemPointer heapptr, OffsetNumber attnum, Datum entry) { ! EntryAccumulator *ea = accum-entries, ! *pea = NULL; int res = 0; - uint32 depth = 1; ! while (ea) { ! res = compareAttEntries(accum-ginstate, attnum, entry, ea-attnum, ea-value); if (res == 0) - break;/* found */ - else { ! pea = ea; ! if (res 0) ! ea = ea-left; ! else ! ea = ea-right; } - depth++; } ! if (depth accum-maxdepth) ! accum-maxdepth = depth; ! if (ea == NULL) ! { ! ea = EAAllocate(accum); ea-left = ea-right = NULL; ! ea-attnum = attnum; ! ea-value = getDatumCopy(accum, attnum, entry); ! ea-length = DEF_NPTR; ! ea-number = 1; ! ea-shouldSort = FALSE; ! ea-list = (ItemPointerData *) palloc(sizeof(ItemPointerData) * DEF_NPTR); ! accum-allocatedMemory += GetMemoryChunkSpace(ea-list); ! ea-list[0] = *heapptr; ! ! if (pea == NULL) ! accum-entries = ea; else { ! Assert(res != 0); ! if (res 0) ! pea-left = ea; ! else ! pea-right = ea; } } ! else ! ginInsertData(accum, ea, heapptr); } /* --- 98,234 } /* + * Adapted to C from the Public Domain splay tree implementation at: + * http://www.link.cs.cmu.edu/link/ftp-site/splaying/SplayTree.java + * Original implementation by Danny Sleator [EMAIL PROTECTED] + * + * Internal method to perform a top-down splay. + * + * splay(key) does the splay operation on the given key. + * If key is in the tree, then the BinaryNode containing + * that key becomes the root. If key is not in the tree, + * then after the splay, key.root is either the greatest key + *key in the tree, or the lest key key in the tree. + * + * This means, among other things, that if you splay with + * a key that's larger than any in the tree, the rightmost + * node of the tree becomes the root. This property is used + * in the delete() method. + */ + static void + splay(BuildAccumulator *accum, OffsetNumber key_attnum, Datum key_value) + { + EntryAccumulator *l, *r, *t, *y; + EntryAccumulator header; + int res; + + l = r = header; + t = accum-entries; + header.left = header.right = NULL; + for (;;) + { + res = compareAttEntries(accum-ginstate, key_attnum, key_value, + t-attnum, t-value); + if (res 0) + { + if (t-left == NULL) break; + if (compareAttEntries(accum-ginstate, key_attnum, key_value, + t-left-attnum, t-left-value) 0) + { + y = t-left;/* rotate right */ + t-left = y-right; + y-right = t; + t = y; + if (t-left == NULL) break; + } + r-left = t; /* link right */ + r = t; + t = t-left; + } + else if (res 0) + { + if (t-right == NULL) break; + if (compareAttEntries(accum-ginstate, key_attnum, key_value, + t-right-attnum, t-right-value) 0) + { +
Re: [HACKERS] maintenance memory vs autovac
Seems it would make more sense to just divide maintenance_work_mem by the number of workers for autovacuum. This sounds familiar. Didn't we already decide to do this once? One concern I have about this is people asking how come when I runvacuum manually it takes x minutes but when autovacuum runs it it tale 5x minutes? greg On 2 Dec 2008, at 01:38 PM, Magnus Hagander [EMAIL PROTECTED] wrote: Would it make sense to be able to configure maintenance_work_mem specifically for the autovacuum processes? Given that there can be a number of them, it might be good to be able to have one default for all *other* processes, and a separate one from the ones kicked off by autovac? //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] GIN improvements
grovel through. The situation will only be rectified at the next vacuum, but if there's no deletes or updates on the table, just inserts, autovacuum won't happen until the next anti-wraparound vacuum. There is not agreement here, see http://archives.postgresql.org/message-id/[EMAIL PROTECTED] Yet another problem is that if so much work is offloaded to autovacuum, it can tie up autovacuum workers for a very long time. And the work can happen on an unfortunate time, when the system is busy, and affect other queries. There's no vacuum_delay_point()s in gininsertcleanup, so there's no way to throttle that work. Will insert. I think we need a hard limit on the number of list pages, before we can consider accepting this patch. After the limit is full, the next inserter can flush the list, inserting the tuples in the list into the tree, or just fall back to regular, slow, inserts. Hard limit is not very good decision - If it will make a flush when limit is reached then sometimes insert or update will take unacceptable amount of time. Small limit is not very helpful, large will takes a lot of time. Although if we calculate limit using work_mem setting then, may be, it will be useful. Bulk insert will collect all pending pages in memory at once. - Falling back to regular insert will take long time for update of whole table - and that was one of reasons of that patch. Users forget to drop GIN index before a global update and query runs forever. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] contrib/pg_stat_statements 1202
Here is an update version of contrib/pg_stat_statements. New modifications from the last version are: 1. New counters in struct Instrumentation. 2. EXPLAIN ANALYZE VERBOSE shows buffer statistics in 'actual' section. 3. Buffer statistics counsters are not reset to zero anymore. 1. New counters in struct Instrumentation. [in include/executor/instrument.h, backend/executor/instrument.c] The following fields are added. They are used by pg_stat_statements and EXPLAIN ANALYZE VERBOSE. getrusage() is called for each nodes. Write-counters are not included because I think they are not so useful. buffer_gets;/* # of buffer hits */ buffer_hits;/* # of buffer gets */ buffile_reads; /* # of buffile reads */ utime; /* user time in sec */ stime; /* sys time in sec */ 2. EXPLAIN ANALYZE VERBOSE shows buffer statistics in 'actual' section. [in backend/commands/explain.c] I borrowed the idea from Vladimir, Buffer pool statistics in Explain Analyze. http://archives.postgresql.org/message-id/[EMAIL PROTECTED] Here is an sample output. We'd better to add a linebreak before the 'actual' section because the line is too wide to display. =# EXPLAIN ANALYZE VERBOSE SELECT * FROM accounts; QUERY PLAN --- Seq Scan on accounts (cost=0.00..2688.29 rows=101829 width=97) (actual time=0.072..119.633 rows=10 loops=1 gets=1670 reads=1638 local_reads=0 CPU=0.06/0.03 sec) Output: aid, bid, abalance, filler Total runtime: 209.556 ms (3 rows) 3. Buffer statistics counsters are not reset to zero anymore. [in storage/buffer/bufmgr.c] ResetBufferUsage() is save the current counters in global variables as baseline and buffer statistics are measured in difference from them because the counters are used by struct Instrumentation. Comments welcome. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center pg_stat_statements.1202.tar.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN index build speed
The issue is that the GIN index build code accumulates the lexemes into a binary tree, but there's nothing to keep the tree balanced. My test case with almost monotonically increasing keys, happens to be a worst-case scenario, and the tree degenerates into almost linked list that every insertion has to grovel through. Agree, but in most cases it works well. Because lexemes in documents aren't ordered. The obvious fix is to use a balanced tree algorithm. I wrote a quick patch to turn the tree into a splay tree. That fixed the degenerative behavior, and the runtime of CREATE INDEX for the above test case fell from 40s to 1.5s. BTW, your patch helps to GIN's btree emulation. With typical scenarios of usage of btree emulation scalar column will be more or less ordered. Magnus kindly gave me a dump of the full-text-search tables from search.postgresql.org, for some real-world testing. Quick testing with that suggests that the patch unfortunately makes the index build 5-10% slower with that data set. Do you see ways to improve that? We're in commitfest, not supposed to be submitting new features, so I'm not going to pursue this further right now. Patch attached, however, which seems to work fine. Personally, I don't object to improve that. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maintenance memory vs autovac
Greg Stark wrote: Seems it would make more sense to just divide maintenance_work_mem by the number of workers for autovacuum. While that would be a solution for some cases, it is far from certain that's what you'd actually want. This sounds familiar. Didn't we already decide to do this once? Could be my google-fu is off today... One concern I have about this is people asking how come when I runvacuum manually it takes x minutes but when autovacuum runs it it tale 5x minutes? As long as the default is the same, people would get at least an initial clue that it might have something to do with them changing a configuration parameter... //Magnus On 2 Dec 2008, at 01:38 PM, Magnus Hagander [EMAIL PROTECTED] wrote: Would it make sense to be able to configure maintenance_work_mem specifically for the autovacuum processes? Given that there can be a number of them, it might be good to be able to have one default for all *other* processes, and a separate one from the ones kicked off by autovac? //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New to_timestamp implementation is pretty strict
Tom Lane wrote: After thinking about it I'm inclined to feel that SS and friends should insist on exactly 2 digits. If you want to allow 1-or-2-digits then use FMSS, just like the error message tells you. (However, I have a vague feeling that Oracle doesn't insist on this, and in the end we ought to follow Oracle's behavior. Can anyone check?) Oracle doesn't insist on it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/12/2 Hitoshi Harada [EMAIL PROTECTED]: sample=# EXPLAIN ANALYZE SELECT LEAD(timestamp) OVER (ORDER BY id) FROM bigtable LIMIT 1; QUERY PLAN -- --- Limit (cost=0.00..0.04 rows=1 width=12) (actual time=0.038..0.039 rows=1 loops=1) - Window (cost=0.00..386612.13 rows=1000 width=12) (actual time=0.036..0.036 rows=1 loops=1) - Index Scan using bigtable_pkey on bigtable (cost=0.00..286612.13 rows=1000 w idth=12) (actual time=0.018..0.021 rows=2 loops=1) Total runtime: 0.071 ms (4 rows) shows quite good result. Great work. After more playing with the new patch, I found worse results. sample=# explain analyze select id, row_number() OVER (order by id) from bigtable order by id; QUERY PLAN -- --- Window (cost=0.00..361612.13 rows=1000 width=4) (actual time=0.064..105414.522 rows=1000 loops=1) - Index Scan using bigtable_pkey on bigtable (cost=0.00..286612.13 rows=1000 width=4 ) (actual time=0.056..16836.341 rows=1000 loops=1) Total runtime: 114650.074 ms (3 rows) sample=# explain analyze select id,LAG(timestamp,1) over (order by id) from bigtable order by id; QUERY PLAN -- Window (cost=0.00..411612.13 rows=1000 width=12) (actual time=0.065..122583.331 rows=100 0 loops=1) - Index Scan using bigtable_pkey on bigtable (cost=0.00..286612.13 rows=1000 width=1 2) (actual time=0.056..18066.829 rows=1000 loops=1) Total runtime: 132770.399 ms (3 rows) The earlier patch results are here: http://archives.postgresql.org/pgsql-hackers/2008-11/msg01121.php row_number(): 44s/114s lag(): 79s/132s I don't understand the new patch totally, and I know the row_number() optimization is in progress, but even lag() is quite worse. Maybe tuplestore read pointer's heavy uses cause these. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] lo_open() makes a warning/falls to an assertion
Here's a better patch. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. Index: src/backend/storage/large_object/inv_api.c === RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/storage/large_object/inv_api.c,v retrieving revision 1.135 diff -c -p -r1.135 inv_api.c *** src/backend/storage/large_object/inv_api.c 2 Nov 2008 01:45:28 - 1.135 --- src/backend/storage/large_object/inv_api.c 2 Dec 2008 14:44:51 - *** inv_open(Oid lobjId, int flags, MemoryCo *** 247,254 --- 247,258 } else if (flags INV_READ) { + ResourceOwner save_resowner = CurrentResourceOwner; + + CurrentResourceOwner = TopTransactionResourceOwner; retval-snapshot = RegisterSnapshot(GetActiveSnapshot()); retval-flags = IFS_RDLOCK; + CurrentResourceOwner = save_resowner; } else elog(ERROR, invalid flags: %d, flags); *** void *** 270,277 --- 274,289 inv_close(LargeObjectDesc *obj_desc) { Assert(PointerIsValid(obj_desc)); + if (obj_desc-snapshot != SnapshotNow) + { + ResourceOwner save_resowner = CurrentResourceOwner; + + CurrentResourceOwner = TopTransactionResourceOwner; UnregisterSnapshot(obj_desc-snapshot); + CurrentResourceOwner = save_resowner; + } + pfree(obj_desc); } Index: src/test/regress/input/largeobject.source === RCS file: /home/alvherre/Code/cvs/pgsql/src/test/regress/input/largeobject.source,v retrieving revision 1.4 diff -c -p -r1.4 largeobject.source *** src/test/regress/input/largeobject.source 3 Mar 2007 22:57:03 - 1.4 --- src/test/regress/input/largeobject.source 2 Dec 2008 14:27:26 - *** SELECT lo_close(fd) FROM lotest_stash_va *** 83,88 --- 83,93 END; + -- Test resource management + BEGIN; + SELECT lo_open(loid, x'4'::int) from lotest_stash_values; + ABORT; + -- Test truncation. BEGIN; UPDATE lotest_stash_values SET fd=lo_open(loid, CAST(x'2' | x'4' AS integer)); Index: src/test/regress/output/largeobject.source === RCS file: /home/alvherre/Code/cvs/pgsql/src/test/regress/output/largeobject.source,v retrieving revision 1.4 diff -c -p -r1.4 largeobject.source *** src/test/regress/output/largeobject.source 3 Mar 2007 22:57:04 - 1.4 --- src/test/regress/output/largeobject.source 2 Dec 2008 14:42:34 - *** SELECT lo_close(fd) FROM lotest_stash_va *** 116,121 --- 116,130 (1 row) END; + -- Test resource management + BEGIN; + SELECT lo_open(loid, x'4'::int) from lotest_stash_values; + lo_open + - +0 + (1 row) + + ABORT; -- Test truncation. BEGIN; UPDATE lotest_stash_values SET fd=lo_open(loid, CAST(x'2' | x'4' AS integer)); Index: src/test/regress/output/largeobject_1.source === RCS file: /home/alvherre/Code/cvs/pgsql/src/test/regress/output/largeobject_1.source,v retrieving revision 1.1 diff -c -p -r1.1 largeobject_1.source *** src/test/regress/output/largeobject_1.source 10 Mar 2007 03:42:19 - 1.1 --- src/test/regress/output/largeobject_1.source 2 Dec 2008 14:42:51 - *** SELECT lo_close(fd) FROM lotest_stash_va *** 116,121 --- 116,130 (1 row) END; + -- Test resource management + BEGIN; + SELECT lo_open(loid, x'4'::int) from lotest_stash_values; + lo_open + - +0 + (1 row) + + ABORT; -- Test truncation. BEGIN; UPDATE lotest_stash_values SET fd=lo_open(loid, CAST(x'2' | x'4' AS integer)); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WIP: The Skyline Operator
The skyline operator is a non-standard extension (even SQL:2008 does not include it). Queries of that type can be expressed in standard SQL (although a bit clumsy and the execution is slow). Others have also worked on extending PostgreSQL with the skyline operator, see: http://archives.postgresql.org/pgsql-hackers/2007-03/thrd9.php#00188 Nevertheless we find it interesting and did our own implementation of BNL and SFS. It must be considered as WIP (work in progress). We highly welcome your feedback. Through a web frontend it is possible to test-drive the implementation. Sample queries are given, see: http://skyline.dbai.tuwien.ac.at/ The source code (patch) is available at: http://skyline.dbai.tuwien.ac.at/skyline-of-200811131.patch.tar.gz This patch applies against 8.3.5. In case of interest a patch against -head can be provided. References: [Borzsonyi2001] Börzsönyi, S.; Kossmann, D. Stocker, K.: The Skyline Operator, In ICDE, 2001, 421--432 Best, Hannes -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] cvs head initdb hangs on unixware
Hi all, cvs head configured without --enable-debug hang in initdb while making check. warthog doesn't exhibit it because it's configured with debug. when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for it while creating template db. According to truss, the last usefull thing postmaster does is writing 8K zeroes to disk. If someone needs an access to a unixware machine, let me know. regards, -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 15, Chemin des Monges+33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: [EMAIL PROTECTED] -- Make your life a dream, make your dream a reality. (St Exupery) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg Smith [EMAIL PROTECTED] writes: ... where the Power Test seems to oscillate between degrees of good and bad behavior seemingly at random. Are any of the queries complicated enough to trigger GEQO planning? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
Hi, Simon. Thanks for taking many hours to review the code!! On Mon, Dec 1, 2008 at 8:42 PM, Simon Riggs [EMAIL PROTECTED] wrote: Can you confirm that all the docs on the Wiki page are up to date? There are a few minor discrepancies that make me think it isn't. Documentation is ongoing. Sorry for my slow progress. BTW, I'm going to add and change the sgml files listed on wiki. http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Documentation_Plan Examples: For example, to make a single multi-statement transaction replication asynchronously when the default is the opposite, issue SET LOCAL synchronous_commit TO OFF within the transaction. Do we mean synchronous_replication in this sentence? I think you've copied the text and not changed all of the necessary parts - please re-read the whole section (probably the whole Wiki, actually). Oops! It's just typo. Sorry for the confusion. I will revise this section. wal_writer_delay - do we mean wal_sender_delay? Yes. I will fix it. Is there some ability to measure the amount of data to be sent and avoid the delay altogether, when the server is sufficiently busy? Why is the former ability required? The latter is possible, I think. We can guarantee that the WAL is sent (in more detail, called send(2)) once at least per wal_sender_delay. Of course, it's dependent on the scheduler of a kernel. The reaction to replication_timeout may need to be configurable. I might not want to keep on processing if the information didn't reach the standby. OK. I will add new GUC variable (PGC_SIGHUP) to specify the reaction for the timeout. I would prefer in many cases that the transactions that were waiting for walsender would abort, but the walsender kept processing. Is it dangerous to abort the transaction with replication continued when the timeout occurs? I think that the WAL consistency between two servers might be broken. Because the WAL writing and sending are done concurrently, and the backend might already write the WAL to disk on the primary when waiting for walsender. How can we restart the walsender if it shuts down? Only restart the standby (with walreceiver). The standby connects to the postmaster on the primary, then the postmaster forks new walsender. Do we want a maximum wait for a transaction and a maximum wait for the server? ISTM that these feature are too much. Do we report stats on how long the replication has been taking? If the average rep time is close to rep timeout then we will be fragile, so we need some way to notice this and produce warnings. Or at least provide info to an external monitoring system. Sounds good. How about log_min_duration_replication? If the rep time is greater than it, we produce warning (or log) like log_min_duration_xx. How do we specify the user we use to connect to primary? Yes, I need to add new option to specify the user name into recovery.conf. Thanks for reminding me! Definitely need more explanatory comments/README-style docs. Completely agreed ;-) I will write README together with other documents. For example, 03_libpq seems simple and self-contained. I'm not sure why we have a state called PGASYNC_REPLICATION; I was hoping that would be dynamic, but I'm not sure where to look for that. It would be useful to have a very long comment within the code to explain how the replication messages work, and note on each function who the intended client and server is. OK. I will re-consider whether PGASYNC_REPLICATION is removable, and write the comment about it. 02_pqcomm: What does HAVE_POLL mean? It identifies whether poll(2) is available or not on the platform. We use poll(2) if it's defined, otherwise select(2). There is similar code at pqSocketPoll() in fe-misc.c. Do we need to worry about periodic renegotiation of keys in be-secure.c? What is keys you mean? Not sure I understand why so many new functions in there. It's because walsender waits for the reply from the standby and the request from the backend concurrently. So, we need poll(2) or select(2) to make walsender wait for them, and some functions for non-blocking receiving. 04_recovery_conf is a change I agree with, though I think it may not work with EXEC_BACKEND for Windows. OK. I will examine and fix it. 05... I need dome commentary to explain this better. 06 and 07 are large and will take substantial review time. So we must get the overall architecture done first and then check the code that implements that. 08 - I think I get this, but some docs will help to confirm. Yes. I need more documentation. 09 pg_standby changes: so more changes are coming there? OK. Can we refer to those two options as failover and switchover? You mean failover trigger and switchover one? ISTM that those names and features might not suit. Naming always bother me, and the current name commit/abort trigger might tend to cause confusion. Is there any other suitable name? There's no need
Re: [HACKERS] PiTR and other architectures....
Heikki Linnakangas wrote: You can also run pg_controlinfo on both systems, and compare the results. If the Maximum data alignment, and all the values below it in the pg_controlinfo output match, the formats are compatible. s/pg_controlinfo/pg_controldata/ cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cvs head initdb hangs on unixware
Could you generate a core and send a stacktrace? kill SIGABRT pid should do that. Zdenek [EMAIL PROTECTED] napsal(a): Hi all, cvs head configured without --enable-debug hang in initdb while making check. warthog doesn't exhibit it because it's configured with debug. when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for it while creating template db. According to truss, the last usefull thing postmaster does is writing 8K zeroes to disk. If someone needs an access to a unixware machine, let me know. regards, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cvs head initdb hangs on unixware
On Tue, 2 Dec 2008, Zdenek Kotala wrote: Date: Tue, 02 Dec 2008 17:22:25 +0100 From: Zdenek Kotala [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: pgsql-hackers list pgsql-hackers@postgresql.org Subject: Re: [HACKERS] cvs head initdb hangs on unixware Could you generate a core and send a stacktrace? kill SIGABRT pid should do that. Zdenek Hmm. No point doing it, it's not debug enabled, I'm afraid stack trace won't show us anything usefull. [EMAIL PROTECTED] napsal(a): Hi all, cvs head configured without --enable-debug hang in initdb while making check. warthog doesn't exhibit it because it's configured with debug. when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for it while creating template db. According to truss, the last usefull thing postmaster does is writing 8K zeroes to disk. If someone needs an access to a unixware machine, let me know. regards, -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 15, Chemin des Monges+33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: [EMAIL PROTECTED] -- Make your life a dream, make your dream a reality. (St Exupery) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cvs head initdb hangs on unixware
On Tue, 2 Dec 2008, Zdenek Kotala wrote: Date: Tue, 02 Dec 2008 17:22:25 +0100 From: Zdenek Kotala [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: pgsql-hackers list pgsql-hackers@postgresql.org Subject: Re: [HACKERS] cvs head initdb hangs on unixware Could you generate a core and send a stacktrace? kill SIGABRT pid should do that. Zdenek Zdenek, On second thought, I tried and got that: Suivi de pile correspondant à p1, Programme postmaster *[0] fsm_rebuild_page( présumé: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97] [1] fsm_search_avail( présumé: 0x2, 0x6, 0x1) [0x81e68d9] [2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 0x8047416, 0xb4) [0x81e6385] [3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) [0x81e5a00] [4] RelationGetBufferForTuple( présumé: 0x84b2250, 0xb4, 0) [0x8099b59] [5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042] [6] simple_heap_insert( présumé: 0x84b2250, 0x853a338, 0x853a310) [0x8097297] [7] InsertOneTuple( présumé: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210] [8] boot_yyparse( présumé: 0x, 0x3, 0x8047ab8) [0x80c822b] [9] BootstrapModeMain( présumé: 0x66, 0x8454600, 0x4) [0x80ca233] [10] AuxiliaryProcessMain(0x4, 0x8047ab4) [0x80cab3b] [11] main(0x4, 0x8047ab4, 0x8047ac8) [0x8177dce] [12] _start() [0x807ff96] seems interesting! We've had problems already with unixware optimizer, hope this one is fixable! regards [EMAIL PROTECTED] napsal(a): Hi all, cvs head configured without --enable-debug hang in initdb while making check. warthog doesn't exhibit it because it's configured with debug. when it hangs, postmaster takes 100% cpu doing nothing. initdb waits for it while creating template db. According to truss, the last usefull thing postmaster does is writing 8K zeroes to disk. If someone needs an access to a unixware machine, let me know. regards, -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 15, Chemin des Monges+33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: [EMAIL PROTECTED] -- Make your life a dream, make your dream a reality. (St Exupery) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windowing Function Patch Review - Standard Conformance
2008/11/26 Heikki Linnakangas [EMAIL PROTECTED]: Hitoshi Harada wrote: I read more, and your spooling approach seems flexible for both now and the furture. Looking at only current release, the frame with ORDER BY is done by detecting peers in WinFrameGetArg() and add row number of peers to winobj-currentpos. Actually if we have capability to spool all rows we need on demand, the frame would be only a boundary problem. Yeah, we could do that. I'm afraid it would be pretty slow, though, if there's a lot of peers. That could probably be alleviated with some sort of caching, though. I added code for this issue. See http://git.postgresql.org/?p=~davidfetter/window_functions/.git;a=blobdiff;f=src/backend/executor/nodeWindow.c;h=f2144bf73a94829cd7a306c28064fa5454f8d369;hp=50a6d6ca4a26cd4854c445364395ed183b61f831;hb=895f1e615352dfc733643a701d1da3de7f91344b;hpb=843e34f341f0e824fd2cc0f909079ad943e3815b This process is very similar to your aggregatedupto in window aggregate, so they might be shared as general the way to detect frame boundary, aren't they? I am randomly trying some issues instead of agg common code (which I now doubt if it's worth sharing the code), so tell me if you're restarting your hack again. I'll send the whole patch. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PiTR and other architectures....
On Tue, 2008-12-02 at 16:21 +0200, Heikki Linnakangas wrote: initdb on one platform, copy the data directory over to the other system, and try to start postmaster. It will complain if the on-disk format is not compatible. You can also run pg_controlinfo on both systems, and compare the results. If the Maximum data alignment, and all the values below it in the pg_controlinfo output match, the formats are compatible. I don't think these things will work for all differences that could be problematic. For instance, a GNU system has a different collation for the en_US locale than an OS X system. This means that the indexes built using en_US on one system can't be moved to the other. How would either of these tests be able to determine that the systems are incompatible? I don't think this is a problem between GNU and FreeBSD, however, so this may work in Philip's case. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cvs head initdb hangs on unixware
[EMAIL PROTECTED] wrote: Suivi de pile correspondant à p1, Programme postmaster *[0] fsm_rebuild_page( présumé: 0xbd9731a0, 0, 0xbd9731a0) [0x81e6a97] [1] fsm_search_avail( présumé: 0x2, 0x6, 0x1) [0x81e68d9] [2] fsm_set_and_search(0x84b2250, 0, 0, 0x2e, 0x5, 0x6, 0x2e, 0x8047416, 0xb4) [0x81e6385] [3] RecordAndGetPageWithFreeSpace(0x84b2250, 0x2e, 0xa0, 0xb4) [0x81e5a00] [4] RelationGetBufferForTuple( présumé: 0x84b2250, 0xb4, 0) [0x8099b59] [5] heap_insert(0x84b2250, 0x853a338, 0, 0, 0) [0x8097042] [6] simple_heap_insert( présumé: 0x84b2250, 0x853a338, 0x853a310) [0x8097297] [7] InsertOneTuple( présumé: 0xb80, 0x84057b0, 0x8452fb8) [0x80cb210] [8] boot_yyparse( présumé: 0x, 0x3, 0x8047ab8) [0x80c822b] [9] BootstrapModeMain( présumé: 0x66, 0x8454600, 0x4) [0x80ca233] [10] AuxiliaryProcessMain(0x4, 0x8047ab4) [0x80cab3b] [11] main(0x4, 0x8047ab4, 0x8047ac8) [0x8177dce] [12] _start() [0x807ff96] seems interesting! We've had problems already with unixware optimizer, hope this one is fixable! Looking at fsm_rebuild_page, I wonder if the compiler is treating int as an unsigned integer? That would cause an infinite loop. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PiTR and other architectures....
On Tue, 2008-12-02 at 23:02 +1100, Philip Warner wrote: In the specific instance I am working with, I'd like to copy from 64 bit AMD BSD system to a 64 bit Linux system. I wouldn't recommend it. Midnight is the wrong time to find out that there was a difference that mattered after all. Is the risk worth it? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
On Tue, 2008-12-02 at 11:08 -0800, Jeff Davis wrote: On Tue, 2008-12-02 at 13:09 +, Simon Riggs wrote: Is it dangerous to abort the transaction with replication continued when the timeout occurs? I think that the WAL consistency between two servers might be broken. Because the WAL writing and sending are done concurrently, and the backend might already write the WAL to disk on the primary when waiting for walsender. The issue I see is that we might want to keep wal_sender_delay small so that transaction times are not increased. But we also want wal_sender_delay high so that replication never breaks. It seems better to have the action on wal_sender_delay configurable if we have an unsteady network (like the internet). Marcus made some comments on line dropping that seem relevant here; we should listen to his experience. Hmmm, dangerous? Well assuming we're linking commits with replication sends then it sounds it. We might end up committing to disk and then deciding to abort instead. But remember we don't remove the xid from procarray or mark the result in clog until the flush is over, so it is possible. But I think we should discuss this in more detail when the main patch is committed. What is the it in it is possible? It seems like there's still a problem window in there. Marking a transaction aborted after we have written a commit record, but before we have removed it from proc array and marked in clog. We'd need a special kind of WAL record to do that. Even if that could be made safe, in the event of a real network failure, you'd just wait the full timeout every transaction, because it still thinks it's replicating. True, but I did suggest having two timeouts. There is considerable reason to reduce the timeout as well as reason to increase it - at the same time. Anyway, lets wait for some user experience following commit. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
On Tue, 2008-12-02 at 13:09 +, Simon Riggs wrote: Is it dangerous to abort the transaction with replication continued when the timeout occurs? I think that the WAL consistency between two servers might be broken. Because the WAL writing and sending are done concurrently, and the backend might already write the WAL to disk on the primary when waiting for walsender. The issue I see is that we might want to keep wal_sender_delay small so that transaction times are not increased. But we also want wal_sender_delay high so that replication never breaks. It seems better to have the action on wal_sender_delay configurable if we have an unsteady network (like the internet). Marcus made some comments on line dropping that seem relevant here; we should listen to his experience. Hmmm, dangerous? Well assuming we're linking commits with replication sends then it sounds it. We might end up committing to disk and then deciding to abort instead. But remember we don't remove the xid from procarray or mark the result in clog until the flush is over, so it is possible. But I think we should discuss this in more detail when the main patch is committed. What is the it in it is possible? It seems like there's still a problem window in there. Even if that could be made safe, in the event of a real network failure, you'd just wait the full timeout every transaction, because it still thinks it's replicating. If the timeout is exceeded, it seems more reasonable to abandon the slave until you could re-sync it and continue processing as normal. As you pointed out, that's not necessarily an expensive operation because you can use something like rsync. The process of re-syncing might be made easier (or perhaps less costly), of course. If we want to still allow processing to happen after a timeout, it seems reasonable to have a configurable option to allow/disallow non-read-only transactions when out of sync. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stop_backup wait bug fix
Fujii Masao wrote: On Wed, Oct 8, 2008 at 10:23 PM, Simon Riggs [EMAIL PROTECTED] wrote: Minor bug fix for pg_stop_backup() to prevent it waiting longer than necessary in certain circumstances. Why don't you use XLByteToPrevSeg like pg_xlogfile_name? I think that we should uniform the logic as much as possible. Agreed, should use XLByteToPrevSeg. But I wonder if we can just replace the current XLByteToSeg call with XLByteToPrevSeg? That would offset the return value of the function by one byte as well, as well as the value printed to the backup history file. In fact, I think the original patch got that wrong; it would return the location of the *beginning* of the last xlog file. I also noticed that the 2nd BackupHistoryFileName call in that function is useless; histfilepath variable is already filled in earlier. How does the attached patch look to you? Do you have an easy way to test this? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Index: src/backend/access/transam/xlog.c === RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.322 diff -c -r1.322 xlog.c *** src/backend/access/transam/xlog.c 9 Nov 2008 17:51:15 - 1.322 --- src/backend/access/transam/xlog.c 2 Dec 2008 20:10:03 - *** *** 6674,6679 --- 6674,6680 char histfilepath[MAXPGPATH]; char startxlogfilename[MAXFNAMELEN]; char stopxlogfilename[MAXFNAMELEN]; + char lastxlogfilename[MAXFNAMELEN]; uint32 _logId; uint32 _logSeg; FILE *lfp; *** *** 6711,6716 --- 6712,6720 XLByteToSeg(stoppoint, _logId, _logSeg); XLogFileName(stopxlogfilename, ThisTimeLineID, _logId, _logSeg); + XLByteToPrevSeg(stoppoint, _logId, _logSeg); + XLogFileName(lastxlogfilename, ThisTimeLineID, _logId, _logSeg); + /* Use the log timezone here, not the session timezone */ stamp_time = (pg_time_t) time(NULL); pg_strftime(strfbuf, sizeof(strfbuf), *** *** 6801,6813 * we assume the admin wanted his backup to work completely. If you * don't wish to wait, you can set statement_timeout. */ - BackupHistoryFileName(histfilepath, ThisTimeLineID, _logId, _logSeg, - startpoint.xrecoff % XLogSegSize); - seconds_before_warning = 60; waits = 0; ! while (XLogArchiveIsBusy(stopxlogfilename) || XLogArchiveIsBusy(histfilepath)) { CHECK_FOR_INTERRUPTS(); --- 6805,6814 * we assume the admin wanted his backup to work completely. If you * don't wish to wait, you can set statement_timeout. */ seconds_before_warning = 60; waits = 0; ! while (XLogArchiveIsBusy(lastxlogfilename) || XLogArchiveIsBusy(histfilepath)) { CHECK_FOR_INTERRUPTS(); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: Dann Corbit [EMAIL PROTECTED] writes: I also do not believe that there is any value that will be the right answer. But a table of data might be useful both for people who want to toy with altering the values and also for those who want to set the defaults. I guess that at one time such a table was generated to produce the initial estimates for default values. Sir, you credit us too much :-(. The actual story is that the current default of 10 was put in when we first implemented stats histograms, replacing code that kept track of only a *single* most common value (and not very well, at that). So it was already a factor of 10 more stats than we had experience with keeping, and accordingly conservatism suggested not boosting the default much past that. I think that's actually too little credit. The sample size is chosen quite carefully based on solid mathematics to provide a specific confidence interval estimate for queries covering ranges the size of a whole bucket. The actual number of buckets more of an arbitrary choice. It depends entirely on how your data is distributed and how large a range your queries are covering. A uniformly distributed data set should only need a single bucket to generate good estimates. Less evenly distributed data sets need more. I wonder actually if there are algorithms for estimating the number of buckets needed for a histogram to achieve some measurable goal. That would close the loop. It would be much more reassuring to base the size of the sample on solid statistics than on hunches. I have a few thoughts on this. First, people are correct that there is no perfect default_statistics_target value. This is similar to the problem with the pre-8.4 max_fsm_pages/max_fsm_relations, for which there also was never a perfect value. But, if the FSM couldn't store all the free space, a server log message was issued that recommended increasing these values; the same is still done for checkpoint_segments. Is there a way we could emit a server log message to recommend increasing the statistics targets for specific columns? Also, is there a way to increase the efficiency of the statistics targets lookups? I assume the values are already sorted in the pg_statistic arrays; do we already do a binary lookup on those? Does that help? -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] default_stats_target WAS: Simple postgresql.conf wizard
Mark, Generally speaking, it seems like on un-tuned systems increasing the default_statistics_target for this workload doesn't have a clear benefit. Do you have any idea how skewed the distribution of data for DBT3 is? If values are being generated in relatively equal proportion, I'd expect increasing DST to have little effect. The databases where higher DST is useful is ones with skewed data distribution. Unfortunately, all the data examples I could point to are proprietary customer databases :-( -- --Josh Josh Berkus PostgreSQL San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
Breaking down of patch into sections works very well for review. Should allow us to get different reviewers on different parts of the code - review wranglers please take note: Dave, Josh. Fujii-san, could you break the patch up into several parts? We have quite a few junior reviewers who are idle right now. -- --Josh Josh Berkus PostgreSQL San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
Jeff, Even if that could be made safe, in the event of a real network failure, you'd just wait the full timeout every transaction, because it still thinks it's replicating. Hmmm. I'd suggest that if we get timeouts for more than 10xTimeout value in a row, that replication stops. Unfortunatley, we should probably make that *another* configuration setting. -- --Josh Josh Berkus PostgreSQL San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] default_stats_target WAS: Simple postgresql.conf wizard
* Josh Berkus [EMAIL PROTECTED] [081202 15:54]: Do you have any idea how skewed the distribution of data for DBT3 is? If values are being generated in relatively equal proportion, I'd expect increasing DST to have little effect. The databases where higher DST is useful is ones with skewed data distribution. Unfortunately, all the data examples I could point to are proprietary customer databases :-( But no body's asking anybody to point out skewed data... I think it's *unanimous* that on skewed data, a higher stats target is needed for the skewed columns. The question is how much of a penalty the (majority of?) users with normal data columns will have to pay in stats/planning overhead to accomidate a blanket increase in DST for the (few?) skewed columns. I think Marks started to try and show that overhead/difference with real numbers. My (probably unsubstantiated) bias is showing, but nobody else has (yet) showed otherwise ;-) a. -- Aidan Van Dyk Create like a god, [EMAIL PROTECTED] command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1
On Thu, Nov 27, 2008 at 11:09 AM, Emmanuel Cecchet [EMAIL PROTECTED] wrote: I have been following that discussion very closely but it seems that we are debating solutions without a good specification of the problem/requirements. I would suggest that we collect all the partitioning requirements on a dedicated Wiki page. There might not be a one size fits it all solution for all requirements. We can also look at what other databases are proposing to address these issues. If we can prioritize features, that should also allow us to stage the partitioning implementation. This might be a good idea. Want to take a crack at it? I have a prototype insert trigger in C that directly move inserts in a master table to the appropriate child table (directly moving the tuple). Let me know if anyone is interested. Can't hurt to post it. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PiTR and other architectures....
Jeff Davis wrote: On Tue, 2008-12-02 at 16:21 +0200, Heikki Linnakangas wrote: initdb on one platform, copy the data directory over to the other system, and try to start postmaster. It will complain if the on-disk format is not compatible. You can also run pg_controlinfo on both systems, and compare the results. If the Maximum data alignment, and all the values below it in the pg_controlinfo output match, the formats are compatible. I don't think these things will work for all differences that could be problematic. For instance, a GNU system has a different collation for the en_US locale than an OS X system. This means that the indexes built using en_US on one system can't be moved to the other. How would either of these tests be able to determine that the systems are incompatible? wow...that's a little scary. Sounds like there is no trustworthy test I can run. Other than the case of collation differences, are there any other kinds of problems that would not be detected by even a postmaster restart? |/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PiTR and other architectures....
On Wed, 2008-12-03 at 10:15 +1100, Philip Warner wrote: wow...that's a little scary. Sounds like there is no trustworthy test I can run. Other than the case of collation differences, are there any other kinds of problems that would not be detected by even a postmaster restart? I can't answer that question authoritatively. If the locales obey the same rules, and pg_controldata has the same output for the relevant values (as Heikki mentioned), I *think* it will work. But, as Simon pointed out, is it really worth the risk? PITR is closer to a physical process, and it's probably wise to just assume it's not portable. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PiTR and other architectures....
But, as Simon pointed out, is it really worth the risk? PITR is closer to a physical process, and it's probably wise to just assume it's not portable. Yeah...I am getting that impression ;-). From this I will assume we need: - same OS (and OS minor version?) - same CPU architecture I was hoping it was a simple set of requirements, but that's life. -- Philip Warner| __---_ Albatross Consulting Pty. Ltd. |/ - \ (A.B.N. 75 008 659 498) | /(@) __---_ Tel: (+61) 03 5330 3171 | _ \ Fax: (+61) 03 5330 3172 | ___ | http://www.rhyme.com.au http://www.rhyme.com.au/ |/ \| |---- GPG key available upon request. | / |/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_all_tables vs NULLs
Magnus Hagander [EMAIL PROTECTED] writes: I've noticed that pg_stat_all_tables returns NULL for idx_scan and idx_tup_fetch if there are no indexes present on a table. Is this actually intended, or is that something that should be fixed? Hmm. I suspect it's an implementation artifact rather than something that was consciously chosen, but on reflection it doesn't seem like a bad thing. If we just COALESCE'd it to zero (which I assume is what you have in mind) then there would be no distinction in the view between you have no indexes and there are indexes but they aren't being used. I'd vote to leave it alone, I think. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] maintenance memory vs autovac
Magnus Hagander [EMAIL PROTECTED] writes: Greg Stark wrote: One concern I have about this is people asking how come when I runvacuum manually it takes x minutes but when autovacuum runs it it tale 5x minutes? As long as the default is the same, people would get at least an initial clue that it might have something to do with them changing a configuration parameter... It seems like mostly a confusion-generator to me. Is there any actual evidence that autovac should use a different maintenance_work_mem than other processes? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] lo_open() makes a warning/falls to an assertion
Alvaro Herrera [EMAIL PROTECTED] writes: I think the solution is to have each large object have its own ResourceOwner, and store the snapshot in it. Otherwise the snapshot is left in the calling query's resowner, which is not good. That's not gonna scale to transactions that touch lots of large objects (think pg_dump). Also it seems like it would be arbitrarily altering the past behavior of LOs. Why don't they just use the calling query's snapshot (which I think is what the historical behavior is, though I'm too lazy to check right now)? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] GIN improvements
Heikki Linnakangas [EMAIL PROTECTED] writes: Teodor Sigaev wrote: - Falling back to regular insert will take long time for update of whole table - and that was one of reasons of that patch. Users forget to drop GIN index before a global update and query runs forever. If *that* is a use case we're interested in, the incoming tuples could be accumulated in backend-private memory, and inserted into the index at commit. That would be a lot simpler, with no need to worry about concurrent inserts or vacuums. Doesn't work --- the index would yield wrong answers for later queries in the same transaction. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Column-level Privileges
On Tue, Nov 25, 2008 at 4:03 PM, Stephen Frost [EMAIL PROTECTED] wrote: * Alvaro Herrera ([EMAIL PROTECTED]) wrote: I had a look at aclchk.c and didn't like your change to objectNamesToOids; seems rather baroque. I changed it per the attached patch. I've incorporated this change. Moreover I didn't very much like the way aclcheck_error_col is dealing with two or one % escapes. I think you should have a separate routine for the column case, and prepend a dummy string to no_priv_msg. I can do this, not really a big deal. Stephen, Are you sending an updated patch with these minor changes? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Erroring out on parser conflicts
Peter Eisentraut wrote: On Tuesday 25 November 2008 15:09:37 Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: While hacking on parser/gram.y just now I noticed in passing that the automatically generated ecpg parser had 402 shift/reduce conflicts. (Don't panic, the parser in CVS is fine.) If you don't pay very close attention, it is easy to miss this. Considering also that we frequently have to educate contributors that parser conflicts are not acceptable, should we try to error out if we see conflicts? Would %expect 0 produce the same result in a less klugy way? Great, that works. I'll see about adding this to our parser files. FYI, this is going to make it hard for developers to test CVS changes until they get their grammar cleaned up; perhaps add a comment on how to disable the check? -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG] lo_open() makes a warning/falls to an assertion
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: I think the solution is to have each large object have its own ResourceOwner, and store the snapshot in it. Otherwise the snapshot is left in the calling query's resowner, which is not good. That's not gonna scale to transactions that touch lots of large objects (think pg_dump). Also it seems like it would be arbitrarily altering the past behavior of LOs. Why don't they just use the calling query's snapshot (which I think is what the historical behavior is, though I'm too lazy to check right now)? Well, that's precisely the problem -- they cannot use the query snapshot (which is what the current buggy code is doing), because the query goes away just after calling lo_open, but the snapshot needs to survive until after lo_close is called (or until end of xact). So the last patch I sent deals with the problem by recording the snapshot in the transaction's resowner instead, by momentarily setting it as current. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Erroring out on parser conflicts
Bruce Momjian [EMAIL PROTECTED] writes: FYI, this is going to make it hard for developers to test CVS changes until they get their grammar cleaned up; perhaps add a comment on how to disable the check? Well, the point is that their grammar changes are broken if that check fails, so I'm not sure what the value of testing a known-incorrect grammar might be. It wouldn't necessarily act the same after being fixed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stop_backup wait bug fix
On Wed, Dec 3, 2008 at 5:13 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Fujii Masao wrote: On Wed, Oct 8, 2008 at 10:23 PM, Simon Riggs [EMAIL PROTECTED] wrote: Minor bug fix for pg_stop_backup() to prevent it waiting longer than necessary in certain circumstances. Why don't you use XLByteToPrevSeg like pg_xlogfile_name? I think that we should uniform the logic as much as possible. Agreed, should use XLByteToPrevSeg. But I wonder if we can just replace the current XLByteToSeg call with XLByteToPrevSeg? That would offset the return value of the function by one byte as well, as well as the value printed to the backup history file. In fact, I think the original patch got that wrong; it would return the location of the *beginning* of the last xlog file. You're right. As you say, the value (stopxlogfilename) printed to the backup history file is wrong. But, since the value is not used fortunately, any troubles have not come up. So, I think that we can just replace them. I also noticed that the 2nd BackupHistoryFileName call in that function is useless; histfilepath variable is already filled in earlier. Somewhat confusingly, BackupHistoryFileName is called only once. Isn't 1st (which probably you thought) BackupHistoryFilePath? In order to prevent confusion, we should add new local variable (histfilename) for the backup history file name? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
Hi, On Wed, Dec 3, 2008 at 6:03 AM, Josh Berkus [EMAIL PROTECTED] wrote: Breaking down of patch into sections works very well for review. Should allow us to get different reviewers on different parts of the code - review wranglers please take note: Dave, Josh. Fujii-san, could you break the patch up into several parts? We have quite a few junior reviewers who are idle right now. Yes, I divided the patch into 9 pieces. Do I need to divide it further? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] V2 of PITR performance improvement for 8.4
Hi, As to checkpoint timeout, yes, this measurement is hard for FPW=on case. I'll do the similar measurement for checkpoint timeout = 5min and post the result. I expect that the recoevry time will be almost the same in the case FPW=on, lesslog=yes and prefetpch = yes. 2008/12/2 Simon Riggs [EMAIL PROTECTED]: On Thu, 2008-11-27 at 21:04 +0900, Koichi Suzuki wrote: We ran the benchmark for on hour with chekpoint timeout 30min and completion_target 0.5. Then, collected all the archive log and run PITR. --+++--- WAL conditions| Recovery | Amount of | recovery | time (sec) | physical read (MB) | rate (TX/min) --+++--- w/o prefetch ||| archived with cp | 6,611 | 5,435 |402 FPW=off ||| --+++--- With prefetch ||| archived with cp | 1,161 | 5,543 | 2,290 FPW=off ||| --+++--- There's clearly a huge gain using prefetch, when we have full_page_writes = off. But that does make me think: Why do we need prefetch at all if we use full page writes? There's nothing to prefetch if we can keep it in cache. Agreed. This is why I proposed prefetch optional through GUC. I notice we set the checkpoint_timeout to 30 mins, which is long enough to exceed the cache on the standby. I wonder if we reduced the timeout would we use the cache better on the standby and not need readahead at all? Do you have any results to examine cache overflow/shorter timeouts? w/o prefetch ||| archived with cp | 1,683 | 801 | 1,458 FPW=on||| (8.3) --+++--- w/o prefetch ||| archived with lesslog | 6,644 | 5,090 |369 FPW=on||| --+++--- With prefetch ||| archived with cp | 1,415 | 2,157 | 1,733 FPW=on||| --+++--- With prefetch ||| archived with lesslog | 1,196 | 5,369 | 2,051 FPW=on||| (This proposal) --+++--- So I'm wondering if we only need prefetch because we're using lesslog? If we integrated lesslog better into the new replication would we be able to forget about doing the prefetch altogether? In the case of lesslog, almost all the FPW is replaced with corresponding incremental log and recovery takes longer. Prefetch dramatically improve this, as you will see in the above result.To improve recovery time with FPW=off or FPW=on and lesslog=yes, we need prefetch. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] where is the last hot standby patch?
I'd like to have the new one too to test. 2008/11/27 Robert Haas [EMAIL PROTECTED]: On Wed, Nov 26, 2008 at 4:38 PM, Jaime Casanova [EMAIL PROTECTED] wrote: i get lost with this one... i thought there were two patches that get merged into one, but i don't find nor the merged version nor the actualized version of any of one... Simon said he would split the infrastructure changes back out as a separate patch in an hour or so but that was the last word I saw on this. http://archives.postgresql.org/pgsql-hackers/2008-11/msg01333.php Simon, did you by any chance send something that got eaten by the message size limit? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby (commit fest version - v5)
Hi, I found that no one is registered as hot standby reviewer. I'd like to review the patch, mainly by testing through some benchmark test. Regards; 2008/11/2 Simon Riggs [EMAIL PROTECTED]: Hot Standby patch, including all major planned features. Allows users to connect to server in archive recovery and run queries. Applies cleanly, passes make check. There's no flaky code, kludges or other last minute rush-for-deadline stuff. It all works, though really for a patch this size and scope I expect many bugs. As a result I am describing this as WIP, though it is much more than a prototype. All the code has been planned out in advance, so there's been very little on-the-fly changes required during final development. I'm fully committed to making all required changes and fixes in reasonable times. I will continue detailed testing over the next few weeks to re-check everything prior to commit. Initially, I would ask people to spend time thinking about this conceptually to check that I have all the correct subsystems and cater for all the little side tweaks that exist in various parts of the server When you test this, please do it on a server built with --enable-cassert and keep a detailed log using trace_recovery_messages = DEBUG4 (or 2-3). Code has been specifically designed to be performance neutral or better for normal workloads, so the WAL volume, number of times we take WAL locks etc should be identical (on 64-bit systems). The patch includes further tuning of subtransaction commits, so I hope the patch may even be a few % win on more complex workloads with many PL/pgSQL functions using EXCEPTION. Enabling the bgwriter during recovery seems likely to be a huge gain on performance for larger shared_buffers settings, which should offset a considerable increase in CPU usage during recovery. Performance test results would be appreciated * for normal running - just to test that it really is neutral * for recovery - if query access not used is it faster than before Points of significant note for detailed reviewers * Prepared transactions not implemented yet. No problems foreseen, but want to wait to see if other refactorings are required. * Touching every block of a btree index during a replay of VACUUM is not yet implemented. Would like some second opinions that it is even required. I have code prototyped for it, but it feels too wacky even after Heikki and I agreed it was required. Do we still think that? * locking correctness around flat file refresh still not enabled * need some discussiona round how to handle max_connections changes cleanly. -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
Fujii-san, Yes, I divided the patch into 9 pieces. Do I need to divide it further? That's plenty. Where do reviews find the 9 pieces? -- Josh Berkus PostgreSQL San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
Hi, On Wed, Dec 3, 2008 at 3:21 PM, Josh Berkus [EMAIL PROTECTED] wrote: Fujii-san, Yes, I divided the patch into 9 pieces. Do I need to divide it further? That's plenty. Where do reviews find the 9 pieces? The latest patch set (v4) is on wiki. http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#Patch_set Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
Hello, On Tue, Dec 2, 2008 at 10:09 PM, Simon Riggs [EMAIL PROTECTED] wrote: The reaction to replication_timeout may need to be configurable. I might not want to keep on processing if the information didn't reach the standby. OK. I will add new GUC variable (PGC_SIGHUP) to specify the reaction for the timeout. I would prefer in many cases that the transactions that were waiting for walsender would abort, but the walsender kept processing. Is it dangerous to abort the transaction with replication continued when the timeout occurs? I think that the WAL consistency between two servers might be broken. Because the WAL writing and sending are done concurrently, and the backend might already write the WAL to disk on the primary when waiting for walsender. The issue I see is that we might want to keep wal_sender_delay small so that transaction times are not increased. But we also want wal_sender_delay high so that replication never breaks. Are you assuming only asynch case? In synch case, since walsender is awoken by the signal from the backend, we don't need to keep the delay so small. And, wal_sender_delay has no relation with the mis-termination of replication. It seems better to have the action on wal_sender_delay configurable if we have an unsteady network (like the internet). Marcus made some comments on line dropping that seem relevant here; we should listen to his experience. OK, I would look for his comments. Please let me know which thread has the comments if you know. Hmmm, dangerous? Well assuming we're linking commits with replication sends then it sounds it. We might end up committing to disk and then deciding to abort instead. But remember we don't remove the xid from procarray or mark the result in clog until the flush is over, so it is possible. But I think we should discuss this in more detail when the main patch is committed. If the transaction is aborted while the backend is waiting for replication, the transaction commit command returns false indication to the client. But the transaction commit record might be written in the primary and standby. As you say, it may not be dangerous as long as the primary is alive. But, when we recover the failed primary, clog of the transaction is marked with success because of the commit record. Is it safe? And, in that case, the transaction is treated as sucess on the standby, and visible for the read-only query. On the other hand, it's invisible on the primary. Isn't it dangerous? Do we need to worry about periodic renegotiation of keys in be-secure.c? What is keys you mean? See the notes in that file for explanation. Thanks! I would check it. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] V2 of PITR performance improvement for 8.4
Hi, On Thu, Nov 27, 2008 at 9:04 PM, Koichi Suzuki [EMAIL PROTECTED] wrote: Please find enclosed a revised version of pg_readahead and a patch to invoke pg_readahead. Some similar functions are in xlog.c and pg_readahead.c (for example, RecordIsValid). I think that we should unify them as a common function, which helps to develop the tool (for example, xlogdump) treating WAL in the future. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCHES] GIN improvements
Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: If *that* is a use case we're interested in, the incoming tuples could be accumulated in backend-private memory, and inserted into the index at commit. That would be a lot simpler, with no need to worry about concurrent inserts or vacuums. Doesn't work --- the index would yield wrong answers for later queries in the same transaction. Queries would still need to check the backend-private list. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby (commit fest version - v5)
On Wed, Dec 3, 2008 at 11:00 AM, Koichi Suzuki [EMAIL PROTECTED] wrote: Hi, I found that no one is registered as hot standby reviewer. I'd like to review the patch, mainly by testing through some benchmark test. You can yourself edit the Wiki page, though you need to register first. But its very straight forward. I added myself as reviewer to Hot standby few days back since I did some code review and testing. I intend to spend some more time after new patch is posted. Thanks, Pavan Pavan Deolasee. EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] maintenance memory vs autovac
On Wed, Dec 3, 2008 at 2:00 AM, Tom Lane [EMAIL PROTECTED] wrote: It seems like mostly a confusion-generator to me. Is there any actual evidence that autovac should use a different maintenance_work_mem than other processes? IMHO, the point is that we were used to consider the maintenance_work_mem as a one process at a time thing. Even if it's not really true, we usually didn't do maintenance task on a concurrent basis. The autovacuum workers change that and make it a default behaviour (as we can have 3*maintenance_work_mem by default). From my point of view, the best solution would be to share the maintenance_work_mem amongst all the workers but I suppose it's not technically possible. -- Guillaume -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep: First Thoughts on Code
Hi, On Wed, Dec 3, 2008 at 4:08 AM, Jeff Davis [EMAIL PROTECTED] wrote: Even if that could be made safe, in the event of a real network failure, you'd just wait the full timeout every transaction, because it still thinks it's replicating. If walsender detects a real network failure, the transaction doesn't need to wait for the timeout. Configuring keepalive options would help walsender to detect it. Of course, though keepalive on linux might not work as expected. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby (commit fest version - v5)
Thank you for your advise. I'll edit the Wiki page. 2008/12/3 Pavan Deolasee [EMAIL PROTECTED]: On Wed, Dec 3, 2008 at 11:00 AM, Koichi Suzuki [EMAIL PROTECTED] wrote: Hi, I found that no one is registered as hot standby reviewer. I'd like to review the patch, mainly by testing through some benchmark test. You can yourself edit the Wiki page, though you need to register first. But its very straight forward. I added myself as reviewer to Hot standby few days back since I did some code review and testing. I intend to spend some more time after new patch is posted. Thanks, Pavan Pavan Deolasee. EnterpriseDB http://www.enterprisedb.com -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] In-place upgrade: catalog side
Since this whole in-place upgrade thing is going nowhere until there's also a good solution for ye olde The database cluster was initialized with CATALOG_VERSION_NO ... error, I spent some time today touring through what everybody else has done there and have some initial review commentary and questions. Zdenek provided the following code: http://src.opensolaris.org/source/xref/sfw/usr/src/cmd/postgres/postgresql-upgrade/ The main thing you'll find there is a ksh script that handles most of the upgrade, presuming there's no page format changes. It looks like it was originally aimed at 8.1-8.2 upgrades (easy as long as you don't use INET/CIDR in your database) and still has some hard-coded stuff from that in there to clean up. I spent some time reading the code to figure out exactly what it's doing, but come to find there's an easier way to ramp up. It looks like this script is a fairly straight shell port (presumably to make things more flexible) of EDB's pg_migrator: http://pgfoundry.org/projects/pg-migrator/ If you want to understand the basics of what happens in either program, the pg_migrator download has a nice outline in doc/intro.txt The basic concept seems workable, albeit a bit more brute-force than I was expecting: don't bother trying to figure out what actually changed in the catalog, just start with a new cluster, restore schema, then slam renumbered data pages into it. At a high level it works like this: -Run pg_dumpall against the old database to grab its schema -Extract some catalog information from the old database -Export the pg_control information -Create a new cluster -Copy the old pg_clog over -With the help of pg_resetxlog, move over the timeline and other ids -Restore the schema dump -Update TOAST info -Join the new database relid information against the old set so you can easily map old and new relids for each relation -Move the underlying database files from their old location (the original relid) to the new one -Adjust tablespace information I'd like to start doing something constructive to help move this part forward, so here's an initial set of questions I've got for mapping that work out: -A ksh script for something this critical is a non-starter on Windows in particular, so a good plan would be to use Zdenek's script as a malleable prototype, confirm it works, then update pg_migrator with anything it's missing, right? -Are there already any specific tasks done by Zdenek's script that are already known to be unique to only its implementation? Eventually I expect I'll figure that out for sure myself just by comparing the code, was curious what the already known divergences were. -There are 10 TODO items listed for the pg_migrator project, most or all of which look like should be squashed before this is really complete. Any chance somebody (Korry?) has an improved version of this floating around beyond what's in the pgfoundry CVS already? -Am I really the only person who is frantic that this program isn't being worked on actively? -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simple postgresql.conf wizard
Greg, On Mon, Dec 1, 2008 at 3:17 AM, Greg Smith [EMAIL PROTECTED] wrote: ./pgtune -i ~/data/postgresql.conf First, thanks for your work: it will really help a lot of people to have a decent default configuration. A couple of comments from reading the code (I didn't run it yet): - it would be really nice to make it work with Python 2.4 as RHEL 5 is a Python 2.4 thing and it is a very widespread platform out there, - considering the audience of this tool, I think you should explain in the usage text which type of workload implies each database type (DW, OLTP, Web, Mixed, Desktop). - not sure of the wording of The file this needs to operate correctly can be generated with: - it would be nice to be able to define the architecture (32-64 bits) from the command line (especially considering I won't be able to run it on our target boxes which are all RHEL 5 :)) I'll see if I have more feedback while testing it for real. -- Guillaume -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-place upgrade: catalog side
Greg Smith napsal(a): The main thing you'll find there is a ksh script that handles most of the upgrade, presuming there's no page format changes. It looks like it was originally aimed at 8.1-8.2 upgrades (easy as long as you don't use INET/CIDR in your database) and still has some hard-coded stuff from that in there to clean up. Yes, It is correct. It is only for 8.1-8.2. It works fine for 8.3-8.4 too, but I'm working on cleanup and fixing bugs. I hope that I will send updated version to community today. snip I'd like to start doing something constructive to help move this part forward, so here's an initial set of questions I've got for mapping that work out: -A ksh script for something this critical is a non-starter on Windows in particular, so a good plan would be to use Zdenek's script as a malleable prototype, confirm it works, then update pg_migrator with anything it's missing, right? It is more workaround or temporary solution. This approach is easy but it has lot of limitation. Problem with toast tables is one, but biggest problem is with dropped columns. And maybe there will be more issues. Problem with dump is that you lost a internal data. I personally prefer to have special mode (like boostrap) which converts data from old catalog to new format. I think pg_upgrade.sh is good starter, before we will implement direct catalog upgrade. -Are there already any specific tasks done by Zdenek's script that are already known to be unique to only its implementation? Eventually I expect I'll figure that out for sure myself just by comparing the code, was curious what the already known divergences were. If you compare with pg_migrator, there is better handling of locale and I think vacuum freeze is used correctly. Also shuffling with tablespaces is little bit different (it should avoid to move data outside of mountpoint). But in principal the idea is same. -There are 10 TODO items listed for the pg_migrator project, most or all of which look like should be squashed before this is really complete. Any chance somebody (Korry?) has an improved version of this floating around beyond what's in the pgfoundry CVS already? As I mentioned before pg_migrator and pg_upgrade.sh is not good way. It is workaround. It does not make sense to continue in this way. Zdenek -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-place upgrade: catalog side
Zdenek Kotala wrote: Greg Smith napsal(a): -There are 10 TODO items listed for the pg_migrator project, most or all of which look like should be squashed before this is really complete. Any chance somebody (Korry?) has an improved version of this floating around beyond what's in the pgfoundry CVS already? As I mentioned before pg_migrator and pg_upgrade.sh is not good way. It is workaround. It does not make sense to continue in this way. Why not? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers