Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
Robert, Thanks for the obviously thought-out review. On Sat, Feb 05, 2011 at 01:29:35AM -0500, Robert Haas wrote: On Thu, Jan 27, 2011 at 2:48 PM, Noah Misch n...@leadboat.com wrote: Done as attached. ?This preserves compatibility with our current handling of composite type dependencies. ?The rest you've seen before. OK, so I took a look at this in more detail today. The current logic for table rewrites looks like this: 1. If we're changing the data type of a column, or adding a column with a default, or adding/dropping OIDs, rewrite the table. Stop. 2. Otherwise, if we're adding a constraint or NOT NULL, scan the table and check constraints. 3. If we're changing tablespaces, copy the table block-by-block. Correct. It's perhaps obvious, but rewriting the table will always reindex it. It seems to me that the revised logic needs to look like this: 1. If we're changing the data type of a column and the existing contents are not binary coercible to the new contents, or if we're adding a column with a default or adding/dropping OIDs, rewrite the table. Stop. 2. Otherwise, if we're adding a constraint or NOT NULL, scan the table and check constraints. With this patch, step 2 changes changes to Otherwise, if we're adding a constraint or NOT NULL, or changing a column to a binary-compatible domain with a domain CHECK constraint, scan the table and check constraints. 3. If we're changing the data type of a column in the table, reindex the table. Rebuild indexes that depend on a changing column. Other indexes can stay. 4. If we're changing tablespaces, copy the table block-by-block. I might be missing something, but I don't see that the patch includes step #3, which I think is necessary. For example, citext is binary coercible to text, but you can't reuse the index because the comparison function is different. Of course, if you're changing the type of a column to its already-current type, you can skip #3, but if that's the only case we can optimize, it's not much of an accomplishment. I guess this gets back to the ALTER TYPE 7 patch, which I haven't looked at in detail, but I have a feeling it may be controversial. It's there, but it's happening rather implicitly. ATExecAlterColumnType builds lists of indexes and constraints that depend on changing columns. Specifically, it stashes their OIDs and the SQL to recreate them. ATPostAlterTypeCleanup drops those objects by OID, then parses the SQL statements, now based on the updated table definition. ATExecAddIndex and ATExecAddConstraint use those parsed statements to recreate the objects. The key is the skip_build logic in ATExecAddIndex: if ATRewriteTables will rewrite the table (and therefore *all* indexes), we skip the build at that earlier stage to avoid building the same index twice. The only thing I had to do was update the skip_build condition so it continues to mirror the corresponding test in ATRewriteTables. Originally I had this patch doing a full reindex, with an eye to having the next patch reduce the scope to dependent indexes. However, all the infrastructure was already there, and it actually made this patch smaller to skip directly to what it does today. ALTER TYPE 7 additionally skips builds of indexes that depend on a changing column but can be proven compatible. So it's in the business of, for example figuring out that text and varchar are compatible but text and citext are not. Another problem here is that if you have to do both #2 and #3, you might have been better off (or just as well off) doing #1, unless you can somehow jigger things so that the same scan does both the constraint checks and the index rebuild. That doesn't look simple. We have no such optimization during #1, either, so #2+#3 is never worse. In particular, #1 scans the table (# indexes total) + 1 times, while #2+#3 scans it (# of indexes depending on changed columns) + 1 times. There are some nice optimization opportunities here, to be sure. As a specific first step, teach index_build to create multiple indexes with a single scan, then have reindex_relation use that. Probably not simple. Combining that with the ATRewriteTable scan would be less simple still. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [pgsql-general 2011-1-21:] Are there any projects interested in object functionality? (+ rule bases)
May I sum up? o in the recent there are no efforts known to experiment with reference types, methods, or rule inference on top of PostgreSQL -- advice that can be given mostly points to the given documented functionality o inside the PostgreSQL community, there is not many knowledge in circulation in regard of performance effects of using deeply nested data structures (with the possible exception of XML handling) or doing rule inference on top oof PostgreSQL -- but at least, there also are no substantial contraindications o extensions of PostgreSQL to support such a kind of usage have to be expected to be expected to be rejected from integration to the code base core -- i.e., if they are done, students have to be told «you can't expect this to become a part of PostgreSQL» Is this understood correctly, especially the last point, or did Robert/Tom just specifically address syntactical conflicts (between schema and object semantics) with the point notation? If not, it might be discouraging for lecture, as there might be interest to present something which at least might be imagined once to become a standard tool. Otherwise, the striking lack of academical initiatives in the area of OO and rule inference on top of PostgreSQL appears to me as a demand to a. check out academic sources, whether principle efficience issues of backend design discourage it so obviously that people do not even try it out b. if this is not the case, to propose this professor to try to fill the gap... ;-) In this case, regarding method semantics extensions, avoiding conflicts with existent language constructs certainly will be preferable, as these will be small projects. Cheers, Nick -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sat, Feb 5, 2011 at 3:05 AM, Noah Misch n...@leadboat.com wrote: Correct. It's perhaps obvious, but rewriting the table will always reindex it. Right. 3. If we're changing the data type of a column in the table, reindex the table. Rebuild indexes that depend on a changing column. Other indexes can stay. Good point. 4. If we're changing tablespaces, copy the table block-by-block. I might be missing something, but I don't see that the patch includes step #3, which I think is necessary. For example, citext is binary coercible to text, but you can't reuse the index because the comparison function is different. Of course, if you're changing the type of a column to its already-current type, you can skip #3, but if that's the only case we can optimize, it's not much of an accomplishment. I guess this gets back to the ALTER TYPE 7 patch, which I haven't looked at in detail, but I have a feeling it may be controversial. It's there, but it's happening rather implicitly. I see now. So you're actually not really making any change to that machinery. It's sufficient to just skip the rewrite of the heap when it isn't needed, and without any particular code change the indexes will sort themselves out. We have no such optimization during #1, either, so #2+#3 is never worse. In particular, #1 scans the table (# indexes total) + 1 times, while #2+#3 scans it (# of indexes depending on changed columns) + 1 times. OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Thu, Jan 27, 2011 at 2:48 PM, Noah Misch n...@leadboat.com wrote: Done as attached. Looking at this still more, it appears that independent of any change this patch may wish to make, there's a live bug here related to the foreign table patch I committed back in December. Creating a foreign table creates an eponymous rowtype, which can be used as a column in a regular table. You can then change the data type of a column in the foreign table, read from the regular table, and crash the server. The simple fix for this is to just change the code in ATPrepAlterColumnType to handle the foreign table case also: if (tab-relkind == RELKIND_COMPOSITE_TYPE) { /* * For composite types, do this check now. Tables will check * it later when the table is being rewritten. */ find_composite_type_dependencies(rel-rd_rel-reltype, NULL, RelationGetRelationName(rel)); } But this is a little unsatisfying, for two reasons. First, the error message will be subtly wrong: we can make it complain about a table or a type, but not a foreign table. At a quick look, it likes the right fix might be to replace the second and third arguments to find_composite_type_dependencies() with a Relation. Second, I wonder if we shouldn't refactor things so that all the checks fire in ATRewriteTables() rather than doing them in different places. Seems like that might be cleaner. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
2011/1/19 Robert Haas robertmh...@gmail.com: On Wed, Jan 19, 2011 at 11:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: ... So what we want to do is write a percentage of them, in a way that guarantees that they'll all eventually get written if people continue to access the same data. The word guarantee seems quite inappropriate here, since as far as I can see this approach provides no such guarantee --- even after many cycles you'd never be really certain all the bits were set. What I asked for upthread was that we continue to have some deterministic, practical way to force all hint bits in a table to be set. This is not *remotely* responding to that request. It's still not deterministic, and even if it were, vacuuming a large table 20 times isn't a very practical solution. I get the impression you haven't spent as much time reading my email as I spent writing it. Perhaps I'm wrong, but in any case the code doesn't do what you're suggesting. In the most recently posted version of this patch, which is v2, if VACUUM hits a page that is Please update the commitfest with the accurate patch, there is only the old immature v1 of the patch in it. I was about reviewing it... https://commitfest.postgresql.org/action/patch_view?id=500 hint-bit-dirty, it always writes it. Full stop. The 20 times bit applies to a SELECT * FROM table, which is a rather different case. As I write this, I realize that there is a small fly in the ointment here, which is that neither VACUUM nor SELECT force out all the pages they modify to disk. So there is some small amount of remaining nondeterminism, even if you VACUUM, because VACUUM will leave the last few pages it dirties in shared_buffers, and whether those hint bits hit the disk will depend on a decision made at the time they're evicted, not at the time they were dirtied. Possibly I could fix that by making SetBufferCommitInfoNeedsSave() set BM_DIRTY during vacuum and BM_HINT_BITS at other times. That would nail the lid shut pretty tight. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] OCLASS_FOREIGN_TABLE support is incomplete
Robert Haas robertmh...@gmail.com writes: Err... wait. Actually, I think the right thing to do might be to remove OCLASS_FOREIGN_TABLE altogether. I don't think it's actually used for anything. For a foreign table, we use OCLASS_CLASS, just as we do for indexes and sequences. I think that's just leftover crap that I failed to rip out. OK. I'll fix it as part of the extensions patch, since it's touching all those same places anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We need to log aborted autovacuums
2011/2/4 Robert Haas robertmh...@gmail.com: On Sun, Jan 30, 2011 at 10:26 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Jan 30, 2011 at 10:03 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Robert Haas's message of dom ene 30 23:37:51 -0300 2011: Unless I'm missing something, making autovacuum.c call ConditionalLockRelationOid() is not going to work, because the vacuum transaction isn't started until we get all the way down to vacuum_rel(). Maybe we need ConditionalLockRelationOidForSession or something like that? That'd be another way to go, if there are objections to what I've implemented here. Seeing as how there seem to be neither objections nor endorsements, I'm inclined to commit what I proposed what do you implement exactly ? * The original request from Josh to get LOG when autovac can not run because of locks * VACOPT_NOWAIT, what is it ? more or less as-is. There remains the issue of what do about the log spam. Josh Berkus suggested logging it when log_autovacuum_min_duration != -1, which seems like a bit of an abuse of that setting, but it's certainly not worth adding another setting for, and the alternative of logging it at, say, DEBUG2 seems unappealing because you'll then have to turn on logging for a lot of unrelated crap to get this information. So on balance I think that proposal is perhaps the least of evils. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI patch version 14
Heikki Linnakangas wrote: On 02.02.2011 19:36, Kevin Grittner wrote: Then there's one still lurking outside the new predicate* files, in heapam.c. It's about 475 lines into the heap_update function (25 lines from the bottom). In reviewing where we needed to acquire predicate locks, this struck me as a place where we might need to duplicate the predicate lock from one tuple to another, but I wasn't entirely sure. I tried for a few hours one day to construct a scenario which would show a serialization anomaly if I didn't do this, and failed create one. If someone who understands both the patch and heapam.c wants to look at this and offer an opinion, I'd be most grateful. I'll take another look and see if I can get my head around it better, but failing that, I'm disinclined to either add locking I'm not sure we need or to remove the comment that says we *might* need it there. Have you convinced yourself that there's nothing to do? If so, we should replace the todo comment with a comment explaining why. It turns out that nagging doubt from my right-brain was on target. Here's the simplest example I was able to construct of a false negative due to the lack of some code there: -- setup create table t (id int not null, txt text) with (fillfactor=50); insert into t (id) select x from (select * from generate_series(1, 100)) a(x); alter table t add primary key (id); -- session 1 -- T1 begin transaction isolation level serializable; select * from t where id = 100; -- session 2 -- T2 begin transaction isolation level serializable; update t set txt = 'x' where id = 100; commit; -- T3 begin transaction isolation level serializable; update t set txt = 'y' where id = 100; select * from t where id = 50; -- session 3 -- T4 begin transaction isolation level serializable; update t set txt = 'z' where id = 50; select * from t where id = 1; commit; -- session 2 commit; -- session 1 update t set txt = 'a' where id = 1; commit; Based on visibility of work, there's a cycle in the apparent order of execution: T1 - T2 - T3 - T4 - T1 So now that I'm sure we actually do need code there, I'll add it. And I'll add the new test to the isolation suite. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] OCLASS_FOREIGN_TABLE support is incomplete
On Sat, Feb 5, 2011 at 10:56 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Err... wait. Actually, I think the right thing to do might be to remove OCLASS_FOREIGN_TABLE altogether. I don't think it's actually used for anything. For a foreign table, we use OCLASS_CLASS, just as we do for indexes and sequences. I think that's just leftover crap that I failed to rip out. OK. I'll fix it as part of the extensions patch, since it's touching all those same places anyway. Excellent, thanks! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] is_absolute_path incorrect on Windows
Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Tom Lane wrote: Bruce Momjian br...@momjian.us writes: I have reviewed is_absolute_path() and have implemented path_is_relative_and_below_cwd() to cleanly handle cases like 'E:abc' on Win32; patch attached. This patch appears to remove some security-critical restrictions. Why did you delete the path_contains_parent_reference calls? They are now in path_is_relative_and_below_cwd(), ... and thus not invoked in the absolute-path case. This is a security hole. I don't see a general reason to prevent .. in absolute paths, only relative ones. load '/path/to/database/../../../path/to/anywhere' Ah, good point. I was thinking about someone using .. in the part of the path that is compared to /data or /log, but using it after that would clearly be a security problem. I have attached an updated patch that restructures the code to be clearer and adds the needed checks. I decided that my convert_and_check_filename() usage was too intertwined so I have developed a simplified version that is easier to understand; patch attached. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/adminpack/adminpack.c b/contrib/adminpack/adminpack.c index 381554d..c149dd6 100644 *** a/contrib/adminpack/adminpack.c --- b/contrib/adminpack/adminpack.c *** convert_and_check_filename(text *arg, bo *** 73,104 canonicalize_path(filename); /* filename can change length here */ - /* Disallow .. in the path */ - if (path_contains_parent_reference(filename)) - ereport(ERROR, - (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - (errmsg(reference to parent directory (\..\) not allowed; - if (is_absolute_path(filename)) { ! /* Allow absolute references within DataDir */ ! if (path_is_prefix_of_path(DataDir, filename)) ! return filename; ! /* The log directory might be outside our datadir, but allow it */ ! if (logAllowed ! is_absolute_path(Log_directory) ! path_is_prefix_of_path(Log_directory, filename)) ! return filename; ! ! ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), (errmsg(absolute path not allowed; - return NULL; /* keep compiler quiet */ - } - else - { - return filename; } } --- 73,102 canonicalize_path(filename); /* filename can change length here */ if (is_absolute_path(filename)) { ! /* Disallow '/a/b/data/..' */ ! if (path_contains_parent_reference(filename)) ! ereport(ERROR, ! (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), ! (errmsg(reference to parent directory (\..\) not allowed; ! /* ! * Allow absolute paths if within DataDir or Log_directory, even ! * though Log_directory might be outside DataDir. ! */ ! if (!path_is_prefix_of_path(DataDir, filename) ! (!logAllowed || !is_absolute_path(Log_directory) || ! !path_is_prefix_of_path(Log_directory, filename))) ! ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), (errmsg(absolute path not allowed; } + else if (!path_is_relative_and_below_cwd(filename)) + ereport(ERROR, + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), + (errmsg(path must be in or below the current directory; + + return filename; } diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c index 93bc401..bbcddfb 100644 *** a/src/backend/utils/adt/genfile.c --- b/src/backend/utils/adt/genfile.c *** convert_and_check_filename(text *arg) *** 51,81 filename = text_to_cstring(arg); canonicalize_path(filename); /* filename can change length here */ - /* Disallow .. in the path */ - if (path_contains_parent_reference(filename)) - ereport(ERROR, - (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), - (errmsg(reference to parent directory (\..\) not allowed; - if (is_absolute_path(filename)) { ! /* Allow absolute references within DataDir */ ! if (path_is_prefix_of_path(DataDir, filename)) ! return filename; ! /* The log directory might be outside our datadir, but allow it */ ! if (is_absolute_path(Log_directory) ! path_is_prefix_of_path(Log_directory, filename)) ! return filename; ! ! ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), (errmsg(absolute path not allowed; - return NULL; /* keep compiler quiet */ - } - else - { - return filename; } } --- 51,80 filename = text_to_cstring(arg); canonicalize_path(filename); /* filename can change length here */ if (is_absolute_path(filename)) { ! /* Disallow '/a/b/data/..' */ ! if (path_contains_parent_reference(filename)) ! ereport(ERROR, ! (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), ! (errmsg(reference to parent directory
Re: [HACKERS] limiting hint bit I/O
On Sat, Feb 5, 2011 at 10:37 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: Please update the commitfest with the accurate patch, there is only the old immature v1 of the patch in it. I was about reviewing it... https://commitfest.postgresql.org/action/patch_view?id=500 Woops, sorry about that. Here's an updated version, which I will also add to the CommitFest application. The need for this patch has been somewhat ameliorated by the fsync queue compaction patch. I tested with: create table s as select g, random()::text||random()::text||random()::text||random()::text from generate_series(1,100) g; checkpoint; The table was large enough not to fit in shared_buffers. Then, repeatedly: select sum(1) from s; At the time I first posted this patch, running against git master, the first run took about 1600 ms vs. ~207-216 ms for subsequent runs. But that was actually running up against the fsync queue problem. Retesting today, the first run took 360 ms, and subsequent runs took 197-206 ms. I doubt that the difference in the steady-state is significant, since the tests were done on different days and not controlled all that carefully, but clearly the response time spike for the first scan is far lower than previously. Setting the log level to DEBUG1 revealed that the first scan did two fsync queue compactions. The patch still does help to smooth things out, though. Here are the times for one series of selects, with the patch applied, after setting up as described above: 257.108 259.245 249.181 245.896 250.161 241.559 240.538 241.091 232.727 232.779 232.543 226.265 225.029 222.015 217.106 216.426 217.724 210.604 209.630 203.507 197.521 204.448 196.809 Without the patch, as seen above, the first run is about ~80% slower. With the patch applied, the first run is about 25% slower than the steady state, and subsequent scans decline steadily from there. Runs 21 and following flush no further data and run at full speed. These numbers aren't representative of all real-world scenarios, though. On a system with many concurrent clients, CLOG contention might be an issue; on the flip side, if this table were larger than RAM (not just larger than shared_buffers) the decrease in write traffic as we scan through the table might actually be a more significant benefit than it is here, where it's mostly a question of kernel time; the I/O system isn't actually taxed. So I think this probably needs more testing before we decide whether or not it's a good idea. I adopted a few suggestions made previously in this version of the patch. Tom Lane recommended not messing with BM_JUST_DIRTY and leaving that for another day. I did that. Also, per my previous musings, I've adjusted this version so that vacuum behaves differently when dirtying pages rather than when flushing them. In versions 1 and 2, vacuum would always write pages that were dirty-only-for-hint-bits when allocating a new buffer; in this version the buffer allocation logic is the same for vacuum, but it marks pages dirty even when only hint bits have changed. The result is that VACUUM followed by CHECKPOINT is enough to make sure all hint bits are set on disk, just as is the case today. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 5663711..e8f8781 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -209,11 +209,12 @@ vacuum(VacuumStmt *vacstmt, Oid relid, bool do_toast, CommitTransactionCommand(); } - /* Turn vacuum cost accounting on or off */ + /* Adjust vacuum cost accounting state and update VacuumActive flag */ PG_TRY(); { ListCell *cur; + VacuumActive = true; VacuumCostActive = (VacuumCostDelay 0); VacuumCostBalance = 0; @@ -254,8 +255,9 @@ vacuum(VacuumStmt *vacstmt, Oid relid, bool do_toast, } PG_CATCH(); { - /* Make sure cost accounting is turned off after error */ + /* Make sure vacuum state variables are fixed up on error */ VacuumCostActive = false; + VacuumActive = false; PG_RE_THROW(); } PG_END_TRY(); diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 1f89e52..1146dee 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -81,6 +81,10 @@ static bool IsForInput; /* local state for LockBufferForCleanup */ static volatile BufferDesc *PinCountWaitBuf = NULL; +/* local state for BufferAlloc */ +static int hint_bit_write_allowance; +static int buffer_allocation_count; + static Buffer ReadBuffer_common(SMgrRelation reln, char relpersistence, ForkNumber forkNum, BlockNumber blockNum, @@ -578,6 +582,7 @@ BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, for (;;) { bool lock_held; + bool write_buffer = false; /* * Select a victim buffer. The buffer is returned with its header @@ -600,13
Re: [HACKERS] We need to log aborted autovacuums
On Sat, Feb 5, 2011 at 12:54 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: what do you implement exactly ? * The original request from Josh to get LOG when autovac can not run because of locks * VACOPT_NOWAIT, what is it ? What the patch implements is: If autovacuum can't get the table lock immediately, it skips the table. This leaves the table still appearing to need autovacuuming, so autovacuum will keep retrying (every 1 minute, or whatever you have autovacuum_naptime set to) until it gets the lock. This seems clearly better than the historical behavior of blocking on the lock, which can potentially keep other tables in the database from getting vacuumed. In the case where a table is skipped for this reason, we log a message at log level LOG. The version of the patch I posted does that unconditionally, but my intention was to change it before commit so that it only logs the message if log_autovacuum_min_duration is something other than -1. Regular vacuum behaves as before - it waits for each table lock individually. We could expose a NOWAIT option at the SQL level as well, so that someone could do VACOPT (NOWAIT), if that's something people want. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
2011/2/5 Robert Haas robertmh...@gmail.com: On Sat, Feb 5, 2011 at 10:37 AM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: Please update the commitfest with the accurate patch, there is only the old immature v1 of the patch in it. I was about reviewing it... https://commitfest.postgresql.org/action/patch_view?id=500 Woops, sorry about that. Here's an updated version, which I will also add to the CommitFest application. The need for this patch has been somewhat ameliorated by the fsync queue compaction patch. I tested with: create table s as select g, random()::text||random()::text||random()::text||random()::text from generate_series(1,100) g; checkpoint; The table was large enough not to fit in shared_buffers. Then, repeatedly: select sum(1) from s; At the time I first posted this patch, running against git master, the first run took about 1600 ms vs. ~207-216 ms for subsequent runs. But that was actually running up against the fsync queue problem. Retesting today, the first run took 360 ms, and subsequent runs took 197-206 ms. I doubt that the difference in the steady-state is significant, since the tests were done on different days and not controlled all that carefully, but clearly the response time spike for the first scan is far lower than previously. Setting the log level to DEBUG1 revealed that the first scan did two fsync queue compactions. The patch still does help to smooth things out, though. Here are the times for one series of selects, with the patch applied, after setting up as described above: 257.108 259.245 249.181 245.896 250.161 241.559 240.538 241.091 232.727 232.779 232.543 226.265 225.029 222.015 217.106 216.426 217.724 210.604 209.630 203.507 197.521 204.448 196.809 Without the patch, as seen above, the first run is about ~80% slower. With the patch applied, the first run is about 25% slower than the steady state, and subsequent scans decline steadily from there. Runs 21 and following flush no further data and run at full speed. These numbers aren't representative of all real-world scenarios, though. On a system with many concurrent clients, CLOG contention might be an issue; on the flip side, if this table were larger than RAM (not just larger than shared_buffers) the decrease in write traffic as we scan through the table might actually be a more significant benefit than it is here, where it's mostly a question of kernel time; the I/O system isn't actually taxed. So I think this probably needs more testing before we decide whether or not it's a good idea. I *may* have an opportunity to test that in a real world application where this hint bit was an issue. I adopted a few suggestions made previously in this version of the patch. Tom Lane recommended not messing with BM_JUST_DIRTY and leaving that for another day. yes, good. I did that. Also, per my previous musings, I've adjusted this version so that vacuum behaves differently when dirtying pages rather than when flushing them. In versions 1 and 2, vacuum would always write pages that were dirty-only-for-hint-bits when allocating a new buffer; in this version the buffer allocation logic is the same for vacuum, but it marks pages dirty even when only hint bits have changed. The result is that VACUUM followed by CHECKPOINT is enough to make sure all hint bits are set on disk, just as is the case today. for now it looks better to reduce this impact, yes.. Keeping the logic from v1 or v2 imply vacuum freeze to 'fix' the hint bit, right ? -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Sat, Feb 5, 2011 at 3:07 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: So I think this probably needs more testing before we decide whether or not it's a good idea. I *may* have an opportunity to test that in a real world application where this hint bit was an issue. That would be great. But note that you'll also need to compare it against an unpatched 9.1devel; otherwise we won't be able to tell whether it's this helping, or some other 9.1 patch (particularly, the fsync compaction patch). I did that. Also, per my previous musings, I've adjusted this version so that vacuum behaves differently when dirtying pages rather than when flushing them. In versions 1 and 2, vacuum would always write pages that were dirty-only-for-hint-bits when allocating a new buffer; in this version the buffer allocation logic is the same for vacuum, but it marks pages dirty even when only hint bits have changed. The result is that VACUUM followed by CHECKPOINT is enough to make sure all hint bits are set on disk, just as is the case today. for now it looks better to reduce this impact, yes.. Keeping the logic from v1 or v2 imply vacuum freeze to 'fix' the hint bit, right ? In v1, you'd need to actually dirty the pages, so yeah, VACUUM (FREEZE) would be pretty much the only way. In v2, regular VACUUM would mostly work, except it might miss a smattering of hint bits at the very end of its scan. In this version (v3), that's been fixed as well and now just plain VACUUM should be entirely sufficient. (The last few pages examined might not get evicted to disk right away, just as in the current code, but they're guaranteed to be written eventually unless a system crash intervenes, again just as in the current code.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We need to log aborted autovacuums
2011/2/5 Robert Haas robertmh...@gmail.com: On Sat, Feb 5, 2011 at 12:54 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: what do you implement exactly ? * The original request from Josh to get LOG when autovac can not run because of locks * VACOPT_NOWAIT, what is it ? What the patch implements is: If autovacuum can't get the table lock immediately, it skips the table. This leaves the table still appearing to need autovacuuming, so autovacuum will keep retrying (every 1 minute, or whatever you have autovacuum_naptime set to) until it gets the lock. This seems clearly better than the historical behavior of blocking on the lock, which can potentially keep other tables in the database from getting vacuumed. great :) In the case where a table is skipped for this reason, we log a message at log level LOG. The version of the patch I posted does that unconditionally, but my intention was to change it before commit so that it only logs the message if log_autovacuum_min_duration is something other than -1. yes looks better, I also note that someone suggest to not add a GUC for that. I am unsure about that, and adding a parameter for that does not hurt me at all: reducing number of parameters for memory/performance/... is fine (well, it is very good when we can), but to improve the log output I think it is ok to add more without much trouble. Regular vacuum behaves as before - it waits for each table lock individually. We could expose a NOWAIT option at the SQL level as well, so that someone could do VACOPT (NOWAIT), if that's something people want. Might be useful, yes. -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We need to log aborted autovacuums
On Sat, Feb 5, 2011 at 3:20 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: In the case where a table is skipped for this reason, we log a message at log level LOG. The version of the patch I posted does that unconditionally, but my intention was to change it before commit so that it only logs the message if log_autovacuum_min_duration is something other than -1. yes looks better, I also note that someone suggest to not add a GUC for that. I think a GUC just to control this one message is overkill. Chances are that if you're logging some or all of your autovacuum runs you'll want to have this too, or at least it won't be a major problem to get it anyway. If for some reason you want ONLY this message, you can effectively get that behavior too, but setting log_autovacuum_min_duration to an enormous value. So I can't really imagine why you'd need a separate knob just for this one thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] We need to log aborted autovacuums
2011/2/5 Robert Haas robertmh...@gmail.com: On Sat, Feb 5, 2011 at 3:20 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: In the case where a table is skipped for this reason, we log a message at log level LOG. The version of the patch I posted does that unconditionally, but my intention was to change it before commit so that it only logs the message if log_autovacuum_min_duration is something other than -1. yes looks better, I also note that someone suggest to not add a GUC for that. I think a GUC just to control this one message is overkill. Chances are that if you're logging some or all of your autovacuum runs you'll want to have this too, or at least it won't be a major problem to get it anyway. If for some reason you want ONLY this message, you can effectively get that behavior too, but setting log_autovacuum_min_duration to an enormous value. So I can't really imagine why you'd need a separate knob just for this one thing. to not output it when I want log_autovac_min_duration = 0 but I know that some tables do have locks contention and may loose maintenance for one hour or so. Anyway, without GUC is fine too as it won't fill the /var/log itself ! I am just not opposed to have new GUC in those areas (log debug). -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI patch version 14
Kevin Grittner wrote: So now that I'm sure we actually do need code there, I'll add it. In working on this I noticed the apparent need to move two calls to PredicateLockTuple a little bit to keep them inside the buffer lock. Without at least a share lock on the buffer, it seems that here is a window where a read could miss the MVCC from a write and the write could fail to see the predicate lock. Please see whether this seems reasonable: http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=7841a22648c3f4ae46f674d7cf4a7c2673cf9ed2 And I'll add the new test to the isolation suite. We don't need all permutations for this test, which is a good thing since it has such a long setup time. Is there an easy way to just run the one schedule of statements on three connections? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
2011/2/5 Robert Haas robertmh...@gmail.com: On Sat, Feb 5, 2011 at 3:07 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: So I think this probably needs more testing before we decide whether or not it's a good idea. I *may* have an opportunity to test that in a real world application where this hint bit was an issue. That would be great. But note that you'll also need to compare it against an unpatched 9.1devel; otherwise we won't be able to tell whether it's this helping, or some other 9.1 patch (particularly, the fsync compaction patch). mmhh, sure. I did that. Also, per my previous musings, I've adjusted this version so that vacuum behaves differently when dirtying pages rather than when flushing them. In versions 1 and 2, vacuum would always write pages that were dirty-only-for-hint-bits when allocating a new buffer; in this version the buffer allocation logic is the same for vacuum, but it marks pages dirty even when only hint bits have changed. The result is that VACUUM followed by CHECKPOINT is enough to make sure all hint bits are set on disk, just as is the case today. for now it looks better to reduce this impact, yes.. Keeping the logic from v1 or v2 imply vacuum freeze to 'fix' the hint bit, right ? In v1, you'd need to actually dirty the pages, so yeah, VACUUM (FREEZE) would be pretty much the only way. In v2, regular VACUUM would mostly work, except it might miss a smattering of hint bits at the very end of its scan. In this version (v3), that's been fixed as well and now just plain VACUUM should be entirely sufficient. (The last few pages examined might not get evicted to disk right away, just as in the current code, but they're guaranteed to be written eventually unless a system crash intervenes, again just as in the current code.) just reading the patch... I understand the idea of the 5% flush. *maybe* it make sense to use effective_io_concurrency GUC here to improve the ratio, but it might be perceived as a bad usage .. currently effective_io_concurrency is for planning purpose. -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
Robert Haas wrote: On Sat, Feb 5, 2011 at 10:37 AM, C?dric Villemain cedric.villemain.deb...@gmail.com wrote: Please update the commitfest with the accurate patch, there is only the old immature v1 of the patch in it. I was about reviewing it... https://commitfest.postgresql.org/action/patch_view?id=500 Woops, sorry about that. Here's an updated version, which I will also add to the CommitFest application. The need for this patch has been somewhat ameliorated by the fsync queue compaction patch. I tested with: Uh, in this C comment: +* or not we want to take the time to write it. We allow up to 5% of +* otherwise-not-dirty pages to be written due to hint bit changes, 5% of what? 5% of all buffers? 5% of all hint-bit-dirty ones? Can you clarify this in the patch? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Sat, Feb 5, 2011 at 4:19 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: just reading the patch... I understand the idea of the 5% flush. *maybe* it make sense to use effective_io_concurrency GUC here to improve the ratio, but it might be perceived as a bad usage .. currently effective_io_concurrency is for planning purpose. effective_io_concurrency is supposed to be set based on how many spindles your RAID array has. There's no reason to think that the correct flush percentage is in any way related to that value. The reason why we might not want backends to write out too many dirty-only-for-hint-bits buffers during a large sequential scan are that (a) the actual write() system calls take time to copy the buffers into kernel space, slowing the scan, and (b) flushing too many buffers this way could lead to I/O spikes. Increasing the flush percentage slows down the first few scans, but takes fewer scans to reach optimal performance (all hit bits set on disk). Decreasing the flush percentage speeds up the first few scans, but is overall less efficient. We could make this a tunable, but I'm not clear that there is much point. If writing 100% of the pages that have only hint-bit updates slows the scan by 80% and writing 5% of the pages slows the scan by 25%, then dropping below 5% doesn't seem likely to buy much further improvement. You could argue for raising the flush percentage above 5%, but if you go too much higher then it's not clear that you're gaining anything over just flushing them all. I don't think we necessarily have enough experience to know whether this is a good idea at all, so worrying about whether different people need different percentages seems a bit premature. Another point here is that no matter how many times you sequential-scan the table, you never get performance as good as what you would get if you vacuumed it, even if the table contains no dead tuples. I believe this is because VACUUM will not only set the HEAP_XMIN_COMMITTED hint bits; it'll also set PD_ALL_VISIBLE on the page. I wonder if we shouldn't be autovacuuming even tables that are insert-only for precisely this reason, as well as to prevent the case where someone inserts small batches of records for a long time and then finally deletes some stuff. There are no visibility map bits set so, boom, you get this huge, expensive vacuum. This will, of course, be even more of an issue when we get index-only scans. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Sat, Feb 5, 2011 at 4:31 PM, Bruce Momjian br...@momjian.us wrote: Uh, in this C comment: + * or not we want to take the time to write it. We allow up to 5% of + * otherwise-not-dirty pages to be written due to hint bit changes, 5% of what? 5% of all buffers? 5% of all hint-bit-dirty ones? Can you clarify this in the patch? 5% of buffers that are hint-bit-dirty but not otherwise dirty. ISTM that's exactly what the comment you just quoted says on its face, but I'm open to some other wording you want to propose. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Varchar and binary protocol
Hi, I do performance tests against orignal JDBC driver and my version in binary and in text mode. I saw strange results when I was reading varchar values. Here is some output from simple benchmark Plain strings speed Execution: 8316582, local: 2116608, all: 10433190 Binary strings speed Execution: 9354613, local: 2755949, all: 12110562 Text NG strings speed Execution: 8346902, local: 2704242, all: 11051144 Plain is standard JDBC driver, Binary is my version with binary transfer, Text is my version with normal transfer. 1st column, Execution is time spend on query execution this includes send, recivie proto message, store it, etc, no conversion to output format. Values are in nanoseconds. In new version I added some functionality, but routines to read parts in Execution block are almost same for binary and text. But as you see the binary version is 10-20% slower then orginal, and my text version, if I increase number of read records this proportion will not change. I done many checks, against even skip proto message content driver, end results was same 10-20% slower. Regards, Radek. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
2011/2/5 Bruce Momjian br...@momjian.us: Robert Haas wrote: On Sat, Feb 5, 2011 at 10:37 AM, C?dric Villemain cedric.villemain.deb...@gmail.com wrote: Please update the commitfest with the accurate patch, there is only the old immature v1 of the patch in it. I was about reviewing it... https://commitfest.postgresql.org/action/patch_view?id=500 Woops, sorry about that. Here's an updated version, which I will also add to the CommitFest application. The need for this patch has been somewhat ameliorated by the fsync queue compaction patch. I tested with: Uh, in this C comment: + * or not we want to take the time to write it. We allow up to 5% of + * otherwise-not-dirty pages to be written due to hint bit changes, 5% of what? 5% of all buffers? 5% of all hint-bit-dirty ones? Can you clarify this in the patch? The patch currently allow 100 buffers to be written consecutively each 2000 BufferAlloc. mmmhhh Robert, I am unsure with the hint_bit_write_allowance counter. It looks a bit fragile because nothing prevent hint_bit_write_allowance counter to increase a lot, so that is not 100 but X*100 next hint bit will be written. Isn't it ? Also, won't buffer_allocation_count hit INT limit ? -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Foreign servers and user mappings versus the extensions patch
Currently, the extensions patch considers that foreign data wrappers, foreign servers, and user mapping objects can all be parts of extensions. This is slightly problematic for pg_dump, where somebody decided to take a shortcut and not implement user mappings using the full DumpableObject infrastructure. That could be fixed, but I'm wondering whether it's worth the trouble. I can see the point of writing an FDW as an extension but it's a lot harder to see why either foreign server or user mapping objects would ever be part of an extension. So it might just be best to remove those two object types from the set that can be managed by an extension. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL/MED - file_fdw
On Mon, Jan 17, 2011 at 06:33:08PM -0600, Kevin Grittner wrote: Itagaki Takahiro wrote: Shigeru HANADA wrote: Attached patch would avoid this leak by adding per-copy context to CopyState. This would be overkill, and ResetCopyFrom() might be reasonable though. Good catch. I merged your fix into the attached patch. Review for CF: ... I tried to read through the code to look for problems, but there are so many structures and techniques involved that I haven't had to deal with yet, that it would take me days to get up to speed enough to desk check this adequately. Since this API is a prerequisite for other patches, and already being used by them, I figured I should do what I could quickly and then worry about how to cover that. I've studied the patch from this angle. My two substantive questions concern CopyFromErrorCallback() error strings and the use of the per-tuple memory context; see below. The rest either entail trivial fixes, or they do not indicate anything that clearly ought to change. The patch mostly moves code around; it's a bit difficult to grasp what's really changing by looking at the diff (not a criticism of the approach -- I see no way to avoid that). The final code structure is at least as easy to follow as the structure it replaces. Here is a summary of the changes: file_fdw wishes to borrow the COPY FROM code for parsing structured text and building tuples fitting a given relation. file_fdw must bypass the parts that insert those tuples. Until now, copy.c has exposed a single API, DoCopy(). To meet file_fdw's needs, the patch adds four new APIs for use as a set: BeginCopyFrom() - once-per-COPY initialization and validation NextCopyFrom() - call N times to get N values/null arrays EndCopyFrom() - once-per-COPY cleanup CopyFromErrorCallback() - for caller use in ErrorContextCallback.callback Implementing these entails breaking apart the existing code structure. For one, the patch separates from DoCopy the once-per-COPY-statement code, both initialization and cleanup. Secondly, it separates the CopyFrom code for assembling a tuple based on COPY input from the code that actually stores said tuples in the target relation. To avoid duplicating code, the patch then calls these new functions from DoCopy and CopyFrom. To further avoid duplicating code and to retain symmetry, it refactors COPY TO setup and teardown along similar lines. We have four new static functions: BeginCopyTo() - parallel to BeginCopyFrom(), for DoCopy() alone BeginCopy() - shared bits between BeginCopyTo() and BeginCopyFrom() EndCopyTo() - parallel to EndCopyFrom(), for DoCopy() alone EndCopy() - shared bits between EndCopyTo() and EndCopyFrom() Most parse analysis-type bits of DoCopy() move to BeginCopy(). Several checks remain in DoCopy(): superuser for reading a server-local file, permission on the relation, and a read-only transaction check. The first stays where it does because a superuser may define a file_fdw foreign table and then grant access to others. The other two remain in DoCopy() because the new API uses the relation as a template without reading or writing it. The catalog work (looking up defaults, I/O functions, etc) in CopyFrom() moves to BeginCopyFrom(), and applicable local variables become CopyState members. copy_in_error_callback changes name to CopyFromErrorCallback() to better fit with the rest of the new API. Its implementation does not change at all. Since it's now possible for one SQL statement to call into the COPY machinery an arbitrary number of times, the patch introduces a per-COPY memory context. The patch implements added refactorings that make sense but are peripheral to the API change at hand. CopyState.fe_copy is gone, now calculated locally in DoCopyTo(). CopyState.processed is gone, with the row count now bubbling up through return values. We now initialize the CopyState buffers for COPY FROM only; they are unused in COPY TO. The purest of patches would defer all these, but I don't object to them tagging along. For COPY TO, the relkind checks move from DoCopyTo() to BeginCopyTo(). I'm skeptical about this one. It's not required for correctness, and the relkind checks for COPY FROM remain in CopyFrom(). file_fdw uses CopyFromErrorCallback() to give errors the proper context. The function uses template strings like COPY %s, line %d, where %s is the name of the relation being copied. Presumably file_fdw and other features using this API would wish to customize that error message prefix, and the relation name might not be apropos at all. How about another argument to BeginCopyFrom, specifying an error prefix to be stashed in the CopyState? We could easily regret requiring a Relation in BeginCopyFrom(); another user may wish to use a fabricated TupleDesc. Code paths in the new API use the Relation for a TupleDesc, relhashoids, column defaults,
Re: [HACKERS] Foreign servers and user mappings versus the extensions patch
On Sat, Feb 5, 2011 at 5:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: Currently, the extensions patch considers that foreign data wrappers, foreign servers, and user mapping objects can all be parts of extensions. This is slightly problematic for pg_dump, where somebody decided to take a shortcut and not implement user mappings using the full DumpableObject infrastructure. That could be fixed, but I'm wondering whether it's worth the trouble. I can see the point of writing an FDW as an extension but it's a lot harder to see why either foreign server or user mapping objects would ever be part of an extension. So it might just be best to remove those two object types from the set that can be managed by an extension. Comments? I agree it's probably not that useful to make a foreign server or foreign user mapping part of an extension, but I'd rather not have us fail to support it just because we can't think of a use case right now. So my vote would be to fix it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] limiting hint bit I/O
On Sat, Feb 5, 2011 at 5:04 PM, Cédric Villemain cedric.villemain.deb...@gmail.com wrote: Robert, I am unsure with the hint_bit_write_allowance counter. It looks a bit fragile because nothing prevent hint_bit_write_allowance counter to increase a lot, so that is not 100 but X*100 next hint bit will be written. Isn't it ? hint_bit_write_allowance can never be more than 100. The only things we ever do are set it to exactly 100, and decrease it by 1 if it's positive. Also, won't buffer_allocation_count hit INT limit ? Sure, if the backend sticks around long enough, but it's no big deal if it overflows. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sat, Feb 05, 2011 at 10:03:59AM -0500, Robert Haas wrote: Looking at this still more, it appears that independent of any change this patch may wish to make, there's a live bug here related to the foreign table patch I committed back in December. Creating a foreign table creates an eponymous rowtype, which can be used as a column in a regular table. You can then change the data type of a column in the foreign table, read from the regular table, and crash the server. The simple fix for this is to just change the code in ATPrepAlterColumnType to handle the foreign table case also: if (tab-relkind == RELKIND_COMPOSITE_TYPE) { /* * For composite types, do this check now. Tables will check * it later when the table is being rewritten. */ find_composite_type_dependencies(rel-rd_rel-reltype, NULL, RelationGetRelationName(rel)); } But this is a little unsatisfying, for two reasons. First, the error message will be subtly wrong: we can make it complain about a table or a type, but not a foreign table. At a quick look, it likes the right fix might be to replace the second and third arguments to find_composite_type_dependencies() with a Relation. Seems like a clear improvement. Second, I wonder if we shouldn't refactor things so that all the checks fire in ATRewriteTables() rather than doing them in different places. Seems like that might be cleaner. Offhand, this seems reasonable, too. I assumed there was some good reason it couldn't be done there for non-tables, but nothing comes to mind. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Include more status information in walsender results
On Thu, Feb 3, 2011 at 7:56 AM, Magnus Hagander mag...@hagander.net wrote: Include more status information in walsender results Add the current xlog insert location to the response of IDENTIFY_SYSTEM, and adds result sets containing start and stop location of backups to BASE_BACKUP responses. Does this have anything to do with the following error message I'm now getting when trying to use streaming replication? FATAL: invalid response from primary server DETAIL: Expected 1 tuple with 2 fields, got 1 tuples with 3 fields. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI patch version 14
Kevin Grittner wrote: So now that I'm sure we actually do need code there, I'll add it. Hmmm... First cut at this here: http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=3ccedeb451ee74ee78ef70735782f3550b621eeb It passes all the usual regression tests, the new isolation tests, and the example posted earlier in the thread of a test case which was allowing an anomaly. (That is to say, the anomaly is now prevented.) I didn't get timings, but it *seems* noticeably slower; hopefully that's either subjective or fixable. Any feedback on whether this seems a sane approach to the issue is welcome. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Include more status information in walsender results
On Sun, Feb 6, 2011 at 01:54, Robert Haas robertmh...@gmail.com wrote: On Thu, Feb 3, 2011 at 7:56 AM, Magnus Hagander mag...@hagander.net wrote: Include more status information in walsender results Add the current xlog insert location to the response of IDENTIFY_SYSTEM, and adds result sets containing start and stop location of backups to BASE_BACKUP responses. Does this have anything to do with the following error message I'm now getting when trying to use streaming replication? FATAL: invalid response from primary server DETAIL: Expected 1 tuple with 2 fields, got 1 tuples with 3 fields. Yes. I believe I forgot to git add that one file that was off hiding in another directory. Thanks, fix applied. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Range Types
New patch. All known TODO items are closed, although I should do a cleanup pass over the code and docs. Fixed in this patch: * Many documentation improvements * Added INT8RANGE * Renamed PERIOD[TZ] - TS[TZ]RANGE * Renamed INTRANGE - INT4RANGE * Improved parser's handling of whitespace and quotes * Support for PL/pgSQL functions with ANYRANGE arguments/returns * Make subtype_float function no longer a requirement for GiST, but it should still be supplied for the penalty function to be useful. There are some open issues remaining, however: * Is typmod worth doing? I could complete it pretty quickly, but it involves introducing a new Node type, which seems a little messy for the benefit. * Should pg_range reference the btree opclass or the compare function directly? * Should subtype_cmp default to the default btree opclass's compare function? * Right now only superusers can define a range type. Should we open it up to normal users? * Should the SQL (inlinable) function length, which relies on polymorphic -, be immutable, strict, or volatile? * Later we might consider whether we should include btree_gist in core, to make range types more useful with exclusion constraints out-of-the-box. This should be left for later, I'm just including this for the archives so it doesn't get lost. Regards, Jeff Davis rangetypes-20110205.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TYPE 2: skip already-provable no-work rewrites
On Sat, Feb 5, 2011 at 7:44 PM, Noah Misch n...@leadboat.com wrote: But this is a little unsatisfying, for two reasons. First, the error message will be subtly wrong: we can make it complain about a table or a type, but not a foreign table. At a quick look, it likes the right fix might be to replace the second and third arguments to find_composite_type_dependencies() with a Relation. Seems like a clear improvement. That didn't quite work, because there's a caller in typecmds.c that doesn't have the relation handy. So I made it take a relkind and a name, which works fine. Second, I wonder if we shouldn't refactor things so that all the checks fire in ATRewriteTables() rather than doing them in different places. Seems like that might be cleaner. Offhand, this seems reasonable, too. I assumed there was some good reason it couldn't be done there for non-tables, but nothing comes to mind. Actually, thinking about this more, I'm thinking if we're going to change anything, it seems we ought to go the other way. If we ever actually did support recursing into wherever the composite type dependencies take us, we'd want to detect that before phase 3 and add work-queue items for each table that we needed to futz with. The attached patch takes this approach. It's very slightly more code, but it reduces the amount of spooky action at a distance. The existing coding is basically relying on the assumption that operations which require finding composite type dependencies also require a table rewrite. That was probably true at one point in time, but it's not really quite right. It already requires compensating code foreign tables and composite types (which can require this treatment even though there's nothing to rewrite) and your proposed changes to avoid table rewrites in cases where a type is changed to a compatible type would break it in the opposite direction (the check would still be needed even if the parent table doesn't need a rewrite, because the rowtype could be indexed in some fashion that depends on the type of the column being changed). Comments? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 6a17399..83f8be0 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -3378,23 +3378,6 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode) } /* - * If we change column data types or add/remove OIDs, the operation has to - * be propagated to tables that use this table's rowtype as a column type. - * newrel will also be non-NULL in the case where we're adding a column - * with a default. We choose to forbid that case as well, since composite - * types might eventually support defaults. - * - * (Eventually we'll probably need to check for composite type - * dependencies even when we're just scanning the table without a rewrite, - * but at the moment a composite type does not enforce any constraints, - * so it's not necessary/appropriate to enforce them just during ALTER.) - */ - if (newrel) - find_composite_type_dependencies(oldrel-rd_rel-reltype, - oldrel-rd_rel-relkind, - RelationGetRelationName(oldrel)); - - /* * Generate the constraint and default execution states */ @@ -4055,7 +4038,7 @@ ATExecAddColumn(AlteredTableInfo *tab, Relation rel, Oid typeOid; int32 typmod; Form_pg_type tform; - Expr *defval; + Expr *defval = NULL; attrdesc = heap_open(AttributeRelationId, RowExclusiveLock); @@ -4304,6 +4287,20 @@ ATExecAddColumn(AlteredTableInfo *tab, Relation rel, tab-new_changeoids = true; /* + * If we're adding an OID column, the operation has to be propagated to + * tables that use this table's rowtype as a column type. But we don't + * currently have the infrastructure for that, so just throw an error. + * We also forbid the case where we're adding a column with a default, since + * composite types might eventually support defaults, and ALTER TABLE .. + * ADD COLUMN .. DEFAULT would be expected to initialize the newly added + * column to the default in each instantiation of the rowtype. + */ + if (isOid || defval) + find_composite_type_dependencies(rel-rd_rel-reltype, + rel-rd_rel-relkind, + RelationGetRelationName(rel)); + + /* * Add needed dependency entries for the new column. */ add_column_datatype_dependency(myrelid, newattnum, attribute.atttypid); @@ -4975,6 +4972,16 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName, /* Tell Phase 3 to physically remove the OID column */ tab-new_changeoids = true; + + /* + * The OID removal operation needs to be propagated to tables that use + * this table's rowtype as a column type. But we don't currently have + * the infrastructure for that, so just throw an error if it would be + * required. + */ +
Re: [HACKERS] keeping a timestamp of the last stats reset (for a db, table and function)
Tomas Vondra wrote: Because when I create a database, the field is NULL - that's true. But once I connect to the database, the stats are updated and the field is set (thanks to the logic in pgstat.c). OK--so it does what I was hoping for, I just didn't test it the right way. Let's call that a documentation issue and move on. Attached is an updated patch that fixes the docs and some other random bits. Looks ready for committer to me now. Make sure to adjust PGSTAT_FILE_FORMAT_ID, do a cat version bump, and set final OIDs for the new functions. Below is what changed since the last posted version, mainly as feedback for Tomas: -Explained more clearly that pg_stat_reset and pg_stat_reset_single_counters will both touch the database reset time, and that it's initialized upon first connection to the database. -Added the reset time to the list of fields in pg_stat_database and pg_stat_bgwriter. -Fixed some tab/whitespace issues. It looks like you had tab stops set at 8 characters during some points when you were editing non-code files. Also, there were a couple of spot where you used a tab while text in the area used spaces. You can normally see both types of errors if you read a patch, they showed up as misaligned things in the context diff. -Removed some extra blank lines that didn't fit the style of the surrounding code. Basically, all the formatting bits I'm nitpicking about I found just by reading the patch itself; they all stuck right out. I'd recommend a pass of that before submitting things if you want to try and avoid those in the future. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index ca83421..100f938 100644 *** a/doc/src/sgml/monitoring.sgml --- b/doc/src/sgml/monitoring.sgml *** postgres: replaceableuser/ replacea *** 267,273 by backends (that is, not by the background writer), how many times those backends had to execute their own fsync calls (normally the background writer handles those even when the backend does its own ! write), and total buffers allocated. /entry /row --- 267,273 by backends (that is, not by the background writer), how many times those backends had to execute their own fsync calls (normally the background writer handles those even when the backend does its own ! write), total buffers allocated, and time of last statistics reset. /entry /row *** postgres: replaceableuser/ replacea *** 278,286 number of transactions committed and rolled back in that database, total disk blocks read, total buffer hits (i.e., block read requests avoided by finding the block already in buffer cache), ! number of rows returned, fetched, inserted, updated and deleted, and total number of queries cancelled due to conflict with recovery (on ! standby servers). /entry /row --- 278,286 number of transactions committed and rolled back in that database, total disk blocks read, total buffer hits (i.e., block read requests avoided by finding the block already in buffer cache), ! number of rows returned, fetched, inserted, updated and deleted, the total number of queries cancelled due to conflict with recovery (on ! standby servers), and time of last statistics reset. /entry /row *** postgres: replaceableuser/ replacea *** 663,668 --- 663,681 /row row + entryliteralfunctionpg_stat_get_db_stat_reset_time/function(typeoid/type)/literal/entry + entrytypetimestamptz/type/entry + entry +Time of the last statistics reset for the database. Initialized to the +system time during the first connection to each database. The reset time +is updated when you call functionpg_stat_reset/function on the +database, as well as upon execution of +functionpg_stat_reset_single_table_counters/function against any +table or index in it. + /entry + /row + + row entryliteralfunctionpg_stat_get_numscans/function(typeoid/type)/literal/entry entrytypebigint/type/entry entry *** postgres: replaceableuser/ replacea *** 1125,1130 --- 1138,1153 varnamebgwriter_lru_maxpages/varname parameter /entry /row + + row + entryliteralfunctionpg_stat_get_bgwriter_stat_reset_time()/function/literal/entry + entrytypetimestamptz/type/entry + entry + Time of the last statistics reset for the background writer, updated + when executing functionpg_stat_reset_shared('bgwriter')/function +
Re: [HACKERS] arrays as pl/perl input arguments [PATCH]
On Thu, Feb 3, 2011 at 18:29, Alex Hunsaker bada...@gmail.com wrote: On Mon, Jan 31, 2011 at 01:34, Alexey Klyukin al...@commandprompt.com wrote: I've looked at the patch and added a test for arrays exceeding or equal maximum dimensions to check, whether the recursive function won't bring surprises there. I've also added check_stack_depth calls to both split_array and plperl_hash_from_tuple. Note that the regression fails currently due to the incorrect error reporting in PostgreSQL, per http://archives.postgresql.org/pgsql-hackers/2011-01/msg02888.php. The v5 is attached. One thing I find odd is we allow for nested arrays, but not nested composites. For example: ... On the other end, the same is true when returning. If you try to return a nested composite or array, the child better be a string as it wont let you pass a hash: So here is a v6 that does the above. It does so by providing a generic plperl_sv_to_datum() function and converting the various places to use it. One cool thing is you can now use the composite types or arrays passed in (or returned from another spi!) as arguments for spi_exec_prepared(). Before you would have to convert the hashes to strings. It also provides a real plperl_convert_to_pg_array (now plperl_array_to_datum) that can handle composite and other random datatypes. This also means we don't have to convert arrays to a string representation first. (which removes part of the argument for #5 @ http://archives.postgresql.org/pgsql-hackers/2011-02/msg00197.php) I have attached a stand alone version of the above so its easier to look over. The diffstat is fairly small (ignoring added regression tests) 1 files changed, 259 insertions(+), 165 deletions(-) Comments? plperl_uniform_inout.patch.gz Description: GNU Zip compressed data pg_to_perl_arrays_v6.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers