Re: [HACKERS] 9.2.3 crashes during archive recovery
(2013/03/06 16:50), Heikki Linnakangas wrote: Hi, Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord(); Attempt patch records minRecoveryPoint. [crash recovery - record minRecoveryPoint in control file - archive recovery] I think that this is an original intention of Heikki's patch. Yeah. That fix isn't right, though; XLogPageRead() is supposed to return true on success, and false on error, and the patch makes it return 'true' on error, if archive recovery was requested but we're still in crash recovery. The real issue here is that I missed the two return NULL;s in ReadRecord(), so the code that I put in the next_record_is_invalid codepath isn't run if XLogPageRead() doesn't find the file at all. Attached patch is the proper fix for this. Thanks for createing patch! I test your patch in 9.2_STABLE, but it does not use promote command... When XLogPageRead() was returned false ,it means the end of stanby loop, crash recovery loop, and archive recovery loop. Your patch is not good for promoting Standby to Master. It does not come off standby loop. So I make new patch which is based Heikki's and Horiguchi's patch. I attempt test script which was modifyed Horiuch's script. This script does not depend on shell enviroment. It was only needed to fix PGPATH. Please execute this test script. I also found a bug in latest 9.2_stable. It does not get latest timeline and recovery history file in archive recovery when master and standby timeline is different. Works for me.. Can you create a test script for that? Remember to set recovery_target_timeline='latest'. I set recovery_target_timeline=latest. hmm... Here is my recovery.conf. mitsu-ko@localhost postgresql]$ cat Standby/recovery.conf standby_mode = 'yes' recovery_target_timeline='latest' primary_conninfo='host=localhost port=65432' restore_command='cp ../arc/%f %p' And my system's log message is here. waiting for server to start[Standby] LOG: database system was shut down in recovery at 2013-03-07 02:56:05 EST [Standby] LOG: restored log file 0002.history from archive cp: cannot stat `../arc/0003.history': そのようなファイルやディレクトリはありません [Standby] FATAL: requested timeline 2 is not a child of database system timeline 1 [Standby] LOG: startup process (PID 20941) exited with exit code 1 [Standby] LOG: aborting startup due to startup process failure It can be reproduced in my test script, too. Last master start command might seem not to exist generally in my test script. But it is generally that PostgreSQL with Pacemaker system. Best regards, -- Mitsumasa KONDO NTT OSS Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 92adc4e..2486683 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -4010,7 +4010,15 @@ ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt) retry: /* Read the page containing the record */ if (!XLogPageRead(RecPtr, emode, fetching_ckpt, randAccess)) + { + /* + * If archive recovery was requested when crash recovery failed, go to + * the label next_record_is_invalid to switch to archive recovery. + */ + if (!InArchiveRecovery ArchiveRecoveryRequested) + goto next_record_is_invalid; return NULL; + } pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf); targetRecOff = RecPtr-xrecoff % XLOG_BLCKSZ; @@ -4168,7 +4176,15 @@ retry: } /* Wait for the next page to become available */ if (!XLogPageRead(pagelsn, emode, false, false)) + { + /* + * If archive recovery was requested when crash recovery failed, go to + * the label next_record_is_invalid to switch to archive recovery. + */ +if (!InArchiveRecovery ArchiveRecoveryRequested) + goto next_record_is_invalid; return NULL; + } /* Check that the continuation record looks valid */ if (!(((XLogPageHeader) readBuf)-xlp_info XLP_FIRST_IS_CONTRECORD)) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 92adc4e..591e8c0 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -4446,7 +4462,7 @@ readTimeLineHistory(TimeLineID targetTLI) if (targetTLI == 1) return list_make1_int((int) targetTLI); - if (InArchiveRecovery) + if (ArchiveRecoveryRequested) { TLHistoryFileName(histfname, targetTLI); fromArchive = #! /bin/sh echo initial settings PGPATH=`pwd`/bin PATH=$PGPATH:$PATH PGDATA0=Master PGDATA1=Standby PGARC=arc mkdir $PGARC PGPORT0=65432 PGPORT1=65433 unset PGPORT unset PGDATA echo Postgresql is \`which postgres`\ killall -9 postgres rm -rf $PGDATA0 $PGDATA1 $PGARC/* initdb $PGDATA0 cat $PGDATA0/postgresql.conf EOF port=$PGPORT0 wal_level = hot_standby checkpoint_segments = 300 checkpoint_timeout = 1h archive_mode = on archive_command = 'cp %p ../$PGARC/%f' max_wal_senders = 3 hot_standby = on log_min_messages = debug1 log_line_prefix = '[Master] ' EOF cat
Re: [HACKERS] sql_drop Event Trigger
On Tue, Mar 5, 2013 at 5:37 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Okay, I added a couple of lines to skip reporting dropped temp schemas, and to skip any objects belonging to any temp schema (not just my own, note). Not posting a new version because the change is pretty trivial. Now, one last thing that comes up is what about objects that don't have straight names (such as pg_amop, pg_amproc, pg_default_acl etc etc), the only thing you get is a catalog OID and an object OID ... but they are pretty useless because by the time you get to the ddl_command_end trigger, the objects are gone from the catalog. Maybe we should report *something* about those. Say, perhaps the object description ... but if we want that, it should be untranslated (i.e. not just what getObjectDescription gives you, because that may be translated, so we would need to patch it so that it only translates if the caller requests it) Broadly, I suggest making the output format match as exactly as possible what commands like COMMENT and SECURITY LABEL accept as input. We've already confronted all of these notational issues there. Columns are identified as COLUMN table.name; functions as FUNCTION function_name(argtypes); etc. Of course it's fine to split the object type off into a separate column, but it should have the same name here that it does there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 6 March 2013 14:16, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 5 March 2013 22:02, Tom Lane t...@sss.pgh.pa.us wrote: FWIW, my opinion is that doing anything like this in the planner is going to be enormously expensive. As we already said: no MVs = zero overhead = no problem. Well, in the first place that statement is false on its face: we'll still spend cycles looking for relevant MVs, or at least maintaining a complexly-indexed cache that helps us find out that there are none in a reasonable amount of time. In the second place, even if it were approximately true it wouldn't help the people who were using MVs. We can store info in the relcache, and reduce such a lookup to a simple if test in the planner. Populating the cache would be easy enough, approx same overhead as deriving list of constraints for the relcache. If you were using MVs, there are further heuristics to apply. MVs come in various shapes, so we can assess whether they use aggregates, joins, filters etc and use that for a general match against a query. I don't see the need for complex assessments in every case. It costs in the cases where time savings are possible and not otherwise. And that is just complete nonsense: matching costs whether you find a match or not. Could we have a little less Pollyanna-ish optimism and a bit more realism about the likely cost of such a feature? It's not a trivial feature; this is a lot of work. But it can be done efficiently, without significant effect on other workloads. If that really were to be true, then enable_lookaside = off can be the default, just as we have for another costly planning feature, constraint_exclusion. Matview lookaside is the next-best-action for further work on the planner, AFAICS. Correctly optimised query parallelism is harder, IMHO. What I'm hearing at the moment is please don't make any changes in my area or don't climb the North face. Considering the rather high bar to being able to do this effectively, I do understand your interest in not having your/our time wasted by casual attempts to approach the problem, but I don't want to slam the door on a serious attempt (2 year project, 1+ man year effort). -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Writable foreign tables: how to identify rows
2013/3/6 Tom Lane t...@sss.pgh.pa.us: Also, the right way to deal with this desire is to teach the planner in general, not just FDWs, about pushing expensive calculations down the plan tree --- basically, resurrecting Joe Hellerstein's thesis work, which we ripped out more than ten years ago. I don't think there's that much that would need to change about the planner's data structures, but deciding where is cheapest to evaluate an expression is a pretty hard problem. Trying to handle that locally within FDWs is doomed to failure IMO. It also seems to me more attractive direction to support a mechanism to off-load calculation of expensive expression in general, not only foreign-tables. Probably, we have to include the cost to compute expression node to choose a proper plan node. For example, if an extension injected a pseudo scan node to materialize underlying regular table and calculate target entry using SIMD operations, its cost to compute target-entry shall be 25% towards regular ones. So, planner shall choose this plan-node in case of advantage. So my intention is to get rid of GetForeignRelWidth() and make use of the existing ctid system column for returning remote TIDs in postgres_fdw. (On review I notice that we're creating ctid columns for foreign tables already, so we don't even need the proposed hook to let FDWs control that; though we will certainly want one in future if we are to support non-TID magic row identifiers.) OK. It gets back to the initial implementation that uses TID system column. I'll try to adjust my patches by the next week. Probably, we should allow FDW drivers to specify data type of TID system column in future development, rather than new system columns, because the non-TID magic row-identifier is used exclusively with TID. If you find that unacceptable, I'm quite willing to mark this patch Returned With Feedback and get on with dealing with some of the other forty-odd patches in the CF queue. A bird is a reasonable outcome with a stone, even though it is not two birds. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sql_drop Event Trigger
Alvaro Herrera alvhe...@2ndquadrant.com writes: Here's another idea --- have three columns, type, schema (as in the current patch and as shown above), and a third one for object identity. For tables and other objects that have simple names, the identity would be their names. For columns, it'd be tablename.columnname. For functions, it'd be the complete signature. And so on. Sounds very good as an extra column yes. Robert Haas robertmh...@gmail.com writes: Broadly, I suggest making the output format match as exactly as possible what commands like COMMENT and SECURITY LABEL accept as input. We've already confronted all of these notational issues there. Columns are identified as COLUMN table.name; functions as FUNCTION function_name(argtypes); etc. Of course it's fine to split the object type off into a separate column, but it should have the same name here that it does there. I would like the format to be easily copy/paste'able to things such as regclass/regtype/regprocedure casts, and apparently the COMMENT input format seems to be the same as that one, so +1 from me. -- Dimitri Fontaine06 63 07 10 78 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2.3 crashes during archive recovery
On 07.03.2013 10:05, KONDO Mitsumasa wrote: (2013/03/06 16:50), Heikki Linnakangas wrote: Yeah. That fix isn't right, though; XLogPageRead() is supposed to return true on success, and false on error, and the patch makes it return 'true' on error, if archive recovery was requested but we're still in crash recovery. The real issue here is that I missed the two return NULL;s in ReadRecord(), so the code that I put in the next_record_is_invalid codepath isn't run if XLogPageRead() doesn't find the file at all. Attached patch is the proper fix for this. Thanks for createing patch! I test your patch in 9.2_STABLE, but it does not use promote command... When XLogPageRead() was returned false ,it means the end of stanby loop, crash recovery loop, and archive recovery loop. Your patch is not good for promoting Standby to Master. It does not come off standby loop. So I make new patch which is based Heikki's and Horiguchi's patch. Ah, I see. I committed a slightly modified version of this. I also found a bug in latest 9.2_stable. It does not get latest timeline and recovery history file in archive recovery when master and standby timeline is different. Works for me.. Can you create a test script for that? Remember to set recovery_target_timeline='latest'. ... It can be reproduced in my test script, too. I see the problem now, with that script. So what happens is that the startup process first scans the timeline history files to choose the recovery target timeline. For that scan, I temporarily set InArchiveRecovery=true, in readRecoveryCommandFile. However, after readRecoveryCommandFile returns, we then try to read the timeline history file corresponding the chosen recovery target timeline, but InArchiveRecovery is no longer set, so we don't fetch the file from archive, and return a dummy history, with just the target timeline in it. That doesn't contain the older timeline, so you get an error at recovery. Fixed per your patch to check for ArchiveRecoveryRequested instead of InArchiveRecovery, when reading timeline history files. This also makes it unnecessary to temporarily set InArchiveRecovery=true, so removed that. Committed both fixes. Please confirm this this fixed the problem in your test environment. Many thanks for the testing and the patches! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On 2013-03-05 23:26:59 +0200, Heikki Linnakangas wrote: On 04.03.2013 06:39, Amit Kapila wrote: The stats look fairly sane. I'm a little concerned about the apparent trend of falling TPS in the patched vs original tests for the 1-client test as record size increases, but it's only 0.0%-0.2%-0.4%, and the 0.4% case made other config changes too. Nonetheless, it might be wise to check with really big records and see if the trend continues. For bigger size (~2000) records, it goes into toast, for which we don't do this optimization. This optimization is mainly for medium size records. I've been doing investigating the pglz option further, and doing performance comparisons of the pglz approach and this patch. I'll begin with some numbers: unpatched (63d283ecd0bc5078594a64dfbae29276072cdf45): testname | wal_generated | duration -+---+-- two short fields, no change |1245525360 | 9.94613695144653 two short fields, one changed |1245536528 | 10.146910905838 two short fields, both changed |1245523160 | 11.2332470417023 one short and one long field, no change |1054926504 | 5.90477800369263 ten tiny fields, all changed|1411774608 | 13.4536008834839 hundred tiny fields, all changed| 635739680 | 7.57448387145996 hundred tiny fields, half changed | 636930560 | 7.56888699531555 hundred tiny fields, half nulled| 573751120 | 6.68991994857788 Amit's wal_update_changes_v10.patch: testname | wal_generated | duration -+---+-- two short fields, no change |1249722112 | 13.0558869838715 two short fields, one changed |1246145408 | 12.9947438240051 two short fields, both changed |1245951056 | 13.0262880325317 one short and one long field, no change | 678480664 | 5.70031690597534 ten tiny fields, all changed|1328873920 | 20.0167419910431 hundred tiny fields, all changed| 638149416 | 14.4236788749695 hundred tiny fields, half changed | 635560504 | 14.8770561218262 hundred tiny fields, half nulled| 558468352 | 16.2437210083008 pglz-with-micro-optimizations-1.patch: testname | wal_generated | duration -+---+-- two short fields, no change |1245519008 | 11.6702048778534 two short fields, one changed |1245756904 | 11.3233819007874 two short fields, both changed |1249711088 | 11.6836447715759 one short and one long field, no change | 664741392 | 6.44810795783997 ten tiny fields, all changed|1328085568 | 13.9679481983185 hundred tiny fields, all changed| 635974088 | 9.15514206886292 hundred tiny fields, half changed | 636309040 | 9.13769292831421 hundred tiny fields, half nulled| 496396448 | 8.77351498603821 In each test, a table is created with a large number of identical rows, and fillfactor=50. Then a full-table UPDATE is performed, and the UPDATE is timed. Duration is the time spent in the UPDATE (lower is better), and wal_generated is the amount of WAL generated by the updates (lower is better). The summary is that Amit's patch is a small win in terms of CPU usage, in the best case where the table has few columns, with one large column that is not updated. In all other cases it just adds overhead. In terms of WAL size, you get a big gain in the same best case scenario. Attached is a different version of this patch, which uses the pglz algorithm to spot the similarities between the old and new tuple, instead of having explicit knowledge of where the column boundaries are. This has the advantage that it will spot similarities, and be able to compress, in more cases. For example, you can see a reduction in WAL size in the hundred tiny fields, half nulled test case above. The attached patch also just adds overhead in most cases, but the overhead is much smaller in the worst case. I think that's the right tradeoff here - we want to avoid scenarios where performance falls off the cliff. That said, if you usually just get a slowdown, we certainly can't make this the default, and if we can't turn it on by default, this probably just isn't worth it. The attached patch contains the variable-hash-size changes I posted in the Optimizing pglz compressor. But in the delta encoding function, it goes further than that, and contains some further micro-optimizations: the hash is calculated in a rolling fashion, and it uses a specialized version of the pglz_hist_add macro that knows that the input can't exceed 4096 bytes. Those changes
Re: [HACKERS] Materialized views WIP patch
David E. Wheeler da...@justatheory.com wrote: Kevin Grittner kgri...@ymail.com wrote: I also think that something should be done about the documentation for indexes. Right now that always refers to a table. It would clearly be awkward to change that to table or materialized view everywhere. I wonder if most of thosse should be changed to relation with a few mentions that the relation could be a table or a materialized view, or whether some less intrusive change would be better. Opinions welcome. Isn’t a materialized view really just a table that gets updated periodically? Not exactly. It is a relation which has some characteristics of a view (including an entry in pg_rewrite exactly like that for a view) and some characteristics of a table (including a heap and optional indexes). Whether it looks more like a table or more like a view depends on how you tilt your head. You could just as easily say that it is really just a view which periodically caches its results on disk. They really are their own thing. As future releases add more subtle freshness concepts, incremental updates, and query rewrite that unique identity will become even more conspicuous, I think. And isn’t a non-matierialized view also thought of as a “relation”? Yes. Tables, views, and materialized views are all relations. If the answer to both those questions is “yes,” I think the term should remain “table,” with a few mentions that the term includes materialized views (and excludes foreign tables). And if the answers are not exactly and yes? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] odd behavior in materialized view
On Thu, Mar 7, 2013 at 8:21 AM, Kevin Grittner kgri...@ymail.com wrote: Fujii Masao masao.fu...@gmail.com wrote: On Tue, Mar 5, 2013 at 7:36 AM, Kevin Grittner kgri...@ymail.com wrote: Fujii Masao masao.fu...@gmail.com wrote: When I accessed the materialized view in the standby server, I got the following ERROR message. Looks odd to me. Is this a bug? ERROR: materialized view hogeview has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. The procedure to reproduce this error message is: In the master server: CREATE TABLE hoge (i int); INSERT INTO hoge VALUES (generate_series(1,100)); CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge; DELETE FROM hoge; REFRESH MATERIALIZED VIEW hogeview; SELECT count(*) FROM hogeview; In the standby server SELECT count(*) FROM hogeview; SELECT count(*) goes well in the master, and expectedly returns 0. OTOH, in the standby, it emits the error message. Will investigate. Thanks! And I found another problem. When I ran the following SQLs in the master, PANIC error occurred in the standby. CREATE TABLE hoge (i int); INSERT INTO hoge VALUES (generate_series(1,100)); CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge; VACUUM ANALYZE; The PANIC error messages that I got in the standby are WARNING: page 0 of relation base/12297/16387 is uninitialized CONTEXT: xlog redo visible: rel 1663/12297/16387; blk 0 PANIC: WAL contains references to invalid pages CONTEXT: xlog redo visible: rel 1663/12297/16387; blk 0 base/12297/16387 is the file of the materialized view 'hogeview'. I was able to replicate both bugs, and they both appear to be fixed by the attached, which I have just pushed. Thanks! I confirmed that the problem that I reported has disappeared in HEAD. Unfortunately I found another odd behavior. When I accessed the MV after VACUUM ANALYZE, I got the following error. ERROR: materialized view hogeview has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. STATEMENT: select * from hogeview where i 10; The test case to reproduce that is: create table hoge (i int); insert into hoge values (generate_series(1,10)); create materialized view hogeview as select * from hoge where i % 2 = 0; create index hogeviewidx on hogeview (i); delete from hoge; refresh materialized view hogeview; select * from hogeview where i 10; vacuum analyze; select * from hogeview where i 10; The last SELECT command caused the above error. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund and...@2ndquadrant.com wrote: The strange think about hoge_pkey_cct_cct is that it seems to imply that an invalid index was reindexed concurrently? But I don't see how it could happen either. Fujii, can you reproduce it? Yes, I can even with the latest version of the patch. The test case to reproduce it is: (Session 1) CREATE TABLE hoge (i int primary key); INSERT INTO hoge VALUES (generate_series(1,10)); (Session 2) BEGIN; SELECT * FROM hoge; (keep this session as it is) (Session 1) SET statement_timeout TO '1s'; REINDEX TABLE CONCURRENTLY hoge; \d hoge REINDEX TABLE CONCURRENTLY hoge; \d hoge Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On 2013-03-07 09:58:58 +0900, Michael Paquier wrote: +The recommended recovery method in such cases is to drop the concurrent +index and try again to perform commandREINDEX CONCURRENTLY/. If an invalid index depends on the constraint like primary key, drop the concurrent index cannot actually drop the index. In this case, you need to issue alter table ... drop constraint ... to recover the situation. I think this informataion should be documented. I think we just shouldn't set -isprimary on the temporary indexes. Now we switch only the relfilenodes and not the whole index, that should be perfectly fine. Sounds good. But, what about other constraint case like unique constraint? Those other cases also can be resolved by not setting -isprimary? We should stick with the concurrent index being a twin of the index it rebuilds for consistency. I don't think its legal. We cannot simply have two indexes with 'indisprimary'. Especially not if bot are valid. Also, there will be no pg_constraint row that refers to it which violates very valid expectations that both users and pg may have. So what to do with that? Mark the concurrent index as valid, then validate it and finally mark it as invalid inside the same transaction at phase 4? That's moving 2 lines of code... Sorry phase 4 is the swap phase. Validation happens at phase 3. Why do you want to temporarily mark it as valid? I don't see any requirement that it is set to that during validate_index() (which imo is badly named, but...). I'd just set it to valid in the same transaction that does the swap. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] odd behavior in materialized view
Fujii Masao masao.fu...@gmail.com wrote: Thanks! I confirmed that the problem that I reported has disappeared in HEAD. Unfortunately I found another odd behavior. When I accessed the MV after VACUUM ANALYZE, I got the following error. ERROR: materialized view hogeview has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. STATEMENT: select * from hogeview where i 10; The test case to reproduce that is: create table hoge (i int); insert into hoge values (generate_series(1,10)); create materialized view hogeview as select * from hoge where i % 2 = 0; create index hogeviewidx on hogeview (i); delete from hoge; refresh materialized view hogeview; select * from hogeview where i 10; vacuum analyze; select * from hogeview where i 10; The last SELECT command caused the above error. Thanks. Will fix. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Mar 7, 2013, at 7:55 AM, Kevin Grittner kgri...@ymail.com wrote: If the answer to both those questions is “yes,” I think the term should remain “table,” with a few mentions that the term includes materialized views (and excludes foreign tables). And if the answers are not exactly and yes? I still tend to think that the term should remain “table,” with brief mentions at the top of pages when the term should be assumed to represent tables and matviews, and otherwise required disambiguations. Trying to make the least possible work for you here. :-) David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] odd behavior in materialized view
On Fri, Mar 8, 2013 at 1:52 AM, Kevin Grittner kgri...@ymail.com wrote: Fujii Masao masao.fu...@gmail.com wrote: Thanks! I confirmed that the problem that I reported has disappeared in HEAD. Unfortunately I found another odd behavior. When I accessed the MV after VACUUM ANALYZE, I got the following error. ERROR: materialized view hogeview has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. STATEMENT: select * from hogeview where i 10; The test case to reproduce that is: create table hoge (i int); insert into hoge values (generate_series(1,10)); create materialized view hogeview as select * from hoge where i % 2 = 0; create index hogeviewidx on hogeview (i); delete from hoge; refresh materialized view hogeview; select * from hogeview where i 10; vacuum analyze; select * from hogeview where i 10; The last SELECT command caused the above error. Thanks. Will fix. Thanks! I found one typo in the document of MV. Please see the attached patch. Regards, -- Fujii Masao typo.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] REFRESH MATERIALIZED VIEW locklevel
Hi, if I understand things correctly REFRESH MATERIALIZED VIEW locks the materialized view with an AcessExclusiveLock even if the view already contains data. I am pretty sure that will - understandably - confuse users, so I vote for at least including a note about that in the docs. This fact imo reduces the usability of the matviews features as it stands atm considerably. I think we should be very careful not to advocate its existance much and document very clearly that its work in progress. Working incrementally is a sensible thing to do, don't get me wrong... Making the refresh work concurrently doesn't seem to be too hard if its already initialized: 1) lock relation exlusively in session mode (or only ShareUpdateExlusive?) 2) build new data into a separate relfilenode 3) switch relfilenode 4) wait till all potential users of the old relfilenode are gone (VirtualXactLock games, just as in CREATE INDEX CONCURRENTLY) 5) drop old relfilenode The only problem I see right now is that we might forget to delete the old relation if we crash during 4). Even if we WAL log it, due to checkpoints causing that action not to be replayed. But that seems to be nothing new, I think the same problem exists during normal table rewrites as well, just the other way round (i.e. we forget about the new relfilenode). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] odd behavior in materialized view
Fujii Masao masao.fu...@gmail.com wrote: I found one typo in the document of MV. Please see the attached patch. Pushed. Thanks! -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel
Andres Freund and...@2ndquadrant.com wrote: if I understand things correctly REFRESH MATERIALIZED VIEW locks the materialized view with an AcessExclusiveLock even if the view already contains data. Yeah. At the time I had to make a decision on that, REINDEX CONCURRENTLY did not seem reliable with a weaker lock, and REFRESH MATERIALIZED VIEW has to rebuild indexes (among other things). If we have all the issues sorted out with REINDEX CONCURRENTLY then the same techniques will probably apply to RMV without too much difficulty. It's a bit late to think about that for 9.3, though. I am pretty sure that will - understandably - confuse users, so I vote for at least including a note about that in the docs. Will see about working that in. -Kevin -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel
On 2013-03-07 09:55:39 -0800, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: if I understand things correctly REFRESH MATERIALIZED VIEW locks the materialized view with an AcessExclusiveLock even if the view already contains data. Yeah. At the time I had to make a decision on that, REINDEX CONCURRENTLY did not seem reliable with a weaker lock, and REFRESH MATERIALIZED VIEW has to rebuild indexes (among other things). If we have all the issues sorted out with REINDEX CONCURRENTLY then the same techniques will probably apply to RMV without too much difficulty. It's a bit late to think about that for 9.3, though. I don't think that REFRESH MATERIALIZED VIEW has to deal with the same problems that REINDEX CONCURRENTLY has - after all, there won't be any DML coming in while its running. That should get rid of the REINDEX CONCURRENTLY problems. There doesn't seem to be any need to use the far more expensive REINDEX CONCURRENLTY style computation of indexes which has to scan the heap multiple times et al. They just should be built ontop of new matview relation which is essentially read only. I am pretty sure that will - understandably - confuse users, so I vote for at least including a note about that in the docs. Will see about working that in. Cool. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel
Kevin Grittner kgri...@ymail.com schrieb: Andres Freund and...@2ndquadrant.com wrote: if I understand things correctly REFRESH MATERIALIZED VIEW locks the materialized view with an AcessExclusiveLock even if the view already contains data. Yeah. At the time I had to make a decision on that, REINDEX CONCURRENTLY did not seem reliable with a weaker lock, and REFRESH MATERIALIZED VIEW has to rebuild indexes (among other things). If we have all the issues sorted out with REINDEX CONCURRENTLY then the same techniques will probably apply to RMV without too much difficulty. It's a bit late to think about that for 9.3, though. I am pretty sure that will - understandably - confuse users, so I vote for at least including a note about that in the docs. Will see about working that in. In the ride home I realized that unless - not that unlikely - you thought about something I didtn't REFRESH will behave similar to TRUNCATE for repeatable read+ transactions that only access it after REFRESH finished. That is, they will appear empty. If that's the case, it needs to be documented prominently as well. It would be possible to get different behaviour by immediately freezing all tuples, but that would also result in violations of visibility by showing tuples that are not yet visible. Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Duplicate JSON Object Keys
This behavior surprised me a bit: david=# select '{foo: 1, foo: 2}'::json; json -- {foo: 1, foo: 2} I had expected something more like this: david=# select '{foo: 1, foo: 2}'::json; json {foo: 2} This hasn’t been much of an issue before, but with Andrew’s JSON enhancements going in, it will start to cause problems: david=# select json_get('{foo: 1, foo: 2}', 'foo'); ERROR: field name is not unique in json object Andrew tells me that the spec requires this. I think that’s fine, but I would rather that it never got to there. In the spirit of being liberal about what we accept but strict about what we store, it seems to me that JSON object key uniqueness should be enforced either by throwing an error on duplicate keys, or by flattening so that the latest key wins (as happens in JavaScript). I realize that tracking keys will slow parsing down, and potentially make it more memory-intensive, but such is the price for correctness. Thoughts? Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel
anara...@anarazel.de and...@anarazel.de wrote: In the ride home I realized that unless - not that unlikely - you thought about something I didtn't REFRESH will behave similar to TRUNCATE for repeatable read+ transactions that only access it after REFRESH finished. That is, they will appear empty. In an early version of the patch someone found that in testing and pointed it out. It would be possible to get different behaviour by immediately freezing all tuples Which is what I did. but that would also result in violations of visibility by showing tuples that are not yet visible. Which is the case, and should be documented. (I had not remembered to do so yet; I'll tuck away your email as a reminder.) Since the MV is already not guaranteed to be in sync with other data, that didn't seem like a fatal flaw. It is, however, the one case where the MV could appear to be *ahead* of the supporting data rather than *behind* it. In a way this is similar to how READ COMMITTED transactions can see data from more than one snapshot on write conflicts, so I see it as a bigger issue for more strict isolation levels -- but those are unlikely to be all that useful with MVs in this release anyway. This is something that I think deserves some work in a subsequent release. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Duplicate JSON Object Keys
On 03/07/2013 02:48 PM, David E. Wheeler wrote: This behavior surprised me a bit: david=# select '{foo: 1, foo: 2}'::json; json -- {foo: 1, foo: 2} I had expected something more like this: david=# select '{foo: 1, foo: 2}'::json; json {foo: 2} This hasn’t been much of an issue before, but with Andrew’s JSON enhancements going in, it will start to cause problems: david=# select json_get('{foo: 1, foo: 2}', 'foo'); ERROR: field name is not unique in json object Andrew tells me that the spec requires this. I think that’s fine, but I would rather that it never got to there. Specifically, rfc4627 says (note last sentence): 2.2. Objects An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string. A single colon comes after each name, separating the name from the value. A single comma separates a value from a following name. The names within an object SHOULD be unique. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Tue, 2013-03-05 at 11:35 +0200, Heikki Linnakangas wrote: If you enable checksums, the free space map never gets updated in a standby. It will slowly drift to be completely out of sync with reality, which could lead to significant slowdown and bloat after failover. One of the design points of this patch is that those operations that use MarkBufferDirtyHint(), including tuple hint bits, the FSM, index dead markers, etc., do not directly go to the standby. That's because the standby can't write WAL, so it can't protect itself against a torn page breaking the checksum. However, these do make it through by riding along with a full-page image in the WAL. The fact that checksums are enabled means that these full page images will be written once per modified page per checkpoint, and then replayed on the standby. FSM should get the updates the same way, even though no other WAL is written for the FSM. If full_page_writes are disabled, then the updates will never arrive. But in that case, I think we can just go ahead and dirty the page during recovery, because there isn't a real problem. I was hesitant to make this change in my patch because: 1. I wanted to see if someone saw a flaw in this reasoning; and 2. I noticed that full_page_images can be changed with a SIGHUP, which could add complexity (I don't see any reason we allow this... shouldn't we just force a restart for that change?). I added a README file, moved some of the explanatory material there, and tried to clarify this situation. Let me know if you see a problem that I'm missing. I verified that at least some FSM changes do make it through with checksums on, but I didn't dig much deeper than that. Since the checksums are an all-or-nothing cluster-wide setting, the three extra flags in the page header, PD_CHECKSUMS1, PD_CHECKSUM2 and PD_HEADERCHECK, are not needed. Let's leave them out. That keeps the code simpler, and leaves the bits free for future use. If we want to enable such per-page setting in the future, we can add it later. For a per-relation scheme, they're not needed. Removed header bits. XLogCheckBuffer() and XLogCheckBufferNeedsBackup() read the page LSN without a lock. That's not atomic, so it could incorrectly determine that a page doesn't need to be backed up. We used to always hold an exclusive lock on the buffer when it's called, which prevents modifications to the LSN, but that's no longer the case. Fixed. I added a new exported function, BufferGetLSNAtomic(). There was another similar omission in gistget.c. By the way, I can not find any trace of XLogCheckBufferNeedsBackup(), was that a typo? Shouldn't SetBufferCommitInfoNeedsSave() check the BM_PERMANENT flag? I think it will generate WAL records for unlogged tables as it is. Fixed. I also rebased and added a GUC to control whether the checksum failure causes an error or not. I need to do another self-review after these changes and some more extensive testing, so I might have missed a couple things. Regards, Jeff Davis checksums-20130307.patch.gz Description: GNU Zip compressed data replace-tli-with-checksums-20130307.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel
On 2013-03-07 11:50:11 -0800, Kevin Grittner wrote: anara...@anarazel.de and...@anarazel.de wrote: In the ride home I realized that unless - not that unlikely - you thought about something I didtn't REFRESH will behave similar to TRUNCATE for repeatable read+ transactions that only access it after REFRESH finished. That is, they will appear empty. In an early version of the patch someone found that in testing and pointed it out. Cool ;) It would be possible to get different behaviour by immediately freezing all tuples Which is what I did. Ok. but that would also result in violations of visibility by showing tuples that are not yet visible. Which is the case, and should be documented. (I had not remembered to do so yet; I'll tuck away your email as a reminder.) Since the MV is already not guaranteed to be in sync with other data, that didn't seem like a fatal flaw. It is, however, the one case where the MV could appear to be *ahead* of the supporting data rather than *behind* it. In a way this is similar to how READ COMMITTED transactions can see data from more than one snapshot on write conflicts, so I see it as a bigger issue for more strict isolation levels -- but those are unlikely to be all that useful with MVs in this release anyway. This is something that I think deserves some work in a subsequent release. I am not that convinced that this is unproblematic. I don't see any problem with READ COMMITTED but in higher levels Even if you expect the view to be out-of-date it may very well be surprising to see it referring to rows in another table that do not yet exists although rows in that table never get deleted. Especially in the initial population I don't see any way to get rid of the problem - I don't think there exists a valid way to compute valid xmin/xmax values for the resulting tuples of all queries. So unless we get catalog accesses that completly objeys repeatable read semantics there's not much we can do about that. And while I think getting rid of SnapshotNow is realistic, I don't think fully versioned catalog access is (i.e. it working in a way that you could access a table in the old and new version after a ALTER TABLE ...). I wonder if we should add something like indexcheckxmin to matviews which specifies after which value its valid. Only that it errors out if you haven't reached it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel
Andres, if I understand things correctly REFRESH MATERIALIZED VIEW locks the materialized view with an AcessExclusiveLock even if the view already contains data. I am pretty sure that will - understandably - confuse users, so I vote for at least including a note about that in the docs. +1 for mentioning it in the docs. We could stand to document what locklevels various commands take more in general. This fact imo reduces the usability of the matviews features as it stands atm considerably. I think we should be very careful not to advocate its existance much and document very clearly that its work in progress. Working incrementally is a sensible thing to do, don't get me wrong... -1 from me. Postgres is currently full of fairly innocent-looking commands which take an unexpected ACCESS EXCLUSIVE lock. For example, DROP CONSTRAINT takes an accessexclusive lock, but it hasn't stopped people from using constraints, and isn't particularly high up on our todo list to fix it. This limitation is in no way crippling for this feature, or even a major detraction. I still intend to promote the heck out of this feature. Now, I agree that having a REFRESH ... CONCURRENTLY would be a wonderful feature for 9.4. But the fact that we don't have it yet is not a big deal, and I would put several other matview-related features ahead of concurrent in priority. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
2013/3/5 Kevin Grittner kgri...@ymail.com: Perhaps it would be worth looking for anything in the patch that you think might be painting us into a corner where it would be hard to do all the other cool things. While it's late enough in the process that changing anything like that which you find would be painful, it might be a lot more painful later if we release without doing something about it. My hope, of course, is that you won't find any such thing. With this patch I've tried to provide a minimal framework onto which these other things can be bolted. I've tried hard not to do anything which would make it hard to extend, but new eyes may see something I missed. (Without having looked at the patch, or even the documentation :-/.) I think that something that might prove useful is the following: Keeping in mind the possibility of storing something in the matview’s heap that doesn’t correspond to what a SELECT * FROM matview would yield (i.e., the “logical content”). The transformation could be performed by an INSTEAD rule (similar to how a view is expanded to its definition, a reference to a matview would expand to its heap content transformed to the “logical content”). (Note that I don’t have any reason to believe that the current implementation would make this more difficult than it should be.) ridiculously long rationale for the previous (All the following requires making matviews (inter- and intra-) transactionally up-to-date w.r.t. their base tables at the moment of querying them. I don’t deal with approximate results, however useful that might be.) I think that the possibility of optimizing COUNT(*) (see mail by Greg Stark in this thread with “the queue of updates with transacion information that are periodically flattened into the aggregate”) can be generalized to generic aggregation that way. The idea would be that a transaction that adds (or deletes or updates) a row in a base table causes a “delta” row version in the matview. Selecting from the matview then merges these deltas into one value (for each row that is logically present in the matview). Every once in a while (or rather quite often, if the base tables change often), a VACUUM-like clean-up operations must be run to merge all rows that are “old enough” (i.e., whose transactions are not in flight anymore). Example of trivial aggregation matview weight_per_kind defined as: SELECT kind, SUM(weight) FROM fruit GROUP BY kind; The matview would then physically contain rows such as: xmin, xmax, kind, weight 1000, 0, 'banana', 123 1000, 0, 'apple', 1 1001, 0, 'banana', 2 1002, 0, 'banana', -3 Which means: * tx 1000 probably performed a clean-up operation and merged a bunch of banana rows together to yield 123; it also inserted an apple of weight 1. * tx 1001 inserted a banana of weight 2. Any clean-up operation coming by could not merge the 2 into the first row, as long as tx 1000 is in flight. Otherwise, it would yield 125; physically this would mean adding a 125 row, marking the 123 and 2 rows as deleted, and then waiting for VACUUM to remove them). * tx 1002 deleted a banana with weight 3. The result of a SELECT * FROM weight_per_kind; would actually execute SELECT kind, SUM_merge(weight) FROM heap_of_weight_per_kind GROUP BY kind; This would result, for tx 1001 (assuming tx 1000 committed and our snapshot can see it), in: kind, weight 'banana', 125 'apple', 1 (The -3 is not taken into account, because it is not visible to tx 1001.) The operator to use at the location of SUM_merge is something that merges multiple aggregation results (plus results that represent some kind of “negation”) together. For SUM, it would be SUM itself and the negation would be numerical negation. There might also be some kind of “difference” concept used for UPDATE: When updating a weight from 4 to 3, the difference would be -1. Those additional properties could optionally be added to the definition of each aggregation function; It must be done for each function that you want to use in such a way. Other aggregation functions such as AVG would require storing the SUM + number of rows in the matview (otherwise two AVGs could not be merged); again a difference between the heap and the logical content. Functions such as MIN and MAX are more difficult to fit in this framework: I can only see how it would work if row deletion were not allowed (which might still be a valid use-case); luckily, I think MIN and MAX are not the typical things for which you would want to use matviews, because quick computation can typically be done directly using the base tables. This whole thing would result in incremental updates of aggregation-matviews that don’t require physical serialization of the transactions that update the base tables and that query the matview, which other models (that depend on correspondence between the heap and logical content of matviews) would probably require. And that’s where I stop rambling because nobody gets this far anyway,
Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel
Hi, On 2013-03-07 15:21:35 -0800, Josh Berkus wrote: This fact imo reduces the usability of the matviews features as it stands atm considerably. I think we should be very careful not to advocate its existance much and document very clearly that its work in progress. Working incrementally is a sensible thing to do, don't get me wrong... -1 from me. Postgres is currently full of fairly innocent-looking commands which take an unexpected ACCESS EXCLUSIVE lock. For example, DROP CONSTRAINT takes an accessexclusive lock, but it hasn't stopped people from using constraints, and isn't particularly high up on our todo list to fix it. Thats a pretty unconvincing comparison. There isn't any expectation that ALTER TABLE works without taking exlusive locks from common implementations and DROP CONSTRAINT only takes a very short time while refreshing a materialized view possibly take rather long. This limitation is in no way crippling for this feature, or even a major detraction. I still intend to promote the heck out of this feature. Thats scaring me. Because the current state of the feature isn't something that people expect under the term materialized views and I am pretty damn sure people will then remember postgres as trying to provide a tick-box item without it being really usable in the real world. And thats not something I want postgres to be known for. Note that I *really* think working incrementally on such things is the way to go and I think its good that this got committed in 9.3. But if this now gets used prominently in promotion in its current state I think the conclusion is that working incrementally in postgres isn't the way to go and that will make it *much* harder to do so in future releases. Which will slow postgres down. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist
[Moving to -hackers] On Sat, Feb 23, 2013 at 2:51 PM, Pavel Stehule pavel.steh...@gmail.com wrote: so * --conditional-drops replaced by --if-exists Thanks for the fixes, I played around with the patch a bit. I was sort of expecting this example to work (after setting up the regression database with `make installcheck`) pg_dump --clean --if-exists -Fp -d regression --file=regression.sql createdb test psql -v ON_ERROR_STOP=1 --single-transaction -d test -f regression.sql But it fails, first at: ... DROP TRIGGER IF EXISTS tsvectorupdate ON public.test_tsvector; ERROR: relation public.test_tsvector does not exist This seems like a shortcoming of DROP TRIGGER ... IF EXISTS, and it looks like DROP RULE ... IF EXISTS has the same problem. I recall DROP ... IF EXISTS being fixed recently for not to error out if the schema specified for the object does not exist, and ISTM the same arguments could be made in favor of fixing DROP TRIGGER/TABLE ... IF EXISTS not to error out if the table doesn't exist. Working further through the dump of the regression database, these also present problems for --clean --if-exists dumps: DROP CAST IF EXISTS (text AS public.casttesttype); ERROR: type public.casttesttype does not exist DROP OPERATOR IF EXISTS public.% (point, widget); ERROR: type widget does not exist DROP FUNCTION IF EXISTS public.pt_in_widget(point, widget); ERROR: type widget does not exist I'm not sure whether DROP CAST/OPERATOR/FUNCTION IF EXISTS should be more tolerant of nonexistent types, of if the mess could perhaps be avoided by dump reordering. Note, this usability problem affects unpatched head as well: pg_dump -Fc -d regression --file=regression.dump pg_restore --clean -1 -d regression regression.dump ... pg_restore: [archiver (db)] could not execute query: ERROR: type widget does not exist Command was: DROP FUNCTION public.widget_out(widget); (The use here is a little different than the first example above, but I would still expect this case to work.) The above problems with IF EXISTS aren't really a problem of the patch per se, but IMO it would be nice to straighten all the issues out together for 9.4. * -- additional check, available only with -c option Cool. I think it would also be useful to check that --clean may only be used with --format=p to avoid any confusion there. (This issue could be addressed in a separate patch if you'd rather not lard this patch.) Some comments on the changes: 1. There is at least one IF EXISTS check missing from pg_dump.c, see for example this statement from a dump of the regression database with --if-exists: ALTER TABLE public.nv_child_2010 DROP CONSTRAINT nv_child_2010_d_check; 2. Shouldn't pg_restore get --if-exists as well? 3. + printf(_( --if-exists don't report error if cleaned object doesn't exist\n)); This help output bleeds just over our de facto 80-character limit. Also contractions seem to be avoided elsewhere. It's a little hard to squeeze a decent explanation into one line, but perhaps: Use IF EXISTS when dropping objects would be better. The sgml changes could use some wordsmithing and grammar fixes. I could clean these up for you in a later version if you'd like. 4. There seem to be spurious whitespace changes to the function prototype and declaration for _printTocEntry. That's all I've had time for so far... Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel
Postgres is currently full of fairly innocent-looking commands which take an unexpected ACCESS EXCLUSIVE lock. For example, DROP CONSTRAINT takes an accessexclusive lock, but it hasn't stopped people from using constraints, and isn't particularly high up on our todo list to fix it. Thats a pretty unconvincing comparison. There isn't any expectation that ALTER TABLE works without taking exlusive locks from common Not exclusive (which is expected), but AccessExclusive (which catches many of our users by surprise). How about the fact that dropping an FK constraint takes an AccessExclusiveLock on the *referenced* table? implementations and DROP CONSTRAINT only takes a very short time while refreshing a materialized view possibly take rather long. Right now there's no expectations at all about our new Matview feature. I think putting the locking information in the docs is the right way to go. Thats scaring me. Because the current state of the feature isn't something that people expect under the term materialized views and I am pretty damn sure people will then remember postgres as trying to provide a tick-box item without it being really usable in the real world. And thats not something I want postgres to be known for. We promoted the heck out of binary replication when it was barely usable. We've gotten huge interest in our JSON support, even when it's a work-in-progress. I don't see why I should change an approach to advocacy which is clearly working. What our project considers an incomplete feature other OSS DBMSes call a version 2.0. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel
On 2013-03-07 15:54:32 -0800, Josh Berkus wrote: Postgres is currently full of fairly innocent-looking commands which take an unexpected ACCESS EXCLUSIVE lock. For example, DROP CONSTRAINT takes an accessexclusive lock, but it hasn't stopped people from using constraints, and isn't particularly high up on our todo list to fix it. Thats a pretty unconvincing comparison. There isn't any expectation that ALTER TABLE works without taking exlusive locks from common Not exclusive (which is expected), but AccessExclusive (which catches many of our users by surprise). How about the fact that dropping an FK constraint takes an AccessExclusiveLock on the *referenced* table? All of that is DDL. implementations and DROP CONSTRAINT only takes a very short time while refreshing a materialized view possibly take rather long. Right now there's no expectations at all about our new Matview feature. I think putting the locking information in the docs is the right way to go. That should definitely be done. The point is that a) refreshing is the only way to update materialized views. There's no incremental support. b) refreshing will take a long time (otherwise you wouldn't have create a materialized view) and you can't use the view during that. Which means for anyone wanting to use matviews in a busy environment you will need to build the new matview separately and then move it into place via renames. With all the issues that brings like needing to recreate dependent views and such. Sorry, but thats not very useful expect (and there very much so) as a stepping stone for further work. Thats scaring me. Because the current state of the feature isn't something that people expect under the term materialized views and I am pretty damn sure people will then remember postgres as trying to provide a tick-box item without it being really usable in the real world. And thats not something I want postgres to be known for. We promoted the heck out of binary replication when it was barely usable. We've gotten huge interest in our JSON support, even when it's a work-in-progress. I don't see why I should change an approach to advocacy which is clearly working. What our project considers an incomplete feature other OSS DBMSes call a version 2.0. I heard some people grumble about binary replication in 9.0 but there were loads of realword scenarios it could be used. I heard quite some people being annoyed about the level of json support even though it provided some usefulness with row_to_json (or whatever its called). And it's a feature that can be extended by extensions. And lots of the defficiencies of binary replication could be solved by outside tooling. Thats not possible with matviews as is. Which, again, is *totally fine* in itself. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Materialized views and unique indexes
Hi all, While testing materialized views, I found the following behavior with unique indexes: postgres=# create table aa as select generate_series(1,3) as a; SELECT 3 postgres=# create materialized view aam as select * from aa; SELECT 3 postgres=# create unique index aam_ind on aam(a); CREATE INDEX postgres=# insert into aa values (4); INSERT 0 1 postgres=# insert into aa values (1); INSERT 0 1 postgres=# refresh materialized view aam; ERROR: could not create unique index aam_ind DETAIL: Key (a)=(1) is duplicated. postgres=# select * from aam; a --- 1 2 3 (3 rows) As expected, the refresh failed, but the error message is not really user-friendly. Shouldn't we output instead something like that? ERROR: could not refresh materialized view because of failure when rebuilding index DETAIL: key is duplicated. Thanks, -- Michael
Re: [HACKERS] Materialized views and unique indexes
As expected, the refresh failed, but the error message is not really user-friendly. Shouldn't we output instead something like that? ERROR: could not refresh materialized view because of failure when rebuilding index DETAIL: key is duplicated. Is there a good reason to allow unique indexes (or constraints in general) on matviews? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views and unique indexes
On Fri, Mar 8, 2013 at 11:33 AM, Josh Berkus j...@agliodbs.com wrote: As expected, the refresh failed, but the error message is not really user-friendly. Shouldn't we output instead something like that? ERROR: could not refresh materialized view because of failure when rebuilding index DETAIL: key is duplicated. Is there a good reason to allow unique indexes (or constraints in general) on matviews? Don't think so. It would make sense to block the creation of all the constraints on matviews. Just based on the docs, matviews cannot have constraints: http://www.postgresql.org/docs/devel/static/sql-altermaterializedview.html Now that you mention it, you can create constraints on them (code at c805659). postgres=# create table aa (a int); CREATE TABLE postgres=# create materialized view aam as select * from aa; SELECT 0 postgres=# alter materialized view aam add constraint popo unique(a); ALTER MATERIALIZED VIEW postgres=# \d aam Materialized view public.aam Column | Type | Modifiers +-+--- a | integer | Indexes: popo UNIQUE CONSTRAINT, btree (a) Also, as it is not mandatory for a unique index to be a constraint, I think that we should block the creation of unique indexes too to avoid any problems. Any suggestions? -- Michael
Re: [HACKERS] Materialized views and unique indexes
On 03/08/2013 10:55 AM, Michael Paquier wrote: Also, as it is not mandatory for a unique index to be a constraint, I think that we should block the creation of unique indexes too to avoid any problems. Any suggestions? How much does the planner benefit from the implied constraint of a unique index? I almost wonder if it should be allowed at the cost of making the refresh of a matview that fails to comply an error. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Mon, Mar 4, 2013 at 05:04:27PM -0800, Daniel Farina wrote: Putting aside the not-so-rosy predictions seen elsewhere in this thread about the availability of a high performance, reliable checksumming file system available on common platforms, I'd like to express what benefit this feature will have to me: Corruption has easily occupied more than one person-month of time last year for us. This year to date I've burned two weeks, although admittedly this was probably the result of statistical clustering. Other colleagues of mine have probably put in a week or two in aggregate in this year to date. The ability to quickly, accurately, and maybe at some later date proactively finding good backups to run WAL recovery from is one of the biggest strides we can make in the operation of Postgres. The especially ugly cases are where the page header is not corrupt, so full page images can carry along malformed tuples...basically, when the corruption works its way into the WAL, we're in much worse shape. Checksums would hopefully prevent this case, converting them into corrupt pages that will not be modified. It would be better yet if I could write tools to find the last-good version of pages, and so I think tight integration with Postgres will see a lot of benefits that would be quite difficult and non-portable when relying on file system checksumming. I see Heroku has corruption experience, and I know Jim Nasby has struggled with corruption in the past. I also see the checksum patch is taking a beating. I wanted to step back and ask what percentage of known corruptions cases will this checksum patch detect? What percentage of these corruptions would filesystem checksums have detected? Also, don't all modern storage drives have built-in checksums, and report problems to the system administrator? Does smartctl help report storage corruption? Let me take a guess at answering this --- we have several layers in a database server: 1 storage 2 storage controller 3 file system 4 RAM 5 CPU My guess is that storage checksums only cover layer 1, while our patch covers layers 1-3, and probably not 4-5 because we only compute the checksum on write. If that is correct, the open question is what percentage of corruption happens in layers 1-3? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views and unique indexes
Craig Ringer cr...@2ndquadrant.com writes: On 03/08/2013 10:55 AM, Michael Paquier wrote: Also, as it is not mandatory for a unique index to be a constraint, I think that we should block the creation of unique indexes too to avoid any problems. Any suggestions? How much does the planner benefit from the implied constraint of a unique index? I almost wonder if it should be allowed at the cost of making the refresh of a matview that fails to comply an error. A unique constraint can allow join elimination, so I'm thinking that disallowing them is a bad idea (not to mention that it'd be a considerable wart in the code to block them for matviews only). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views and unique indexes
On Fri, Mar 8, 2013 at 12:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: Craig Ringer cr...@2ndquadrant.com writes: On 03/08/2013 10:55 AM, Michael Paquier wrote: Also, as it is not mandatory for a unique index to be a constraint, I think that we should block the creation of unique indexes too to avoid any problems. Any suggestions? How much does the planner benefit from the implied constraint of a unique index? I almost wonder if it should be allowed at the cost of making the refresh of a matview that fails to comply an error. A unique constraint can allow join elimination, so I'm thinking that disallowing them is a bad idea (not to mention that it'd be a considerable wart in the code to block them for matviews only). Fair argument. The error message at refresh step should be more explicit though. I still have the feeling that users might be lost if a constraint introduced on matviews is failing during refresh with the current error message. -- Michael
Re: [HACKERS] Enabling Checksums
2013/3/8 Bruce Momjian br...@momjian.us: On Mon, Mar 4, 2013 at 05:04:27PM -0800, Daniel Farina wrote: Putting aside the not-so-rosy predictions seen elsewhere in this thread about the availability of a high performance, reliable checksumming file system available on common platforms, I'd like to express what benefit this feature will have to me: Corruption has easily occupied more than one person-month of time last year for us. This year to date I've burned two weeks, although admittedly this was probably the result of statistical clustering. Other colleagues of mine have probably put in a week or two in aggregate in this year to date. The ability to quickly, accurately, and maybe at some later date proactively finding good backups to run WAL recovery from is one of the biggest strides we can make in the operation of Postgres. The especially ugly cases are where the page header is not corrupt, so full page images can carry along malformed tuples...basically, when the corruption works its way into the WAL, we're in much worse shape. Checksums would hopefully prevent this case, converting them into corrupt pages that will not be modified. It would be better yet if I could write tools to find the last-good version of pages, and so I think tight integration with Postgres will see a lot of benefits that would be quite difficult and non-portable when relying on file system checksumming. I see Heroku has corruption experience, and I know Jim Nasby has struggled with corruption in the past. I also see the checksum patch is taking a beating. I wanted to step back and ask what percentage of known corruptions cases will this checksum patch detect? What percentage of these corruptions would filesystem checksums have detected? Also, don't all modern storage drives have built-in checksums, and report problems to the system administrator? Does smartctl help report storage corruption? Let me take a guess at answering this --- we have several layers in a database server: 1 storage 2 storage controller 3 file system 4 RAM 5 CPU My guess is that storage checksums only cover layer 1, while our patch covers layers 1-3, and probably not 4-5 because we only compute the checksum on write. If that is correct, the open question is what percentage of corruption happens in layers 1-3? I cooperate with important Czech bank - and they request checksum as any other tool to increase a possibility to failure identification. So missing checksums penalize a usability PostgreSQL to critical systems - speed is not too important there. Regards Pavel -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On Thu, Mar 7, 2013 at 7:31 PM, Bruce Momjian br...@momjian.us wrote: On Mon, Mar 4, 2013 at 05:04:27PM -0800, Daniel Farina wrote: Putting aside the not-so-rosy predictions seen elsewhere in this thread about the availability of a high performance, reliable checksumming file system available on common platforms, I'd like to express what benefit this feature will have to me: Corruption has easily occupied more than one person-month of time last year for us. This year to date I've burned two weeks, although admittedly this was probably the result of statistical clustering. Other colleagues of mine have probably put in a week or two in aggregate in this year to date. The ability to quickly, accurately, and maybe at some later date proactively finding good backups to run WAL recovery from is one of the biggest strides we can make in the operation of Postgres. The especially ugly cases are where the page header is not corrupt, so full page images can carry along malformed tuples...basically, when the corruption works its way into the WAL, we're in much worse shape. Checksums would hopefully prevent this case, converting them into corrupt pages that will not be modified. It would be better yet if I could write tools to find the last-good version of pages, and so I think tight integration with Postgres will see a lot of benefits that would be quite difficult and non-portable when relying on file system checksumming. I see Heroku has corruption experience, and I know Jim Nasby has struggled with corruption in the past. More than a little: it has entered the realm of the routine, and happens frequently enough that it has become worthwhile to start looking for patterns. Our methods so far rely heavily on our archives to deal with it: it's time consuming but the 'simple' case of replaying WAL from some earlier base backup resulting in a non-corrupt database is easily the most common. Interestingly, the WAL has never failed to recover halfway through because of CRC failures while treating corruption[0]. We know this fairly convincingly because we constantly sample txid and wal positions while checking the database, as we typically do about every thirty seconds. I think this unreasonable effectiveness of this strategy of old backup and WAL replay might suggest that database checksums would prove useful. In my mind, the ways this formula could work so well if the bug was RAM or CPU based is slimmed considerably. [0] I have seen -- very rarely -- substantial periods of severe WAL corruption (files are not even remotely the correct size) propagated to the archives in the case of disaster recovery where the machine met its end because of the WAL disk being marked as dead. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2.3 crashes during archive recovery
(2013/03/07 19:41), Heikki Linnakangas wrote: On 07.03.2013 10:05, KONDO Mitsumasa wrote: (2013/03/06 16:50), Heikki Linnakangas wrote: Yeah. That fix isn't right, though; XLogPageRead() is supposed to return true on success, and false on error, and the patch makes it return 'true' on error, if archive recovery was requested but we're still in crash recovery. The real issue here is that I missed the two return NULL;s in ReadRecord(), so the code that I put in the next_record_is_invalid codepath isn't run if XLogPageRead() doesn't find the file at all. Attached patch is the proper fix for this. Thanks for createing patch! I test your patch in 9.2_STABLE, but it does not use promote command... When XLogPageRead() was returned false ,it means the end of stanby loop, crash recovery loop, and archive recovery loop. Your patch is not good for promoting Standby to Master. It does not come off standby loop. So I make new patch which is based Heikki's and Horiguchi's patch. Ah, I see. I committed a slightly modified version of this. I feel that your modification is legible. Thanks for your modification and committing patch! I also found a bug in latest 9.2_stable. It does not get latest timeline and recovery history file in archive recovery when master and standby timeline is different. Works for me.. Can you create a test script for that? Remember to set recovery_target_timeline='latest'. ... It can be reproduced in my test script, too. I see the problem now, with that script. So what happens is that the startup process first scans the timeline history files to choose the recovery target timeline. For that scan, I temporarily set InArchiveRecovery=true, in readRecoveryCommandFile. However, after readRecoveryCommandFile returns, we then try to read the timeline history file corresponding the chosen recovery target timeline, but InArchiveRecovery is no longer set, so we don't fetch the file from archive, and return a dummy history, with just the target timeline in it. That doesn't contain the older timeline, so you get an error at recovery. Fixed per your patch to check for ArchiveRecoveryRequested instead of InArchiveRecovery, when reading timeline history files. This also makes it unnecessary to temporarily set InArchiveRecovery=true, so removed that. Committed both fixes. Please confirm this this fixed the problem in your test environment. Many thanks for the testing and the patches! I understand this problem. Thank you for adding modification and detail explanation! I test latest REL9_2_STABLE in my system. I confirm that it run good without problem. If I found an another problem, I will report and send you patch and test script! Best regards, -- Mitsumasa KONDO NTT OSS Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
Hello, Patches can be reviewed by more than one people. It's particularly useful if they have different things to say. So don't hesitate to review patches that have already been reviewed by other people. Thanks for the advice. I was anxious about who among the reviewers is, and when to make a decisision if the patch has reached the level or not, I'll take it more easy. In fact, you can even review committed patches; it's not unlikely that you will be able to find bugs in those, too. Umm.. to be sure.. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2.3 crashes during archive recovery
Everything seems settled up above my head while sleeping.. Sorry for crumsy test script, and thank you for refining it, Mitsumasa. And thank you for fixing the bug and the detailed explanation, Heikki. I confirmed that the problem is fixed also for me at origin/REL9_2_STABLE. I understand this problem. Thank you for adding modification and detail explanation! I test latest REL9_2_STABLE in my system. I confirm that it run good without problem. If I found an another problem, I will report and send you patch and test script! regards, -- 堀口恭太郎 日本電信電話株式会社 NTTオープンソフトウェアセンタ Phone: 03-5860-5115 / Fax: 03-5463-5490 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers