date:20130307

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread KONDO Mitsumasa


(2013/03/06 16:50), Heikki Linnakangas wrote:

Hi,

Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord();
Attempt patch records minRecoveryPoint.
[crash recovery - record minRecoveryPoint in control file - archive
recovery]
I think that this is an original intention of Heikki's patch.


Yeah. That fix isn't right, though; XLogPageRead() is supposed to return true on success, 
and false on error, and the patch makes it return 'true' on error, if archive recovery 
was requested but we're still in crash recovery. The real issue here is that I missed the 
two return NULL;s in ReadRecord(), so the code that I put in the 
next_record_is_invalid codepath isn't run if XLogPageRead() doesn't find the file at all. 
Attached patch is the proper fix for this.


Thanks for createing patch! I test your patch in 9.2_STABLE, but it does not 
use promote command...
When XLogPageRead() was returned false ,it means the end of stanby loop, crash 
recovery loop, and archive recovery loop.
Your patch is not good for promoting Standby to Master. It does not come off 
standby loop.

So I make new patch which is based Heikki's and Horiguchi's patch.
I attempt test script which was modifyed Horiuch's script. This script does not 
depend on shell enviroment. It was only needed to fix PGPATH.
Please execute this test script.



I also found a bug in latest 9.2_stable. It does not get latest timeline
and
recovery history file in archive recovery when master and standby
timeline is different.


Works for me.. Can you create a test script for that? Remember to set 
recovery_target_timeline='latest'.

I set recovery_target_timeline=latest. hmm...

Here is my recovery.conf.

mitsu-ko@localhost postgresql]$ cat Standby/recovery.conf
standby_mode = 'yes'
recovery_target_timeline='latest'
primary_conninfo='host=localhost port=65432'
restore_command='cp ../arc/%f %p'

And my system's log message is here.

waiting for server to start[Standby] LOG:  database system was shut down in 
recovery at 2013-03-07 02:56:05 EST
[Standby] LOG:  restored log file 0002.history from archive
cp: cannot stat `../arc/0003.history': そのようなファイルやディレクトリはありません
[Standby] FATAL:  requested timeline 2 is not a child of database system 
timeline 1
[Standby] LOG:  startup process (PID 20941) exited with exit code 1
[Standby] LOG:  aborting startup due to startup process failure

It can be reproduced in my test script, too.
Last master start command might seem not to exist generally in my test script.
But it is generally that PostgreSQL with Pacemaker system.


Best regards,
--
Mitsumasa KONDO
NTT OSS Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 92adc4e..2486683 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4010,7 +4010,15 @@ ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
 retry:
 	/* Read the page containing the record */
 	if (!XLogPageRead(RecPtr, emode, fetching_ckpt, randAccess))
+	{
+		/*
+		 * If archive recovery was requested when crash recovery failed, go to
+		 * the label next_record_is_invalid to switch to archive recovery.
+		 */
+		if (!InArchiveRecovery  ArchiveRecoveryRequested)
+			goto next_record_is_invalid;
 		return NULL;
+	}
 
 	pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf);
 	targetRecOff = RecPtr-xrecoff % XLOG_BLCKSZ;
@@ -4168,7 +4176,15 @@ retry:
 			}
 			/* Wait for the next page to become available */
 			if (!XLogPageRead(pagelsn, emode, false, false))
+		{
+		/*
+ * If archive recovery was requested when crash recovery failed, go to
+ * the label next_record_is_invalid to switch to archive recovery.
+ */
+if (!InArchiveRecovery  ArchiveRecoveryRequested)
+	goto next_record_is_invalid;
 return NULL;
+			}
 
 			/* Check that the continuation record looks valid */
 			if (!(((XLogPageHeader) readBuf)-xlp_info  XLP_FIRST_IS_CONTRECORD))
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 92adc4e..591e8c0 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4446,7 +4462,7 @@ readTimeLineHistory(TimeLineID targetTLI)
 	if (targetTLI == 1)
 		return list_make1_int((int) targetTLI);
 
-	if (InArchiveRecovery)
+	if (ArchiveRecoveryRequested)
 	{
 		TLHistoryFileName(histfname, targetTLI);
 		fromArchive =
#! /bin/sh
echo  initial settings 
PGPATH=`pwd`/bin
PATH=$PGPATH:$PATH
PGDATA0=Master
PGDATA1=Standby
PGARC=arc
mkdir $PGARC
PGPORT0=65432
PGPORT1=65433
unset PGPORT
unset PGDATA
echo Postgresql is \`which postgres`\
killall -9 postgres
rm -rf $PGDATA0 $PGDATA1 $PGARC/*
initdb $PGDATA0
cat  $PGDATA0/postgresql.conf EOF
port=$PGPORT0
wal_level = hot_standby
checkpoint_segments = 300
checkpoint_timeout = 1h
archive_mode = on
archive_command = 'cp %p ../$PGARC/%f'
max_wal_senders = 3
hot_standby = on
log_min_messages = debug1
log_line_prefix = '[Master] '
EOF
cat

Re: [HACKERS] sql_drop Event Trigger

2013-03-07 Thread Robert Haas

On Tue, Mar 5, 2013 at 5:37 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Okay, I added a couple of lines to skip reporting dropped temp schemas,
 and to skip any objects belonging to any temp schema (not just my own,
 note).  Not posting a new version because the change is pretty trivial.

 Now, one last thing that comes up is what about objects that don't have
 straight names (such as pg_amop, pg_amproc, pg_default_acl etc etc), the
 only thing you get is a catalog OID and an object OID ... but they are
 pretty useless because by the time you get to the ddl_command_end
 trigger, the objects are gone from the catalog.  Maybe we should report
 *something* about those.  Say, perhaps the object description ... but if
 we want that, it should be untranslated (i.e. not just what
 getObjectDescription gives you, because that may be translated, so we
 would need to patch it so that it only translates if the caller requests
 it)

Broadly, I suggest making the output format match as exactly as
possible what commands like COMMENT and SECURITY LABEL accept as
input.  We've already confronted all of these notational issues there.
 Columns are identified as COLUMN table.name; functions as FUNCTION
function_name(argtypes); etc.  Of course it's fine to split the object
type off into a separate column, but it should have the same name here
that it does there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-07 Thread Simon Riggs

On 6 March 2013 14:16, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 On 5 March 2013 22:02, Tom Lane t...@sss.pgh.pa.us wrote:
 FWIW, my opinion is that doing anything like this in the planner is
 going to be enormously expensive.

 As we already said: no MVs = zero overhead = no problem.

 Well, in the first place that statement is false on its face: we'll
 still spend cycles looking for relevant MVs, or at least maintaining a
 complexly-indexed cache that helps us find out that there are none in
 a reasonable amount of time.  In the second place, even if it were
 approximately true it wouldn't help the people who were using MVs.

We can store info in the relcache, and reduce such a lookup to a
simple if test in the planner. Populating the cache would be easy
enough, approx same overhead as deriving list of constraints for the
relcache.

If you were using MVs, there are further heuristics to apply. MVs come
in various shapes, so we can assess whether they use aggregates,
joins, filters etc and use that for a general match against a query. I
don't see the need for complex assessments in every case.


 It costs in
 the cases where time savings are possible and not otherwise.

 And that is just complete nonsense: matching costs whether you find a
 match or not.  Could we have a little less Pollyanna-ish optimism and
 a bit more realism about the likely cost of such a feature?

It's not a trivial feature; this is a lot of work. But it can be done
efficiently, without significant effect on other workloads. If that
really were to be true, then enable_lookaside = off can be the
default, just as we have for another costly planning feature,
constraint_exclusion.

Matview lookaside is the next-best-action for further work on the
planner, AFAICS. Correctly optimised query parallelism is harder,
IMHO.

What I'm hearing at the moment is please don't make any changes in my
area or don't climb the North face. Considering the rather high bar
to being able to do this effectively, I do understand your interest in
not having your/our time wasted by casual attempts to approach the
problem, but I don't want to slam the door on a serious attempt (2
year project, 1+ man year effort).

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-07 Thread Kohei KaiGai

2013/3/6 Tom Lane t...@sss.pgh.pa.us:
 Also, the right way to deal with this desire is to teach the planner in
 general, not just FDWs, about pushing expensive calculations down the
 plan tree --- basically, resurrecting Joe Hellerstein's thesis work,
 which we ripped out more than ten years ago.  I don't think there's that
 much that would need to change about the planner's data structures, but
 deciding where is cheapest to evaluate an expression is a pretty hard
 problem.  Trying to handle that locally within FDWs is doomed to failure
 IMO.

It also seems to me more attractive direction to support a mechanism
to off-load calculation of expensive expression in general, not only
foreign-tables.

Probably, we have to include the cost to compute expression node to
choose a proper plan node. For example, if an extension injected
a pseudo scan node to materialize underlying regular table and
calculate target entry using SIMD operations, its cost to compute
target-entry shall be 25% towards regular ones. So, planner shall
choose this plan-node in case of advantage.

 So my intention is to get rid of GetForeignRelWidth() and make use of
 the existing ctid system column for returning remote TIDs in
 postgres_fdw.  (On review I notice that we're creating ctid columns
 for foreign tables already, so we don't even need the proposed hook
 to let FDWs control that; though we will certainly want one in future
 if we are to support non-TID magic row identifiers.)

OK. It gets back to the initial implementation that uses TID system column.
I'll try to adjust my patches by the next week.

Probably, we should allow FDW drivers to specify data type of TID system
column in future development, rather than new system columns, because
the non-TID magic row-identifier is used exclusively with TID.

 If you find that unacceptable, I'm quite willing to mark this patch
 Returned With Feedback and get on with dealing with some of the other
 forty-odd patches in the CF queue.

A bird is a reasonable outcome with a stone, even though it is not two birds.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sql_drop Event Trigger

2013-03-07 Thread Dimitri Fontaine

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Here's another idea --- have three columns, type, schema (as in the
 current patch and as shown above), and a third one for object identity.

 For tables and other objects that have simple names, the identity would
 be their names.  For columns, it'd be tablename.columnname.  For
 functions, it'd be the complete signature.  And so on.

Sounds very good as an extra column yes.

Robert Haas robertmh...@gmail.com writes:
 Broadly, I suggest making the output format match as exactly as
 possible what commands like COMMENT and SECURITY LABEL accept as
 input.  We've already confronted all of these notational issues there.
  Columns are identified as COLUMN table.name; functions as FUNCTION
 function_name(argtypes); etc.  Of course it's fine to split the object
 type off into a separate column, but it should have the same name here
 that it does there.

I would like the format to be easily copy/paste'able to things such as
regclass/regtype/regprocedure casts, and apparently the COMMENT input
format seems to be the same as that one, so +1 from me.

-- 
Dimitri Fontaine06 63 07 10 78
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread Heikki Linnakangas


On 07.03.2013 10:05, KONDO Mitsumasa wrote:

(2013/03/06 16:50), Heikki Linnakangas wrote:

Yeah. That fix isn't right, though; XLogPageRead() is supposed to
return true on success, and false on error, and the patch makes it
return 'true' on error, if archive recovery was requested but we're
still in crash recovery. The real issue here is that I missed the two
return NULL;s in ReadRecord(), so the code that I put in the
next_record_is_invalid codepath isn't run if XLogPageRead() doesn't
find the file at all. Attached patch is the proper fix for this.


Thanks for createing patch! I test your patch in 9.2_STABLE, but it does
not use promote command...
When XLogPageRead() was returned false ,it means the end of stanby loop,
crash recovery loop, and archive recovery loop.
Your patch is not good for promoting Standby to Master. It does not come
off standby loop.

So I make new patch which is based Heikki's and Horiguchi's patch.


Ah, I see. I committed a slightly modified version of this.


I also found a bug in latest 9.2_stable. It does not get latest timeline
and
recovery history file in archive recovery when master and standby
timeline is different.


Works for me.. Can you create a test script for that? Remember to set
recovery_target_timeline='latest'.

...
It can be reproduced in my test script, too.


I see the problem now, with that script. So what happens is that the 
startup process first scans the timeline history files to choose the 
recovery target timeline. For that scan, I temporarily set 
InArchiveRecovery=true, in readRecoveryCommandFile. However, after 
readRecoveryCommandFile returns, we then try to read the timeline 
history file corresponding the chosen recovery target timeline, but 
InArchiveRecovery is no longer set, so we don't fetch the file from 
archive, and return a dummy history, with just the target timeline in 
it. That doesn't contain the older timeline, so you get an error at 
recovery.


Fixed per your patch to check for ArchiveRecoveryRequested instead of 
InArchiveRecovery, when reading timeline history files. This also makes 
it unnecessary to temporarily set InArchiveRecovery=true, so removed that.


Committed both fixes. Please confirm this this fixed the problem in your 
test environment. Many thanks for the testing and the patches!


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-03-07 Thread Andres Freund

On 2013-03-05 23:26:59 +0200, Heikki Linnakangas wrote:
 On 04.03.2013 06:39, Amit Kapila wrote:
 The stats look fairly sane. I'm a little concerned about the apparent
 trend of falling TPS in the patched vs original tests for the 1-client
 test as record size increases, but it's only 0.0%-0.2%-0.4%, and the
 0.4% case made other config changes too. Nonetheless, it might be wise
 to check with really big records and see if the trend continues.
 
 For bigger size (~2000) records, it goes into toast, for which we don't do
 this optimization.
 This optimization is mainly for medium size records.
 
 I've been doing investigating the pglz option further, and doing performance
 comparisons of the pglz approach and this patch. I'll begin with some
 numbers:
 
 unpatched (63d283ecd0bc5078594a64dfbae29276072cdf45):
 
 testname | wal_generated | duration
 
 -+---+--
  two short fields, no change |1245525360 | 9.94613695144653
  two short fields, one changed   |1245536528 |  10.146910905838
  two short fields, both changed  |1245523160 | 11.2332470417023
  one short and one long field, no change |1054926504 | 5.90477800369263
  ten tiny fields, all changed|1411774608 | 13.4536008834839
  hundred tiny fields, all changed| 635739680 | 7.57448387145996
  hundred tiny fields, half changed   | 636930560 | 7.56888699531555
  hundred tiny fields, half nulled| 573751120 | 6.68991994857788
 
 Amit's wal_update_changes_v10.patch:
 
 testname | wal_generated | duration
 
 -+---+--
  two short fields, no change |1249722112 | 13.0558869838715
  two short fields, one changed   |1246145408 | 12.9947438240051
  two short fields, both changed  |1245951056 | 13.0262880325317
  one short and one long field, no change | 678480664 | 5.70031690597534
  ten tiny fields, all changed|1328873920 | 20.0167419910431
  hundred tiny fields, all changed| 638149416 | 14.4236788749695
  hundred tiny fields, half changed   | 635560504 | 14.8770561218262
  hundred tiny fields, half nulled| 558468352 | 16.2437210083008
 
 pglz-with-micro-optimizations-1.patch:
 
  testname | wal_generated | duration
 -+---+--
  two short fields, no change |1245519008 | 11.6702048778534
  two short fields, one changed   |1245756904 | 11.3233819007874
  two short fields, both changed  |1249711088 | 11.6836447715759
  one short and one long field, no change | 664741392 | 6.44810795783997
  ten tiny fields, all changed|1328085568 | 13.9679481983185
  hundred tiny fields, all changed| 635974088 | 9.15514206886292
  hundred tiny fields, half changed   | 636309040 | 9.13769292831421
  hundred tiny fields, half nulled| 496396448 | 8.77351498603821
 
 In each test, a table is created with a large number of identical rows, and
 fillfactor=50. Then a full-table UPDATE is performed, and the UPDATE is
 timed. Duration is the time spent in the UPDATE (lower is better), and
 wal_generated is the amount of WAL generated by the updates (lower is
 better).
 
 The summary is that Amit's patch is a small win in terms of CPU usage, in
 the best case where the table has few columns, with one large column that is
 not updated. In all other cases it just adds overhead. In terms of WAL size,
 you get a big gain in the same best case scenario.
 
 Attached is a different version of this patch, which uses the pglz algorithm
 to spot the similarities between the old and new tuple, instead of having
 explicit knowledge of where the column boundaries are. This has the
 advantage that it will spot similarities, and be able to compress, in more
 cases. For example, you can see a reduction in WAL size in the hundred tiny
 fields, half nulled test case above.
 
 The attached patch also just adds overhead in most cases, but the overhead
 is much smaller in the worst case. I think that's the right tradeoff here -
 we want to avoid scenarios where performance falls off the cliff. That said,
 if you usually just get a slowdown, we certainly can't make this the
 default, and if we can't turn it on by default, this probably just isn't
 worth it.
 
 The attached patch contains the variable-hash-size changes I posted in the
 Optimizing pglz compressor. But in the delta encoding function, it goes
 further than that, and contains some further micro-optimizations: the hash
 is calculated in a rolling fashion, and it uses a specialized version of the
 pglz_hist_add macro that knows that the input can't exceed 4096 bytes. Those
 changes

Re: [HACKERS] Materialized views WIP patch

2013-03-07 Thread Kevin Grittner

David E. Wheeler da...@justatheory.com wrote:
 Kevin Grittner kgri...@ymail.com wrote:

 I also think that something should be done about the
 documentation for indexes.  Right now that always refers to a
 table.  It would clearly be awkward to change that to table
 or materialized view everywhere.  I wonder if most of thosse
 should be changed to relation with a few mentions that the
 relation could be a table or a materialized view, or whether
 some less intrusive change would be better.  Opinions welcome.

 Isn’t a materialized view really just a table that gets updated
 periodically?

Not exactly.  It is a relation which has some characteristics of a
view (including an entry in pg_rewrite exactly like that for a
view) and some characteristics of a table (including a heap and
optional indexes).  Whether it looks more like a table or more like
a view depends on how you tilt your head.  You could just as easily
say that it is really just a view which periodically caches its
results on disk.  They really are their own thing.  As future
releases add more subtle freshness concepts, incremental updates,
and query rewrite that unique identity will become even more
conspicuous, I think.

 And isn’t a non-matierialized view also thought of as a
 “relation”?

Yes.  Tables, views, and materialized views are all relations.

 If the answer to both those questions is “yes,” I think the term
 should remain “table,” with a few mentions that the term includes
 materialized views (and excludes foreign tables).

And if the answers are not exactly and yes?

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] odd behavior in materialized view

2013-03-07 Thread Fujii Masao

On Thu, Mar 7, 2013 at 8:21 AM, Kevin Grittner kgri...@ymail.com wrote:
 Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Mar 5, 2013 at 7:36 AM, Kevin Grittner kgri...@ymail.com wrote:
 Fujii Masao masao.fu...@gmail.com wrote:

 When I accessed the materialized view in the standby server,

 I got the following ERROR message. Looks odd to me. Is this a bug?

 ERROR:  materialized view hogeview has not been populated
 HINT:  Use the REFRESH MATERIALIZED VIEW command.

 The procedure to reproduce this error message is:

 In the master server:
 CREATE TABLE hoge (i int);
 INSERT INTO hoge VALUES (generate_series(1,100));
 CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge;
 DELETE FROM hoge;
 REFRESH MATERIALIZED VIEW hogeview;
 SELECT count(*) FROM hogeview;

 In the standby server
 SELECT count(*) FROM hogeview;

 SELECT count(*) goes well in the master, and expectedly returns 0.
 OTOH, in the standby, it emits the error message.

 Will investigate.

 Thanks!

 And I found another problem. When I ran the following SQLs in the master,
 PANIC error occurred in the standby.

 CREATE TABLE hoge (i int);
 INSERT INTO hoge VALUES (generate_series(1,100));
 CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge;
 VACUUM ANALYZE;

 The PANIC error messages that I got in the standby are

 WARNING:  page 0 of relation base/12297/16387 is uninitialized
 CONTEXT:  xlog redo visible: rel 1663/12297/16387; blk 0
 PANIC:  WAL contains references to invalid pages
 CONTEXT:  xlog redo visible: rel 1663/12297/16387; blk 0

 base/12297/16387 is the file of the materialized view 'hogeview'.

 I was able to replicate both bugs, and they both appear to be fixed
 by the attached, which I have just pushed.

Thanks! I confirmed that the problem that I reported has disappeared in HEAD.

Unfortunately I found another odd behavior. When I accessed the MV
after VACUUM ANALYZE, I got the following error.

ERROR:  materialized view hogeview has not been populated
HINT:  Use the REFRESH MATERIALIZED VIEW command.
STATEMENT:  select * from hogeview where i  10;

The test case to reproduce that is:

create table hoge (i int);
insert into hoge values (generate_series(1,10));
create materialized view hogeview as select * from hoge where i % 2 = 0;
create index hogeviewidx on hogeview (i);
delete from hoge;
refresh materialized view hogeview;
select * from hogeview where i  10;
vacuum analyze;
select * from hogeview where i  10;

The last SELECT command caused the above error.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-07 Thread Fujii Masao

On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund and...@2ndquadrant.com wrote:
 The strange think about hoge_pkey_cct_cct is that it seems to imply
 that an invalid index was reindexed concurrently?

 But I don't see how it could happen either. Fujii, can you reproduce it?

Yes, I can even with the latest version of the patch. The test case to
reproduce it is:

(Session 1)
CREATE TABLE hoge (i int primary key);
INSERT INTO hoge VALUES (generate_series(1,10));

(Session 2)
BEGIN;
SELECT * FROM hoge;
(keep this session as it is)

(Session 1)
SET statement_timeout TO '1s';
REINDEX TABLE CONCURRENTLY hoge;
\d hoge
REINDEX TABLE CONCURRENTLY hoge;
\d hoge

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-07 Thread Andres Freund

On 2013-03-07 09:58:58 +0900, Michael Paquier wrote:
+The recommended recovery method in such cases is to drop the
concurrent
 +index and try again to perform commandREINDEX
  CONCURRENTLY/.

 If an invalid index depends on the constraint like primary key,
  drop
 the concurrent
 index cannot actually drop the index. In this case, you need to
  issue
 alter table
 ... drop constraint ... to recover the situation. I think this
 informataion should be
 documented.

 I think we just shouldn't set -isprimary on the temporary indexes.
  Now
 we switch only the relfilenodes and not the whole index, that
  should be
 perfectly fine.
   
Sounds good. But, what about other constraint case like unique
  constraint?
Those other cases also can be resolved by not setting -isprimary?
   
   We should stick with the concurrent index being a twin of the index it
   rebuilds for consistency.
 
  I don't think its legal. We cannot simply have two indexes with
  'indisprimary'. Especially not if bot are valid.
  Also, there will be no pg_constraint row that refers to it which
  violates very valid expectations that both users and pg may have.
 
  So what to do with that?
  Mark the concurrent index as valid, then validate it and finally mark it
  as invalid inside the same transaction at phase 4?
  That's moving 2 lines of code...
 
 Sorry phase 4 is the swap phase. Validation happens at phase 3.

Why do you want to temporarily mark it as valid? I don't see any
requirement that it is set to that during validate_index() (which imo is
badly named, but...).
I'd just set it to valid in the same transaction that does the swap.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] odd behavior in materialized view

2013-03-07 Thread Kevin Grittner

Fujii Masao masao.fu...@gmail.com wrote:

 Thanks! I confirmed that the problem that I reported has
 disappeared in HEAD.

 Unfortunately I found another odd behavior. When I accessed the
 MV after VACUUM ANALYZE, I got the following error.

 ERROR:  materialized view hogeview has not been populated
 HINT:  Use the REFRESH MATERIALIZED VIEW command.
 STATEMENT:  select * from hogeview where i  10;

 The test case to reproduce that is:

 create table hoge (i int);
 insert into hoge values (generate_series(1,10));
 create materialized view hogeview as select * from hoge where i % 2 = 0;
 create index hogeviewidx on hogeview (i);
 delete from hoge;
 refresh materialized view hogeview;
 select * from hogeview where i  10;
 vacuum analyze;
 select * from hogeview where i  10;

 The last SELECT command caused the above error.

Thanks.  Will fix.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-07 Thread David E. Wheeler

On Mar 7, 2013, at 7:55 AM, Kevin Grittner kgri...@ymail.com wrote:

 If the answer to both those questions is “yes,” I think the term
 should remain “table,” with a few mentions that the term includes
 materialized views (and excludes foreign tables).
 
 And if the answers are not exactly and yes?

I still tend to think that the term should remain “table,” with brief mentions 
at the top of pages when the term should be assumed to represent tables and 
matviews, and otherwise required disambiguations.

Trying to make the least possible work for you here. :-)

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] odd behavior in materialized view

2013-03-07 Thread Fujii Masao

On Fri, Mar 8, 2013 at 1:52 AM, Kevin Grittner kgri...@ymail.com wrote:
 Fujii Masao masao.fu...@gmail.com wrote:

 Thanks! I confirmed that the problem that I reported has
 disappeared in HEAD.

 Unfortunately I found another odd behavior. When I accessed the
 MV after VACUUM ANALYZE, I got the following error.

 ERROR:  materialized view hogeview has not been populated
 HINT:  Use the REFRESH MATERIALIZED VIEW command.
 STATEMENT:  select * from hogeview where i  10;

 The test case to reproduce that is:

 create table hoge (i int);
 insert into hoge values (generate_series(1,10));
 create materialized view hogeview as select * from hoge where i % 2 = 0;
 create index hogeviewidx on hogeview (i);
 delete from hoge;
 refresh materialized view hogeview;
 select * from hogeview where i  10;
 vacuum analyze;
 select * from hogeview where i  10;

 The last SELECT command caused the above error.

 Thanks.  Will fix.

Thanks!

I found one typo in the document of MV. Please see the attached patch.

Regards,

-- 
Fujii Masao


typo.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] REFRESH MATERIALIZED VIEW locklevel

2013-03-07 Thread Andres Freund

Hi,

if I understand things correctly REFRESH MATERIALIZED VIEW locks the
materialized view with an AcessExclusiveLock even if the view already
contains data.
I am pretty sure that will - understandably - confuse users, so I vote
for at least including a note about that in the docs.

This fact imo reduces the usability of the matviews features as it
stands atm considerably. I think we should be very careful not to
advocate its existance much and document very clearly that its work in
progress.
Working incrementally is a sensible thing to do, don't get me wrong...

Making the refresh work concurrently doesn't seem to be too hard if its
already initialized:

1) lock relation exlusively in session mode (or only ShareUpdateExlusive?)
2) build new data into a separate relfilenode
3) switch relfilenode
4) wait till all potential users of the old relfilenode are gone
   (VirtualXactLock games, just as in CREATE INDEX CONCURRENTLY)
5) drop old relfilenode

The only problem I see right now is that we might forget to delete the
old relation if we crash during 4). Even if we WAL log it, due to
checkpoints causing that action not to be replayed.
But that seems to be nothing new, I think the same problem exists during
normal table rewrites as well, just the other way round (i.e. we forget
about the new relfilenode).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] odd behavior in materialized view

2013-03-07 Thread Kevin Grittner

Fujii Masao masao.fu...@gmail.com wrote:

 I found one typo in the document of MV. Please see the attached
 patch.

Pushed.  Thanks!

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel

2013-03-07 Thread Kevin Grittner

Andres Freund and...@2ndquadrant.com wrote:

 if I understand things correctly REFRESH MATERIALIZED VIEW locks
 the materialized view with an AcessExclusiveLock even if the view
 already contains data.

Yeah.  At the time I had to make a decision on that, REINDEX
CONCURRENTLY did not seem reliable with a weaker lock, and REFRESH
MATERIALIZED VIEW has to rebuild indexes (among other things).  If
we have all the issues sorted out with REINDEX CONCURRENTLY then
the same techniques will probably apply to RMV without too much
difficulty.  It's a bit late to think about that for 9.3, though.

 I am pretty sure that will - understandably - confuse users, so I
 vote for at least including a note about that in the docs.

Will see about working that in.

-Kevin

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel

2013-03-07 Thread Andres Freund

On 2013-03-07 09:55:39 -0800, Kevin Grittner wrote:
 Andres Freund and...@2ndquadrant.com wrote:
 
  if I understand things correctly REFRESH MATERIALIZED VIEW locks
  the materialized view with an AcessExclusiveLock even if the view
  already contains data.
 
 Yeah.  At the time I had to make a decision on that, REINDEX
 CONCURRENTLY did not seem reliable with a weaker lock, and REFRESH
 MATERIALIZED VIEW has to rebuild indexes (among other things).  If
 we have all the issues sorted out with REINDEX CONCURRENTLY then
 the same techniques will probably apply to RMV without too much
 difficulty.  It's a bit late to think about that for 9.3, though.

I don't think that REFRESH MATERIALIZED VIEW has to deal with the same
problems that REINDEX CONCURRENTLY has - after all, there won't be any
DML coming in while its running. That should get rid of the REINDEX
CONCURRENTLY problems.
There doesn't seem to be any need to use the far more expensive REINDEX
CONCURRENLTY style computation of indexes which has to scan the heap
multiple times et al. They just should be built ontop of new matview
relation which is essentially read only.
 
  I am pretty sure that will - understandably - confuse users, so I
  vote for at least including a note about that in the docs.
 
 Will see about working that in.

Cool.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel

2013-03-07 Thread anara...@anarazel.de



Kevin Grittner kgri...@ymail.com schrieb:

Andres Freund and...@2ndquadrant.com wrote:

 if I understand things correctly REFRESH MATERIALIZED VIEW locks
 the materialized view with an AcessExclusiveLock even if the view
 already contains data.

Yeah.  At the time I had to make a decision on that, REINDEX
CONCURRENTLY did not seem reliable with a weaker lock, and REFRESH
MATERIALIZED VIEW has to rebuild indexes (among other things).  If
we have all the issues sorted out with REINDEX CONCURRENTLY then
the same techniques will probably apply to RMV without too much
difficulty.  It's a bit late to think about that for 9.3, though.

 I am pretty sure that will - understandably - confuse users, so I
 vote for at least including a note about that in the docs.

Will see about working that in.

In the ride home I realized that unless - not that unlikely - you thought about 
something I didtn't  REFRESH will behave similar to TRUNCATE for repeatable 
read+ transactions that only access it after REFRESH finished. That is, they 
will appear empty. If that's the case, it needs to be documented prominently as 
well.

It would be possible to get different behaviour by immediately freezing all 
tuples, but that would also result in violations of visibility by showing  
tuples that are not yet visible.

Andres

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Duplicate JSON Object Keys

2013-03-07 Thread David E. Wheeler

This behavior surprised me a bit:

david=# select '{foo: 1, foo: 2}'::json;
 json 
--
 {foo: 1, foo: 2}

I had expected something more like this:

david=# select '{foo: 1, foo: 2}'::json;
json

 {foo: 2}

This hasn’t been much of an issue before, but with Andrew’s JSON enhancements 
going in, it will start to cause problems:

david=# select json_get('{foo: 1, foo: 2}', 'foo');
ERROR:  field name is not unique in json object

Andrew tells me that the spec requires this. I think that’s fine, but I would 
rather that it never got to there.

In the spirit of being liberal about what we accept but strict about what we 
store, it seems to me that JSON object key uniqueness should be enforced either 
by throwing an error on duplicate keys, or by flattening so that the latest key 
wins (as happens in JavaScript). I realize that tracking keys will slow parsing 
down, and potentially make it more memory-intensive, but such is the price for 
correctness.

Thoughts?

Thanks,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel

2013-03-07 Thread Kevin Grittner

anara...@anarazel.de and...@anarazel.de wrote:

 In the ride home I realized that unless - not that unlikely - you
 thought about something I didtn't  REFRESH will behave similar to
 TRUNCATE for repeatable read+ transactions that only access it
 after REFRESH finished. That is, they will appear empty.

In an early version of the patch someone found that in testing and
pointed it out.

 It would be possible to get different behaviour by immediately
 freezing all tuples

Which is what I did.

 but that would also result in violations of visibility by showing
 tuples that are not yet visible.

Which is the case, and should be documented.  (I had not remembered
to do so yet; I'll tuck away your email as a reminder.)  Since the
MV is already not guaranteed to be in sync with other data, that
didn't seem like a fatal flaw.  It is, however, the one case where
the MV could appear to be *ahead* of the supporting data rather
than *behind* it.  In a way this is similar to how READ COMMITTED
transactions can see data from more than one snapshot on write
conflicts, so I see it as a bigger issue for more strict isolation
levels -- but those are unlikely to be all that useful with MVs in
this release anyway.  This is something that I think deserves some
work in a subsequent release.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Duplicate JSON Object Keys

2013-03-07 Thread Andrew Dunstan



On 03/07/2013 02:48 PM, David E. Wheeler wrote:

This behavior surprised me a bit:

 david=# select '{foo: 1, foo: 2}'::json;
  json
 --
  {foo: 1, foo: 2}

I had expected something more like this:

 david=# select '{foo: 1, foo: 2}'::json;
 json
 
  {foo: 2}

This hasn’t been much of an issue before, but with Andrew’s JSON enhancements 
going in, it will start to cause problems:

 david=# select json_get('{foo: 1, foo: 2}', 'foo');
 ERROR:  field name is not unique in json object

Andrew tells me that the spec requires this. I think that’s fine, but I would 
rather that it never got to there.


Specifically, rfc4627 says (note last sentence):

   2.2.  Objects

   An object structure is represented as a pair of curly brackets
   surrounding zero or more name/value pairs (or members).  A name is a
   string.  A single colon comes after each name, separating the name
   from the value.  A single comma separates a value from a following
   name.  The names within an object SHOULD be unique.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-07 Thread Jeff Davis

On Tue, 2013-03-05 at 11:35 +0200, Heikki Linnakangas wrote:
 If you enable checksums, the free space map never gets updated in a 
 standby. It will slowly drift to be completely out of sync with reality, 
 which could lead to significant slowdown and bloat after failover.

One of the design points of this patch is that those operations that use
MarkBufferDirtyHint(), including tuple hint bits, the FSM, index dead
markers, etc., do not directly go to the standby. That's because the
standby can't write WAL, so it can't protect itself against a torn page
breaking the checksum.

However, these do make it through by riding along with a full-page image
in the WAL. The fact that checksums are enabled means that these full
page images will be written once per modified page per checkpoint, and
then replayed on the standby. FSM should get the updates the same way,
even though no other WAL is written for the FSM.

If full_page_writes are disabled, then the updates will never arrive.
But in that case, I think we can just go ahead and dirty the page during
recovery, because there isn't a real problem. I was hesitant to make
this change in my patch because:
1. I wanted to see if someone saw a flaw in this reasoning; and
2. I noticed that full_page_images can be changed with a SIGHUP, which
could add complexity (I don't see any reason we allow this... shouldn't
we just force a restart for that change?).

I added a README file, moved some of the explanatory material there, and
tried to clarify this situation.

Let me know if you see a problem that I'm missing. I verified that at
least some FSM changes do make it through with checksums on, but I
didn't dig much deeper than that.

 Since the checksums are an all-or-nothing cluster-wide setting, the 
 three extra flags in the page header, PD_CHECKSUMS1, PD_CHECKSUM2 and 
 PD_HEADERCHECK, are not needed. Let's leave them out. That keeps the 
 code simpler, and leaves the bits free for future use. If we want to 
 enable such per-page setting in the future, we can add it later. For a 
 per-relation scheme, they're not needed.

Removed header bits.

 XLogCheckBuffer() and XLogCheckBufferNeedsBackup() read the page LSN 
 without a lock. That's not atomic, so it could incorrectly determine 
 that a page doesn't need to be backed up. We used to always hold an 
 exclusive lock on the buffer when it's called, which prevents 
 modifications to the LSN, but that's no longer the case.

Fixed. I added a new exported function, BufferGetLSNAtomic().

There was another similar omission in gistget.c.

By the way, I can not find any trace of XLogCheckBufferNeedsBackup(),
was that a typo?

 Shouldn't SetBufferCommitInfoNeedsSave() check the BM_PERMANENT flag? I 
 think it will generate WAL records for unlogged tables as it is.

Fixed.

I also rebased and added a GUC to control whether the checksum failure
causes an error or not.

I need to do another self-review after these changes and some more
extensive testing, so I might have missed a couple things.

Regards,
Jeff Davis


checksums-20130307.patch.gz
Description: GNU Zip compressed data


replace-tli-with-checksums-20130307.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel

2013-03-07 Thread Andres Freund

On 2013-03-07 11:50:11 -0800, Kevin Grittner wrote:
 anara...@anarazel.de and...@anarazel.de wrote:
 
  In the ride home I realized that unless - not that unlikely - you
  thought about something I didtn't  REFRESH will behave similar to
  TRUNCATE for repeatable read+ transactions that only access it
  after REFRESH finished. That is, they will appear empty.
 
 In an early version of the patch someone found that in testing and
 pointed it out.

Cool ;)
 
  It would be possible to get different behaviour by immediately
  freezing all tuples
 
 Which is what I did.

Ok.

  but that would also result in violations of visibility by showing
  tuples that are not yet visible.
 
 Which is the case, and should be documented.  (I had not remembered
 to do so yet; I'll tuck away your email as a reminder.)  Since the
 MV is already not guaranteed to be in sync with other data, that
 didn't seem like a fatal flaw.  It is, however, the one case where
 the MV could appear to be *ahead* of the supporting data rather
 than *behind* it.  In a way this is similar to how READ COMMITTED
 transactions can see data from more than one snapshot on write
 conflicts, so I see it as a bigger issue for more strict isolation
 levels -- but those are unlikely to be all that useful with MVs in
 this release anyway.  This is something that I think deserves some
 work in a subsequent release.

I am not that convinced that this is unproblematic. I don't see any
problem with READ COMMITTED but in higher levels Even if you expect the
view to be out-of-date it may very well be surprising to see it
referring to rows in another table that do not yet exists although rows
in that table never get deleted.

Especially in the initial population I don't see any way to get rid of
the problem - I don't think there exists a valid way to compute valid
xmin/xmax values for the resulting tuples of all queries. So unless we
get catalog accesses that completly objeys repeatable read semantics
there's not much we can do about that. And while I think getting rid of
SnapshotNow is realistic, I don't think fully versioned catalog access
is (i.e. it working in a way that you could access a table in the old
and new version after a ALTER TABLE ...).

I wonder if we should add something like indexcheckxmin to matviews
which specifies after which value its valid. Only that it errors out if
you haven't reached it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel

2013-03-07 Thread Josh Berkus

Andres,

 if I understand things correctly REFRESH MATERIALIZED VIEW locks the
 materialized view with an AcessExclusiveLock even if the view already
 contains data.
 I am pretty sure that will - understandably - confuse users, so I vote
 for at least including a note about that in the docs.

+1 for mentioning it in the docs.  We could stand to document what
locklevels various commands take more in general.

 This fact imo reduces the usability of the matviews features as it
 stands atm considerably. I think we should be very careful not to
 advocate its existance much and document very clearly that its work in
 progress.
 Working incrementally is a sensible thing to do, don't get me wrong...

-1 from me.

Postgres is currently full of fairly innocent-looking commands which
take an unexpected ACCESS EXCLUSIVE lock.  For example, DROP CONSTRAINT
takes an accessexclusive lock, but it hasn't stopped people from using
constraints, and isn't particularly high up on our todo list to fix it.

This limitation is in no way crippling for this feature, or even a major
detraction.  I still intend to promote the heck out of this feature.

Now, I agree that having a REFRESH ... CONCURRENTLY would be a wonderful
feature for 9.4.  But the fact that we don't have it yet is not a big
deal, and I would put several other matview-related features ahead of
concurrent in priority.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-07 Thread Nicolas Barbier

2013/3/5 Kevin Grittner kgri...@ymail.com:

 Perhaps it would be worth looking for anything in the patch that
 you think might be painting us into a corner where it would be hard
 to do all the other cool things.  While it's late enough in the
 process that changing anything like that which you find would be
 painful, it might be a lot more painful later if we release without
 doing something about it.  My hope, of course, is that you won't
 find any such thing.  With this patch I've tried to provide a
 minimal framework onto which these other things can be bolted.
 I've tried hard not to do anything which would make it hard to
 extend, but new eyes may see something I missed.

(Without having looked at the patch, or even the documentation :-/.)

I think that something that might prove useful is the following:
Keeping in mind the possibility of storing something in the matview’s
heap that doesn’t correspond to what a SELECT * FROM matview would
yield (i.e., the “logical content”). The transformation could be
performed by an INSTEAD rule (similar to how a view is expanded to its
definition, a reference to a matview would expand to its heap content
transformed to the “logical content”).

(Note that I don’t have any reason to believe that the current
implementation would make this more difficult than it should be.)

ridiculously long rationale for the previous

(All the following requires making matviews (inter- and intra-)
transactionally up-to-date w.r.t. their base tables at the moment of
querying them. I don’t deal with approximate results, however useful
that might be.)

I think that the possibility of optimizing COUNT(*) (see mail by Greg
Stark in this thread with “the queue of updates with transacion
information that are periodically flattened into the aggregate”) can
be generalized to generic aggregation that way. The idea would be that
a transaction that adds (or deletes or updates) a row in a base table
causes a “delta” row version in the matview. Selecting from the
matview then merges these deltas into one value (for each row that is
logically present in the matview). Every once in a while (or rather
quite often, if the base tables change often), a VACUUM-like clean-up
operations must be run to merge all rows that are “old enough” (i.e.,
whose transactions are not in flight anymore).

Example of trivial aggregation matview weight_per_kind defined as:

SELECT kind, SUM(weight) FROM fruit GROUP BY kind;

The matview would then physically contain rows such as:

xmin, xmax, kind, weight
1000, 0, 'banana', 123
1000, 0, 'apple', 1
1001, 0, 'banana', 2
1002, 0, 'banana', -3

Which means:

* tx 1000 probably performed a clean-up operation and merged a bunch
of banana rows together to yield 123; it also inserted an apple of
weight 1.
* tx 1001 inserted a banana of weight 2. Any clean-up operation coming
by could not merge the 2 into the first row, as long as tx 1000 is in
flight. Otherwise, it would yield 125; physically this would mean
adding a 125 row, marking the 123 and 2 rows as deleted, and then
waiting for VACUUM to remove them).
* tx 1002 deleted a banana with weight 3.

The result of a SELECT * FROM weight_per_kind; would actually execute
SELECT kind, SUM_merge(weight) FROM heap_of_weight_per_kind GROUP BY
kind;

This would result, for tx 1001 (assuming tx 1000 committed and our
snapshot can see it), in:

kind, weight
'banana', 125
'apple', 1

(The -3 is not taken into account, because it is not visible to tx 1001.)

The operator to use at the location of SUM_merge is something that
merges multiple aggregation results (plus results that represent some
kind of “negation”) together. For SUM, it would be SUM itself and the
negation would be numerical negation. There might also be some kind of
“difference” concept used for UPDATE: When updating a weight from 4 to
3, the difference would be -1. Those additional properties could
optionally be added to the definition of each aggregation function; It
must be done for each function that you want to use in such a way.

Other aggregation functions such as AVG would require storing the SUM
+ number of rows in the matview (otherwise two AVGs could not be
merged); again a difference between the heap and the logical content.
Functions such as MIN and MAX are more difficult to fit in this
framework: I can only see how it would work if row deletion were not
allowed (which might still be a valid use-case); luckily, I think MIN
and MAX are not the typical things for which you would want to use
matviews, because quick computation can typically be done directly
using the base tables.

This whole thing would result in incremental updates of
aggregation-matviews that don’t require physical serialization of the
transactions that update the base tables and that query the matview,
which other models (that depend on correspondence between the heap and
logical content of matviews) would probably require.

And that’s where I stop rambling because nobody gets this far anyway,

Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel

2013-03-07 Thread Andres Freund

Hi,

On 2013-03-07 15:21:35 -0800, Josh Berkus wrote:
  This fact imo reduces the usability of the matviews features as it
  stands atm considerably. I think we should be very careful not to
  advocate its existance much and document very clearly that its work in
  progress.
  Working incrementally is a sensible thing to do, don't get me wrong...
 
 -1 from me.
 
 Postgres is currently full of fairly innocent-looking commands which
 take an unexpected ACCESS EXCLUSIVE lock.  For example, DROP CONSTRAINT
 takes an accessexclusive lock, but it hasn't stopped people from using
 constraints, and isn't particularly high up on our todo list to fix
 it.

Thats a pretty unconvincing comparison. There isn't any expectation that
ALTER TABLE works without taking exlusive locks from common
implementations and DROP CONSTRAINT only takes a very short time while
refreshing a materialized view possibly take rather long.

 This limitation is in no way crippling for this feature, or even a major
 detraction.  I still intend to promote the heck out of this feature.

Thats scaring me. Because the current state of the feature isn't
something that people expect under the term materialized views and I
am pretty damn sure people will then remember postgres as trying to
provide a tick-box item without it being really usable in the real
world.
And thats not something I want postgres to be known for.

Note that I *really* think working incrementally on such things is the
way to go and I think its good that this got committed in 9.3. But if
this now gets used prominently in promotion in its current state I think
the conclusion is that working incrementally in postgres isn't the way
to go and that will make it *much* harder to do so in future
releases. Which will slow postgres down.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist

2013-03-07 Thread Josh Kupershmidt

[Moving to -hackers]

On Sat, Feb 23, 2013 at 2:51 PM, Pavel Stehule pavel.steh...@gmail.com wrote:

 so

 * --conditional-drops replaced by --if-exists

Thanks for the fixes, I played around with the patch a bit. I was sort
of expecting this example to work (after setting up the regression
database with `make installcheck`)

  pg_dump --clean --if-exists -Fp -d regression --file=regression.sql
  createdb test
  psql -v ON_ERROR_STOP=1 --single-transaction -d test -f regression.sql

But it fails, first at:
  ...
  DROP TRIGGER IF EXISTS tsvectorupdate ON public.test_tsvector;
  ERROR:  relation public.test_tsvector does not exist

This seems like a shortcoming of DROP TRIGGER ... IF EXISTS, and it
looks like DROP RULE ... IF EXISTS has the same problem. I recall DROP
... IF EXISTS being fixed recently for not to error out if the schema
specified for the object does not exist, and ISTM the same arguments
could be made in favor of fixing DROP TRIGGER/TABLE ... IF EXISTS not
to error out if the table doesn't exist.

Working further through the dump of the regression database, these
also present problems for --clean --if-exists dumps:

  DROP CAST IF EXISTS (text AS public.casttesttype);
  ERROR:  type public.casttesttype does not exist

  DROP OPERATOR IF EXISTS public.% (point, widget);
  ERROR:  type widget does not exist

  DROP FUNCTION IF EXISTS public.pt_in_widget(point, widget);
  ERROR:  type widget does not exist

I'm not sure whether DROP CAST/OPERATOR/FUNCTION IF EXISTS should be
more tolerant of nonexistent types, of if the mess could perhaps be
avoided by dump reordering.

Note, this usability problem affects unpatched head as well:

  pg_dump -Fc -d regression --file=regression.dump
  pg_restore --clean -1 -d regression regression.dump
  ...
  pg_restore: [archiver (db)] could not execute query: ERROR:  type
widget does not exist
  Command was: DROP FUNCTION public.widget_out(widget);

(The use here is a little different than the first example above, but
I would still expect this case to work.) The above problems with IF
EXISTS aren't really a problem of the patch per se, but IMO it would
be nice to straighten all the issues out together for 9.4.

 * -- additional check, available only with -c option

Cool. I think it would also be useful to check that --clean may only
be used with --format=p to avoid any confusion there. (This issue
could be addressed in a separate patch if you'd rather not lard this
patch.)

Some comments on the changes:

1. There is at least one IF EXISTS check missing from pg_dump.c, see
for example this statement from a dump of the regression database with
--if-exists:

ALTER TABLE public.nv_child_2010 DROP CONSTRAINT nv_child_2010_d_check;

2. Shouldn't pg_restore get --if-exists as well?

3.
+   printf(_(  --if-exists  don't report error if
cleaned object doesn't exist\n));

This help output bleeds just over our de facto 80-character limit.
Also contractions seem to be avoided elsewhere. It's a little hard to
squeeze a decent explanation into one line, but perhaps:

  Use IF EXISTS when dropping objects

would be better. The sgml changes could use some wordsmithing and
grammar fixes. I could clean these up for you in a later version if
you'd like.

4. There seem to be spurious whitespace changes to the function
prototype and declaration for _printTocEntry.

That's all I've had time for so far...

Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel

2013-03-07 Thread Josh Berkus


 Postgres is currently full of fairly innocent-looking commands which
 take an unexpected ACCESS EXCLUSIVE lock.  For example, DROP CONSTRAINT
 takes an accessexclusive lock, but it hasn't stopped people from using
 constraints, and isn't particularly high up on our todo list to fix
 it.
 
 Thats a pretty unconvincing comparison. There isn't any expectation that
 ALTER TABLE works without taking exlusive locks from common

Not exclusive (which is expected), but AccessExclusive (which catches
many of our users by surprise).

How about the fact that dropping an FK constraint takes an
AccessExclusiveLock on the *referenced* table?

 implementations and DROP CONSTRAINT only takes a very short time while
 refreshing a materialized view possibly take rather long.

Right now there's no expectations at all about our new Matview feature.
 I think putting the locking information in the docs is the right way to go.

 Thats scaring me. Because the current state of the feature isn't
 something that people expect under the term materialized views and I
 am pretty damn sure people will then remember postgres as trying to
 provide a tick-box item without it being really usable in the real
 world.
 And thats not something I want postgres to be known for.

We promoted the heck out of binary replication when it was barely
usable.  We've gotten huge interest in our JSON support, even when it's
a work-in-progress.  I don't see why I should change an approach to
advocacy which is clearly working.  What our project considers an
incomplete feature other OSS DBMSes call a version 2.0.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REFRESH MATERIALIZED VIEW locklevel

2013-03-07 Thread Andres Freund

On 2013-03-07 15:54:32 -0800, Josh Berkus wrote:
 
  Postgres is currently full of fairly innocent-looking commands which
  take an unexpected ACCESS EXCLUSIVE lock.  For example, DROP CONSTRAINT
  takes an accessexclusive lock, but it hasn't stopped people from using
  constraints, and isn't particularly high up on our todo list to fix
  it.
  
  Thats a pretty unconvincing comparison. There isn't any expectation that
  ALTER TABLE works without taking exlusive locks from common
 
 Not exclusive (which is expected), but AccessExclusive (which catches
 many of our users by surprise).
 
 How about the fact that dropping an FK constraint takes an
 AccessExclusiveLock on the *referenced* table?

All of that is DDL.

  implementations and DROP CONSTRAINT only takes a very short time while
  refreshing a materialized view possibly take rather long.
 
 Right now there's no expectations at all about our new Matview feature.
  I think putting the locking information in the docs is the right way to go.

That should definitely be done.

The point is that
a) refreshing is the only way to update materialized views. There's no
   incremental support.
b) refreshing will take a long time (otherwise you wouldn't have
   create a materialized view) and you can't use the view during that.

Which means for anyone wanting to use matviews in a busy environment you
will need to build the new matview separately and then move it into
place via renames. With all the issues that brings like needing to
recreate dependent views and such.

Sorry, but thats not very useful expect (and there very much so) as a
stepping stone for further work.

  Thats scaring me. Because the current state of the feature isn't
  something that people expect under the term materialized views and I
  am pretty damn sure people will then remember postgres as trying to
  provide a tick-box item without it being really usable in the real
  world.
  And thats not something I want postgres to be known for.
 
 We promoted the heck out of binary replication when it was barely
 usable.  We've gotten huge interest in our JSON support, even when it's
 a work-in-progress.  I don't see why I should change an approach to
 advocacy which is clearly working.  What our project considers an
 incomplete feature other OSS DBMSes call a version 2.0.

I heard some people grumble about binary replication in 9.0 but there
were loads of realword scenarios it could be used. I heard quite some
people being annoyed about the level of json support even though it
provided some usefulness with row_to_json (or whatever its called). And
it's a feature that can be extended by extensions. And lots of the
defficiencies of binary replication could be solved by outside tooling.

Thats not possible with matviews as is. Which, again, is *totally fine*
in itself.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Materialized views and unique indexes

2013-03-07 Thread Michael Paquier

Hi all,

While testing materialized views, I found the following behavior with
unique indexes:
postgres=# create table aa as select generate_series(1,3) as a;
SELECT 3
postgres=# create materialized view aam as select * from aa;
SELECT 3
postgres=# create unique index aam_ind on aam(a);
CREATE INDEX
postgres=# insert into aa values (4);
INSERT 0 1
postgres=# insert into aa values (1);
INSERT 0 1
postgres=# refresh materialized view aam;
ERROR:  could not create unique index aam_ind
DETAIL:  Key (a)=(1) is duplicated.
postgres=# select * from aam;
 a
---
 1
 2
 3
(3 rows)

As expected, the refresh failed, but the error message is not really
user-friendly.
Shouldn't we output instead something like that?
ERROR: could not refresh materialized view because of failure when
rebuilding index
DETAIL: key is duplicated.

Thanks,
-- 
Michael

Re: [HACKERS] Materialized views and unique indexes

2013-03-07 Thread Josh Berkus


 As expected, the refresh failed, but the error message is not really
 user-friendly.
 Shouldn't we output instead something like that?
 ERROR: could not refresh materialized view because of failure when
 rebuilding index
 DETAIL: key is duplicated.

Is there a good reason to allow unique indexes (or constraints in
general) on matviews?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views and unique indexes

2013-03-07 Thread Michael Paquier

On Fri, Mar 8, 2013 at 11:33 AM, Josh Berkus j...@agliodbs.com wrote:


  As expected, the refresh failed, but the error message is not really
  user-friendly.
  Shouldn't we output instead something like that?
  ERROR: could not refresh materialized view because of failure when
  rebuilding index
  DETAIL: key is duplicated.

 Is there a good reason to allow unique indexes (or constraints in
 general) on matviews?

Don't think so. It would make sense to block the creation of all the
constraints on matviews.

Just based on the docs, matviews cannot have constraints:
http://www.postgresql.org/docs/devel/static/sql-altermaterializedview.html

Now that you mention it, you can create constraints on them (code at
c805659).
postgres=# create table aa (a int);
CREATE TABLE
postgres=# create materialized view aam as select * from aa;
SELECT 0
postgres=# alter materialized view aam add constraint popo unique(a);
ALTER MATERIALIZED VIEW
postgres=# \d aam
Materialized view public.aam
 Column |  Type   | Modifiers
+-+---
 a  | integer |
Indexes:
popo UNIQUE CONSTRAINT, btree (a)

Also, as it is not mandatory for a unique index to be a constraint, I think
that we should block the creation of unique indexes too to avoid any
problems. Any suggestions?
-- 
Michael

Re: [HACKERS] Materialized views and unique indexes

2013-03-07 Thread Craig Ringer

On 03/08/2013 10:55 AM, Michael Paquier wrote:
 Also, as it is not mandatory for a unique index to be a constraint, I
 think that we should block the creation of unique indexes too to avoid
 any problems. Any suggestions?
How much does the planner benefit from the implied constraint of a
unique index? I almost wonder if it should be allowed at the cost of
making the refresh of a matview that fails to comply an error.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-07 Thread Bruce Momjian

On Mon, Mar  4, 2013 at 05:04:27PM -0800, Daniel Farina wrote:
 Putting aside the not-so-rosy predictions seen elsewhere in this
 thread about the availability of a high performance, reliable
 checksumming file system available on common platforms, I'd like to
 express what benefit this feature will have to me:
 
 Corruption has easily occupied more than one person-month of time last
 year for us.  This year to date I've burned two weeks, although
 admittedly this was probably the result of statistical clustering.
 Other colleagues of mine have probably put in a week or two in
 aggregate in this year to date.  The ability to quickly, accurately,
 and maybe at some later date proactively finding good backups to run
 WAL recovery from is one of the biggest strides we can make in the
 operation of Postgres.  The especially ugly cases are where the page
 header is not corrupt, so full page images can carry along malformed
 tuples...basically, when the corruption works its way into the WAL,
 we're in much worse shape.  Checksums would hopefully prevent this
 case, converting them into corrupt pages that will not be modified.
 
 It would be better yet if I could write tools to find the last-good
 version of pages, and so I think tight integration with Postgres will
 see a lot of benefits that would be quite difficult and non-portable
 when relying on file system checksumming.

I see Heroku has corruption experience, and I know Jim Nasby has
struggled with corruption in the past.

I also see the checksum patch is taking a beating.  I wanted to step
back and ask what percentage of known corruptions cases will this
checksum patch detect?  What percentage of these corruptions would
filesystem checksums have detected?

Also, don't all modern storage drives have built-in checksums, and
report problems to the system administrator?  Does smartctl help report
storage corruption?

Let me take a guess at answering this --- we have several layers in a
database server:

1 storage
2 storage controller
3 file system
4 RAM
5 CPU

My guess is that storage checksums only cover layer 1, while our patch
covers layers 1-3, and probably not 4-5 because we only compute the
checksum on write.

If that is correct, the open question is what percentage of corruption
happens in layers 1-3?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views and unique indexes

2013-03-07 Thread Tom Lane

Craig Ringer cr...@2ndquadrant.com writes:
 On 03/08/2013 10:55 AM, Michael Paquier wrote:
 Also, as it is not mandatory for a unique index to be a constraint, I
 think that we should block the creation of unique indexes too to avoid
 any problems. Any suggestions?

 How much does the planner benefit from the implied constraint of a
 unique index? I almost wonder if it should be allowed at the cost of
 making the refresh of a matview that fails to comply an error.

A unique constraint can allow join elimination, so I'm thinking that
disallowing them is a bad idea (not to mention that it'd be a
considerable wart in the code to block them for matviews only).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views and unique indexes

2013-03-07 Thread Michael Paquier

On Fri, Mar 8, 2013 at 12:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Craig Ringer cr...@2ndquadrant.com writes:
  On 03/08/2013 10:55 AM, Michael Paquier wrote:
  Also, as it is not mandatory for a unique index to be a constraint, I
  think that we should block the creation of unique indexes too to avoid
  any problems. Any suggestions?

  How much does the planner benefit from the implied constraint of a
  unique index? I almost wonder if it should be allowed at the cost of
  making the refresh of a matview that fails to comply an error.

 A unique constraint can allow join elimination, so I'm thinking that
 disallowing them is a bad idea (not to mention that it'd be a
 considerable wart in the code to block them for matviews only).

Fair argument.

The error message at refresh step should be more explicit though. I still
have the feeling that users might be lost if a constraint introduced on
matviews is failing during refresh with the current error message.
-- 
Michael

Re: [HACKERS] Enabling Checksums

2013-03-07 Thread Pavel Stehule

2013/3/8 Bruce Momjian br...@momjian.us:
 On Mon, Mar  4, 2013 at 05:04:27PM -0800, Daniel Farina wrote:
 Putting aside the not-so-rosy predictions seen elsewhere in this
 thread about the availability of a high performance, reliable
 checksumming file system available on common platforms, I'd like to
 express what benefit this feature will have to me:

 Corruption has easily occupied more than one person-month of time last
 year for us.  This year to date I've burned two weeks, although
 admittedly this was probably the result of statistical clustering.
 Other colleagues of mine have probably put in a week or two in
 aggregate in this year to date.  The ability to quickly, accurately,
 and maybe at some later date proactively finding good backups to run
 WAL recovery from is one of the biggest strides we can make in the
 operation of Postgres.  The especially ugly cases are where the page
 header is not corrupt, so full page images can carry along malformed
 tuples...basically, when the corruption works its way into the WAL,
 we're in much worse shape.  Checksums would hopefully prevent this
 case, converting them into corrupt pages that will not be modified.

 It would be better yet if I could write tools to find the last-good
 version of pages, and so I think tight integration with Postgres will
 see a lot of benefits that would be quite difficult and non-portable
 when relying on file system checksumming.

 I see Heroku has corruption experience, and I know Jim Nasby has
 struggled with corruption in the past.

 I also see the checksum patch is taking a beating.  I wanted to step
 back and ask what percentage of known corruptions cases will this
 checksum patch detect?  What percentage of these corruptions would
 filesystem checksums have detected?

 Also, don't all modern storage drives have built-in checksums, and
 report problems to the system administrator?  Does smartctl help report
 storage corruption?

 Let me take a guess at answering this --- we have several layers in a
 database server:

 1 storage
 2 storage controller
 3 file system
 4 RAM
 5 CPU

 My guess is that storage checksums only cover layer 1, while our patch
 covers layers 1-3, and probably not 4-5 because we only compute the
 checksum on write.

 If that is correct, the open question is what percentage of corruption
 happens in layers 1-3?

I cooperate with important Czech bank - and they request checksum as
any other tool to increase a possibility to failure identification. So
missing checksums penalize a usability PostgreSQL to critical systems
- speed is not too important there.

Regards

Pavel


 --
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com

   + It's impossible for everything to be true. +


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-07 Thread Daniel Farina

On Thu, Mar 7, 2013 at 7:31 PM, Bruce Momjian br...@momjian.us wrote:
 On Mon, Mar  4, 2013 at 05:04:27PM -0800, Daniel Farina wrote:
 Putting aside the not-so-rosy predictions seen elsewhere in this
 thread about the availability of a high performance, reliable
 checksumming file system available on common platforms, I'd like to
 express what benefit this feature will have to me:

 Corruption has easily occupied more than one person-month of time last
 year for us.  This year to date I've burned two weeks, although
 admittedly this was probably the result of statistical clustering.
 Other colleagues of mine have probably put in a week or two in
 aggregate in this year to date.  The ability to quickly, accurately,
 and maybe at some later date proactively finding good backups to run
 WAL recovery from is one of the biggest strides we can make in the
 operation of Postgres.  The especially ugly cases are where the page
 header is not corrupt, so full page images can carry along malformed
 tuples...basically, when the corruption works its way into the WAL,
 we're in much worse shape.  Checksums would hopefully prevent this
 case, converting them into corrupt pages that will not be modified.

 It would be better yet if I could write tools to find the last-good
 version of pages, and so I think tight integration with Postgres will
 see a lot of benefits that would be quite difficult and non-portable
 when relying on file system checksumming.

 I see Heroku has corruption experience, and I know Jim Nasby has
 struggled with corruption in the past.

More than a little: it has entered the realm of the routine, and
happens frequently enough that it has become worthwhile to start
looking for patterns.

Our methods so far rely heavily on our archives to deal with it: it's
time consuming but the 'simple' case of replaying WAL from some
earlier base backup resulting in a non-corrupt database is easily the
most common.  Interestingly, the WAL has never failed to recover
halfway through because of CRC failures while treating corruption[0].
We know this fairly convincingly because we constantly sample txid and
wal positions while checking the database, as we typically do about
every thirty seconds.

I think this unreasonable effectiveness of this strategy of old backup
and WAL replay might suggest that database checksums would prove
useful.  In my mind, the ways this formula could work so well if the
bug was RAM or CPU based is slimmed considerably.

[0] I have seen -- very rarely -- substantial periods of severe WAL
corruption (files are not even remotely the correct size) propagated
to the archives in the case of disaster recovery where the machine met
its end because of the WAL disk being marked as dead.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread KONDO Mitsumasa


(2013/03/07 19:41), Heikki Linnakangas wrote:

On 07.03.2013 10:05, KONDO Mitsumasa wrote:

(2013/03/06 16:50), Heikki Linnakangas wrote:

Yeah. That fix isn't right, though; XLogPageRead() is supposed to
return true on success, and false on error, and the patch makes it
return 'true' on error, if archive recovery was requested but we're
still in crash recovery. The real issue here is that I missed the two
return NULL;s in ReadRecord(), so the code that I put in the
next_record_is_invalid codepath isn't run if XLogPageRead() doesn't
find the file at all. Attached patch is the proper fix for this.


Thanks for createing patch! I test your patch in 9.2_STABLE, but it does
not use promote command...
When XLogPageRead() was returned false ,it means the end of stanby loop,
crash recovery loop, and archive recovery loop.
Your patch is not good for promoting Standby to Master. It does not come
off standby loop.

So I make new patch which is based Heikki's and Horiguchi's patch.


Ah, I see. I committed a slightly modified version of this.

I feel that your modification is legible. Thanks for your modification and 
committing patch!
 

I also found a bug in latest 9.2_stable. It does not get latest timeline
and
recovery history file in archive recovery when master and standby
timeline is different.


Works for me.. Can you create a test script for that? Remember to set
recovery_target_timeline='latest'.

...
It can be reproduced in my test script, too.


I see the problem now, with that script. So what happens is that the startup process 
first scans the timeline history files to choose the recovery target timeline. For that 
scan, I temporarily set InArchiveRecovery=true, in readRecoveryCommandFile. However, 
after readRecoveryCommandFile returns, we then try to read the timeline history file 
corresponding the chosen recovery target timeline, but InArchiveRecovery is no longer 
set, so we don't fetch the file from archive, and return a dummy history, 
with just the target timeline in it. That doesn't contain the older timeline, so you get 
an error at recovery.
Fixed per your patch to check for ArchiveRecoveryRequested instead of 
InArchiveRecovery, when reading timeline history files. This also makes it 
unnecessary to temporarily set InArchiveRecovery=true, so removed that.
Committed both fixes. Please confirm this this fixed the problem in your test 
environment. Many thanks for the testing and the patches!

I understand this problem. Thank you for adding modification and detail 
explanation! I test latest REL9_2_STABLE in my system. I confirm that it run 
good without problem. If I found an another problem, I will report and send you 
patch and test script!


Best regards,
--
Mitsumasa KONDO
NTT OSS Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used

2013-03-07 Thread Kyotaro HORIGUCHI

Hello,

 Patches can be reviewed by more than one people.  It's particularly
 useful if they have different things to say.  So don't hesitate to
 review patches that have already been reviewed by other people.

Thanks for the advice. I was anxious about who among the
reviewers is, and when to make a decisision if the patch has
reached the level or not, I'll take it more easy.

 In fact, you can even review committed patches; it's not unlikely that
 you will be able to find bugs in those, too.

Umm.. to be sure..

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-07 Thread Kyotaro HORIGUCHI

Everything seems settled up above my head while sleeping..

Sorry for crumsy test script, and thank you for refining it, Mitsumasa.

And thank you for fixing the bug and the detailed explanation, Heikki.

I confirmed that the problem is fixed also for me at origin/REL9_2_STABLE.

 I understand this problem. Thank you for adding modification and
 detail explanation! I test latest REL9_2_STABLE in my system. I
 confirm that it run good without problem. If I found an another
 problem, I will report and send you patch and test script!

regards,

-- 
堀口恭太郎

日本電信電話株式会社 NTTオープンソフトウェアセンタ
Phone: 03-5860-5115 / Fax: 03-5463-5490


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

42 matches

Mail list logo