Re: [HACKERS] [GENERAL] Floating point error
Daniel Farina wrote: On Mon, Mar 4, 2013 at 2:27 PM, Maciek Sakrejda m.sakre...@gmail.com wrote: On Sun, Mar 3, 2013 at 9:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: The real difficulty is that there may be more than one storable value that corresponds to 1.23456 to six decimal digits. To be certain that we can reproduce the stored value uniquely, we have to err in the other direction, and print *more* decimal digits than the underlying precision justifies, rather than a bit less. Some of those digits are going to look like garbage to the naked eye. I think part of the difficulty here is that psql (if I understand this correctly) conflates the wire-format text representations with what should be displayed to the user. E.g., a different driver might parse the wire representation into a native representation, and then format that native representation when it is to be displayed. That's what the JDBC driver does, so it doesn't care about how the wire format actually looks. pg_dump cares about reproducing values exactly, and not about whether things are nice-looking, so it cranks up extra_float_digits. The JDBC driver might be justified in doing likewise, to ensure that the identical binary float value is stored on both client and server --- but that isn't even a valid goal unless you assume that the server's float implementation is the same as Java's, which is a bit of a leap of faith, even if IEEE 754 is nigh universal these days. I would hope that any driver cares about reproducing values exactly (or at least as exactly as the semantics of the client and server representations of the data type allow). Once you start talking operations, sure, things get a lot more complicated and you're better off not relying on any particular semantics. But IEEE 754 unambiguously defines certain bit patterns to correspond to certain values, no? If both client and server talk IEEE 754 floating point, it should be possible to round-trip values with no fuss and end up with the same bits you started with (and as far as I can tell, it is, as long as extra_float_digits is set to the max), even if the implementations of actual operations on these numbers behave very differently on client and server. I think given that many ORMs can cause UPDATEs on tuple fields that have not changed as part of saving an object, stable round trips seem like a desirable feature. But all these things are already available: Any driver that cares can set extra_float_digits=3, and if it prefers the binary format, the wire protocol supports sending floating point values as such. I also find the rationale for extra_float digits quite mysterious for the same reason: why would most programs care about precision less than pg_dump does? If a client wants floating point numbers to look nice, I think the rendering should be on them (e.g. psql and pgadmin), and the default should be to expose whatever precision is available to clients that want an accurate representation of what is in the database. This kind of change may have many practical problems that may make it un-pragmatic to alter at this time (considering the workaround is to set the extra float digits), but I can't quite grasp the rationale for well, the only program that cares about the most precision available is pg_dump. It seems like most programs would care just as much. I don't think that it is about looking nice. C doesn't promise you more than FLT_DIG or DBL_DIG digits of precision, so PostgreSQL cannot either. If you allow more, that would mean that if you store the same number on different platforms and query it, it might come out differently. Among other things, that would be a problem for the regression tests. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 3 March 2013 23:39, Tom Lane t...@sss.pgh.pa.us wrote: Nicolas Barbier nicolas.barb...@gmail.com writes: 2013/3/3 Kevin Grittner kgri...@ymail.com: Nicolas Barbier nicolas.barb...@gmail.com wrote: I think that automatically using materialized views even when the query doesn’t mention them directly, is akin to automatically using indexes without having to mention them in the query. Oh, I understand that concept perfectly well, I just wonder how often it is useful in practice. There's a much more fundamental reason why this will never happen, which is that the query planner is not licensed to decide that you only want an approximate and not an exact answer to your query. If MVs were guaranteed always up-to-date, maybe we could think about automatic use of them --- but that's a far different feature from what Kevin has here. Its not a different feature, its what most people expect a feature called MV to deliver. That's not a matter of opinion, its simply how every other database works currently - Oracle, Teradata, SQLServer at least. The fact that we don't allow MVs to automatically optimize queries is acceptable, as long as that is clearly marked in some way to avoid confusion, and I don't mean buried on p5 of the docs. What we have here is a partial implementation that can be improved upon over next few releases. I hope anyone isn't going to claim that Materialized Views have been implemented in the release notes in this release, because unqualified that would be seriously misleading and might even stifle further funding to improve things to the level already implemented elsewhere. Just to reiterate, I fully support the committing of this partial feature into Postgres in this release because it will be a long haul to complete the full feature and what we have here is a reasonable stepping stone to get there. Transactionally up-yo-date MVs can be used like indexes in the planner. The idea that this is impossible because of the permutations involved is somewhat ridiculous; there is much published work on optimising that and some obvious optimisations. Clearly that varies according to the number of MVs and the number of tables they touch, not the overall complexity of the query. The overhead is probably same or less as partial indexes, which we currently think is acceptable. In any case, if you don't wish that overhead, don't use MVs. Non-transactionally up-to-date MVs could also be used like indexes in the planner, if we gave the planner the licence it (clearly) lacks. If using MV makes a two-hour query return in 1 minute, then using an MV that is 15 minutes out of date is likely to be a win. The licence is some kind of user parameter/option that specifies how stale an answer a query can return. For many queries that involve averages and sums, a stale or perhaps an approximate answer would hardly differ anyway. So I think there is room somewhere there for a staleness time specification by the user, allowing approximation. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 5 March 2013 01:04, Daniel Farina dan...@heroku.com wrote: Corruption has easily occupied more than one person-month of time last year for us. This year to date I've burned two weeks, although admittedly this was probably the result of statistical clustering. Other colleagues of mine have probably put in a week or two in aggregate in this year to date. The ability to quickly, accurately, and maybe at some later date proactively finding good backups to run WAL recovery from is one of the biggest strides we can make in the operation of Postgres. The especially ugly cases are where the page header is not corrupt, so full page images can carry along malformed tuples...basically, when the corruption works its way into the WAL, we're in much worse shape. Checksums would hopefully prevent this case, converting them into corrupt pages that will not be modified. It would be better yet if I could write tools to find the last-good version of pages, and so I think tight integration with Postgres will see a lot of benefits that would be quite difficult and non-portable when relying on file system checksumming. You are among the most well-positioned to make assessments of the cost of the feature, but I thought you might appreciate a perspective of the benefits, too. I think they're large, and for me they are the highest pole in the tent for what makes Postgres stressful to operate as-is today. It's a testament to the quality of the programming in Postgres that Postgres programming error is not the largest problem. That's good perspective. I think we all need to be clear that committing this patch also commits the community (via the committer) to significant work and responsibility around this, and my minimum assessment of it is 1 month per year for a 3-5 years, much of that on the committer. In effect this will move time and annoyance experienced by users of Postgres back onto developers of Postgres. That is where it should be, but the effect will be large and easily noticeable, IMHO. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2.3 crashes during archive recovery
Hello, I could cause the behavior and might understand the cause. The head of origin/REL9_2_STABLE shows the behavior I metioned in the last message when using the shell script attached. 9.3dev runs as expected. In XLogPageRead, when RecPtr goes beyond the last page, the current xlog file is released and new page requested. The variables were as below at the point. StandbyRequested == true StandbyMode == false ArchiveRecoveryRequested == true InArchiveRecovery == false In this case, XLogPageRead immediately returns NULL before trying to get xlogs via streaming nor from archive. So ReadRecord returns NULL, then unexpectedly exits 'main redo apply loop' and increases timeline ID as if it were promoted. This seems fiexed by letting it try all requested sources. Attached patch does it and the test script runs as expected. We found that PostgreSQL with this patch unexpctedly becomes primary when starting up as standby. We'll do further investigation for the behavior. Anyway, I've committed this to master and 9.2 now. This seems to fix the issue. We'll examine this further. regards, -- Kyotaro Horiguchi NTT Open Source Software Center #! /bin/sh version=abf5c5 version=924b killall -9 postgres source pgsetpath $version 0 rm -rf $PGDATA/* $PGARC/* PGDATA0=$PGDATA PGPORT0=$PGPORT initdb -D $PGDATA0 cat $PGDATA0/postgresql.conf EOF wal_level = hot_standby checkpoint_segments = 300 checkpoint_timeout = 1h archive_mode = on archive_command = 'cp %p $PGARC/%f' max_wal_senders = 3 hot_standby = on EOF cat $PGDATA0/pg_hba.conf EOF local replication horigutitrust EOF echo ## Startup master pg_ctl -D $PGDATA0 -w start source pgsetpath $version 1 -p 5433 PGDATA1=$PGDATA PGPORT1=$PGPORT rm -rf $PGDATA/* $PGARC/* echo ## basebackup pg_basebackup -h /tmp -p $PGPORT0 -F p -X s -D $PGDATA1 chmod 700 $PGDATA1 cat $PGDATA1/recovery.conf EOF standby_mode = yes primary_conninfo='host=/tmp port=5432' restore_command='a=$PGARC; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; fi' #restore_command='a=$PGARC; if [ -d \$a ]; then echo Archive directory \$a is not found.; exit 1; elif [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; fi' EOF echo ## Startup standby pg_ctl -D $PGDATA1 start echo ## Sleep for 5 seconds sleep 5 echo ## Shutdown standby pg_ctl -D $PGDATA1 -w stop -m f echo ## Shutdown master in immediate mode pg_ctl -D $PGDATA0 -w stop -m i cat $PGDATA0/recovery.conf EOF standby_mode = yes primary_conninfo='host=/tmp port=5433' restore_command='a=$PGARC; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; fi' EOF echo ## Starting master as a standby if [ $1 == w ]; then touch /tmp/xlogwait; fi PGPORT=5432 pg_ctl -D $PGDATA0 start #psql postgres -c select pg_is_in_recovery(); diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 92adc4e..00b5bc5 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -10604,7 +10604,19 @@ retry: sources); switched_segment = true; if (readFile 0) +{ + if (!InArchiveRecovery ArchiveRecoveryRequested) + { + InArchiveRecovery = true; + goto retry; + } + else if (!StandbyMode StandbyModeRequested) + { + StandbyMode = true; + goto retry; + } return false; +} } } } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
On 04.03.2013 09:11, Simon Riggs wrote: On 3 March 2013 18:24, Greg Smithg...@2ndquadrant.com wrote: The 16-bit checksum feature seems functional, with two sources of overhead. There's some CPU time burned to compute checksums when pages enter the system. And there's extra overhead for WAL logging hint bits. I'll quantify both of those better in another message. It's crunch time. Do you and Jeff believe this patch should be committed to Postgres core? Are there objectors? In addition to my hostility towards this patch in general, there are some specifics in the patch I'd like to raise (read out in a grumpy voice): If you enable checksums, the free space map never gets updated in a standby. It will slowly drift to be completely out of sync with reality, which could lead to significant slowdown and bloat after failover. Since the checksums are an all-or-nothing cluster-wide setting, the three extra flags in the page header, PD_CHECKSUMS1, PD_CHECKSUM2 and PD_HEADERCHECK, are not needed. Let's leave them out. That keeps the code simpler, and leaves the bits free for future use. If we want to enable such per-page setting in the future, we can add it later. For a per-relation scheme, they're not needed. + * The checksum algorithm is a modified Fletcher 64-bit (which is + * order-sensitive). The modification is because, at the end, we have two + * 64-bit sums, but we only have room for a 16-bit checksum. So, instead of + * using a modulus of 2^32 - 1, we use 2^8 - 1; making it also resemble a + * Fletcher 16-bit. We don't use Fletcher 16-bit directly, because processing + * single bytes at a time is slower. How does the error detection rate of this compare with e.g CRC-16? Is there any ill effect from truncating the Fletcher sums like this? + /* +* Store the sums as bytes in the checksum. We add one to shift the range +* from 0..255 to 1..256, to make zero invalid for checksum bytes (which +* seems wise). +*/ + p8Checksum[0] = (sum1 % 255) + 1; + p8Checksum[1] = (sum2 % 255) + 1; That's a bit odd. We don't avoid zero in the WAL crc, and I don't recall seeing that in other checksum implementations either. 16-bits is not very wide for a checksum, and this eats about 1% of the space of valid values. I can see that it might be a handy debugging aid to avoid 0. But there's probably no need to avoid 0 in both bytes, it seems enough to avoid a completely zero return value. XLogCheckBuffer() and XLogCheckBufferNeedsBackup() read the page LSN without a lock. That's not atomic, so it could incorrectly determine that a page doesn't need to be backed up. We used to always hold an exclusive lock on the buffer when it's called, which prevents modifications to the LSN, but that's no longer the case. Shouldn't SetBufferCommitInfoNeedsSave() check the BM_PERMANENT flag? I think it will generate WAL records for unlogged tables as it is. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
Hi, Have you benchmarked the toastrelidx removal stuff in any way? If not, thats fine, but if yes I'd be interested. On 2013-03-04 22:33:53 +0900, Michael Paquier wrote: --- a/src/backend/access/heap/tuptoaster.c +++ b/src/backend/access/heap/tuptoaster.c @@ -1238,7 +1238,7 @@ toast_save_datum(Relation rel, Datum value, struct varlena * oldexternal, int options) { Relationtoastrel; - Relationtoastidx; + Relation *toastidxs; HeapTuple toasttup; TupleDesc toasttupDesc; Datum t_values[3]; @@ -1257,15 +1257,26 @@ toast_save_datum(Relation rel, Datum value, char *data_p; int32 data_todo; Pointer dval = DatumGetPointer(value); + ListCell *lc; + int count = 0; I find count a confusing name for a loop iteration variable... i of orr, idxno, or ... + int num_indexes; /* * Open the toast relation and its index. We can use the index to check * uniqueness of the OID we assign to the toasted item, even though it has - * additional columns besides OID. + * additional columns besides OID. A toast table can have multiple identical + * indexes associated to it. */ toastrel = heap_open(rel-rd_rel-reltoastrelid, RowExclusiveLock); toasttupDesc = toastrel-rd_att; - toastidx = index_open(toastrel-rd_rel-reltoastidxid, RowExclusiveLock); + if (toastrel-rd_indexvalid == 0) + RelationGetIndexList(toastrel); Hm, I think we should move this into a macro, this is cropping up at more and more places. - index_insert(toastidx, t_values, t_isnull, - (toasttup-t_self), - toastrel, - toastidx-rd_index-indisunique ? - UNIQUE_CHECK_YES : UNIQUE_CHECK_NO); + for (count = 0; count num_indexes; count++) + index_insert(toastidxs[count], t_values, t_isnull, + (toasttup-t_self), + toastrel, + toastidxs[count]-rd_index-indisunique ? + UNIQUE_CHECK_YES : UNIQUE_CHECK_NO); The indisunique check looks like a copy pasto to me, albeit not yours... /* * Create the TOAST pointer value that we'll return @@ -1475,10 +1493,13 @@ toast_delete_datum(Relation rel, Datum value) struct varlena *attr = (struct varlena *) DatumGetPointer(value); struct varatt_external toast_pointer; + /* + * We actually use only the first index but taking a lock on all is + * necessary. + */ Hm, is it guaranteed that the first index is valid? + foreach(lc, toastrel-rd_indexlist) + toastidxs[count++] = index_open(lfirst_oid(lc), RowExclusiveLock); /* - * If we're swapping two toast tables by content, do the same for their - * indexes. + * If we're swapping two toast tables by content, do the same for all of + * their indexes. The swap can actually be safely done only if all the indexes + * have valid Oids. */ What's an index without a valid oid? if (swap_toast_by_content - relform1-reltoastidxid relform2-reltoastidxid) - swap_relation_files(relform1-reltoastidxid, - relform2-reltoastidxid, - target_is_pg_class, - swap_toast_by_content, - InvalidTransactionId, - InvalidMultiXactId, - mapped_tables); + relform1-reltoastrelid + relform2-reltoastrelid) + { + Relation toastRel1, toastRel2; + + /* Open relations */ + toastRel1 = heap_open(relform1-reltoastrelid, RowExclusiveLock); + toastRel2 = heap_open(relform2-reltoastrelid, RowExclusiveLock); Shouldn't those be Access Exlusive Locks? + /* Obtain index list if necessary */ + if (toastRel1-rd_indexvalid == 0) + RelationGetIndexList(toastRel1); + if (toastRel2-rd_indexvalid == 0) + RelationGetIndexList(toastRel2); + + /* Check if the swap is possible for all the toast indexes */ So there's no error being thrown if this turns out not to be possible? + if (!list_member_oid(toastRel1-rd_indexlist, InvalidOid) +
Re: [HACKERS] Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
Hello, I think that the only other behavioural glitch I spotted was that it fails to catch one overflow case, which should generate an out of ranger error: select format('|%*s|', -2147483648, 'foo'); Result: |foo| because -(-2147483648) = 0 in signed 32-bit integers. Ouch. Thanks for pointing out. fixed - next other overflow check added Your change shown below seems assuming that the two's complement of the most negative number in integer types is identical to itself, and looks working as expected at least on linux/x86_64. But C standard defines it as undefined, (As far as I hear :-). | if (width != 0) | { | int32 _width = -width; | | if (SAMESIGN(width, _width)) | ereport(ERROR, Instead, I think we can deny it by simply comparing with INT_MIN. I modified the patch like so and put some modifications on styling. Finally, enlargeStringInfo fails just after for large numbers. We might should keep it under certain length to get rid of memory exhaustion. Anyway, I'll send this patch to committers as it is in this message. best wishes, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 9b7e967..b2d2ed6 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1519,21 +1519,13 @@ primaryformat/primary /indexterm literalfunctionformat/function(parameterformatstr/parameter typetext/type -[, parameterstr/parameter typeany/type [, ...] ])/literal +[, parameterformatarg/parameter typeany/type [, ...] ])/literal /entry entrytypetext/type/entry entry Format arguments according to a format string. - This function is similar to the C function - functionsprintf/, but only the following conversion specifications - are recognized: literal%s/literal interpolates the corresponding - argument as a string; literal%I/literal escapes its argument as - an SQL identifier; literal%L/literal escapes its argument as an - SQL literal; literal%%/literal outputs a literal literal%/. - A conversion can reference an explicit parameter position by preceding - the conversion specifier with literalreplaceablen/$/, where - replaceablen/replaceable is the argument position. - See also xref linkend=plpgsql-quote-literal-example. + This function is similar to the C function functionsprintf/. + See xref linkend=functions-string-format. /entry entryliteralformat('Hello %s, %1$s', 'World')/literal/entry entryliteralHello World, World/literal/entry @@ -2847,6 +2839,186 @@ /tgroup /table + sect2 id=functions-string-format +titlefunctionformat/function/title + +indexterm + primaryformat/primary +/indexterm + +para + The function functionformat/ produces formatted output according to + a format string in a similar way to the C function functionsprintf/. +/para + +para +synopsis +format(parameterformatstr/ typetext/ [, parameterformatarg/ typeany/ [, ...] ]) +/synopsis + replaceableformatstr/ is a format string that specifies how the + result should be formatted. Text in the format string is copied directly + to the result, except where firsttermformat specifiers/ are used. + Format specifiers act as placeholders in the string, allowing subsequent + function arguments to be formatted and inserted into the result. +/para + +para + Format specifiers are introduced by a literal%/ character and take + the form +synopsis +%[replaceableparameter/][replaceableflags/][replaceablewidth/]replaceabletype/ +/synopsis + variablelist + varlistentry + termreplaceableparameter/replaceable (optional)/term + listitem +para + An expression of the form literalreplaceablen/$/ where + replaceablen/ is the index of the argument to use for the format + specifier's value. An index of 1 means the first argument after + replaceableformatstr/. If the replaceableparameter/ field is + omitted, the default is to use the next argument. +/para +screen +SELECT format('Testing %s, %s, %s', 'one', 'two', 'three'); +lineannotationResult: /computeroutputTesting one, two, three/ + +SELECT format('Testing %3$s, %2$s, %1$s', 'one', 'two', 'three'); +lineannotationResult: /computeroutputTesting three, two, one/ +/screen + +para + Note that unlike the C function functionsprintf/ defined in the + Single UNIX Specification, the functionformat/ function in + productnamePostgreSQL/ allows format specifiers with and without + explicit replaceableparameter/ fields to be mixed in the same + format string. A format specifier without a + replaceableparameter/ field always uses the next
Re: [HACKERS] 9.2.3 crashes during archive recovery
Sorry, I sent wrong script. The head of origin/REL9_2_STABLE shows the behavior I metioned in the last message when using the shell script attached. 9.3dev runs as expected. regards, -- Kyotaro Horiguchi NTT Open Source Software Center #! /bin/sh pgpath=$HOME/bin/pgsql_924b echo $PATH | grep $pgpath | wc -l if [ `echo $PATH | grep $pgpath | wc -l` == 0 ]; then PATH=$pgpath/bin:$PATH fi PGDATA0=$HOME/data/pgdata_0 PGDATA1=$HOME/data/pgdata_1 PGARC0=$HOME/data/pgarc_0 PGARC1=$HOME/data/pgarc_1 PGPORT0=5432 PGPORT1=5433 unset PGPORT unset PGDATA echo Postgresql is \`which postgres`\ killall -9 postgres rm -rf $PGDATA0/* $PGARC0/* initdb -D $PGDATA0 cat $PGDATA0/postgresql.conf EOF port=5432 wal_level = hot_standby checkpoint_segments = 300 checkpoint_timeout = 1h archive_mode = on archive_command = 'cp %p $PGARC0/%f' max_wal_senders = 3 hot_standby = on #log_min_messages = debug5 EOF cat $PGDATA0/pg_hba.conf EOF local replication horigutitrust EOF echo ## Startup master pg_ctl -D $PGDATA0 -w -o -p $PGPORT0 start rm -rf $PGDATA1/* $PGARC1/* echo ## basebackup pg_basebackup -h /tmp -p $PGPORT0 -F p -X s -D $PGDATA1 chmod 700 $PGDATA1 cat $PGDATA1/recovery.conf EOF standby_mode = yes primary_conninfo='host=/tmp port=5432' restore_command='a=$PGARC1; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; fi' EOF echo ## Startup standby pg_ctl -D $PGDATA1 -o -p $PGPORT1 start echo ## Sleep for 5 seconds sleep 5 echo ## Shutdown standby pg_ctl -D $PGDATA1 -w stop -m f echo ## Shutdown master in immediate mode pg_ctl -D $PGDATA0 -w stop -m i cat $PGDATA0/recovery.conf EOF standby_mode = yes primary_conninfo='host=/tmp port=5433' restore_command='a=$PGARC0; if [ -f \$a/%f ]; then cp \$a/%f %p; else exit 1; fi' EOF echo ## Starting master as a standby pg_ctl -D $PGDATA0 start -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2.3 crashes during archive recovery
Hi, Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord(); Attempt patch records minRecoveryPoint. [crash recovery - record minRecoveryPoint in control file - archive recovery] I think that this is an original intention of Heikki's patch. I also found a bug in latest 9.2_stable. It does not get latest timeline and recovery history file in archive recovery when master and standby timeline is different. Best regards, (2013/03/05 18:22), Kyotaro HORIGUCHI wrote: Hello, I could cause the behavior and might understand the cause. The head of origin/REL9_2_STABLE shows the behavior I metioned in the last message when using the shell script attached. 9.3dev runs as expected. In XLogPageRead, when RecPtr goes beyond the last page, the current xlog file is released and new page requested. The variables were as below at the point. StandbyRequested == true StandbyMode == false ArchiveRecoveryRequested == true InArchiveRecovery == false In this case, XLogPageRead immediately returns NULL before trying to get xlogs via streaming nor from archive. So ReadRecord returns NULL, then unexpectedly exits 'main redo apply loop' and increases timeline ID as if it were promoted. This seems fiexed by letting it try all requested sources. Attached patch does it and the test script runs as expected. We found that PostgreSQL with this patch unexpctedly becomes primary when starting up as standby. We'll do further investigation for the behavior. Anyway, I've committed this to master and 9.2 now. This seems to fix the issue. We'll examine this further. regards, -- Mitsumasa KONDO NTT OSS Center --- a/src/backend/access/transam/xlog.c 2013-03-04 15:13:49.0 -0500 +++ b/src/backend/access/transam/xlog.c 2013-03-05 06:43:49.435093827 -0500 @@ -4446,7 +4446,7 @@ if (targetTLI == 1) return list_make1_int((int) targetTLI); - if (InArchiveRecovery) + if (ArchiveRecoveryRequested) { TLHistoryFileName(histfname, targetTLI); fromArchive = @@ -10603,8 +10603,11 @@ readFile = XLogFileReadAnyTLI(readId, readSeg, emode, sources); switched_segment = true; -if (readFile 0) +if (readFile 0){ + if (StandbyModeRequested) + return true; return false; +} } } } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Simon Riggs si...@2ndquadrant.com wrote: On 3 March 2013 23:39, Tom Lane t...@sss.pgh.pa.us wrote: Nicolas Barbier nicolas.barb...@gmail.com writes: 2013/3/3 Kevin Grittner kgri...@ymail.com: Nicolas Barbier nicolas.barb...@gmail.com wrote: I think that automatically using materialized views even when the query doesn’t mention them directly, is akin to automatically using indexes without having to mention them in the query. Oh, I understand that concept perfectly well, I just wonder how often it is useful in practice. There's a much more fundamental reason why this will never happen, which is that the query planner is not licensed to decide that you only want an approximate and not an exact answer to your query. If MVs were guaranteed always up-to-date, maybe we could think about automatic use of them --- but that's a far different feature from what Kevin has here. Its not a different feature, its what most people expect a feature called MV to deliver. That's not a matter of opinion, its simply how every other database works currently - Oracle, Teradata, SQLServer at least. The fact that we don't allow MVs to automatically optimize queries is acceptable, as long as that is clearly marked in some way to avoid confusion, and I don't mean buried on p5 of the docs. What we have here is a partial implementation that can be improved upon over next few releases. I hope anyone isn't going to claim that Materialized Views have been implemented in the release notes in this release, because unqualified that would be seriously misleading and might even stifle further funding to improve things to the level already implemented elsewhere. Just to reiterate, I fully support the committing of this partial feature into Postgres in this release because it will be a long haul to complete the full feature and what we have here is a reasonable stepping stone to get there. Transactionally up-yo-date MVs can be used like indexes in the planner. The idea that this is impossible because of the permutations involved is somewhat ridiculous; there is much published work on optimising that and some obvious optimisations. Clearly that varies according to the number of MVs and the number of tables they touch, not the overall complexity of the query. The overhead is probably same or less as partial indexes, which we currently think is acceptable. In any case, if you don't wish that overhead, don't use MVs. Non-transactionally up-to-date MVs could also be used like indexes in the planner, if we gave the planner the licence it (clearly) lacks. If using MV makes a two-hour query return in 1 minute, then using an MV that is 15 minutes out of date is likely to be a win. The licence is some kind of user parameter/option that specifies how stale an answer a query can return. For many queries that involve averages and sums, a stale or perhaps an approximate answer would hardly differ anyway. So I think there is room somewhere there for a staleness time specification by the user, allowing approximation. I don't think I disagree with any of what Simon says other than his feelings about the planning cost. Imagine that there are ten MVs that might apply to a complex query, but some of them are mutually exclusive, so there are a large number of permutations of MVs which could be used to replace parts of the original query. And maybe some of base tables have indexes which could reduce execution cost which aren't present in some or all of the MVs. And each MV has a number of indexes. The combinatorial explosion of possible plans would make it hard to constrain plan time without resorting to some crude rules about what to choose. That's not an unsolvable problem, but I see it as a pretty big problem. I have no doubt that someone could take a big data warehouse and add one or two MVs and show a dramatic improvement in the run time of a query. Almost as big as if the query were rewritten to usee the MV directly. It would make for a very nice presentation, and as long as they are used sparingly this could be a useful tool for a data warehouse environment which is playing with alternative ways to optimize slow queries which pass a lot of data. In other environments, I feel that it gets a lot harder to show a big win. The good news is that it sounds like we agree on the ideal long-term feature set. I'm just a lot more excited, based on the use-cases I've seen, about the addition of incremental updates than substituting MVs into query plans which reference the underlying tables. Perhaps that indicates a chance to the final feature set sooner, through everyone scratching their own itches. :-) And we both seem to feel that some system for managing acceptable levels of MV freshness is a necessary feature in order to go very much further. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list
[HACKERS] Materialized view assertion failure in HEAD
I'm getting an assertion failure in HEAD with materialized views, see below for backtrace. To reproduce, just run make installcheck, dump the regression database and then restore it, the server crashes during restore. (gdb) bt #0 0x7f283a366425 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x7f283a369b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00888429 in ExceptionalCondition ( conditionName=0xa0c840 !(!matviewRel-rd_rel-relhasoids), errorType=0xa0c78a FailedAssertion, fileName=0xa0c780 matview.c, lineNumber=135) at assert.c:54 #3 0x005dbfc4 in ExecRefreshMatView (stmt=0x1b70a28, queryString=0x1b6ff98 REFRESH MATERIALIZED VIEW tvmm;\n\n\n, params=0x0, completionTag=0x7fff47af0a60 ) at matview.c:135 #4 0x007758a5 in standard_ProcessUtility (parsetree=0x1b70a28, queryString=0x1b6ff98 REFRESH MATERIALIZED VIEW tvmm;\n\n\n, params=0x0, dest=0x1b70da0, completionTag=0x7fff47af0a60 , context=PROCESS_UTILITY_TOPLEVEL) at utility.c:1173 #5 0x00773e3f in ProcessUtility (parsetree=0x1b70a28, queryString=0x1b6ff98 REFRESH MATERIALIZED VIEW tvmm;\n\n\n, params=0x0, dest=0x1b70da0, completionTag=0x7fff47af0a60 , context=PROCESS_UTILITY_TOPLEVEL) at utility.c:341 #6 0x00772d7e in PortalRunUtility (portal=0x1aeb6d8, utilityStmt=0x1b70a28, isTopLevel=1 '\001', dest=0x1b70da0, completionTag=0x7fff47af0a60 ) at pquery.c:1185 #7 0x00772f56 in PortalRunMulti (portal=0x1aeb6d8, isTopLevel=1 '\001', dest=0x1b70da0, altdest=0x1b70da0, completionTag=0x7fff47af0a60 ) at pquery.c:1317 #8 0x00772481 in PortalRun (portal=0x1aeb6d8, count=9223372036854775807, isTopLevel=1 '\001', dest=0x1b70da0, altdest=0x1b70da0, completionTag=0x7fff47af0a60 ) at pquery.c:814 #9 0x0076c155 in exec_simple_query ( query_string=0x1b6ff98 REFRESH MATERIALIZED VIEW tvmm;\n\n\n) at postgres.c:1048 #10 0x00770517 in PostgresMain (argc=2, argv=0x1ab4bb0, username=0x1ab4a88 joe) at postgres.c:3969 #11 0x0070ccec in BackendRun (port=0x1ae5ac0) at postmaster.c:3989 #12 0x0070c401 in BackendStartup (port=0x1ae5ac0) at postmaster.c:3673 #13 0x00708ce6 in ServerLoop () at postmaster.c:1575 #14 0x00708420 in PostmasterMain (argc=3, argv=0x1ab2420) at postmaster.c:1244 #15 0x006704f7 in main (argc=3, argv=0x1ab2420) at main.c:197 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] odd behavior in materialized view
On Tue, Mar 5, 2013 at 7:36 AM, Kevin Grittner kgri...@ymail.com wrote: Fujii Masao masao.fu...@gmail.com wrote: When I accessed the materialized view in the standby server, I got the following ERROR message. Looks odd to me. Is this a bug? ERROR: materialized view hogeview has not been populated HINT: Use the REFRESH MATERIALIZED VIEW command. The procedure to reproduce this error message is: In the master server: CREATE TABLE hoge (i int); INSERT INTO hoge VALUES (generate_series(1,100)); CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge; DELETE FROM hoge; REFRESH MATERIALIZED VIEW hogeview; SELECT count(*) FROM hogeview; In the standby server SELECT count(*) FROM hogeview; SELECT count(*) goes well in the master, and expectedly returns 0. OTOH, in the standby, it emits the error message. Will investigate. Thanks! And I found another problem. When I ran the following SQLs in the master, PANIC error occurred in the standby. CREATE TABLE hoge (i int); INSERT INTO hoge VALUES (generate_series(1,100)); CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge; VACUUM ANALYZE; The PANIC error messages that I got in the standby are WARNING: page 0 of relation base/12297/16387 is uninitialized CONTEXT: xlog redo visible: rel 1663/12297/16387; blk 0 PANIC: WAL contains references to invalid pages CONTEXT: xlog redo visible: rel 1663/12297/16387; blk 0 base/12297/16387 is the file of the materialized view 'hogeview'. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sql_drop Event Trigger
On Mon, Mar 4, 2013 at 11:13 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Another question. If I do ALTER TABLE foo DROP COLUMN bar, do we need to fire an event trigger for the dropped column? Right now we don't, ISTM we should. And if we want that, then the above set of three properties doesn't cut it. +1. Similar questions arise for object-access-hooks, among other places. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sql_drop Event Trigger
On Mon, Mar 4, 2013 at 4:59 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Alvaro Herrera escribió: I think this is mostly ready to go in. I'll look at your docs, and unless there are more objections will commit later or early tomorrow. Actually it still needs a bit more work: the error messages in pg_event_trigger_dropped_object need to be reworked. It's a bit annoying that the function throws an error if the function is called in a CREATE command, rather than returning an empty set; or is it just me? That seems like a reasonable thing to change. Here's v6 with docs and regression tests too. Note the new function in objectaddress.c; without that, I was getting regression failures because catalogs such as pg_amop and pg_default_acl are not present in its supporting table. I agree that this is reasonably close to being committable. I do have a bit of concern about the save-and-restore logic for the dropped-object list -- it necessitates that the processing of every DDL command that can potentially drop objects be bracketed with a PG_TRY/PG_CATCH block. While that's relatively easy to guarantee today (but doesn't ALTER TABLE need similar handling?), it seems to me that it might get broken in the future. How hard would it be to pass this information up and down the call stack instead of doing it this way? Or at least move the save/restore logic into something inside the deletion machinery itself so that new callers don't have to worry about it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimizing pglz compressor
I spent some more time on this, and came up with the attached patch. It includes the changes I posted earlier, to use indexes instead of pointers in the hash table. In addition, it makes the hash table size variable, depending on the length of the input. This further reduces the startup cost on small inputs. I changed the hash method slightly, because the old method would not use any bits from the 3rd byte with a small hash table size, but fortunately that didn't seem to negative impact with larger hash table sizes either. I wrote a little C extension to test this. It contains a function, which runs pglz_compress() on a bytea input, N times. I ran that with different kinds of inputs, and got the following results: unpatched: testname | auto ---+--- 5k text | 1425.750 512b text | 1287.413 2k random | -1053.160 100k random | -1056.379 512b random | -1018.474 64b of text | -2547.106 64b random| -3731.700 100k of same byte | 1093.885 100k text | 849.430 (9 rows) pglz-replace-pointers-with-indexes.patch (the patch I posted earlier): testname | auto ---+--- 5k text | 1251.576 512b text | 946.348 2k random | -815.095 100k random | -808.356 512b random | -614.418 64b of text | -744.382 64b random| -1060.758 100k of same byte | 1192.397 100k text | 796.530 (9 rows) pglz-variable-size-hash-table.patch: testname | auto ---+-- 5k text | 1276.905 512b text | 823.796 2k random | -844.484 100k random | -848.080 512b random | -615.039 64b of text | -393.448 64b random| -568.685 100k of same byte | 1186.759 100k text | 799.978 (9 rows) These values are from a single run of the test, but I did repeat them several times to make sure there isn't too much variability in them to render the results meaningless. The negative values mean that pglz_compress gave up on the compression in the test, ie. it shows how long it took for pglz_compress to realize that it can't compress the input. Compare the abs() values. With the variable-size hash table, I'm not sure how significant the earlier patch to use indexes instead of pointers is. But I don't think it makes things any worse, so it's included in this. On 01.03.2013 17:42, Heikki Linnakangas wrote: On 01.03.2013 17:37, Alvaro Herrera wrote: My take on this is that if this patch is necessary to get Amit's patch to a commitable state, it's fair game. I don't think it's necessary for that, but let's see.. With these tweaks, I was able to make pglz-based delta encoding perform roughly as well as Amit's patch. So I think that's the approach we should take, as it's a bit simpler and more versatile. I'll follow up on that in the other thread. - Heikki diff --git a/src/backend/utils/adt/pg_lzcompress.c b/src/backend/utils/adt/pg_lzcompress.c index 66c64c1..b8a69ec 100644 --- a/src/backend/utils/adt/pg_lzcompress.c +++ b/src/backend/utils/adt/pg_lzcompress.c @@ -112,7 +112,7 @@ * of identical bytes like trailing spaces) and for bigger ones * our 4K maximum look-back distance is too small. * - * The compressor creates a table for 8192 lists of positions. + * The compressor creates a table for lists of positions. * For each input position (except the last 3), a hash key is * built from the 4 next input bytes and the position remembered * in the appropriate list. Thus, the table points to linked @@ -120,7 +120,10 @@ * matching strings. This is done on the fly while the input * is compressed into the output area. Table entries are only * kept for the last 4096 input positions, since we cannot use - * back-pointers larger than that anyway. + * back-pointers larger than that anyway. The size of the hash + * table depends on the size of the input - a larger table has + * a larger startup cost, as it needs to be initialized to zero, + * but reduces the number of hash collisions on long inputs. * * For each byte in the input, it's hash key (built from this * byte and the next 3) is used to find the appropriate list @@ -180,8 +183,7 @@ * Local definitions * -- */ -#define PGLZ_HISTORY_LISTS 8192 /* must be power of 2 */ -#define PGLZ_HISTORY_MASK (PGLZ_HISTORY_LISTS - 1) +#define PGLZ_MAX_HISTORY_LISTS 8192 /* must be power of 2 */ #define PGLZ_HISTORY_SIZE 4096 #define PGLZ_MAX_MATCH 273 @@ -198,9 +200,9 @@ */ typedef struct PGLZ_HistEntry { - struct PGLZ_HistEntry *next; /* links for my hash key's list */ - struct PGLZ_HistEntry *prev; - int hindex; /* my current hash key */ + int16 next; /* links for my hash key's list */ + int16 prev; + uint32 hindex; /* my current hash key */ const char *pos; /* my input position */ }
Re: [HACKERS] Fix pgstattuple/pgstatindex to use regclass-type as the argument
On Sun, Mar 3, 2013 at 5:32 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maybe this is acceptable collateral damage. I don't know. But we definitely stand a chance of breaking applications if we change pgstatindex like this. It might be better to invent a differently-named function to replace pgstatindex. If this were a built-in function, we might have to make a painful decision between breaking backward compatibility and leaving this broken forever, but as it isn't, we don't. I think your suggestion of adding a new function is exactly right. We can remove the old one in a future release, and support both in the meantime. It strikes me that if extension versioning is for anything, this is it. We encountered, not long ago, a case where someone couldn't pg_upgrade from 9.0 to 9.2. The reason is that they had defined a view which happened to reference pg_stat_activity.procpid, which we renamed. Oops. Granted, few users do that, and granted, we can't always refrain from changing system catalog structure. But it seems to me that it's good to avoid the pain where we can. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
2013/3/5 Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp: Hello, I think that the only other behavioural glitch I spotted was that it fails to catch one overflow case, which should generate an out of ranger error: select format('|%*s|', -2147483648, 'foo'); Result: |foo| because -(-2147483648) = 0 in signed 32-bit integers. Ouch. Thanks for pointing out. fixed - next other overflow check added Your change shown below seems assuming that the two's complement of the most negative number in integer types is identical to itself, and looks working as expected at least on linux/x86_64. But C standard defines it as undefined, (As far as I hear :-). | if (width != 0) | { | int32 _width = -width; | | if (SAMESIGN(width, _width)) | ereport(ERROR, this pattern was used elsewhere in pg Instead, I think we can deny it by simply comparing with INT_MIN. I modified the patch like so and put some modifications on styling. ook - I have not enough expirience with this topic and I cannot say what is preferred. Finally, enlargeStringInfo fails just after for large numbers. We might should keep it under certain length to get rid of memory exhaustion. I though about it, but I don't know a correct value - probably any width specification higher 1MB will be bogus and can be blocked ?? Our VARLENA max size is 1GB so with should not be higher than 1GB ever. what do you thinking about these limits? Regards Pavel Anyway, I'll send this patch to committers as it is in this message. best wishes, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Floating point error
Daniel Farina dan...@heroku.com wrote: This kind of change may have many practical problems that may make it un-pragmatic to alter at this time (considering the workaround is to set the extra float digits), but I can't quite grasp the rationale for well, the only program that cares about the most precision available is pg_dump. It seems like most programs would care just as much. Something to keep in mind is that when you store 0.01 into a double precision column, the precise value stored, when written in decimal, is: 0.0120816681711721685132943093776702880859375 Of course, some values can't be precisely written in decimal with so few digits. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggested new CF status: Pending Discussion
On Mon, Mar 4, 2013 at 11:50 PM, Jeff Janes jeff.ja...@gmail.com wrote: Is that true of all commitfests, or only the last one in a cycle? If the former, I think the existence of the waiting on author category belies this point. The original point of Waiting on Author was that a patch might need minor adjustments before it could get committed. And, generally, for the first few CommitFests, that's what happens. But in the final CommitFest, some people's notion of what a tweak in expands to include multiple complete rewrites. If a patch is basically OK, or only minor points are being discussed, it's appropriate to punt it back to the author and say, hey, fix these things and then we can commit this. But if the whole method of operation of the patch needs rethinking, then, in my opinion anyway, it should get pushed out to the last CommitFest. As for the last CommitFest, it's less that we should apply a different standard and more that people fight back against the same standard a lot harder when it means waiting a whole release cycle. Very little of the recently-committed stuff was ready to commit on January 15th, or even close to it, and the percentage of what's left that falls into that category is probably dropping steadily. At this point, if there's not a consensus on it, the correct status is Returned with Feedback. Specifically, the feedback that we're not going to commit it this CommitFest because we don't have consensus on it yet. That is a fair point, and I think Tom has said something similar. But it leaves open the question of who it is that is supposed to be implementing it. Is it the commit-fest manager who decides there is not sufficient consensus, or the author, or a self-assigned reviewer? It can be any of those. I know that I certainly would not rush into an ongoing a conversation, in which several of the participants have their commit-bits, and say I'm calling myself the reviewer and am calling it dead, please stop discussing this. Or even just, stop discussing it as an item for 9.3. Perhaps not, but you could certainly express an opinion, if you had one. If you express well-thought-out opinions a sufficient number of times, the idea is that (along with various other things) lead to you getting a commit bit, too. I think the role of the commit-fest manager is that of a traffic-cop, not a magistrate. But if we are going to have Commitfest II: The summary judgement, that needs to be run by a magistrate, as a separate process from the ordinary part of a commitfest. I found that when I could spend 20 or 30 hours a week just managing the CommitFest, I could do a fairly good job separating the stuff that had a real chance of being ready with a modest amount of work from the stuff that didn't. But that basically involved me understanding, in a fair degree of detail, every patch in the CommitFest. Once I got my commit bit, that became a lot less practical, because understanding every patch in the CommitFest broadly and some of them well enough to commit them added up to more work than I could do. Also, that style of CommitFest management involved me spending a lot of time arguing with people who didn't want their patch punted no matter how well-considered the arguments, and the novelty of that eventually wore off. To be fair, many people were very reasonable about the whole thing - but the holdouts could suck up a huge amount of time and emotional energy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] odd behavior in materialized view
Fujii Masao masao.fu...@gmail.com wrote: And I found another problem. When I ran the following SQLs in the master, PANIC error occurred in the standby. CREATE TABLE hoge (i int); INSERT INTO hoge VALUES (generate_series(1,100)); CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge; VACUUM ANALYZE; The PANIC error messages that I got in the standby are WARNING: page 0 of relation base/12297/16387 is uninitialized CONTEXT: xlog redo visible: rel 1663/12297/16387; blk 0 PANIC: WAL contains references to invalid pages CONTEXT: xlog redo visible: rel 1663/12297/16387; blk 0 base/12297/16387 is the file of the materialized view 'hogeview'. Yeah, that looks like it will be fixed by the fix for the first problem. The write of a first page without any rows to indicate that it is a scannable empty relation must be WAL-logged. I should have something later today. Thanks for spotting this. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized view assertion failure in HEAD
Joachim Wieland j...@mcknight.de wrote: I'm getting an assertion failure in HEAD with materialized views, see below for backtrace. To reproduce, just run make installcheck, dump the regression database and then restore it, the server crashes during restore. #2 0x00888429 in ExceptionalCondition ( conditionName=0xa0c840 !(!matviewRel-rd_rel-relhasoids), errorType=0xa0c78a FailedAssertion, fileName=0xa0c780 matview.c, lineNumber=135) at assert.c:54 Will investigate. You don't have default_with_oids = on, do you? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
On 5 March 2013 13:46, Pavel Stehule pavel.steh...@gmail.com wrote: 2013/3/5 Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp: Hello, I think that the only other behavioural glitch I spotted was that it fails to catch one overflow case, which should generate an out of ranger error: select format('|%*s|', -2147483648, 'foo'); Result: |foo| because -(-2147483648) = 0 in signed 32-bit integers. Ouch. Thanks for pointing out. fixed - next other overflow check added Your change shown below seems assuming that the two's complement of the most negative number in integer types is identical to itself, and looks working as expected at least on linux/x86_64. But C standard defines it as undefined, (As far as I hear :-). | if (width != 0) | { | int32 _width = -width; | | if (SAMESIGN(width, _width)) | ereport(ERROR, this pattern was used elsewhere in pg Instead, I think we can deny it by simply comparing with INT_MIN. I modified the patch like so and put some modifications on styling. ook - I have not enough expirience with this topic and I cannot say what is preferred. Finally, enlargeStringInfo fails just after for large numbers. We might should keep it under certain length to get rid of memory exhaustion. I though about it, but I don't know a correct value - probably any width specification higher 1MB will be bogus and can be blocked ?? Our VARLENA max size is 1GB so with should not be higher than 1GB ever. what do you thinking about these limits? I think it's fine as it is. It's no different from repeat() for example. We allow repeat('a', 10) so allowing format('%10s', '') seems reasonable, although probably not very useful. Upping either beyond 1GB generates an out of memory error, which also seems reasonable -- I can't imagine why you would want such a long string. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On Tue, Mar 5, 2013 at 10:35 PM, Michael Paquier michael.paqu...@gmail.com wrote: Thanks for the review. All your comments are addressed and updated patches are attached. I got the compile warnings: tuptoaster.c:1539: warning: format '%s' expects type 'char *', but argument 3 has type 'Oid' tuptoaster.c:1539: warning: too many arguments for format The patch doesn't handle the index on the materialized view correctly. =# CREATE TABLE hoge (i int); CREATE TABLE =# CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge; SELECT 0 =# CREATE INDEX hogeview_idx ON hogeview(i); CREATE INDEX =# REINDEX TABLE hogeview; REINDEX =# REINDEX TABLE CONCURRENTLY hogeview; NOTICE: table hogeview has no indexes REINDEX Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Floating point error
On 05.03.2013 15:59, Kevin Grittner wrote: Daniel Farinadan...@heroku.com wrote: This kind of change may have many practical problems that may make it un-pragmatic to alter at this time (considering the workaround is to set the extra float digits), but I can't quite grasp the rationale for well, the only program that cares about the most precision available is pg_dump. It seems like most programs would care just as much. Something to keep in mind is that when you store 0.01 into a double precision column, the precise value stored, when written in decimal, is: 0.0120816681711721685132943093776702880859375 Of course, some values can't be precisely written in decimal with so few digits. It would be nice to have a base-2 text format to represent floats. It wouldn't be as human-friendly as base-10, but it could be used when you don't want to lose precision. pg_dump in particular. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 5 March 2013 12:15, Kevin Grittner kgri...@ymail.com wrote: I don't think I disagree with any of what Simon says other than his feelings about the planning cost. Imagine that there are ten MVs that might apply to a complex query, but some of them are mutually exclusive, so there are a large number of permutations of MVs which could be used to replace parts of the original query. And maybe some of base tables have indexes which could reduce execution cost which aren't present in some or all of the MVs. And each MV has a number of indexes. The combinatorial explosion of possible plans would make it hard to constrain plan time without resorting to some crude rules about what to choose. That's not an unsolvable problem, but I see it as a pretty big problem. If we are proposing that we shouldn't try to optimise because its not usefully solvable, then I would disagree. If we are saying that more plans are possible with MVs, then I'd say, yes there *could* be - that's the one of the purposes. That represents more options for optimisation and we should be happy, not sad about that. Yes, we would need some thought to how those potential optimisations can be applied without additional planning cost, but I see that as a long term task, not a problem. The question is will the potential for additional planning time actually materialise into a planning problem? (See below). I have no doubt that someone could take a big data warehouse and add one or two MVs and show a dramatic improvement in the run time of a query. Almost as big as if the query were rewritten to usee the MV directly. It would make for a very nice presentation, and as long as they are used sparingly this could be a useful tool for a data warehouse environment which is playing with alternative ways to optimize slow queries which pass a lot of data. In other environments, I feel that it gets a lot harder to show a big win. Are there realistically going to be more options to consider? In practice, no, because in most cases we won't be considering both MVs and indexes. Splatting MVs on randomly won't show any more improvement than splatting indexes on randomly helps. Specific optimisations help in specific cases only. And of course, adding extra data structures that have no value certainly does increase planning time. Presumably we'd need some way of seeing how frequently MVs were picked, so we could drop unused MVs just like we can indexes. * Indexes are great at speeding up various kinds of search queries. If you don't have any queries like that, they help very little. * MVs help in specific and restricted use cases, but can otherwise be thought of as a new kind of index structure. MVs help with joins and aggregations, so if you don't do much of that, you'll see no benefit. That knowledge also allows us to develop heuristics for sane optimisation. If the MV has a GROUP BY in it, and a query doesn't, then it probably won't help much to improve query times. If it involves a join you aren't using, then that won't help either. My first guess would be that we don't even bother looking for MV plans unless it has an aggregated result, or a large scale join. We do something similar when we look for plans that use indexes when we have appropriate quals - no quals, no indexes. As a result, I don't see MVs increasing planning times for requests that would make little or no use of them. There will be more planning time on queries that could make use of them and that is good because we really care about that. Sure, a badly written planner might cost more time than it saves. All of this work requires investment from someone with the time and/or experience to make a good go at it. I'm not pushing Tom towards it, nor anyone else, but I do want to see the door kept open for someone to do this when possible (i.e. not GSoC). -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggested new CF status: Pending Discussion
On 4 March 2013 18:59, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sun, Mar 3, 2013 at 9:27 PM, Josh Berkus j...@agliodbs.com wrote: Except that the implication of waiting on author is that, if there's no updates in a couple weeks, we bounce it. And the author doesn't necessarily control a bikeshedding discussion about syntax, for example. That's true. I think, though, that the basic problem is that we've lost track of the ostensible purpose of a CommitFest, which is to commit the patches that *are already ready* for commit. Mumble. That's *part* of the purpose of a CF, but not all. It's also meant to be a time when people concentrate on reviewing patches, and surely discussions about syntax or whatever have to be part of that. I recall in fact that at the last developer meeting, there was discussion about trying to get people to do more formal reviewing of design ideas that hadn't even made it to the submittable-patch stage. So I feel it's counterproductive to try to narrow the concept of a CF to only ready to commit patches. Agreed. Ready to commit is a state, and everybody has a different view of the state of each patch. So we need a point where we all sync up and decide by some mechanism what the group's opinion of the state is and handle accordingly. Or put another way, I very much welcome the chance for feedback from others on my work, and appreciate the opportunity for review of others work. If we can commit at any time, then every discussion needs to be followed in detail otherwise we get you should have said that earlier repeatedly. By all stopping, having a cup of tea and deciding what we're happy with and then continue, it gives the opportunity for efficient thread scheduling of our work. Between CFs we can work mostly independently. People starting new discussions while we have a big patch queue is annoying, because it just makes the whole queue slow down and even less gets done in the long run. We need to look at good overall thruput, not continually interrupt each other, causing single threading and overall loss of efficiency. Same idea as if we had a meeting, it would be cool if everybody listened to the meeting, not took calls and then walked back in and repeat discussions that already happened. And overall, one long rolling discussion on hackers is the same thing as saying all developers spend all day every day in a meeting - we should recognise how unproductive that is and work out ways to avoid it. Process changes every year because the skills, capacities and requirements change each year. So change doesn't imply last thing was broken. The general question is how can we work efficiently together and still produce high quality software? But having said that, maybe the last CF of a cycle has to be treated more nearly as you suggest. Certainly if we hold off ending the CF in hopes of committing stuff that wasn't nearly ready to commit at its beginning, then we're back to bad old habits that seldom lead to anything but a late release. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
2013/3/5 Dean Rasheed dean.a.rash...@gmail.com: On 5 March 2013 13:46, Pavel Stehule pavel.steh...@gmail.com wrote: 2013/3/5 Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp: Hello, I think that the only other behavioural glitch I spotted was that it fails to catch one overflow case, which should generate an out of ranger error: select format('|%*s|', -2147483648, 'foo'); Result: |foo| because -(-2147483648) = 0 in signed 32-bit integers. Ouch. Thanks for pointing out. fixed - next other overflow check added Your change shown below seems assuming that the two's complement of the most negative number in integer types is identical to itself, and looks working as expected at least on linux/x86_64. But C standard defines it as undefined, (As far as I hear :-). | if (width != 0) | { | int32 _width = -width; | | if (SAMESIGN(width, _width)) | ereport(ERROR, this pattern was used elsewhere in pg Instead, I think we can deny it by simply comparing with INT_MIN. I modified the patch like so and put some modifications on styling. ook - I have not enough expirience with this topic and I cannot say what is preferred. Finally, enlargeStringInfo fails just after for large numbers. We might should keep it under certain length to get rid of memory exhaustion. I though about it, but I don't know a correct value - probably any width specification higher 1MB will be bogus and can be blocked ?? Our VARLENA max size is 1GB so with should not be higher than 1GB ever. what do you thinking about these limits? I think it's fine as it is. It's no different from repeat() for example. We allow repeat('a', 10) so allowing format('%10s', '') seems reasonable, although probably not very useful. Upping either beyond 1GB generates an out of memory error, which also seems reasonable -- I can't imagine why you would want such a long string. Regards, Dean ok Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: proposal: a width specification for s specifier (format function), fix behave when positional and ordered placeholders are used
Kyotaro HORIGUCHI escribió: Umm. sorry, If you have no problem with this, I'll send this to committer. I just found that this patch already has a revewer. I've seen only Status field in patch list.. Patches can be reviewed by more than one people. It's particularly useful if they have different things to say. So don't hesitate to review patches that have already been reviewed by other people. In fact, you can even review committed patches; it's not unlikely that you will be able to find bugs in those, too. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sql_drop Event Triggerg
Robert Haas escribió: On Mon, Mar 4, 2013 at 4:59 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Alvaro Herrera escribió: I think this is mostly ready to go in. I'll look at your docs, and unless there are more objections will commit later or early tomorrow. Actually it still needs a bit more work: the error messages in pg_event_trigger_dropped_object need to be reworked. It's a bit annoying that the function throws an error if the function is called in a CREATE command, rather than returning an empty set; or is it just me? That seems like a reasonable thing to change. I'm thinking just removing the check altogether, and if the list of objects dropped is empty, then the empty set is returned. That is, if you call the function directly in user-invoked SQL, you get empty; if you call the function in a CREATE event trigger function, you get empty. Since this makes the boolean drop_in_progress useless, it'd be removed as well. I do have a bit of concern about the save-and-restore logic for the dropped-object list -- it necessitates that the processing of every DDL command that can potentially drop objects be bracketed with a PG_TRY/PG_CATCH block. While that's relatively easy to guarantee today (but doesn't ALTER TABLE need similar handling?), it seems to me that it might get broken in the future. How hard would it be to pass this information up and down the call stack instead of doing it this way? This would be rather messy, I think; there are too many layers, and too many different functions. Or at least move the save/restore logic into something inside the deletion machinery itself so that new callers don't have to worry about it? Hmm, maybe this can be made to work, I'll see about it. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Floating point error
On Tue, Mar 5, 2013 at 12:03 AM, Albe Laurenz laurenz.a...@wien.gv.at wrote: I don't think that it is about looking nice. C doesn't promise you more than FLT_DIG or DBL_DIG digits of precision, so PostgreSQL cannot either. If you allow more, that would mean that if you store the same number on different platforms and query it, it might come out differently. Among other things, that would be a problem for the regression tests. Thank you: I think this is what I was missing, and what wasn't clear from the proposed doc patch. But then how can pg_dump assume that it's always safe to set extra_float_digits = 3? Why the discrepancy between default behavior and what pg_dump gets? It can't know whether the dump is to be restored into the same system or a different one (and AFAICT, there's not even an option to tweak extra_float_digits there). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On 2013-03-05 22:35:16 +0900, Michael Paquier wrote: Thanks for the review. All your comments are addressed and updated patches are attached. Please see below for the details, and if you find anything else just let me know. On Tue, Mar 5, 2013 at 6:27 PM, Andres Freund and...@2ndquadrant.comwrote: Have you benchmarked the toastrelidx removal stuff in any way? If not, thats fine, but if yes I'd be interested. No I haven't. Is it really that easily measurable? I think not, but me too I'd be interested in looking at such results. I don't think its really measurable, at least not for modifications. But istm that the onus to proof that to some degree is upon the patch. + if (toastrel-rd_indexvalid == 0) + RelationGetIndexList(toastrel); Hm, I think we should move this into a macro, this is cropping up at more and more places. This is not necessary. RelationGetIndexList does a check similar at its top, so I simply removed all those checks. Well, in some of those cases a function call might be noticeable (probably only in the toast fetch path). Thats why I suggested putting the above in a macro... + for (count = 0; count num_indexes; count++) + index_insert(toastidxs[count], t_values, t_isnull, + (toasttup-t_self), + toastrel, + toastidxs[count]-rd_index-indisunique ? + UNIQUE_CHECK_YES : UNIQUE_CHECK_NO); The indisunique check looks like a copy pasto to me, albeit not yours... Yes it is the same for all the indexes normally, but it looks more solid to me to do that as it is. So unchanged. Hm, if the toast indexes aren't unique anymore loads of stuff would be broken. Anyway, not your fault. + /* Obtain index list if necessary */ + if (toastRel1-rd_indexvalid == 0) + RelationGetIndexList(toastRel1); + if (toastRel2-rd_indexvalid == 0) + RelationGetIndexList(toastRel2); + + /* Check if the swap is possible for all the toast indexes */ So there's no error being thrown if this turns out not to be possible? There are no errors also in the former process... This should fail silently, no? Not sure what you mean by former process? So far I don't see any reason why it would be a good idea to fail silently. We end up with corrupt data if the swap is silently not performed. + if (count == 0) + snprintf(NewToastName, NAMEDATALEN, pg_toast_%u_index, + OIDOldHeap); + else + snprintf(NewToastName, NAMEDATALEN, pg_toast_%u_index_cct%d, + OIDOldHeap, count); + RenameRelationInternal(lfirst_oid(lc), + NewToastName); + count++; + } Hm. It seems wrong that this layer needs to know about _cct. Any other idea? For the time being I removed cct and added only a suffix based on the index number... Hm. It seems like throwing an error would be sufficient, that path is only entered for shared catalogs, right? Having multiple toast indexes would be a bug. + /* + * Index is considered as a constraint if it is PRIMARY KEY or EXCLUSION. + */ + isconstraint = indexRelation-rd_index-indisprimary || + indexRelation-rd_index-indisexclusion; unique constraints aren't mattering here? No they are not. Unique indexes are not counted as constraints in the case of index_create. Previous versions of the patch did that but there are issues with unique indexes using expressions. Hm. index_create's comment says: * isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION constraint There are unique indexes that are constraints and some that are not. Looking at -indisunique is not sufficient to determine whether its one or not. We probably should remove the fsm of the index altogether after this? The freespace map? Not sure it is necessary here. Isn't it going to be removed with the relation anyway? I had a thinko here, forgot what I said. I thought the freespacemap would be the one from the old index, but htats clearly bogus. Comes from writing reviews after having to leave home at 5 in the morning to catch a plane ;) +void +index_concurrent_drop(Oid indexOid) +{ + Oid constraintOid = get_index_constraint(indexOid); + ObjectAddress object; + Form_pg_index indexForm; + Relationpg_index; + HeapTuple indexTuple; + bool
Re: [HACKERS] [GENERAL] Floating point error
HL == Heikki Linnakangas hlinnakan...@vmware.com writes: HL It would be nice to have a base-2 text format to represent floats. HL It wouldn't be as human-friendly as base-10, but it could be used HL when you don't want to lose precision. pg_dump in particular. hexidecimal notation for floats exists. The printf format flag is %a for miniscule and %A for majuscule. The result of 1./3. is 0xa.aabp-5. This site has some info and a conversion demo: http://gregstoll.dyndns.org/~gregstoll/floattohex/ -JimC -- James Cloos cl...@jhcloos.com OpenPGP: 1024D/ED7DAEA6 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sql_drop Event Triggerg
Robert Haas escribió: Or at least move the save/restore logic into something inside the deletion machinery itself so that new callers don't have to worry about it? I don't think that works well; precisely the point of the initialize/finalize pair of functions is to bracket the drop so that the objects reported by the deletion machinery are stored in the right list. I tried this macro: /* * Wrap a code fragment that executes a command that may drop database objects, * so that the event trigger environment is appropriately setup. * * Note this macro will call EventTriggerDDLCommandEnd if the object type is * supported; caller must make sure to call EventTriggerDDLCommandStart by * itself. */ #define ExecuteDropCommand(isCompleteQuery, codeFragment, has_objtype, objtype) \ do { \ slist_head _save_objlist; \ bool_supported; \ \ _supported = has_objtype ? EventTriggerSupportsObjectType(objtype) : true; \ \ if (isCompleteQuery) \ { \ EventTriggerInitializeDrop(_save_objlist); \ } \ PG_TRY(); \ { \ codeFragment; \ if (isCompleteQuery _supported) \ { \ EventTriggerDDLCommandEnd(parsetree); \ } \ } \ PG_CATCH(); \ { \ if (isCompleteQuery _supported) \ { \ EventTriggerFinalizeDrop(_save_objlist); \ } \ PG_RE_THROW(); \ } \ PG_END_TRY(); \ EventTriggerFinalizeDrop(_save_objlist); \ } while (0) This looks nice in DropOwned: case T_DropOwnedStmt: if (isCompleteQuery) EventTriggerDDLCommandStart(parsetree); ExecuteDropCommand(isCompleteQuery, DropOwnedObjects((DropOwnedStmt *) parsetree), false, 0); break; And it works for DropStmt too: ExecuteDropCommand(isCompleteQuery, switch (stmt-removeType) { case OBJECT_INDEX: case OBJECT_TABLE: case OBJECT_SEQUENCE: case OBJECT_VIEW: case OBJECT_MATVIEW: case OBJECT_FOREIGN_TABLE: RemoveRelations((DropStmt *) parsetree); break; default: RemoveObjects((DropStmt *) parsetree); break; }, true, stmt-removeType); but a rather serious problem is that pgindent refuses to work completely on this (which is understandable, IMV). My editor doesn't like the braces inside something that looks like a function call, either. We use this pattern (a codeFragment being called by a macro) as ProcessMessageList in inval.c, but the code fragments there are much simpler. And in AlterTable, using the macro would be much uglier because the code fragment is more convoluted. I'm not really sure if it's worth having the above macro if the only caller is DropOwned. Hmm, maybe I should be considering a pair of macros instead -- UTILITY_START_DROP and UTILITY_END_DROP. I'll give this a try. Other ideas are welcome. FWIW, I noticed that the AlterTable case lacks a call to DDLCommandEnd; reporting that to Dimitri led to him noticing that DefineStmt also lacks one. This is a simple bug in the already-committed implementation which should probably be fixed separately from this patch. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enabling Checksums
Thank you for the review. On Tue, 2013-03-05 at 11:35 +0200, Heikki Linnakangas wrote: If you enable checksums, the free space map never gets updated in a standby. It will slowly drift to be completely out of sync with reality, which could lead to significant slowdown and bloat after failover. Will investigate. Since the checksums are an all-or-nothing cluster-wide setting, the three extra flags in the page header, PD_CHECKSUMS1, PD_CHECKSUM2 and PD_HEADERCHECK, are not needed. Let's leave them out. That keeps the code simpler, and leaves the bits free for future use. If we want to enable such per-page setting in the future, we can add it later. For a per-relation scheme, they're not needed. They don't really need to be there, I just put them there because it seemed wise if we ever want to allow online enabling/disabling of checksums. But I will remove them. How does the error detection rate of this compare with e.g CRC-16? Is there any ill effect from truncating the Fletcher sums like this? I don't recall if I published these results or not, but I loaded a table, and used pageinspect to get the checksums of the pages. I then did some various GROUP BY queries to see if I could find any clustering or stepping of the checksum values, and I could not. The distribution seemed very uniform across the 255^2 space. I tried to think of other problems, like missing errors in the high or low bits of a word or a page (similar to the issue with mod 256 described below), but I couldn't find any. I'm not enough of an expert to say more than that about the error detection rate. Fletcher is probably significantly faster than CRC-16, because I'm just doing int32 addition in a tight loop. Simon originally chose Fletcher, so perhaps he has more to say. That's a bit odd. We don't avoid zero in the WAL crc, and I don't recall seeing that in other checksum implementations either. 16-bits is not very wide for a checksum, and this eats about 1% of the space of valid values. I can see that it might be a handy debugging aid to avoid 0. But there's probably no need to avoid 0 in both bytes, it seems enough to avoid a completely zero return value. http://en.wikipedia.org/wiki/Fletcher%27s_checksum If you look at the section on Fletcher-16, it discusses the choice of the modulus. If we used 256, then an error anywhere except the lowest byte of a 4-byte word read from the page would be missed. Considering that I was using only 255 values anyway, I thought I might as well shift the values away from zero. We could get slightly better by using all combinations. I also considered chopping the 64-bit ints into 16-bit chunks and XORing them together. But when I saw the fact that we avoided zero with the other approach, I kind of liked it, and kept it. XLogCheckBuffer() and XLogCheckBufferNeedsBackup() read the page LSN without a lock. That's not atomic, so it could incorrectly determine that a page doesn't need to be backed up. We used to always hold an exclusive lock on the buffer when it's called, which prevents modifications to the LSN, but that's no longer the case. Will investigate, but it sounds like a buffer header lock will fix it. Shouldn't SetBufferCommitInfoNeedsSave() check the BM_PERMANENT flag? I think it will generate WAL records for unlogged tables as it is. Yes, thank you. Also, in FlushBuffer(), this patch moves the clearing of the BM_JUST_DIRTIED bit to before the WAL flush. That seems to expand the window during which a change to a page will prevent it from being marked clean. Do you see any performance problem with that? The alternative is to take the buffer header lock twice: once to get the LSN, then WAL flush, then another header lock to clear BM_JUST_DIRTIED. Not sure if that's better or worse. This goes back to Simon's patch, so he may have a comment here, as well. I'll post a new patch with these comments addressed, probably tomorrow so that I have some time to self-review and do some basic testing. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Floating point error
Maciek Sakrejda m.sakre...@gmail.com writes: Thank you: I think this is what I was missing, and what wasn't clear from the proposed doc patch. But then how can pg_dump assume that it's always safe to set extra_float_digits = 3? It's been proven (don't have a link handy, but the paper is at least a dozen years old) that 3 extra digits are sufficient to accurately reconstruct any IEEE single or double float value, given properly written conversion functions in libc. So that's where that number comes from. Now, if either end is not using IEEE floats, you may or may not get equivalent results --- but it's pretty hard to make any guarantees at all in such a case. Why the discrepancy between default behavior and what pg_dump gets? Basically, the default behavior is tuned to the expectations of people who think that what they put in is what they should get back, ie we don't want the system doing this by default: regression=# set extra_float_digits = 3; SET regression=# select 0.1::float4; float4 - 0.10001 (1 row) regression=# select 0.1::float8; float8 - 0.10001 (1 row) We would get a whole lot more bug reports, not fewer, if that were the default behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Floating point error
On Tue, Mar 5, 2013 at 10:23 AM, Tom Lane t...@sss.pgh.pa.us wrote: Why the discrepancy between default behavior and what pg_dump gets? Basically, the default behavior is tuned to the expectations of people who think that what they put in is what they should get back, ie we don't want the system doing this by default: regression=# set extra_float_digits = 3; SET regression=# select 0.1::float4; float4 - 0.10001 (1 row) regression=# select 0.1::float8; float8 - 0.10001 (1 row) We would get a whole lot more bug reports, not fewer, if that were the default behavior. Isn't this a client rendering issue, rather than an on-the-wire encoding issue? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Floating point error
Maciek Sakrejda m.sakre...@gmail.com writes: On Tue, Mar 5, 2013 at 10:23 AM, Tom Lane t...@sss.pgh.pa.us wrote: Basically, the default behavior is tuned to the expectations of people who think that what they put in is what they should get back, ie we don't want the system doing this by default: regression=# set extra_float_digits = 3; SET regression=# select 0.1::float4; float4 - 0.10001 (1 row) regression=# select 0.1::float8; float8 - 0.10001 (1 row) We would get a whole lot more bug reports, not fewer, if that were the default behavior. Isn't this a client rendering issue, rather than an on-the-wire encoding issue? Nope, at least not unless you ask for binary output format (which introduces a whole different set of portability gotchas, so it's not the default either). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transforms
On 13-03-03 08:13 PM, Josh Berkus wrote: This (creating the extensions) works fine for me on a Ubuntu 10.x system Now if only we can work out the combinatorics issue ... The plpython2u extensions worked fine but I haven't been able to get this to work with plpython3u (python 3.1). create extension hstore_plpython3u; ERROR: could not load library /usr/local/pgsql93git/lib/hstore_plpython3.so: /usr/local/pgsql93git/lib/hstore_plpython3.so: undefined symbol: _Py_NoneStruct Peter mentioned in the submission that the unit tests don't pass with python3, it doesn't work for meoutside the tests either. Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Floating point error
This conversation has moved beyond my ability to be useful but I want to remind everyone of my original issues in case it helps you improve the docs: 1) Data shown in psql did not match data retrieved by JDBC. I had to debug pretty deep into the JDBC code to confirm that a value I was staring at in psql was different in JDBC. Pretty weird, but I figured it had something to do with floating point malarky. 2) The problem in #1 could not be reproduced when running on our test database. Again very weird, because as far as psql was showing me the values in the two databases were identical. I used COPY to transfer some data from the production database to the test database. I now know that what you see in psql is not necessarily what you see in JDBC. I also know that you need to set extra_float_digits = 3 before using COPY to transfer data from one database to another or risk differences in floating point values. Sounds like both pg_dump and the JDBC driver must be doing this or its equivalent on their own. If the numeric types page of the documentation had mentioned the extra_float_digits then I might have been able to solve my own problem. I'd like you to add some mention of it even if it is just handwaving but will let you guys hash it out from here. Either way, PostgreSQL rocks! Tom On Mar 5, 2013, at 12:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maciek Sakrejda m.sakre...@gmail.com writes: On Tue, Mar 5, 2013 at 10:23 AM, Tom Lane t...@sss.pgh.pa.us wrote: Basically, the default behavior is tuned to the expectations of people who think that what they put in is what they should get back, ie we don't want the system doing this by default: regression=# set extra_float_digits = 3; SET regression=# select 0.1::float4; float4 - 0.10001 (1 row) regression=# select 0.1::float8; float8 - 0.10001 (1 row) We would get a whole lot more bug reports, not fewer, if that were the default behavior. Isn't this a client rendering issue, rather than an on-the-wire encoding issue? Nope, at least not unless you ask for binary output format (which introduces a whole different set of portability gotchas, so it's not the default either). regards, tom lane -- Tom Duffey tduf...@trillitech.com 414-751-0600 x102 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transforms
Peter mentioned in the submission that the unit tests don't pass with python3, it doesn't work for meoutside the tests either. Also, note that the feature is the concept of Transforms, not the individual transforms which Peter provides. The idea is to enable a new kind of extension. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Simon, Kevin, all: Actually, there was already an attempt at automated MV query planning as a prior university project. We could mine that for ideas. Hmmm. I thought it was on pgfoundry, but it's not. Does anyone have access to ACM databases etc. so they could search for this? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parameterized paths vs index clauses extracted from OR clauses
Robert Haas robertmh...@gmail.com writes: On Thu, Feb 28, 2013 at 3:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: Whichever way we go, the resulting patch is likely to be too large and invasive for me to feel terribly comfortable about back-patching it into 9.2. AFAICT this issue only arises for indexquals extracted out of larger OR conditions, so maybe it's not worth taking such a risk for. EnterpriseDB has received a number of complaints from our customers resulting from planner behavior changes which were back-patched; so I am not sanguine about back-patching unless the situation is pretty darn dire and the fix is pretty darn near certain to be an improvement in every case. Well, the point is not so much about whether it's an improvement as that 9.2's current behavior is a regression from 9.1 and earlier. People may not like changes in minor releases, but they don't like regressions either. A downside of this approach is that to preserve the same-number-of-rows assumption, we'd end up having to enforce the extracted clauses as filter clauses in parameterized paths, even if they'd not proved to be of any use as index quals. I'm not sure I fully grasp why this is a downside. Explain further? Because we'd be checking redundant clauses. You'd get something like Nested Loop Filter: (foo OR (bar AND baz)) ... some outer scan here ... Index Scan: Filter: (foo OR bar) If foo OR bar is useful as an indexqual condition in the inner scan, that's one thing. But if it isn't, the cycles expended to check it in the inner scan are possibly wasted, because we'll still have to check the full original OR clause later. It's possible that the filter condition removes enough rows from the inner scan's result to justify the redundant checks, but it's at least as possible that it doesn't. Since there's little point in using a paramaterized path in the first place unless it enables you to drastically reduce the number of rows being processed, I would anticipate that maybe the consequences aren't too bad, but I'm not sure. Yeah, we could hope that the inner scan is already producing few enough rows that it doesn't matter much. But I think that we'd end up checking the added qual even in a non-parameterized scan; there's no mechanism for pushing quals into the general qual lists and then retracting them later. (Hm, maybe what we need is a marker for enforce this clause only if you feel like it?) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reproducible Bus error in 9.2.3 during database dump restoration (Ubuntu Server 12.04 LTS)
Dmitry Koterov dmi...@koterov.ru wrote: LOG: server process (PID 18705) was terminated by signal 7: Bus error So far I have only heard of this sort of error when PostgreSQL is running in a virtual machine and the VM software is buggy. If you are not running in a VM, my next two suspects would be hardware/BIOS configuration issues, or an antivirus product. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On Tue, Mar 5, 2013 at 7:15 AM, Kevin Grittner kgri...@ymail.com wrote: I don't think I disagree with any of what Simon says other than his feelings about the planning cost. Imagine that there are ten MVs that might apply to a complex query, but some of them are mutually exclusive, so there are a large number of permutations of MVs which could be used to replace parts of the original query. And maybe some of base tables have indexes which could reduce execution cost which aren't present in some or all of the MVs. And each MV has a number of indexes. The combinatorial explosion of possible plans would make it hard to constrain plan time without resorting to some crude rules about what to choose. That's not an unsolvable problem, but I see it as a pretty big problem. I'm not sure I agree. Suppose you have a query like SELECT * FROM a INNER JOIN b ON a.x = b.x INNER JOIN c ON a.y = c.y WHERE some stuff. The query planner will construct paths for scans on a, b, and c. Then it will construct joinrels for (a b), (a c), (b c), and eventually (a b c) and calculate a set of promising paths for each of them. If there is a materialized view available for one of those joinrels, all we really need to do is add the possible paths for scanning that materialized view to the joinrel. They'll either be better than the existing paths, or they won't. If they're better, the paths that otherwise would have gotten chosen will get kicked out. If they're worse, the materialized-view paths will get kicked out. Either way, we don't have any more paths than we would have otherwise - so no combinatorial explosion. There is one case where we might end up with more paths than we had before. Suppose there's a materialized view on the query SELECT a.x, a.y, a.z, b.t FROM a INNER JOIN b ON a.x = b.x ORDER BY a.z and the users enters just that query. Suppose further that the materialized view has an index on column z, but table a does not. In that case, the best paths for the joinrel (a b) not involving the materialized view will be an unordered path, but we could scan the materialized view using the index and so will have a path that is ordered by a.z to add to the joinrel. This path will stick around even if it's more expensive than the unordered path because we know it avoids a final sort. So in that case we do have more paths, but they are potentially useful paths, so I don't see that as a problem. It seems to me that the tricky part of this is not that it might add a lot more paths (because I don't think it will, and if it does I think they're useful paths), but that figuring out whether or not a materialized view matches any particular joinrel might be expensive. I think we're going to need a pretty accurate heuristic for quickly discarding materialized views that can't possibly be relevant to the query as a whole, or to particular joinrels. There are a couple of possible ways to approach that. The most manual of these is probably to have a command like ALTER TABLE x {ENABLE | DISABLE } REWRITE MATERIALIZED y, where the user has to explicitly ask for materialized views to be considered, or else they aren't. That sort of fine-grained control might have other benefits as well. I think a more automated solution is also possible, if we want it. For a materialized view to match a query, all of the baserels in the materialized view must also be present in the query. (Actually, there are situations where this isn't true; e.g. the materialized view has an extra table, but it's joined in a way that could be pruned away by the join removal code, but I think we could ignore such somewhat-pathological cases at least initially.) It seems to me that if we could figure out a very-cheap way to throw away all of the materialized views that don't pass that basic test, we'd be reasonably close to a workable solution. A database with tons of materialized views defined on the same set of target relations might still have some planning time problems, but so might a database with tons of indexes. In that regard, what I was thinking about is to use something sort of like a Bloom filter. Essentially, produce a fingerprint for each materialized view. For the sake of argument, let's say that the fingerprint is a 64-bit integer, although it could be a bit string of any length. To construct the fingerprint, hash the OID of each relation involved in the view onto a bit position between 0 and 63. Set the bits for all relations involved in the materialized view. Then, construct a fingerprint for the query in the same way. Any materialized view where (matview_fingerprint query_fingerprint) != matview_fingerprint needn't be considered; moreover, for a given joinrel, any materialized view where matview_fingerprint != joinrel_fingerprint (computed using the same scheme) needn't be considered. Of course, a matching fingerprint doesn't mean that the materialized view matches, or even necessarily that it involves the correct
Re: [HACKERS] Materialized views WIP patch
Josh Berkus j...@agliodbs.com wrote: There is no shortage of literature on the topic, although any papers from the ACM could certainly be of interest due to the general high quality of papers published there. Adding anything like this to 9.3 is clearly out of the question, though, so I really don't want to spend time researching this now, or encouraging others to do so until after we have a 9.3 release candidate. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sql_drop Event Trigger
Alvaro Herrera escribió: Hmm, maybe I should be considering a pair of macros instead -- UTILITY_START_DROP and UTILITY_END_DROP. I'll give this a try. Other ideas are welcome. This seems to work. See attached; I like the result because there's no clutter and it supports all three cases without a problem. I also added a new output column to pg_event_trigger_dropped_objects, subobject_name, which is NULL except when a column is being dropped, in which case it contains the column name. Also, the object type column now says table column instead of table when dropping a column. Another question arose in testing: this reports dropping of temp objects, too, but of course not always: particularly not when temp objects are dropped at the end of a session (or the beginning of a session that reuses a previously used temp schema). I find this rather inconsistent and I wonder if we should instead suppress reporting of temp objects. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/event-trigger.sgml b/doc/src/sgml/event-trigger.sgml index 71241c8..1e67d86 100644 --- a/doc/src/sgml/event-trigger.sgml +++ b/doc/src/sgml/event-trigger.sgml @@ -36,8 +36,8 @@ The literalddl_command_start/ event occurs just before the execution of a literalCREATE/, literalALTER/, or literalDROP/ command. As an exception, however, this event does not occur for - DDL commands targeting shared objects - databases, roles, and tablespaces - - or for command targeting event triggers themselves. The event trigger + DDL commands targeting shared objects mdash; databases, roles, and tablespaces + mdash; or for command targeting event triggers themselves. The event trigger mechanism does not support these object types. literalddl_command_start/ also occurs just before the execution of a literalSELECT INTO/literal command, since this is equivalent to @@ -46,6 +46,16 @@ /para para +To list all objects that have been deleted as part of executing a +command, use the set returning +function literalpg_event_trigger_dropped_objects()/ from +your literalddl_command_end/ event trigger code (see +xref linkend=functions-event-triggers). Note that +the trigger is executed after the objects have been deleted from the +system catalogs, so it's not possible to look them up anymore. + /para + + para Event triggers (like other functions) cannot be executed in an aborted transaction. Thus, if a DDL command fails with an error, any associated literalddl_command_end/ triggers will not be executed. Conversely, @@ -433,6 +443,11 @@ entry align=centerliteralX/literal/entry /row row +entry align=leftliteralDROP OWNED/literal/entry +entry align=centerliteralX/literal/entry +entry align=centerliteralX/literal/entry + /row + row entry align=leftliteralDROP RULE/literal/entry entry align=centerliteralX/literal/entry entry align=centerliteralX/literal/entry diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 9b7e967..5721d1b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -15702,4 +15702,54 @@ FOR EACH ROW EXECUTE PROCEDURE suppress_redundant_updates_trigger(); xref linkend=SQL-CREATETRIGGER. /para /sect1 + + sect1 id=functions-event-triggers + titleEvent Trigger Functions/title + + indexterm + primarypg_event_trigger_dropped_objects/primary + /indexterm + + para +Currently productnamePostgreSQL/ provides one built-in event trigger +helper function, functionpg_event_trigger_dropped_objects/, which +lists all object dropped by the command in whose literalddl_command_end/ +event it is called. If the function is run in a context other than a +literalddl_command_end/ event trigger function, or if it's run in the +literalddl_command_end/ event of a command that does not drop objects, +it will return the empty set. +/para + + para +The functionpg_event_trigger_dropped_objects/ function can be used +in an event trigger like this: +programlisting +CREATE FUNCTION test_event_trigger_for_drops() +RETURNS event_trigger LANGUAGE plpgsql AS $$ +DECLARE +obj record; +BEGIN +FOR obj IN SELECT * FROM pg_event_trigger_dropped_objects() +LOOP +RAISE NOTICE '% dropped object: % %.% %', + tg_tag, + obj.object_type, + obj.schema_name, + obj.object_name, + obj.subobject_name; +END LOOP; +END +$$; +CREATE EVENT TRIGGER test_event_trigger_for_drops + ON ddl_command_end + EXECUTE PROCEDURE test_event_trigger_for_drops(); +/programlisting +/para + + para + For more information about event triggers, + see xref linkend=event-triggers. +
Re: [HACKERS] Materialized views WIP patch
Robert Haas robertmh...@gmail.com wrote: All that having been said, it's hard for me to imagine that anyone really cares about any of this until we have an incremental update feature, which right now we don't. These are actually independent of one another, as long as we nail down how we're determining freshness -- which is probably needed for either. Someone who's immersed in tuning long-running DW queries might be interested in this before incremental update. (They might load the data once per month, so refreshing the MVs as a step in that process might be cheaper than incrementally maintaining them.) Someone could base freshness on pg_relation_is_scannable() and start working on automatic query rewrite right now, if they wanted to. Actually, I'm betting that's going to be significantly harder than automatic-query-rewrite, when all is said and done. Automatic-query-rewrite, if and when we get it, will not be easy and will require a bunch of work from someone with a good understanding of the planner, but it strikes me as the sort of thing that might work out to one large project and then it's done. I still think we're going to hit the wall on planning time under certain circumstances and need to tweak that over the course of several releases, but now is not the time to get into the details of why I think that. We've spent way too much time on it already for the point we're at in the 9.3 cycle. I've kept my concerns hand-wavy on purpose, and am trying hard to resist the temptation to spend a lot of time demonstrating the problems. Whereas, incremental update sounds to me like a series of projects over a series of releases targeting various special cases, where we can always point to some improvements vs. release N-1 but we're never actually done Exactly. I predict that we will eventually have some special sort of trigger for maintaining MVs based on base table changes to handle the ones that are just too expensive (in developer time or run time) to fully automate. But there is a lot of low-hanging fruit for automation. Even a reasonably simplistic and partial implementation of incremental update will benefit a lot of users. Agreed. But in terms of relative difficulty, it's not at all obvious to me that that's the easier part of the project. I totally agree that getting something working to use MVs in place of underlying tables is not all that different or more difficult than using partial indexes. I just predict that we'll get a lot of complaints about cases where it results in worse performance and we'll need to deal with those issues. I don't seem that as being brain surgery; just a messy matter of trying to get this pretty theory to work well in the real world -- probably using a bunch of not-so-elegant heuristics. And in the end, the best you can hope for is performance not noticeably worse than you would get if you modified your query to explicitly use the MV(s) -- you're just saving yourself the rewrite. Well, OK, there is the point that, (like indexes) if you run the query which hits the base tables with different parameters, and a new plan is generated each time, it might pick different MVs or exclude them as is most efficient for the given parameters. That's the Holy Grail of all this. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
2013/3/5 Robert Haas robertmh...@gmail.com: All that having been said, it's hard for me to imagine that anyone really cares about any of this until we have an incremental update feature, which right now we don't. Actually, I'm betting that's going to be significantly harder than automatic-query-rewrite, when all is said and done. I agree. E.g., things such as keeping a matview consistent relative to changes applied to the base tables during the same transaction, might be mightily difficult to implement in a performant way. OTOH, matviews that can only be used for optimization if their base tables were not changed “too recently” (e.g., by transactions that are still in flight, including the current transaction), are probably kind of useful in themselves as long as those base tables are not updated all the time. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Robert Haas robertmh...@gmail.com writes: On Tue, Mar 5, 2013 at 7:15 AM, Kevin Grittner kgri...@ymail.com wrote: I don't think I disagree with any of what Simon says other than his feelings about the planning cost. I'm not sure I agree. Suppose you have a query like SELECT * FROM a INNER JOIN b ON a.x = b.x INNER JOIN c ON a.y = c.y WHERE some stuff. The query planner will construct paths for scans on a, b, and c. Then it will construct joinrels for (a b), (a c), (b c), and eventually (a b c) and calculate a set of promising paths for each of them. If there is a materialized view available for one of those joinrels, all we really need to do is add the possible paths for scanning that materialized view to the joinrel. That only works to the extent that a materialized view can be described by a path. My impression is that most of the use-cases for MVs will involve aggregates or similar data reduction operators, and we don't currently implement anything about aggregates at the Path level. Arguably it would be useful to do so; in particular, we could get rid of the currently hard-wired mechanism for choosing between sorted and hashed aggregation, and perhaps there'd be a less grotty way to deal with index-optimized MIN/MAX aggregates. But there's a great deal to do to make that happen, and up to now I haven't seen any indication that it would do much except add overhead. FWIW, my opinion is that doing anything like this in the planner is going to be enormously expensive. Index matching is already pretty expensive, and that has the saving grace that we only do it once per base relation. Your sketch above implies trying to match to MVs once per considered join relation, which will be combinatorially worse. Even with a lot of sweat over reducing the cost of the matching, it will hurt. All that having been said, it's hard for me to imagine that anyone really cares about any of this until we have an incremental update feature, which right now we don't. Agreed. Even if we're willing to have an approximate results are OK GUC (which frankly strikes me as a horrid idea), people would certainly not be willing to turn it on without some guarantee as to how stale the results could be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
2013/3/5 Kevin Grittner kgri...@ymail.com: Exactly. I predict that we will eventually have some special sort of trigger for maintaining MVs based on base table changes to handle the ones that are just too expensive (in developer time or run time) to fully automate. But there is a lot of low-hanging fruit for automation. I think it would be totally OK to restrict the possible definitions for matviews that can be maintained fully incrementally to something like: SELECT attributes and aggregations FROM trivial joins WHERE trivial condition GROUP BY attributes; Those definitions are the most useful for optimizing the things that matviews are good at (joins and aggregation). Nicolas PS. Sorry for having fired off this discussion that obviously doesn’t really relate to the current patch. -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Nicolas Barbier nicolas.barb...@gmail.com wrote: PS. Sorry for having fired off this discussion that obviously doesn’t really relate to the current patch. I know it's hard to resist. While I think there will be a number of people for whom the current patch will be a convenience and will therefore use it, it is hard to look at what's there and *not* go if only it also... Perhaps it would be worth looking for anything in the patch that you think might be painting us into a corner where it would be hard to do all the other cool things. While it's late enough in the process that changing anything like that which you find would be painful, it might be a lot more painful later if we release without doing something about it. My hope, of course, is that you won't find any such thing. With this patch I've tried to provide a minimal framework onto which these other things can be bolted. I've tried hard not to do anything which would make it hard to extend, but new eyes may see something I missed. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transforms
On 3/3/13 6:14 PM, Josh Berkus wrote: Currently Peter is punting (as is proper in a new patch) by having a separate extension for each combination (hstore/plperl, hstore/plpython, ltree/plpython, etc.). This is obviously not a maintainable approach in the long run. There is also a {Perl, Python, Ruby, etc.} binding for {libpq, libmysql, libpng, libyaml, libssl, libgmp, etc.}, each as a separately downloadable and buildable package. I don't think anyone has ever seriously considered a system by which if, say, you have Python and libyaml installed, pyyaml magically appears. Might be nice, but maybe not. The solution, in practice, is that you download pyyaml, and it pulls in any required dependencies. This would work the same way. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transforms
Peter, There is also a {Perl, Python, Ruby, etc.} binding for {libpq, libmysql, libpng, libyaml, libssl, libgmp, etc.}, each as a separately downloadable and buildable package. I don't think anyone has ever seriously considered a system by which if, say, you have Python and libyaml installed, pyyaml magically appears. Might be nice, but maybe not. The solution, in practice, is that you download pyyaml, and it pulls in any required dependencies. This would work the same way. That would be good too, although we don't currently have that capability; if I try to install hstore_plpython without plpython installed, it just errors out. Aside from that, what can we reasonably do for 9.3 to get this feature in? Maybe we add a transforms/ subdirectory of contrib, so that it can be as crowded as we want? Or put the transforms on PGXN for now? I want to see this feature go in so that the community starts writing transforms this year instead of next year. BTW, dependancies seem to be working OK for DROP: postgres=# drop extension plpythonu; ERROR: cannot drop extension plpythonu because other objects depend on it DETAIL: extension hstore_plpythonu depends on extension plpythonu function look_up_hstore(hstore,text) depends on transform for hstore language plpythonu extension ltree_plpythonu depends on extension plpythonu HINT: Use DROP ... CASCADE to drop the dependent objects too. STATEMENT: drop extension plpythonu; ERROR: cannot drop extension plpythonu because other objects depend on it DETAIL: extension hstore_plpythonu depends on extension plpythonu function look_up_hstore(hstore,text) depends on transform for hstore language plpythonu extension ltree_plpythonu depends on extension plpythonu HINT: Use DROP ... CASCADE to drop the dependent objects too. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
On 03/05/2013 01:09 PM, Kevin Grittner wrote: Josh Berkus j...@agliodbs.com wrote: There is no shortage of literature on the topic, although any papers from the ACM could certainly be of interest due to the general high quality of papers published there. Adding anything like this to 9.3 is clearly out of the question, though, so I really don't want to spend time researching this now, or encouraging others to do so until after we have a 9.3 release candidate. Good point. Just FYI: once we start work on 9.4, some university team got planner stuff for matviews working with postgres, using version 8.0 or something. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sql_drop Event Trigger
Okay, I added a couple of lines to skip reporting dropped temp schemas, and to skip any objects belonging to any temp schema (not just my own, note). Not posting a new version because the change is pretty trivial. Now, one last thing that comes up is what about objects that don't have straight names (such as pg_amop, pg_amproc, pg_default_acl etc etc), the only thing you get is a catalog OID and an object OID ... but they are pretty useless because by the time you get to the ddl_command_end trigger, the objects are gone from the catalog. Maybe we should report *something* about those. Say, perhaps the object description ... but if we want that, it should be untranslated (i.e. not just what getObjectDescription gives you, because that may be translated, so we would need to patch it so that it only translates if the caller requests it) Another example is reporting of functions: right now you get the function name .. but if there are overloaded functions there's no way to know wihch one was dropped. Example: alvherre=# create function f(int, int) returns int language sql as $$ select $1 + $2; $$; CREATE FUNCTION alvherre=# create function f(int, int, int) returns int language sql as $$ select $1 + $2 + $3; $$; CREATE FUNCTION alvherre=# drop function f(int, int); DROP FUNCTION alvherre=# select * from dropped_objects ; type | schema | object | subobj | curr_user | sess_user --++---++---+--- function | public | f || alvherre | alvherre Maybe we could use the subobject_name field (what you see as subobj above) to store the function signature (perhaps excluding the function name), for example. So you'd get object=f subobject=(int,int). Or maybe we should stash the whole function signature as name and leave subobject NULL. The reason I'm worrying about this is that it might be important for some use cases. For instance, replication cases probably don't care about that at all. But if we want to be able to use event triggers for auditing user activity, we need this info. Thoughts? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transforms
Peter, I'm still getting intermittent linking failures: postgres=# drop extension plperl cascade; NOTICE: drop cascades to extension hstore_plperl DROP EXTENSION postgres=# create extension plperl; CREATE EXTENSION postgres=# create extension hstore_plperl; ERROR: could not load library /home/josh/pg93/lib/postgresql/hstore_plperl.so: /home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol: hstoreUniquePairs STATEMENT: create extension hstore_plperl; ERROR: could not load library /home/josh/pg93/lib/postgresql/hstore_plperl.so: /home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol: hstoreUniquePairs postgres=# There appears to be something wonky which breaks when I've been running 9.2, shut it down, and fire up 9.3. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transforms
postgres=# create extension plperl; CREATE EXTENSION postgres=# create extension hstore_plperl; ERROR: could not load library /home/josh/pg93/lib/postgresql/hstore_plperl.so: /home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol: hstoreUniquePairs STATEMENT: create extension hstore_plperl; ERROR: could not load library /home/josh/pg93/lib/postgresql/hstore_plperl.so: /home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol: hstoreUniquePairs postgres=# There appears to be something wonky which breaks when I've been running 9.2, shut it down, and fire up 9.3. More on this: the problem appears to be that the symbols for hstore are loaded only if I've just just created the extension in that database: postgres=# create database plperlh postgres-# ; CREATE DATABASE postgres=# \c plperlh; You are now connected to database plperlh as user josh. plperlh=# create extension plperl; CREATE EXTENSION plperlh=# create extension hstore; CREATE EXTENSION plperlh=# create extension hstore_plperl; CREATE EXTENSION plperlh=# plperlh=# \c postgres You are now connected to database postgres as user josh. postgres=# create extension hstore_plperl; ERROR: could not load library /home/josh/pg93/lib/postgresql/hstore_plperl.so: /home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol: PL_thr_key STATEMENT: create extension hstore_plperl; ERROR: could not load library /home/josh/pg93/lib/postgresql/hstore_plperl.so: /home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol: PL_thr_key -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transforms
On 03/05/2013 02:52 PM, Josh Berkus wrote: plperlh=# \c postgres You are now connected to database postgres as user josh. postgres=# create extension hstore_plperl; ERROR: could not load library /home/josh/pg93/lib/postgresql/hstore_plperl.so: /home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol: PL_thr_key STATEMENT: create extension hstore_plperl; ERROR: could not load library /home/josh/pg93/lib/postgresql/hstore_plperl.so: /home/josh/pg93/lib/postgresql/hstore_plperl.so: undefined symbol: PL_thr_key What happens if you set your LD_LIBRARY_PATH to reflect each installation before you start the database? JD -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] transforms
What happens if you set your LD_LIBRARY_PATH to reflect each installation before you start the database? No change, at least not setting it to $PGHOME/lib. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for REINDEX CONCURRENTLY
On Tue, Mar 5, 2013 at 11:22 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Mar 5, 2013 at 10:35 PM, Michael Paquier michael.paqu...@gmail.com wrote: Thanks for the review. All your comments are addressed and updated patches are attached. I got the compile warnings: tuptoaster.c:1539: warning: format '%s' expects type 'char *', but argument 3 has type 'Oid' tuptoaster.c:1539: warning: too many arguments for format Fixed. Thanks for catching that. The patch doesn't handle the index on the materialized view correctly. Hehe... I didn't know that materialized views could have indexes... I fixed it, will send updated patch once I am done with Andres' comments. -- Michael
Re: [HACKERS] Materialized view assertion failure in HEAD
On Tue, Mar 5, 2013 at 9:11 AM, Kevin Grittner kgri...@ymail.com wrote: Will investigate. You don't have default_with_oids = on, do you? No, this was a default install with a default postgresql.conf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Writable foreign tables: how to identify rows
One of the core problems for a writable-foreign-tables feature is how to identify a previously-fetched row for UPDATE or DELETE actions. In an ordinary Postgres table, we use the ctid system column for that, but a remote table doesn't necessarily have such a thing. (Oracle has a rowid that acts a lot like our ctids, but I don't believe the concept is common in other RDBMSes.) Without any magic row identifier such as these, I suppose an FDW would need to insist on knowing the primary key for the remote table, so that it could update based on the values of the pkey column(s). The current writable-foreign-tables patch goes to great lengths to try to cater for magic row identifiers of unspecified data types; which is something I encouraged in the beginning, but now that I see the results I think the messiness-to-usefulness quotient is awfully low. Basically what it's doing is hacking the TupleDesc associated with a foreign table so that the descriptor (sometimes) includes extra columns. I don't think that's workable at all --- there are too many places that assume that relation TupleDescs match up with what's in the catalogs. I think if we're going to support magic row identifiers, they need to be actual system columns, complete with negative attnums and entries in pg_attribute. This would require FDWs to commit to the data type of a magic row identifier at foreign-table creation time, but that doesn't seem like much of a restriction: probably any one FDW would have only one possible way to handle a magic identifier. So I'm envisioning adding an FDW callback function that gets called at table creation and returns an indication of which system columns the foreign table should have, and then we actually make pg_attribute entries for those columns. For postgres_fdw, that would really be enough, since it could just cause a ctid column to be created with the usual definition. Then it could put the remote ctid into the usual t_self field in returned tuples. Supporting magic identifiers that aren't of the TID data type is considerably harder, mainly because it's not clear how heap_getsysattr() could know how to fetch the column value. I have some rough ideas about that, but I suggest that we might want to punt on that extension for the time being. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Writable foreign tables: how to identify rows
Tom, One of the core problems for a writable-foreign-tables feature is how to identify a previously-fetched row for UPDATE or DELETE actions. In an ordinary Postgres table, we use the ctid system column for that, but a remote table doesn't necessarily have such a thing. (Oracle has a rowid that acts a lot like our ctids, but I don't believe the concept is common in other RDBMSes.) Without any magic row identifier such as these, I suppose an FDW would need to insist on knowing the primary key for the remote table, so that it could update based on the values of the pkey column(s). Well, a good test case for this could be the various key-value stores, where the foreign row identifier (FRI) is an immutable string key of arbitrary size. If we can support that, there's a lot of datastores we can support, and even a rule of your target must have a single-column key would be tolerable for non-postgres FDWs for quite a while. I think if we're going to support magic row identifiers, they need to be actual system columns, complete with negative attnums and entries in pg_attribute. This would require FDWs to commit to the data type of a magic row identifier at foreign-table creation time, but that doesn't seem like much of a restriction: probably any one FDW would have only one possible way to handle a magic identifier. So I'm envisioning adding an FDW callback function that gets called at table creation and returns an indication of which system columns the foreign table should have, and then we actually make pg_attribute entries for those columns. Per the above, it would be good to allow the system column to also be a column which is exposed to the user. Supporting magic identifiers that aren't of the TID data type is considerably harder, mainly because it's not clear how heap_getsysattr() could know how to fetch the column value. I have some rough ideas about that, but I suggest that we might want to punt on that extension for the time being. Well, if it gets pgsql_fdw in, I'm for it. I'd always assumed that writeable non-postgres targets were going to be hard. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Writable foreign tables: how to identify rows
On Wed, Mar 6, 2013 at 6:00 AM, Tom Lane t...@sss.pgh.pa.us wrote: One of the core problems for a writable-foreign-tables feature is how to identify a previously-fetched row for UPDATE or DELETE actions. In an ordinary Postgres table, we use the ctid system column for that, but a remote table doesn't necessarily have such a thing. (Oracle has a rowid that acts a lot like our ctids, but I don't believe the concept is common in other RDBMSes.) Without any magic row identifier such as these, I suppose an FDW would need to insist on knowing the primary key for the remote table, so that it could update based on the values of the pkey column(s). The current writable-foreign-tables patch goes to great lengths to try to cater for magic row identifiers of unspecified data types; which is something I encouraged in the beginning, but now that I see the results I think the messiness-to-usefulness quotient is awfully low. Basically what it's doing is hacking the TupleDesc associated with a foreign table so that the descriptor (sometimes) includes extra columns. I don't think that's workable at all --- there are too many places that assume that relation TupleDescs match up with what's in the catalogs. I think if we're going to support magic row identifiers, they need to be actual system columns, complete with negative attnums and entries in pg_attribute. This would require FDWs to commit to the data type of a magic row identifier at foreign-table creation time, but that doesn't seem like much of a restriction: probably any one FDW would have only one possible way to handle a magic identifier. So I'm envisioning adding an FDW callback function that gets called at table creation and returns an indication of which system columns the foreign table should have, and then we actually make pg_attribute entries for those columns. For postgres_fdw, that would really be enough, since it could just cause a ctid column to be created with the usual definition. Then it could put the remote ctid into the usual t_self field in returned tuples. +1 for adding a new system attribute. We did something similar in Postgres-XC, though problem there was much simpler because we always knew that the remote FDW is a Postgres instance running the same version. So we added a new system column node_id which does not get any disk storage, but gets set during execution time depending on which node the tuple belongs to. The ctid system column of course is set to the remote ctid. In the context of postgres_fdw, I am not sure if we need an additional system column like a node_id. Would there be a possibility where tuples to-be-modified are coming from different foreign tables and at runtime we need to decide where to execute the UPDATE/DELETE operation ? If we start supporting inheritance involving foreign tables as child tables, this will become a reality. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Index Unqiueness
Hey I want to work towards the follwing feature in TODO list: Prevent index uniqueness checks when UPDATE does not modify the columnUniqueness (index) checks are done when updating a column even if the column is not modified by the UPDATE. However, HOT already short-circuits this in common cases, so more work might not be helpful. So I wanted to know if someone is already working on this. Thanks Abhinav Batra
Re: [HACKERS] Reproducible Bus error in 9.2.3 during database dump restoration (Ubuntu Server 12.04 LTS)
On Tue, Mar 5, 2013 at 3:04 PM, Kevin Grittner kgri...@ymail.com wrote: Dmitry Koterov dmi...@koterov.ru wrote: LOG: server process (PID 18705) was terminated by signal 7: Bus error So far I have only heard of this sort of error when PostgreSQL is running in a virtual machine and the VM software is buggy. If you are not running in a VM, my next two suspects would be hardware/BIOS configuration issues, or an antivirus product. for posterity, what's the hardware platform? software bus errors are more likely on non x86 hardware. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Materialized views WIP patch
Kevin Grittner kgri...@ymail.com wrote: REFRESH MATERIALIZED VIEW name [, ...] WITH [ NO ] DATA Given the short time, I left out the [, ...]. If people think that is important to get into this release, a follow-on patch might be possible. Barring objections, I will use the above and push tomorrow. I'm still working on docs, and the changes related to the syntax change are still only lightly tested, but as far as I know, all is complete except for the docs. I'm still working on those and expect to have them completed late today. I'm posting this patch to allow a chance for final review of the code changes before I push. Was the remaining work on docs done? I would like to test MVs and am waiting for the docs completed. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2.3 crashes during archive recovery
Hmm.. Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord(); Attempt patch records minRecoveryPoint. [crash recovery - record minRecoveryPoint in control file - archive recovery] I think that this is an original intention of Heikki's patch. It could be. Before that, my patch does not wake up hot standby because I didn't care of minRecoveryPoint in it:-p On the other hand, your patch fixes that point but ReadRecord runs on the false page data. However the wrong record on the false page can be identified as broken, It should be an undesiable behavior. If we want to show the final solution by ourselves, we need to make another patch to settle them all. Let me take further thought.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parameterized paths vs index clauses extracted from OR clauses
On Tue, Mar 5, 2013 at 3:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, the point is not so much about whether it's an improvement as that 9.2's current behavior is a regression from 9.1 and earlier. People may not like changes in minor releases, but they don't like regressions either. That's true, but I'm still worried that we're just moving the unhappiness around from one group of people to another group of people, and I don't have a lot of confidence about which group is larger. A downside of this approach is that to preserve the same-number-of-rows assumption, we'd end up having to enforce the extracted clauses as filter clauses in parameterized paths, even if they'd not proved to be of any use as index quals. I'm not sure I fully grasp why this is a downside. Explain further? Because we'd be checking redundant clauses. You'd get something like Nested Loop Filter: (foo OR (bar AND baz)) ... some outer scan here ... Index Scan: Filter: (foo OR bar) If foo OR bar is useful as an indexqual condition in the inner scan, that's one thing. But if it isn't, the cycles expended to check it in the inner scan are possibly wasted, because we'll still have to check the full original OR clause later. It's possible that the filter condition removes enough rows from the inner scan's result to justify the redundant checks, but it's at least as possible that it doesn't. Yeah, that's pretty unappealing. It probably doesn't matter much if foo is just a column reference, but what if it's an expensive function? For that matter, what if it's a volatile function that we can't execute twice without changing the results? Since there's little point in using a paramaterized path in the first place unless it enables you to drastically reduce the number of rows being processed, I would anticipate that maybe the consequences aren't too bad, but I'm not sure. Yeah, we could hope that the inner scan is already producing few enough rows that it doesn't matter much. But I think that we'd end up checking the added qual even in a non-parameterized scan; there's no mechanism for pushing quals into the general qual lists and then retracting them later. (Hm, maybe what we need is a marker for enforce this clause only if you feel like it?) Not sure I get the parenthesized bit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Request for vote to move forward with recovery.conf overhaul
On Sun, Mar 3, 2013 at 9:25 PM, Greg Smith g...@2ndquadrant.com wrote: -Allow a pg_ctl standby and pg_ctl recover command that work similarly to promote. This should slim down the work needed for the first replication setup people do. -Make it obvious when people try to use recovery.conf that it's not supported anymore. -Provide a migration path for tool authors strictly in the form of some documentation and error message hints. That was it as far as concessions to backward compatibility. The wrap-up I did started at http://www.postgresql.org/message-id/4ee91248.8010...@2ndquadrant.com and only had a few responses/controversy from there. Robert wrote a good summary: 1. Get rid of recovery.conf - error out if it is seen 2. For each parameter that was previously a recovery.conf parameter, make it a GUC 3. For the parameter that was does recovery.conf exist?, replace it with does standby.enabled exist?. All that works for me. I thought this stopped from there because no one went back to clean up Fujii's submission, which Simon and Michael have now put more time into. There is not much distance between it and the last update Michael sent. Here's the detailed notes from my original proposal, with updates to incorporate the main feedback I got then; note that much of this is documentation rather than code: -Creating a standby.enabled file puts the system into recovery mode. That feature needs to save some state, and making those decisions based on existence of a file is already a thing we do. Rather than emulating the rename to recovery.done that happens now, the server can just delete it, to keep from incorrectly returning to a state it's exited. A UI along the lines of the promote one, allowing pg_ctl standby, should fall out of here. This file can be relocated to the config directory, similarly to how the include directory looks for things. There was a concern that this would require write permissions that don't exist on systems that relocate configs, like Debian/Ubuntu. That doesn't look to be a real issue though. Here's a random Debian server showing the postgres user can write to all of those: $ ls -ld /etc/postgresql drwxr-xr-x 4 root root 4096 Jun 29 2012 /etc/postgresql $ ls -ld /etc/postgresql/9.1 drwxr-xr-x 3 postgres postgres 4096 Jul 1 2012 /etc/postgresql/9.1 $ ls -ld /etc/postgresql/9.1/main drwxr-xr-x 2 postgres postgres 4096 Mar 3 11:00 /etc/postgresql/9.1/main -A similar recovery.enabled file turns on PITR recovery. -It should be possible to copy a postgresql.conf file from master to standby and just use it. For example: --standby_mode = on: Ignored unless you've started the server with standby.enabled, won't bother the master if you include it. --primary_conninfo: This will look funny on the master showing it connecting to itself, but it will get ignored there too. -If startup finds a recovery.conf file where it used to live at, abort--someone is expecting the old behavior. Hint to RTFM or include a short migration guide right on the spot. That can have a nice section about how you might use the various postgresql.conf include* features if they want to continue managing those files separately. Example: rename what you used to make recovery.conf as replication.conf and use include_if_exists if you want to be able to rename it to recovery.done like before. Or drop it into a config/ directory (similarly to the proposal for SET PERSISTENT) where the rename to recovery.done will make it then skipped. (Only files ending with .conf are processed by include_dir) -Tools such as pgpool that want to write a simple configuration file, only touching the things that used to go into recovery.conf, can tell people to do the same trick. End their postgresql.conf with a call to \include_if_exists replication.conf as part of setup. While I don't like pushing problems toward tool vendors, as one I think validating if this has been done doesn't require the sort of fully GUC compatible parser people (rightly) want to avoid. A simple scan of the postgresql.conf looking for the recommended text at the front of a line could confirm whether that bit is there. And adding a single include_if_exists line to the end of the postgresql.conf is not a terrible edit job to consider pushing toward tools. None of this is any more complicated than the little search and replace job that initdb does right now. -If you want to do something special yourself to clean up when recovery finishes, perhaps to better emulate the old those settings go away implementation, there's already recovery_end_command available for that. Let's say you wanted to force the old name and did include_if_exists config/recovery.conf. Now you could do: recovery_end_command = 'rm -f /tmp/pgsql.trigger.5432 mv conf.d/recovery.conf conf.d/recovery.done' No argument with any of that, either. If that's what we're implementing, I'm on board. --
Re: [HACKERS] pg_ctl idempotent option
On Mon, 2013-01-14 at 06:37 -0500, Peter Eisentraut wrote: Here is a patch to add an option -I/--idempotent to pg_ctl, the result of which is that pg_ctl doesn't error on start or stop if the server is already running or already stopped. So apparently, pg_upgrade needs the existing behavior, so making the idempotent option the only behavior won't work. Therefore, I think this patch is still useful as originally presented. I've made one change that pg_ctl won't print any messages if the -I option is used and the server is already started/stopped. diff --git a/contrib/start-scripts/linux b/contrib/start-scripts/linux index b950cf5..03c6e50 100644 --- a/contrib/start-scripts/linux +++ b/contrib/start-scripts/linux @@ -84,17 +84,17 @@ case $1 in echo -n Starting PostgreSQL: test x$OOM_SCORE_ADJ != x echo $OOM_SCORE_ADJ /proc/self/oom_score_adj test x$OOM_ADJ != x echo $OOM_ADJ /proc/self/oom_adj - su - $PGUSER -c $DAEMON -D '$PGDATA' $PGLOG 21 + su - $PGUSER -c $DAEMON -I -D '$PGDATA' $PGLOG 21 echo ok ;; stop) echo -n Stopping PostgreSQL: - su - $PGUSER -c $PGCTL stop -D '$PGDATA' -s -m fast + su - $PGUSER -c $PGCTL stop -I -D '$PGDATA' -s -m fast echo ok ;; restart) echo -n Restarting PostgreSQL: - su - $PGUSER -c $PGCTL stop -D '$PGDATA' -s -m fast -w + su - $PGUSER -c $PGCTL stop -I -D '$PGDATA' -s -m fast -w test x$OOM_SCORE_ADJ != x echo $OOM_SCORE_ADJ /proc/self/oom_score_adj test x$OOM_ADJ != x echo $OOM_ADJ /proc/self/oom_adj su - $PGUSER -c $DAEMON -D '$PGDATA' $PGLOG 21 diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml index 3107514..549730d 100644 --- a/doc/src/sgml/ref/pg_ctl-ref.sgml +++ b/doc/src/sgml/ref/pg_ctl-ref.sgml @@ -39,6 +39,7 @@ arg choice=optoption-o/option replaceableoptions/replaceable/arg arg choice=optoption-p/option replaceablepath/replaceable/arg arg choice=optoption-c/option/arg + arg choice=optoption-I/option/arg /cmdsynopsis cmdsynopsis @@ -55,6 +56,7 @@ arg choice=plainoptioni[mmediate]/option/arg /group /arg + arg choice=optoption-I/option/arg /cmdsynopsis cmdsynopsis @@ -271,6 +273,25 @@ titleOptions/title /varlistentry varlistentry + termoption-I/option/term + termoption--idempotent/option/term + listitem + para +When used with the literalstart/literal or literalstop/literal +actions, return exit code 0 if the server is already running or +stopped, respectively. Otherwise, an error is raised and a nonzero +exit code is returned in these cases. + /para + + para +This option is useful for System-V-style init scripts, which require +the literalstart/literal and literalstop/literal actions to be +idempotent. + /para + /listitem + /varlistentry + + varlistentry termoption-l replaceable class=parameterfilename/replaceable/option/term termoption--log replaceable class=parameterfilename/replaceable/option/term listitem diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c index 7d5e168..f6b92d1 100644 --- a/src/bin/pg_ctl/pg_ctl.c +++ b/src/bin/pg_ctl/pg_ctl.c @@ -86,6 +86,7 @@ static char *pgdata_opt = NULL; static char *post_opts = NULL; static const char *progname; +static bool idempotent = false; static char *log_file = NULL; static char *exec_path = NULL; static char *register_servicename = PostgreSQL; /* FIXME: + version ID? */ @@ -774,9 +775,15 @@ { old_pid = get_pgpid(); if (old_pid != 0) - write_stderr(_(%s: another server might be running; - trying to start server anyway\n), - progname); + { + if (idempotent) +exit(0); + else + { +write_stderr(_(%s: another server might be running\n), progname); +exit(1); + } + } } read_post_opts(); @@ -860,6 +867,8 @@ if (pid == 0)/* no pid file */ { + if (idempotent) + exit(0); write_stderr(_(%s: PID file \%s\ does not exist\n), progname, pid_file); write_stderr(_(Is server running?\n)); exit(1); @@ -1764,9 +1773,9 @@ printf(_(%s is a utility to initialize, start, stop, or control a PostgreSQL server.\n\n), progname); printf(_(Usage:\n)); printf(_( %s init[db] [-D DATADIR] [-s] [-o \OPTIONS\]\n), progname); - printf(_( %s start [-w] [-t SECS] [-D DATADIR] [-s] [-l FILENAME] [-o \OPTIONS\]\n), progname); - printf(_( %s stop[-W] [-t SECS] [-D DATADIR] [-s] [-m SHUTDOWN-MODE]\n), progname); - printf(_( %s restart [-w] [-t SECS] [-D DATADIR] [-s] [-m SHUTDOWN-MODE]\n + printf(_( %s start [-w] [-t SECS] [-D DATADIR] [-s] [-I] [-l FILENAME] [-o \OPTIONS\]\n), progname); + printf(_( %s stop[-W] [-t SECS] [-D DATADIR] [-s] [-I] [-m SHUTDOWN-MODE]\n), progname); + printf(_( %s restart [-w] [-t SECS] [-D DATADIR] [-s] [-m SHUTDOWN-MODE]\n [-o \OPTIONS\]\n), progname); printf(_( %s reload [-D DATADIR] [-s]\n),
Re: [HACKERS] sql_drop Event Trigger
Alvaro Herrera escribió: Now, one last thing that comes up is what about objects that don't have straight names (such as pg_amop, pg_amproc, pg_default_acl etc etc), the only thing you get is a catalog OID and an object OID ... but they are pretty useless because by the time you get to the ddl_command_end trigger, the objects are gone from the catalog. Maybe we should report *something* about those. Here's another idea --- have three columns, type, schema (as in the current patch and as shown above), and a third one for object identity. For tables and other objects that have simple names, the identity would be their names. For columns, it'd be tablename.columnname. For functions, it'd be the complete signature. And so on. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Writable foreign tables: how to identify rows
On Wed, Mar 6, 2013 at 12:35 PM, Pavan Deolasee pavan.deola...@gmail.comwrote: +1 for adding a new system attribute. We did something similar in Postgres-XC, though problem there was much simpler because we always knew that the remote FDW is a Postgres instance running the same version. So we added a new system column node_id which does not get any disk storage, but gets set during execution time depending on which node the tuple belongs to. The ctid system column of course is set to the remote ctid. Just thinking aloud... This could also be merged in XC code and reduce XC code footprint on PG core. http://www.linkedin.com/in/pavandeolasee -- Michael
Re: [HACKERS] Parameterized paths vs index clauses extracted from OR clauses
Robert Haas robertmh...@gmail.com writes: On Tue, Mar 5, 2013 at 3:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: If foo OR bar is useful as an indexqual condition in the inner scan, that's one thing. But if it isn't, the cycles expended to check it in the inner scan are possibly wasted, because we'll still have to check the full original OR clause later. It's possible that the filter condition removes enough rows from the inner scan's result to justify the redundant checks, but it's at least as possible that it doesn't. Yeah, that's pretty unappealing. It probably doesn't matter much if foo is just a column reference, but what if it's an expensive function? For that matter, what if it's a volatile function that we can't execute twice without changing the results? Well, *that* worry at least is a nonissue: if the clause contains volatile functions then it's not a candidate to be an indexqual anyway. The code that pulls out these simplified OR clauses is only looking for clauses that can be shown to match existing indexes, so it won't pick anything like that. But expensive functions could be a hazard. (Hm, maybe what we need is a marker for enforce this clause only if you feel like it?) Not sure I get the parenthesized bit. I was thinking that we'd extract the simplified clause and then mark it as something to be used only if it can be used as an indexqual. However, on second thought that still leaves us with the problem that some parameterized paths yield more rows than others. Maybe that assumption simply has to go ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Writable foreign tables: how to identify rows
2013/3/6 Tom Lane t...@sss.pgh.pa.us: One of the core problems for a writable-foreign-tables feature is how to identify a previously-fetched row for UPDATE or DELETE actions. In an ordinary Postgres table, we use the ctid system column for that, but a remote table doesn't necessarily have such a thing. (Oracle has a rowid that acts a lot like our ctids, but I don't believe the concept is common in other RDBMSes.) Without any magic row identifier such as these, I suppose an FDW would need to insist on knowing the primary key for the remote table, so that it could update based on the values of the pkey column(s). The current writable-foreign-tables patch goes to great lengths to try to cater for magic row identifiers of unspecified data types; which is something I encouraged in the beginning, but now that I see the results I think the messiness-to-usefulness quotient is awfully low. Basically what it's doing is hacking the TupleDesc associated with a foreign table so that the descriptor (sometimes) includes extra columns. I don't think that's workable at all --- there are too many places that assume that relation TupleDescs match up with what's in the catalogs. I think if we're going to support magic row identifiers, they need to be actual system columns, complete with negative attnums and entries in pg_attribute. This would require FDWs to commit to the data type of a magic row identifier at foreign-table creation time, but that doesn't seem like much of a restriction: probably any one FDW would have only one possible way to handle a magic identifier. So I'm envisioning adding an FDW callback function that gets called at table creation and returns an indication of which system columns the foreign table should have, and then we actually make pg_attribute entries for those columns. For postgres_fdw, that would really be enough, since it could just cause a ctid column to be created with the usual definition. Then it could put the remote ctid into the usual t_self field in returned tuples. Supporting magic identifiers that aren't of the TID data type is considerably harder, mainly because it's not clear how heap_getsysattr() could know how to fetch the column value. I have some rough ideas about that, but I suggest that we might want to punt on that extension for the time being. Thoughts? Sorry, -1 for me. The proposed design tried to kill two birds with one stone. The reason why the new GetForeignRelWidth() can reserve multiple slot for pseudo-columns is, that intends to push-down complex calculation in target-list to the remote computing resource. For example, please assume the third target-entry below is very complex calculation, thus, it consumes much CPU cycles. SELECT x, y, (x-y)^2 from remote_table; FDW driver will able to replace it with just a reference to a pseudo-column that shall hold the calculation result of (x-y)^2 in remote side. It does not take a calculation in local side, it can assist CPU off-load. If we have a particular system attribute to carry remote rowid on foreign- table declaration time, it will loose opportunity of such kind of optimization. When we handle a query including sub-queries, it generates its TupleDesc in run-time without problems. I don't think we have no special reason that we cannot admit foreign-tables to adopt an alternative TupleDesc being constructed in run-time. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Request for vote to move forward with recovery.conf overhaul
Thanks for taking time in typing a complete summary of the situation. That really helps. On Mon, Mar 4, 2013 at 11:25 AM, Greg Smith g...@2ndquadrant.com wrote: On 1/23/13 6:36 AM, Michael Paquier wrote: The only problem I see is if the same parameter is defined in recovery.conf and postgresql.conf, is the priority given to recovery.conf? This sort of thing is what dragged down the past work on this. I don't really agree with the idea this thread is based on, that it was backward compatibility that kept this from being finished last year. I put a good bit of time into a proposal that a few people seemed happy with; all its ideas seem to have washed away. That balanced a couple of goals: -Allow a pg_ctl standby and pg_ctl recover command that work similarly to promote. This should slim down the work needed for the first replication setup people do. -Make it obvious when people try to use recovery.conf that it's not supported anymore. -Provide a migration path for tool authors strictly in the form of some documentation and error message hints. That was it as far as concessions to backward compatibility. The wrap-up I did started at http://www.postgresql.org/** message-id/4EE91248.8010505@**2ndQuadrant.comhttp://www.postgresql.org/message-id/4ee91248.8010...@2ndquadrant.comand only had a few responses/controversy from there. Robert wrote a good summary: 1. Get rid of recovery.conf - error out if it is seen 2. For each parameter that was previously a recovery.conf parameter, make it a GUC 3. For the parameter that was does recovery.conf exist?, replace it with does standby.enabled exist?. Agreed on that. I thought this stopped from there because no one went back to clean up Fujii's submission, which Simon and Michael have now put more time into. There is not much distance between it and the last update Michael sent. Just to be picky. I recommend using the version dated of 23rd of January as a work base if we definitely get rid of recovery.conf, the patch file is called 20130123_recovery_unite.patch. The second patch I sent on the 27th tried to support both recovery.conf and postgresql.conf, but this finishes with a lot of duplicated code as two sets of functions are necessary to deparse options for both postgresql.conf and recovery.conf... Here's the detailed notes from my original proposal, with updates to incorporate the main feedback I got then; note that much of this is documentation rather than code: -Creating a standby.enabled file puts the system into recovery mode. That feature needs to save some state, and making those decisions based on existence of a file is already a thing we do. Rather than emulating the rename to recovery.done that happens now, the server can just delete it, to keep from incorrectly returning to a state it's exited. A UI along the lines of the promote one, allowing pg_ctl standby, should fall out of here. This file can be relocated to the config directory, similarly to how the include directory looks for things. There was a concern that this would require write permissions that don't exist on systems that relocate configs, like Debian/Ubuntu. That doesn't look to be a real issue though. Here's a random Debian server showing the postgres user can write to all of those: $ ls -ld /etc/postgresql drwxr-xr-x 4 root root 4096 Jun 29 2012 /etc/postgresql $ ls -ld /etc/postgresql/9.1 drwxr-xr-x 3 postgres postgres 4096 Jul 1 2012 /etc/postgresql/9.1 $ ls -ld /etc/postgresql/9.1/main drwxr-xr-x 2 postgres postgres 4096 Mar 3 11:00 /etc/postgresql/9.1/main -A similar recovery.enabled file turns on PITR recovery. -It should be possible to copy a postgresql.conf file from master to standby and just use it. For example: --standby_mode = on: Ignored unless you've started the server with standby.enabled, won't bother the master if you include it. --primary_conninfo: This will look funny on the master showing it connecting to itself, but it will get ignored there too. -If startup finds a recovery.conf file where it used to live at, abort--someone is expecting the old behavior. Hint to RTFM or include a short migration guide right on the spot. That can have a nice section about how you might use the various postgresql.conf include* features if they want to continue managing those files separately. Example: rename what you used to make recovery.conf as replication.conf and use include_if_exists if you want to be able to rename it to recovery.done like before. Or drop it into a config/ directory (similarly to the proposal for SET PERSISTENT) where the rename to recovery.done will make it then skipped. (Only files ending with .conf are processed by include_dir) -Tools such as pgpool that want to write a simple configuration file, only touching the things that used to go into recovery.conf, can tell people to do the same trick. End their postgresql.conf with a call to
Re: [HACKERS] 9.2.3 crashes during archive recovery
Hi, I suppose the attached patch is close to the solution. I think that this is an original intention of Heikki's patch. I noticed that archive recovery will be turned on in next_record_is_invalid thanks to your patch. On the other hand, your patch fixes that point but ReadRecord runs on the false page data. However the wrong record on the false page can be identified as broken, It should be an undesiable behavior. All we should do to update minRecoveryPoint as needed when XLogPageRead is failed in ReadRecord is to simply jump to next_record_is_invalid if archive recovery is requested but doing crash recovery yet. Your modification in readTimeLineHistory and my modifictions in XLogPageRead seem not necessary for the objective in this thread, so removed. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 92adc4e..28d6f2e 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -4010,7 +4010,15 @@ ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt) retry: /* Read the page containing the record */ if (!XLogPageRead(RecPtr, emode, fetching_ckpt, randAccess)) + { + /* + * If archive recovery was requested when crash recovery failed, go to + * the label next_record_is_invalid to switch to archive recovery. + */ + if (!InArchiveRecovery ArchiveRecoveryRequested) + goto next_record_is_invalid; return NULL; + } pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf); targetRecOff = RecPtr-xrecoff % XLOG_BLCKSZ; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers