Re: [HACKERS] RLS Design
Sorry for my late responding, now I'm catching up the discussion. * Robert Haas (robertmh...@gmail.com) wrote: On Tue, Jul 1, 2014 at 3:20 PM, Dean Rasheed dean.a.rash...@gmail.com wrote: If RLS quals are instead regarded as constraints on access, and multiple policies apply, then it seems that the quals should now be combined with AND rather than OR, right? I do feel that RLS quals are constraints on access, but I don't see how it follows that multiple quals should be AND'd together because of that. I view the RLS policies on each table as being independent and standing alone regarding what can be seen. If you have access to a table today through policy A, and then later policy B is added, using AND would mean that the set of rows returned is less than if only policy A existed. That doesn't seem correct to me. It seems to me direction of the constraints (RLS-policy) works to is reverse. In case when we have no RLS-policy, 100% of rows are visible isn't it? Addition of a constraint usually reduces the number of rows being visible, or same number of rows at least. Constraint shall never work to the direction to increase the number of rows being visible. If multiple RLS-policies are connected with OR-operator, the first policy works to the direction to reduce number of visible rows, but the second policy works to the reverse direction. If we would have OR'd RLS-policy, how does it merged with user given qualifiers with? For example, if RLS-policy of t1 is (t1.credential get_user_credential) and user's query is: SELECT * FROM t1 WHERE t1.x = t1.x; Do you think RLS-policy shall be merged with OR'd form? Yeah, maybe. I intuitively feel that OR would be more useful, so it would be nice to find a design where that makes sense. But it depends a lot, in my view, on what syntax we end up with. For example, suppose we add just one command: ALTER TABLE table_name FILTER [ role_name | PUBLIC ] USING qual; If the given role inherits from multiple roles that have different filters, I think the user will naturally expect all of the filters to be applied. Agreed. But you could do it other ways. For example: ALTER TABLE table_name [ NO ] ROW LEVEL SECURITY; ALTER TABLE table_name GRANT ROW ACCESS TO role_name USING qual; If a table is set to NO ROW LEVEL SECURITY then it behaves just like it does now: anyone who accesses it sees all the rows, restricted to those columns for which they have permission. If the table is set to ROW LEVEL SECURITY then the default is to show no rows. The second command then allows access to a subset of the rows for a give role name. In this case, it is probably logical for access to be combined via OR. I can see value is having a table-level option to indicate if RLS is applied for that table or not, but I had been thinking we'd just automatically manage that. That is to say that once you define an RLS policy for a table, we go look and see what policy should be applied in each case. With the user able to control that, what happens if they say row security on the table and there are no policies? All access would show the table as empty? What if policies exist and they decide to 'turn off' RLS for the table- suddenly everyone can see all the rows? My answers to the above (which are making me like the idea more, actually...) would be: Yes, if they turn on RLS for the table and there aren't any policies, then the table appears empty for anyone with normal SELECT rights (table owner and superusers would still see everything). If policies exist and the user asks to turn off RLS, I'd throw an ERROR as there is a security risk there. We could support a CASCADE option which would go and drop the policies from the table first. Hmm... This approach starts from the empty permission then adds permission to reference a particular range of the configured table. It's one attitude. However, I think it has a dark side we cannot ignore. Usually, the purpose of security mechanism is to ensure which is readable/writable according to the rules. Once multiple RLS-policies are merged with OR'd form, its results are unpredicatable. Please assume here are two individual applications that use RLS on table-X. Even if application-1 want only rows being public become visible, it may expose credential or secret rows by interaction of orthogonal policy configured by application-2 (that may configure the policy according to the source ip-address). It seems to me application-2 partially invalidated the RLS-policy configured by application-1. I think, an important characteristic is things to be invisible is invisible even though multiple rules are configured. Otherwise, I'm generally liking Dean's thoughts in http://www.postgresql.org/message-id/CAEZATCVftksFH=X+9mVmBNMZo5KsUP+R k0kb4oro92jofjo...@mail.gmail.com along with the table-level enable RLS option. Are we getting to a point where there is
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
At 2014-07-04 14:38:27 +0900, masao.fu...@gmail.com wrote: But 0002-CompressBackupBlock_snappy_lz4_pglz-2.patch doesn't seem to be able to apply to HEAD cleanly. Yes, and it needs quite some reformatting beyond fixing whitespace damage too (long lines, comment formatting, consistent spacing etc.). -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issue while calling new PostgreSQL command from a Java Application
You may have to add code to copy inp_str to _copyVacuumStmt(). See how a character array being copied from other _copy* functions. On Fri, Jul 4, 2014 at 10:43 AM, Ashoke s.ash...@gmail.com wrote: Hi, -- I have defined a new command my_command in PostgreSQL. This command takes the path of ANALYZE and inside analyze.c, I have a function to do some operations if its my_command.This command takes the input arguments: table name, column name and an input string. my_command nation (n_nationkey) 'input string'; When I run this command from command line psql, it works as expected. But when I call the same command from a java application, the variable that stores the input string is NULL. I printed the value of the input string in gram.y file where I have defined my_command. fprintf (stderr, I am inside gram.y %s\n,n-inp_str); and the input string is printed correctly. But when I print stmt-inp_str in the function standard_ProcessUtility() of utility.c for the case T_VacuumStmt, I get the value as NULL. This is as far as I could trace back from analyze.c. I am not sure how executing the same command from an application can make a difference. gram.y content gist: -- MyStmt: my_keyword qualified_name name_list my_inp_str { VacuumStmt *n = makeNode(VacuumStmt); n-options = VACOPT_ANALYZE; n-freeze_min_age = -1; n-freeze_table_age = -1; n-relation = $2; n-va_cols = $3; n-inp_str = $4; fprintf (stderr, I am inside gram.y %s\n,n-inp_str); $$ = (Node *)n; }; char *inp_str is added to the struct VacuumStmt in parsenodes.h --- Only the newly added char *inp_str(that is different from ANALYZE) value is NULL. I was able to retrieve the column name from va_cols. Any help is appreciated. Thanks! -- Regards, Ashoke -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Issue while calling new PostgreSQL command from a Java Application
At 2014-07-04 10:43:12 +0530, s.ash...@gmail.com wrote: I am not sure how executing the same command from an application can make a difference. It shouldn't make any difference, of course. But since you're seeing the problem with new code, the overwhelming probability is that there's an error in the new code. That being the case, speculating about what might be going wrong without looking at the code in question is a waste of time. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Can simplify 'limit 1' with slow function?
Fascinating. On Fri, Jul 04, 2014 at 10:47:06AM +0800, gotoschool6g wrote: slow query(8531 ms): SELECT ST_Distance_Sphere(shape,ST_GeomFromText('POINT(116.41386186784513 40.12211338311868)')) FROM road order by id LIMIT 1; explain output: Limit (cost=4653.48..4653.48 rows=1 width=3612) - Sort (cost=4653.48..4683.06 rows=11832 width=3612) Sort Key: id - Seq Scan on road (cost=0.00..4594.32 rows=11832 width=3612) fast query(16ms): select ST_Distance_Sphere(shape,ST_GeomFromText('POINT(116.41386186784513 40.12211338311868)')) from (SELECT shape FROM road order by id LIMIT 1) a explain output: Subquery Scan on a (cost=1695.48..1695.74 rows=1 width=3608) - Limit (cost=1695.48..1695.48 rows=1 width=3612) - Sort (cost=1695.48..1725.06 rows=11832 width=3612) Sort Key: road.id - Seq Scan on road (cost=0.00..1636.32 rows=11832 width=3612) So Postgres knows perfectly well that it's expensive, it just doesn't appear to understand it has the option of moving the calculation above the limit. In this case though, it seems an index on road(id) would make it instant in any case. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
Re: [HACKERS] pg_xlogdump --stats
At 2014-06-30 05:19:10 +, dilip.ku...@huawei.com wrote: Please fix these issues and send the updated patch.. I will continue reviewing the patch.. Did you get anywhere with the updated patch? -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
On Wed, Jul 2, 2014 at 3:45 AM, Vik Fearing vik.fear...@dalibo.com wrote: On 06/29/2014 02:25 PM, Andres Freund wrote: On 2014-06-29 11:11:14 +0100, Thomas Munro wrote: On 29 June 2014 10:55, Andres Freund and...@2ndquadrant.com wrote: So, I'd looked at it with an eye towards committing it and found some more things. I've now * added the restriction that the cluster name can only be ASCII. It's shown from several clusters with differing encodings, and it's shown in process titles, so that seems wise. * moved everything to the LOGGING_WHAT category * added minimal documention to monitoring.sgml * expanded the config.sgml entry to mention the restriction on the name. * Changed the GUCs short description Thank you. I also think, but haven't done so, we should add a double colon after the cluster name, so it's not: postgres: server1 stats collector process but postgres: server1: stats collector process +1 Committed with the above changes. Thanks for the contribution! Is there a reason for not using this in synchronous_standby_names, perhaps falling back to application_name if not set? You mean that if synchronous_standby_names is an empty it automatically should be set to cluster_name? Or, you mean that if application_name is not set in primary_conninfo the standby should automatically use its cluster_name as application_name in primary_conninfo? I'm afraid that those may cause the trouble such as that standby is unexpectedly treated as synchronous one even though a user want to set up it as asynchronous one by emptying synchronous_standby_names in the master. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issue while calling new PostgreSQL command from a Java Application
Thank you Ashutosh*.* That was the issue. But, could you please explain why it worked from command line? On Fri, Jul 4, 2014 at 11:49 AM, Ashutosh Bapat ashutosh.ba...@enterprisedb.com wrote: You may have to add code to copy inp_str to _copyVacuumStmt(). See how a character array being copied from other _copy* functions. On Fri, Jul 4, 2014 at 10:43 AM, Ashoke s.ash...@gmail.com wrote: Hi, -- I have defined a new command my_command in PostgreSQL. This command takes the path of ANALYZE and inside analyze.c, I have a function to do some operations if its my_command.This command takes the input arguments: table name, column name and an input string. my_command nation (n_nationkey) 'input string'; When I run this command from command line psql, it works as expected. But when I call the same command from a java application, the variable that stores the input string is NULL. I printed the value of the input string in gram.y file where I have defined my_command. fprintf (stderr, I am inside gram.y %s\n,n-inp_str); and the input string is printed correctly. But when I print stmt-inp_str in the function standard_ProcessUtility() of utility.c for the case T_VacuumStmt, I get the value as NULL. This is as far as I could trace back from analyze.c. I am not sure how executing the same command from an application can make a difference. gram.y content gist: -- MyStmt: my_keyword qualified_name name_list my_inp_str { VacuumStmt *n = makeNode(VacuumStmt); n-options = VACOPT_ANALYZE; n-freeze_min_age = -1; n-freeze_table_age = -1; n-relation = $2; n-va_cols = $3; n-inp_str = $4; fprintf (stderr, I am inside gram.y %s\n,n-inp_str); $$ = (Node *)n; }; char *inp_str is added to the struct VacuumStmt in parsenodes.h --- Only the newly added char *inp_str(that is different from ANALYZE) value is NULL. I was able to retrieve the column name from va_cols. Any help is appreciated. Thanks! -- Regards, Ashoke -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Regards, Ashoke
Re: [HACKERS] Cluster name in ps output
On 07/04/2014 08:50 AM, Fujii Masao wrote: On Wed, Jul 2, 2014 at 3:45 AM, Vik Fearing vik.fear...@dalibo.com wrote: Is there a reason for not using this in synchronous_standby_names, perhaps falling back to application_name if not set? You mean that if synchronous_standby_names is an empty it automatically should be set to cluster_name? Or, you mean that if application_name is not set in primary_conninfo the standby should automatically use its cluster_name as application_name in primary_conninfo? I'm afraid that those may cause the trouble such as that standby is unexpectedly treated as synchronous one even though a user want to set up it as asynchronous one by emptying synchronous_standby_names in the master. No, I mean that synchronous_standby_names should look at cluster_name first, and if it's not set then unfortunately look at application_name for backward compatibility. Using application_name for this always seems like a hack to me, and cluster_name is a much better fit. We should have created cluster_name back when we created synchronous_standby_names. -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issue while calling new PostgreSQL command from a Java Application
On Fri, Jul 4, 2014 at 12:30 PM, Ashoke s.ash...@gmail.com wrote: Thank you Ashutosh*.* That was the issue. But, could you please explain why it worked from command line? I do not know. Any time we add a member to a node and find it's value coming out NULL or 0 instead of the one set, corresponding _copy* is the first suspect. You may be able find why it worked in command line and why not through the connector by breaking on copyObject() in either cases. On Fri, Jul 4, 2014 at 11:49 AM, Ashutosh Bapat ashutosh.ba...@enterprisedb.com wrote: You may have to add code to copy inp_str to _copyVacuumStmt(). See how a character array being copied from other _copy* functions. On Fri, Jul 4, 2014 at 10:43 AM, Ashoke s.ash...@gmail.com wrote: Hi, -- I have defined a new command my_command in PostgreSQL. This command takes the path of ANALYZE and inside analyze.c, I have a function to do some operations if its my_command.This command takes the input arguments: table name, column name and an input string. my_command nation (n_nationkey) 'input string'; When I run this command from command line psql, it works as expected. But when I call the same command from a java application, the variable that stores the input string is NULL. I printed the value of the input string in gram.y file where I have defined my_command. fprintf (stderr, I am inside gram.y %s\n,n-inp_str); and the input string is printed correctly. But when I print stmt-inp_str in the function standard_ProcessUtility() of utility.c for the case T_VacuumStmt, I get the value as NULL. This is as far as I could trace back from analyze.c. I am not sure how executing the same command from an application can make a difference. gram.y content gist: -- MyStmt: my_keyword qualified_name name_list my_inp_str { VacuumStmt *n = makeNode(VacuumStmt); n-options = VACOPT_ANALYZE; n-freeze_min_age = -1; n-freeze_table_age = -1; n-relation = $2; n-va_cols = $3; n-inp_str = $4; fprintf (stderr, I am inside gram.y %s\n,n-inp_str); $$ = (Node *)n; }; char *inp_str is added to the struct VacuumStmt in parsenodes.h --- Only the newly added char *inp_str(that is different from ANALYZE) value is NULL. I was able to retrieve the column name from va_cols. Any help is appreciated. Thanks! -- Regards, Ashoke -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Regards, Ashoke -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] pg_xlogdump --stats
On 04 July 2014 12:07, Abhijit Menon-Sen Wrote, -Original Message- From: Abhijit Menon-Sen [mailto:a...@2ndquadrant.com] Sent: 04 July 2014 12:07 To: Dilip kumar Cc: pgsql-hackers@postgresql.org; furu...@pm.nttdata.co.jp Subject: Re: [HACKERS] pg_xlogdump --stats At 2014-06-30 05:19:10 +, dilip.ku...@huawei.com wrote: Please fix these issues and send the updated patch.. I will continue reviewing the patch.. Did you get anywhere with the updated patch? Patch looks fine to me, except few small comments. 1. Update this new option in usage function also this still have the old way { -z, --stats[=record] } {stats, no_argument, NULL, 'z'}, {record-stats, no_argument, NULL, 'Z'}, 2. While applying stats-newopt.dif (after applying stat2.diff), some error in merging sgml file. patching file `doc/src/sgml/pg_xlogdump.sgml' Hunk #1 FAILED at 181. 1 out of 1 hunk FAILED -- saving rejects to doc/src/sgml/pg_xlogdump.sgml.rej Once you fix above erros, I think patch is ok from my side. Thanks Regards, Dilip Kumar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_xlogdump --stats
At 2014-07-04 08:38:17 +, dilip.ku...@huawei.com wrote: Once you fix above erros, I think patch is ok from my side. I've attached two updated patches here, including the fix to the usage message. I'll mark this ready for committer now. (The rm_identify patch is posted separately from the xlogdump one only to make the whole thing easier to follow.) Thank you. -- Abhijit diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c index c555786..239321f 100644 --- a/contrib/pg_xlogdump/pg_xlogdump.c +++ b/contrib/pg_xlogdump/pg_xlogdump.c @@ -15,12 +15,28 @@ #include dirent.h #include unistd.h +#include access/clog.h +#include access/gin_private.h +#include access/gist_private.h +#include access/heapam_xlog.h +#include access/heapam_xlog.h +#include access/multixact.h +#include access/nbtree.h +#include access/spgist_private.h +#include access/xact.h #include access/xlog.h #include access/xlogreader.h #include access/transam.h +#include catalog/pg_control.h +#include catalog/storage_xlog.h +#include commands/dbcommands.h +#include commands/sequence.h +#include commands/tablespace.h #include common/fe_memutils.h #include getopt_long.h #include rmgrdesc.h +#include storage/standby.h +#include utils/relmapper.h static const char *progname; @@ -41,6 +57,8 @@ typedef struct XLogDumpConfig int stop_after_records; int already_displayed_records; bool follow; + bool stats; + bool stats_per_record; /* filter options */ int filter_by_rmgr; @@ -48,6 +66,20 @@ typedef struct XLogDumpConfig bool filter_by_xid_enabled; } XLogDumpConfig; +typedef struct Stats +{ + uint64 count; + uint64 rec_len; + uint64 fpi_len; +} Stats; + +typedef struct XLogDumpStats +{ + uint64 count; + Stats rmgr_stats[RM_NEXT_ID]; + Stats record_stats[RM_NEXT_ID][16]; +} XLogDumpStats; + static void fatal_error(const char *fmt,...) __attribute__((format(PG_PRINTF_ATTRIBUTE, 1, 2))); @@ -322,6 +354,50 @@ XLogDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen, } /* + * Store per-rmgr and per-record statistics for a given record. + */ +static void +XLogDumpCountRecord(XLogDumpConfig *config, XLogDumpStats *stats, XLogRecPtr ReadRecPtr, XLogRecord *record) +{ + RmgrId rmid; + uint8 recid; + + if (config-filter_by_rmgr != -1 + config-filter_by_rmgr != record-xl_rmid) + return; + + if (config-filter_by_xid_enabled + config-filter_by_xid != record-xl_xid) + return; + + stats-count++; + + /* Update per-rmgr statistics */ + + rmid = record-xl_rmid; + + stats-rmgr_stats[rmid].count++; + stats-rmgr_stats[rmid].rec_len += + record-xl_len + SizeOfXLogRecord; + stats-rmgr_stats[rmid].fpi_len += + record-xl_tot_len - (record-xl_len + SizeOfXLogRecord); + + /* + * Update per-record statistics, where the record is identified by a + * combination of the RmgrId and the upper four bits of the xl_info + * field (to give sixteen possible entries per RmgrId). + */ + + recid = (record-xl_info ~XLR_INFO_MASK) 4; + + stats-record_stats[rmid][recid].count++; + stats-record_stats[rmid][recid].rec_len += + record-xl_len + SizeOfXLogRecord; + stats-record_stats[rmid][recid].fpi_len += + record-xl_tot_len - (record-xl_len + SizeOfXLogRecord); +} + +/* * Print a record to stdout */ static void @@ -380,6 +456,498 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogRecPtr ReadRecPtr, XLogRecord } } +/* + * Given an xl_rmid and the high bits of xl_info, returns a string + * describing the record type. + */ +static const char * +identify_record(RmgrId rmid, uint8 info) +{ + const RmgrDescData *desc = RmgrDescTable[rmid]; + const char *rec; + + rec = psprintf(0x%x, info); + + switch (rmid) + { + case RM_XLOG_ID: + switch (info) + { +case XLOG_CHECKPOINT_SHUTDOWN: + rec = CHECKPOINT_SHUTDOWN; + break; +case XLOG_CHECKPOINT_ONLINE: + rec = CHECKPOINT_ONLINE; + break; +case XLOG_NOOP: + rec = NOOP; + break; +case XLOG_NEXTOID: + rec = NEXTOID; + break; +case XLOG_SWITCH: + rec = SWITCH; + break; +case XLOG_BACKUP_END: + rec = BACKUP_END; + break; +case XLOG_PARAMETER_CHANGE: + rec = PARAMETER_CHANGE; + break; +case XLOG_RESTORE_POINT: + rec = RESTORE_POINT; + break; +case XLOG_FPW_CHANGE: + rec = FPW_CHANGE; + break; +case XLOG_END_OF_RECOVERY: + rec = END_OF_RECOVERY; + break; +case XLOG_FPI: + rec = FPI; + break; + } + break; + + case RM_XACT_ID: + switch (info) + { +case XLOG_XACT_COMMIT: + rec = COMMIT; + break; +case XLOG_XACT_PREPARE: + rec = PREPARE; + break; +case XLOG_XACT_ABORT: + rec = ABORT; + break; +case XLOG_XACT_COMMIT_PREPARED: + rec = COMMIT_PREPARED; + break; +case XLOG_XACT_ABORT_PREPARED: + rec = ABORT_PREPARED; + break; +case XLOG_XACT_ASSIGNMENT: + rec = ASSIGNMENT; + break; +case XLOG_XACT_COMMIT_COMPACT: +
Re: [HACKERS] pg_xlogdump --stats
On 2014-07-04 14:16:42 +0530, Abhijit Menon-Sen wrote: At 2014-07-04 08:38:17 +, dilip.ku...@huawei.com wrote: Once you fix above erros, I think patch is ok from my side. I've attached two updated patches here, including the fix to the usage message. I'll mark this ready for committer now. (The rm_identify patch is posted separately from the xlogdump one only to make the whole thing easier to follow.) I'm pretty sure that most committers would want to apply the rm_identity part in a separate commit first. Could you make it two patches, one introducing rm_identity, and another with xlogdump using it? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_xlogdump --stats
At 2014-07-04 10:54:08 +0200, and...@2ndquadrant.com wrote: I'm pretty sure that most committers would want to apply the rm_identity part in a separate commit first. Could you make it two patches, one introducing rm_identity, and another with xlogdump using it? Sure, attached. -- Abhijit diff --git a/src/backend/access/rmgrdesc/clogdesc.c b/src/backend/access/rmgrdesc/clogdesc.c index e82baa8..08f225d 100644 --- a/src/backend/access/rmgrdesc/clogdesc.c +++ b/src/backend/access/rmgrdesc/clogdesc.c @@ -40,3 +40,21 @@ clog_desc(StringInfo buf, XLogRecord *record) else appendStringInfoString(buf, UNKNOWN); } + +const char * +clog_identify(uint8 info) +{ + const char *id = NULL; + + switch (info) + { + case CLOG_ZEROPAGE: + id = ZEROPAGE; + break; + case CLOG_TRUNCATE: + id = TRUNCATE; + break; + } + + return id; +} diff --git a/src/backend/access/rmgrdesc/dbasedesc.c b/src/backend/access/rmgrdesc/dbasedesc.c index 0230716..38f3a39 100644 --- a/src/backend/access/rmgrdesc/dbasedesc.c +++ b/src/backend/access/rmgrdesc/dbasedesc.c @@ -42,3 +42,21 @@ dbase_desc(StringInfo buf, XLogRecord *record) else appendStringInfoString(buf, UNKNOWN); } + +const char * +dbase_identify(uint8 info) +{ + const char *id = NULL; + + switch (info) + { + case XLOG_DBASE_CREATE: + id = CREATE; + break; + case XLOG_DBASE_DROP: + id = DROP; + break; + } + + return id; +} diff --git a/src/backend/access/rmgrdesc/gindesc.c b/src/backend/access/rmgrdesc/gindesc.c index 12d68d7..115ad5b 100644 --- a/src/backend/access/rmgrdesc/gindesc.c +++ b/src/backend/access/rmgrdesc/gindesc.c @@ -187,3 +187,45 @@ gin_desc(StringInfo buf, XLogRecord *record) break; } } + +const char * +gin_identify(uint8 info) +{ + const char *id = NULL; + + switch (info) + { + case XLOG_GIN_CREATE_INDEX: + id = CREATE_INDEX; + break; + case XLOG_GIN_CREATE_PTREE: + id = CREATE_PTREE; + break; + case XLOG_GIN_INSERT: + id = INSERT; + break; + case XLOG_GIN_SPLIT: + id = SPLIT; + break; + case XLOG_GIN_VACUUM_PAGE: + id = VACUUM_PAGE; + break; + case XLOG_GIN_VACUUM_DATA_LEAF_PAGE: + id = VACUUM_DATA_LEAF_PAGE; + break; + case XLOG_GIN_DELETE_PAGE: + id = DELETE_PAGE; + break; + case XLOG_GIN_UPDATE_META_PAGE: + id = UPDATE_META_PAGE; + break; + case XLOG_GIN_INSERT_LISTPAGE: + id = INSERT_LISTPAGE; + break; + case XLOG_GIN_DELETE_LISTPAGE: + id = DELETE_LISTPAGE; + break; + } + + return id; +} diff --git a/src/backend/access/rmgrdesc/gistdesc.c b/src/backend/access/rmgrdesc/gistdesc.c index cdac496..e4f288b 100644 --- a/src/backend/access/rmgrdesc/gistdesc.c +++ b/src/backend/access/rmgrdesc/gistdesc.c @@ -67,3 +67,24 @@ gist_desc(StringInfo buf, XLogRecord *record) break; } } + +const char * +gist_identify(uint8 info) +{ + const char *id = NULL; + + switch (info) + { + case XLOG_GIST_PAGE_UPDATE: + id = PAGE_UPDATE; + break; + case XLOG_GIST_PAGE_SPLIT: + id = PAGE_SPLIT; + break; + case XLOG_GIST_CREATE_INDEX: + id = CREATE_INDEX; + break; + } + + return id; +} diff --git a/src/backend/access/rmgrdesc/hashdesc.c b/src/backend/access/rmgrdesc/hashdesc.c index 86a0376..c58461c 100644 --- a/src/backend/access/rmgrdesc/hashdesc.c +++ b/src/backend/access/rmgrdesc/hashdesc.c @@ -20,3 +20,9 @@ void hash_desc(StringInfo buf, XLogRecord *record) { } + +const char * +hash_identify(uint8 info) +{ + return NULL; +} diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c index 7df18fa..e6149be 100644 --- a/src/backend/access/rmgrdesc/heapdesc.c +++ b/src/backend/access/rmgrdesc/heapdesc.c @@ -202,3 +202,78 @@ heap2_desc(StringInfo buf, XLogRecord *record) else appendStringInfoString(buf, UNKNOWN); } + +const char * +heap_identify(uint8 info) +{ + const char *id = NULL; + + switch (info XLOG_HEAP_OPMASK) + { + case XLOG_HEAP_INSERT: + id = INSERT; + break; + case XLOG_HEAP_DELETE: + id = DELETE; + break; + case XLOG_HEAP_UPDATE: + id = UPDATE; + break; + case XLOG_HEAP_HOT_UPDATE: + id = HOT_UPDATE; + break; + case XLOG_HEAP_NEWPAGE: + id = NEWPAGE; + break; + case XLOG_HEAP_LOCK: + id = LOCK; + break; + case XLOG_HEAP_INPLACE: + id = INPLACE; + break; + } + + if (info XLOG_HEAP_INIT_PAGE) + id = psprintf(%s+INIT, id); + + return id; +} + +const char * +heap2_identify(uint8 info) +{ + const char *id = NULL; + + switch (info XLOG_HEAP_OPMASK) + { + case XLOG_HEAP2_CLEAN: + id = CLEAN; + break; + case XLOG_HEAP2_FREEZE_PAGE: + id = FREEZE_PAGE; + break; + case XLOG_HEAP2_CLEANUP_INFO: + id = CLEANUP_INFO; + break; + case XLOG_HEAP2_VISIBLE: + id = VISIBLE; + break; + case XLOG_HEAP2_MULTI_INSERT: + id = MULTI_INSERT; + break; + case XLOG_HEAP2_LOCK_UPDATED: + id = LOCK_UPDATED; + break; + case XLOG_HEAP2_NEW_CID: + id = NEW_CID; + break; + case XLOG_HEAP2_REWRITE: + id = REWRITE; + break; + } + + if (info XLOG_HEAP_INIT_PAGE) + id =
Re: [HACKERS] pg_xlogdump --stats
Hi, On 2014-07-04 14:42:03 +0530, Abhijit Menon-Sen wrote: Sure, attached. +const char * +heap_identify(uint8 info) +{ + const char *id = NULL; + + switch (info XLOG_HEAP_OPMASK) + { + case XLOG_HEAP_INSERT: + id = INSERT; + break; + case XLOG_HEAP_DELETE: + id = DELETE; + break; + case XLOG_HEAP_UPDATE: + id = UPDATE; + break; + case XLOG_HEAP_HOT_UPDATE: + id = HOT_UPDATE; + break; + case XLOG_HEAP_NEWPAGE: + id = NEWPAGE; + break; + case XLOG_HEAP_LOCK: + id = LOCK; + break; + case XLOG_HEAP_INPLACE: + id = INPLACE; + break; + } + + if (info XLOG_HEAP_INIT_PAGE) + id = psprintf(%s+INIT, id); + + return id; +} So we're leaking memory here? For --stats that could well be relevant... +/* + * Display a single row of record counts and sizes for an rmgr or record. + */ +static void +XLogDumpStatsRow(const char *name, + uint64 n, double n_pct, + uint64 rec_len, double rec_len_pct, + uint64 fpi_len, double fpi_len_pct, + uint64 total_len, double total_len_pct) +{ + printf(%-27s %20zu (%6.02f) %20zu (%6.02f) %20zu (%6.02f) %20zu (%6.02f)\n, +name, n, n_pct, rec_len, rec_len_pct, fpi_len, fpi_len_pct, +total_len, total_len_pct); +} This (and following locations) is going to break on 32bit platforms. %z indicates size_t, not 64bit. I think we're going to have to redefine the PGAC_FUNC_SNPRINTF_LONG_LONG_INT_FORMAT callsite in configure.in to define INT64_MODIFIER='ll/l/I64D' and then define #define INT64_FORMAT %CppAsString(INT64_MODIFIER)d #define UINT64_FORMAT %CppAsString(INT64_MODIFIER)u in c.h based on that, or something like i. This was written blindly, so it'd might need further work. Then you can use INT64_MODIFIER in you format strings. Won't be pretty, but at least it'd work... +static void +XLogDumpDisplayStats(XLogDumpConfig *config, XLogDumpStats *stats) +{ + /* + * 27 is strlen(Transaction/COMMIT_PREPARED), + * 20 is strlen(2^64), 8 is strlen((100.00%)) + */ It's far from guaranteed that 27 will always suffice. I guess it's ok because it doesn't cause bad breakage, but just some misalignment... + for (ri = 0; ri RM_NEXT_ID; ri++) + { + uint64 count, rec_len, fpi_len; + const RmgrDescData *desc = RmgrDescTable[ri]; + + if (!config-stats_per_record) + { + count = stats-rmgr_stats[ri].count; + rec_len = stats-rmgr_stats[ri].rec_len; + fpi_len = stats-rmgr_stats[ri].fpi_len; + + XLogDumpStatsRow(desc-rm_name, + count, 100*(double)count/total_count, + rec_len, 100*(double)rec_len/total_rec_len, + fpi_len, 100*(double)fpi_len/total_fpi_len, + rec_len+fpi_len, 100*(double)(rec_len+fpi_len)/total_len); + } Many missing spaces here. + else + { + for (rj = 0; rj 16; rj++) + { + const char *id; + + count = stats-record_stats[ri][rj].count; + rec_len = stats-record_stats[ri][rj].rec_len; + fpi_len = stats-record_stats[ri][rj].fpi_len; + + /* Skip undefined combinations and ones that didn't occur */ + if (count == 0) + continue; + + id = desc-rm_identify(rj 4); + if (id == NULL) + id = psprintf(0x%x, rj 4); + + XLogDumpStatsRow(psprintf(%s/%s, desc-rm_name, id), + count, 100*(double)count/total_count, + rec_len, 100*(double)rec_len/total_rec_len, + fpi_len, 100*(double)fpi_len/total_fpi_len, + rec_len+fpi_len, 100*(double)(rec_len+fpi_len)/total_len); + } + } + } Some additional leaking here.
Re: [HACKERS] WAL replay bugs
Hello, thank you for considering my comments, including somewhat impractical ones. I'll have a look at the latest patch sooner. At Fri, 4 Jul 2014 15:29:51 +0900, Michael Paquier michael.paqu...@gmail.com wrote in cab7npqt_fs_3jlnhywc6nzej4sbl6dgslfvcg0jbukgjep9...@mail.gmail.com OK, I have been working more on this patch, improving on-the-fly the following things on top of what Horiguchi-san has reported: - Moved sequence page opaque data to sequence.h, renaming it at the same time. - Improvement of page type identification, particularly for sequences using a correct opaque data structure. For gin the process is not that cool, but I guess that there is nothing much to do as it has no proper page identifier :( Year, there seems to be no choice than that. On Thu, Jul 3, 2014 at 7:34 PM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: = bufcapt.c: - buffer_capture_remember() or so. ... Yes, it is worth mentioning and a comment in bufcapt.h seems enough. - buffer_capture_forget() ... Yesh, this seems informative. - buffer_capture_is_changed() ... Hm, yes. This name looks better fine as it remains static within bufcapt.c. == bufmgr.c: - ConditionalLockBuffer() ... Fixed. - LockBuffer() ... lwlock.h: #define LWLockHoldedExclusive(l) ((l)-exclusive 0) lwlock.h: #define LWLockHoldedShared(l) ((l)-shared 0) I don't think that there is much to gain with such macros as of now LWLock-exclusive is only used in the code this patch introduces. Year, I think so, too:p That's simply for the case if you thought that. If there isn't any particular concern, 'XXX:' should be removed. Well yes. That's great. = bufpage.c: = bufcapt.h: - header comment The file name is misspelled as 'bufcaptr.h'. Nicely spotted. Thank you ;) - The includes in this header except for buf.h seem not to be necessary. Yep. = init_env.sh: - pg_get_test_port() It determines server port using PG_VERSION_NUM, but is it necessary? Addition to it, the theoretical maximum initial PGPORT seems to be 65535 but this script search for free port up to the number larger by 16 from the start, which would be beyond the edge. Hm, no. As of now, there is still some margin: PG_VERSION_NUM = 90500 PG_VERSION_NUM % 16384 + 49152 = 57732 Yes, it's practically no problem. I said about the theroretical max value seeing it without any preassumption about the value of PG_VERSION_NUM. There's in reality no problem before the PostgreSQL 9.82.88 comes, and 10.0.0 doesn't cause problem. So I'm not so dissapointed if it is not fixed. - pg_get_test_port() It stops with error after 16 times of failure, but the message complains only about the final attempt. If you want to mention the port numbers, it might should be 'port $PGPORTSTART-$PGPORT ..' or would be 'All 16 ports attempted failed' or something.. Hm. You could complain about pg_upgrade as well now for the same thing. But I guess that it doesn't hurt to complain back to caller about the range of ports already in use, so changed this way. Yes, this comment is also comes from a kind of fastidiousness. I'm satisified not to fixed if you think so. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()
On 3 July 2014 16:59, Simon Riggs si...@2ndquadrant.com wrote: I think we should say this though LockBufHdr(buf); valid = ((buf-flags BM_VALID) != 0); if (valid) PinBuffer_Locked(buf); else UnlockBufHdr(buf); since otherwise we would access the buffer flags without the spinlock and we would leak a pin if the buffer was not valid Ah right. That is essential.
Re: [HACKERS] WAL replay bugs
Hello, At Thu, 3 Jul 2014 14:48:50 +0900, Michael Paquier michael.paqu...@gmail.com wrote in cab7npqq2y3qkapasac6oxxqtwbvkkxcrftua0w+dx-j3i-l...@mail.gmail.com TODO ... Each type of page can be confirmed by the following way *if* its type is previously *hinted* except for gin. btree: 32bit magic at pd-opaque gin : No magic gist : 16-bit magic at ((GISTPageOpaque*)pd-opaque)-gist_page_id spgist : 16-bit magic at ((SpGistPageOpaque*)pd-opaque)-spgist_page_id hash : 16-bit magic at ((HashPageOpaque*)pd-paque)-hasho_page_id sequence : 16-bit magic at pd-opaque, the last 2 bytes of the page. # Is it comprehensive? and correct? Sequence pages use the last 4 bytes. Have a look at sequence_magic in sequence.c. For btree pages we can use the last 2 bytes and a check on MAX_BT_CYCLE_ID. For gin, I'll investigate if it is possible to add a identifier like GIN_PAGE_ID, it would make the page analysis more consistent with the others. I am not sure for what the 8 bytes allocated for the special area are used now for though. The majority is 16-bit magic at the TAIL of opaque struct. If we can unify them into , say, 16-bit magic at *(uint16*)(pd-opaque) the sizeof() labyrinth become stable and simple and other tools which should identify the type of pages will be happy. Do you think we can do that? Yes I think so. I'll raise a different thread though as this is a different problem that what this patch is targeting. I would even imagine a macro in bufpage.c able to handle that well. Ok, that being the case, this topic should be stashed and I'll look into there regardless of it. Thank you. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
Hello, At Tue, 1 Jul 2014 21:21:38 +0200, Martijn van Oosterhout klep...@svana.org wrote in 20140701192138.gb20...@svana.org On Tue, Jul 01, 2014 at 12:26:43PM +0900, Kyotaro HORIGUCHI wrote: 1. I think it's the case that there are platforms around where a signal won't cause send() to return EINTR and I'd be entirely unsurprised if SSL_write() doesn't necessarily return EINTR in that case. I'm not sure what, if anything, we can do about that. man 2 send on FreeBSD has not description about EINTR.. And even on linux, send won't return EINTR for most cases, at least I haven't seen that. So send()=-1,EINTR seems to me as only an equivalent of send() = 0. I have no idea about what the implementer thought the difference is. Whether send() returns EINTR or not depends on whether the signal has been marked restartable or not. This is configurable per signal, see sigaction(). If the signal is marked to restart, the kernel returns ERESTARTHAND (IIRC) and the libc will redo the call internally. Wow, thank you for detailed information. I'll study that and take it into future discussion. Default BSD does not return EINTR normally, but supports sigaction(). I guess it is for easiness-to-keep-compatibility, seems reasonable. have a nice day, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] No toast table for pg_shseclabel but for pg_seclabel
Hi, postgres=# SELECT oid::regclass, reltoastrelid FROM pg_class WHERE relname IN ('pg_seclabel', 'pg_shseclabel'); oid | reltoastrelid ---+--- pg_seclabel | 3598 pg_shseclabel | 0 (2 rows) Isn't that a somewhat odd choice? Why do we assume that there cannot be lengthy seclabels on shared objects? Granted, most shared objects aren't candidates for large amounts of data, but both users and databases don't seem to fall into that category. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping from blocked send() reprised.
Hello, thank you for keeping this discussion moving. I think there's no such a reasonable time. The behavior might should be determined from another point.. On alternative would be let pg_terminate_backend() have a parameter instructing force shutodwn (how to propagate it?), or make a forced shutdown on duplicate invocation of pg_terminate_backend(). Well, I think that when people call pg_terminate_backend() just once, they expect it to kill the target backend. I think people will tolerate a short delay, like a few seconds; after all, there's no guarantee, even today, that the backend will hit a CHECK_FOR_INTERRUPTS() in less than a few hundred milliseconds. Sure. But they are not going to want to have to take a second action to kill the backend - killing it once should be sufficient. Hmm, it sounds persuasive. Well, do you think they tolerate -force option? (Even though its technical practicality is not clear) regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gaussian distribution pgbench
Yea. I certainly disagree with the patch in it's current state because it copies the same 15 lines several times with a two word difference. Independent of whether we want those options, I don't think that's going to fly. I liked a simple static string for the different variants, which means replication. Factorizing out the (large) common part will mean malloc sprintf. Well, why not. OTOH, we've almost reached the consensus that supporting gaussian and exponential options in \setrandom. So I think that you should separate those two features into two patches, and we should apply the \setrandom one first. Then we can discuss whether the other patch should be applied or not. Sounds like a good plan. Sigh. I'll do that as it seems to be a blocker... The caveat that I have is that without these options there is: (1) no return about the actual distributions in the final summary, which depend on the threshold value, and (2) no included mean to test the feature, so the first patch is less meaningful if the feature cannot be used simply and require a custom script. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_xlogdump --stats
At 2014-07-04 11:34:21 +0200, and...@2ndquadrant.com wrote: So we're leaking memory here? For --stats that could well be relevant... Will fix (I think using a static buffer would be OK here). I think we're going to have to redefine the PGAC_FUNC_SNPRINTF_LONG_LONG_INT_FORMAT callsite in configure.in […] OK, will do. It's far from guaranteed that 27 will always suffice. I guess it's ok because it doesn't cause bad breakage, but just some misalignment... Yes, that was my thought too. I could measure the lengths of things and align columns dynamically, but I really doubt it's worth the effort in this case. Many missing spaces here. […] Some additional leaking here. Will fix both. Looks like that should at least partially have been in the other patch? Yes, sorry. Will fix. (I'll set this back to waiting on author. Thanks for having a look.) -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gaussian distribution pgbench
On 2014-07-04 11:59:23 +0200, Fabien COELHO wrote: Yea. I certainly disagree with the patch in it's current state because it copies the same 15 lines several times with a two word difference. Independent of whether we want those options, I don't think that's going to fly. I liked a simple static string for the different variants, which means replication. Factorizing out the (large) common part will mean malloc sprintf. Well, why not. It sucks from a maintenance POV. And I don't see the overhead of malloc being relevant here... OTOH, we've almost reached the consensus that supporting gaussian and exponential options in \setrandom. So I think that you should separate those two features into two patches, and we should apply the \setrandom one first. Then we can discuss whether the other patch should be applied or not. Sounds like a good plan. Sigh. I'll do that as it seems to be a blocker... I think we also need documentation about the actual mathematical behaviour of the randomness generators. + para + With the gaussian option, the larger the replaceablethreshold/, + the more frequently values close to the middle of the interval are drawn, + and the less frequently values close to the replaceablemin/ and + replaceablemax/ bounds. + In other worlds, the larger the replaceablethreshold/, + the narrower the access range around the middle. + the smaller the threshold, the smoother the access pattern + distribution. The minimum threshold is 2.0 for performance. + /para The only way to actually understand the distribution here is to create a table, insert random values, and then look at the result. That's not a good thing. The caveat that I have is that without these options there is: (1) no return about the actual distributions in the final summary, which depend on the threshold value, and (2) no included mean to test the feature, so the first patch is less meaningful if the feature cannot be used simply and require a custom script. I personally agree that we likely want that as an additional feature. Even if just because it makes the results easier to compare. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_xlogdump --stats
On 2014-07-04 15:34:14 +0530, Abhijit Menon-Sen wrote: At 2014-07-04 11:34:21 +0200, and...@2ndquadrant.com wrote: So we're leaking memory here? For --stats that could well be relevant... Will fix (I think using a static buffer would be OK here). In that place, yes. There there's several other places where that's probably isn't going to work. Probably makes sense to check memory usage after running it over a larger amount of WAL. It's far from guaranteed that 27 will always suffice. I guess it's ok because it doesn't cause bad breakage, but just some misalignment... Yes, that was my thought too. I could measure the lengths of things and align columns dynamically, but I really doubt it's worth the effort in this case. Nah. Too much work for no gain ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No toast table for pg_shseclabel but for pg_seclabel
On 2014-07-04 11:50:17 +0200, Andres Freund wrote: Hi, postgres=# SELECT oid::regclass, reltoastrelid FROM pg_class WHERE relname IN ('pg_seclabel', 'pg_shseclabel'); oid | reltoastrelid ---+--- pg_seclabel | 3598 pg_shseclabel | 0 (2 rows) Isn't that a somewhat odd choice? Why do we assume that there cannot be lengthy seclabels on shared objects? Granted, most shared objects aren't candidates for large amounts of data, but both users and databases don't seem to fall into that category. Hm. It seems they were explicitly removed around http://archives.postgresql.org/message-id/1309888389-sup-3853%40alvh.no-ip.org I don't understand the reasoning there. There's a toast table for non-shared objects. Why would we expect less data for the shared ones, even though they're pretty basic objects and more likely to be used to store policies and such? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_xlogdump --stats
I'm pretty sure that most committers would want to apply the rm_identity part in a separate commit first. Could you make it two patches, one introducing rm_identity, and another with xlogdump using it? Carp the code: const char * clog_identify(uint8 info) { switch (info) { case CLOG_ZEROPAGE: return ZEROPAGE; case CLOG_TRUNCATE: return TRUNCATE; break; } return NULL; } or const char * clog_identify(uint8 info) { if(info==CLOG_ZEROPAGE)return ZEROPAGE; if(info==CLOG_TRUNCATE)return TRUNCATE; return NULL; } is a bit faster than: const char * clog_identify(uint8 info) { const char *id = NULL; switch (info) { case CLOG_ZEROPAGE: id = ZEROPAGE; break; case CLOG_TRUNCATE: id = TRUNCATE; break; } return id; }
Re: [HACKERS] pg_xlogdump --stats
On 2014-07-04 18:31:34 +0800, gotoschool6g wrote: I'm pretty sure that most committers would want to apply the rm_identity part in a separate commit first. Could you make it two patches, one introducing rm_identity, and another with xlogdump using it? Carp the code: const char * clog_identify(uint8 info) { switch (info) { case CLOG_ZEROPAGE: return ZEROPAGE; case CLOG_TRUNCATE: return TRUNCATE; break; } return NULL; } or const char * clog_identify(uint8 info) { if(info==CLOG_ZEROPAGE)return ZEROPAGE; if(info==CLOG_TRUNCATE)return TRUNCATE; return NULL; } is a bit faster than: Any halfway decent compiler will not use a local variable here. Don't think that matters much. Also the code isn't a performance bottleneck... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_receivexlog add synchronous mode
Thanks for reviewing the patch! I think that this refactoring patch is useful for improving source code readability and making the future patches simpler, whether we adopt your patch or not. So, barring any objections, I'm thinking to commit this refactoring patch. Committed! It is a patch that added the --fsync-interval option. Interface of --fsync-interval option was referring to the pg_recvlogical.c. It is not judgement the flush on a per-message basis. It is judgment at the time of receipt stop of the message. If you specify a zero --fsync-interval make the flush at the same timing as the walreceiver . If you do not specify --fsync-interval, you will flush only when switching as WAL files as in the past. Regards, -- Furuya Osamu pg_receivexlog-add-fsync-interval-v1.patch Description: pg_receivexlog-add-fsync-interval-v1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] log_error_verbosity and unexpected errors
On 02/07/14 22:10, Tom Lane wrote: Greg Stark st...@mit.edu writes: I think log_error_verbosity is a strange variable. It's useless for expected user-facing errors but essential for unexpected errors that indicate bugs in the code -- and you can only have it on for everything or off for everything. I'm finding I usually want it set to 'verbose' for anything that PANICs or is generated by an elog() but it's just noise for anything generated by an ereport() and is ERROR or below. [...] [ thinks for a bit... ] A slightly cleaner approach is to nominate a specified set of SQLSTATEs, certainly including XX000 and perhaps some others, as being ones that force verbose reporting. That would have the same practical effect as far as elogs go, but wouldn't break the nominal functional equivalence. And that brings up the previous work on SQLSTATE-dependent choices about whether to log at all. I remember a patch was submitted for that but don't remember offhand why it didn't get committed. ISTM we should think about reviving that and making the choice be not just log or not, but no log, terse log, normal log, verbose log. I had a patch for making log_min_error_statement configurable per SQLSTATE in https://commitfest.postgresql.org/action/patch_view?id=1360 but you pointed out various issues in it and I didn't have time to update it for 9.4. I'm going to rewrite it based on the comments and submit it again for a 9.5 commitfest. The same mechanism could be used to set verbosity per SQLSTATE. / Oskari -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_xlogdump --stats
At 2014-07-04 11:34:21 +0200, and...@2ndquadrant.com wrote: I think we're going to have to redefine the PGAC_FUNC_SNPRINTF_LONG_LONG_INT_FORMAT callsite in configure.in to define INT64_MODIFIER='ll/l/I64D' I've attached a patch to do this, and also add INT64_MODIFIER to pg_config.h.in: 0001-modifier.diff I reran autoconf, and just for convenience I've attached the resulting changes to configure: 0002-configure.diff Then there are the rm_identify changes: 0003-rmid.diff Finally, the xlogdump patch using INT64_MODIFIER: 0004-xlogdump.diff I can confirm that this series applies in-order to master, and that the result builds cleanly (including after each patch) on my machine, and that the resulting pg_xlogdump works as expected. NOTE: I do not know what to do about pg_config.h.win32. If someone tells me what to do, I can submit another patch. Some additional leaking here. Two of the extra calls to psprintf in pg_xlogdump happen at most RM_MAX_ID*16 (i.e. O(record-types) not O(records)) times, and the other two happen just before exit. It would be easy to use a static buffer and snprintf, but I don't think it's worth doing in this case. -- Abhijit, hoping with crossed fingers to not forget attachments now. diff --git a/config/c-library.m4 b/config/c-library.m4 index 8f45010..4821a61 100644 --- a/config/c-library.m4 +++ b/config/c-library.m4 @@ -221,22 +221,22 @@ HAVE_POSIX_SIGNALS=$pgac_cv_func_posix_signals AC_SUBST(HAVE_POSIX_SIGNALS)])# PGAC_FUNC_POSIX_SIGNALS -# PGAC_FUNC_SNPRINTF_LONG_LONG_INT_FORMAT +# PGAC_FUNC_SNPRINTF_LONG_LONG_INT_MODIFIER # --- -# Determine which format snprintf uses for long long int. We handle -# %lld, %qd, %I64d. The result is in shell variable -# LONG_LONG_INT_FORMAT. +# Determine which length modifier snprintf uses for long long int. We +# handle ll, q, and I64. The result is in shell variable +# LONG_LONG_INT_MODIFIER. # # MinGW uses '%I64d', though gcc throws an warning with -Wall, # while '%lld' doesn't generate a warning, but doesn't work. # -AC_DEFUN([PGAC_FUNC_SNPRINTF_LONG_LONG_INT_FORMAT], -[AC_MSG_CHECKING([snprintf format for long long int]) -AC_CACHE_VAL(pgac_cv_snprintf_long_long_int_format, -[for pgac_format in '%lld' '%qd' '%I64d'; do +AC_DEFUN([PGAC_FUNC_SNPRINTF_LONG_LONG_INT_MODIFIER], +[AC_MSG_CHECKING([snprintf length modifier for long long int]) +AC_CACHE_VAL(pgac_cv_snprintf_long_long_int_modifier, +[for pgac_modifier in 'll' 'q' 'I64'; do AC_TRY_RUN([#include stdio.h typedef long long int ac_int64; -#define INT64_FORMAT $pgac_format +#define INT64_FORMAT %${pgac_modifier}d ac_int64 a = 2001; ac_int64 b = 4005; @@ -258,19 +258,19 @@ int does_int64_snprintf_work() main() { exit(! does_int64_snprintf_work()); }], -[pgac_cv_snprintf_long_long_int_format=$pgac_format; break], +[pgac_cv_snprintf_long_long_int_modifier=$pgac_modifier; break], [], -[pgac_cv_snprintf_long_long_int_format=cross; break]) +[pgac_cv_snprintf_long_long_int_modifier=cross; break]) done])dnl AC_CACHE_VAL -LONG_LONG_INT_FORMAT='' +LONG_LONG_INT_MODIFIER='' -case $pgac_cv_snprintf_long_long_int_format in +case $pgac_cv_snprintf_long_long_int_modifier in cross) AC_MSG_RESULT([cannot test (not on host machine)]);; - ?*)AC_MSG_RESULT([$pgac_cv_snprintf_long_long_int_format]) - LONG_LONG_INT_FORMAT=$pgac_cv_snprintf_long_long_int_format;; + ?*)AC_MSG_RESULT([$pgac_cv_snprintf_long_long_int_modifier]) + LONG_LONG_INT_MODIFIER=$pgac_cv_snprintf_long_long_int_modifier;; *) AC_MSG_RESULT(none);; -esac])# PGAC_FUNC_SNPRINTF_LONG_LONG_INT_FORMAT +esac])# PGAC_FUNC_SNPRINTF_LONG_LONG_INT_MODIFIER # PGAC_FUNC_SNPRINTF_ARG_CONTROL diff --git a/configure.in b/configure.in index c938a5d..6afc818 100644 --- a/configure.in +++ b/configure.in @@ -1636,35 +1636,38 @@ fi # If we found long int is 64 bits, assume snprintf handles it. If # we found we need to use long long int, better check. We cope with # snprintfs that use %lld, %qd, or %I64d as the format. If none of these -# work, fall back to our own snprintf emulation (which we know uses %lld). +# works, fall back to our own snprintf emulation (which we know uses %lld). if test $HAVE_LONG_LONG_INT_64 = yes ; then if test $pgac_need_repl_snprintf = no; then -PGAC_FUNC_SNPRINTF_LONG_LONG_INT_FORMAT -if test $LONG_LONG_INT_FORMAT = ; then +PGAC_FUNC_SNPRINTF_LONG_LONG_INT_MODIFIER +if test $LONG_LONG_INT_MODIFIER = ; then # Force usage of our own snprintf, since system snprintf is broken pgac_need_repl_snprintf=yes - LONG_LONG_INT_FORMAT='%lld' + LONG_LONG_INT_MODIFIER='ll' fi else # Here if we previously decided we needed to use our own snprintf -LONG_LONG_INT_FORMAT='%lld' +LONG_LONG_INT_MODIFIER='ll' fi - LONG_LONG_UINT_FORMAT=`echo $LONG_LONG_INT_FORMAT | sed 's/d$/u/'` - INT64_FORMAT=\$LONG_LONG_INT_FORMAT\ - UINT64_FORMAT=\$LONG_LONG_UINT_FORMAT\
Re: [HACKERS] No toast table for pg_shseclabel but for pg_seclabel
Here is no other reason than what Alvaro mentioned in the upthread. We intended to store security label of SELinux (less than 100bytes at most), so I didn't think it leads any problem actually. On the other hands, pg_seclabel was merged in another development cycle. We didn't have deep discussion about necessity of toast table of pg_seclabel. I added its toast table mechanically. It never means we need toast table for local object but don't need it for shared database object. Thanks, 2014-07-04 19:11 GMT+09:00 Andres Freund and...@2ndquadrant.com: On 2014-07-04 11:50:17 +0200, Andres Freund wrote: Hi, postgres=# SELECT oid::regclass, reltoastrelid FROM pg_class WHERE relname IN ('pg_seclabel', 'pg_shseclabel'); oid | reltoastrelid ---+--- pg_seclabel | 3598 pg_shseclabel | 0 (2 rows) Isn't that a somewhat odd choice? Why do we assume that there cannot be lengthy seclabels on shared objects? Granted, most shared objects aren't candidates for large amounts of data, but both users and databases don't seem to fall into that category. Hm. It seems they were explicitly removed around http://archives.postgresql.org/message-id/1309888389-sup-3853%40alvh.no-ip.org I don't understand the reasoning there. There's a toast table for non-shared objects. Why would we expect less data for the shared ones, even though they're pretty basic objects and more likely to be used to store policies and such? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] introduce XLogLockBlockRangeForCleanup()
Updated patch attached, thanks. Amit: what's your conclusion from the review? -- Abhijit diff --git a/src/backend/access/nbtree/nbtxlog.c b/src/backend/access/nbtree/nbtxlog.c index 5f9fc49..dc90c02 100644 --- a/src/backend/access/nbtree/nbtxlog.c +++ b/src/backend/access/nbtree/nbtxlog.c @@ -501,33 +501,9 @@ btree_xlog_vacuum(XLogRecPtr lsn, XLogRecord *record) * here during crash recovery. */ if (HotStandbyActiveInReplay()) - { - BlockNumber blkno; - - for (blkno = xlrec-lastBlockVacuumed + 1; blkno xlrec-block; blkno++) - { - /* - * We use RBM_NORMAL_NO_LOG mode because it's not an error - * condition to see all-zero pages. The original btvacuumpage - * scan would have skipped over all-zero pages, noting them in FSM - * but not bothering to initialize them just yet; so we mustn't - * throw an error here. (We could skip acquiring the cleanup lock - * if PageIsNew, but it's probably not worth the cycles to test.) - * - * XXX we don't actually need to read the block, we just need to - * confirm it is unpinned. If we had a special call into the - * buffer manager we could optimise this so that if the block is - * not in shared_buffers we confirm it as unpinned. - */ - buffer = XLogReadBufferExtended(xlrec-node, MAIN_FORKNUM, blkno, - RBM_NORMAL_NO_LOG); - if (BufferIsValid(buffer)) - { -LockBufferForCleanup(buffer); -UnlockReleaseBuffer(buffer); - } - } - } + XLogLockBlockRangeForCleanup(xlrec-node, MAIN_FORKNUM, + xlrec-lastBlockVacuumed + 1, + xlrec-block); /* * If we have a full-page image, restore it (using a cleanup lock) and diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index b7829ff..d84f83a 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -287,10 +287,6 @@ XLogReadBuffer(RelFileNode rnode, BlockNumber blkno, bool init) * * In RBM_ZERO and RBM_ZERO_ON_ERROR modes, if the page doesn't exist, the * relation is extended with all-zeroes pages up to the given block number. - * - * In RBM_NORMAL_NO_LOG mode, we return InvalidBuffer if the page doesn't - * exist, and we don't check for all-zeroes. Thus, no log entry is made - * to imply that the page should be dropped or truncated later. */ Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, @@ -331,8 +327,6 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, log_invalid_page(rnode, forknum, blkno, false); return InvalidBuffer; } - if (mode == RBM_NORMAL_NO_LOG) - return InvalidBuffer; /* OK to extend the file */ /* we do this in recovery only - no rel-extension lock needed */ Assert(InRecovery); @@ -375,6 +369,62 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, return buffer; } +/* + * XLogBlockRangeForCleanup is used in Hot Standby mode to emulate the + * locking effects imposed by VACUUM for nbtrees. + */ +void +XLogLockBlockRangeForCleanup(RelFileNode rnode, ForkNumber forkNum, + BlockNumber startBlkno, + BlockNumber uptoBlkno) +{ + BlockNumber blkno; + BlockNumber lastblock; + BlockNumber endingBlkno; + Buffer buffer; + SMgrRelation smgr; + + Assert(startBlkno != P_NEW); + Assert(uptoBlkno != P_NEW); + + /* Open the relation at smgr level */ + smgr = smgropen(rnode, InvalidBackendId); + + /* + * Create the target file if it doesn't already exist. This lets us cope + * if the replay sequence contains writes to a relation that is later + * deleted. (The original coding of this routine would instead suppress + * the writes, but that seems like it risks losing valuable data if the + * filesystem loses an inode during a crash. Better to write the data + * until we are actually told to delete the file.) + */ + smgrcreate(smgr, forkNum, true); + + lastblock = smgrnblocks(smgr, forkNum); + + endingBlkno = uptoBlkno; + if (lastblock endingBlkno) + endingBlkno = lastblock; + + for (blkno = startBlkno; blkno endingBlkno; blkno++) + { + /* + * All we need to do here is prove that we can lock each block + * with a cleanup lock. It's not an error to see all-zero pages + * here because the original btvacuumpage would not have thrown + * an error either. + * + * We don't actually need to read the block, we just need to + * confirm it is unpinned. + */ + buffer = GetBufferWithoutRelcache(rnode, forkNum, blkno); + if (BufferIsValid(buffer)) + { + LockBufferForCleanup(buffer); + UnlockReleaseBuffer(buffer); + } + } +} /* * Struct actually returned by XLogFakeRelcacheEntry, though the declared diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 07ea665..ce5ff70 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -222,8 +222,6 @@ ReadBuffer(Relation reln, BlockNumber blockNum) * current physical EOF; that is likely to cause problems in md.c when
[HACKERS] [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Hello, My customer reported a strange connection hang problem. He and I couldn't reproduce it. I haven't been able to understand the cause, but I can think of one hypothesis. Could you give me your opinions on whether my hypothesis is correct, and a direction on how to fix the problem? I'm willing to submit a patch if necessary. [Problem] The customer is using synchronous streaming replication with PostgreSQL 9.2.8. The cluster consists of two nodes. He performed archive recovery test like this: 1. Take a base backup. At that time, some notable settings in postgresql.conf are: synchronous_standby_names = 'node2' autovacuum = on # synchronous_commit is commented out, so it's on by default 2. Some update operations. I don't know what. 3. Shutdown the primary and promote the standby. 4. Shutdown the new primary. 5. Perform archive recovery. That is, restore the base backup, create recovery.conf, and do pg_ctl start. 6. Immediately after the archive recovery is complete, connect to the database server and perform some queries to check user data. The steps 5 and 6 are done in some recovery script. However, the connection attempt in step 6 got stuck for 12 hours, and the test was canceled. The stack trace was: #0 0x003f4badf258 in poll () from /lib64/libc.so.6 #1 0x00619b94 in WaitLatchOrSocket () #2 0x00640c4c in SyncRepWaitForLSN () #3 0x00491c18 in RecordTransactionCommit () #4 0x00491d98 in CommitTransaction () #5 0x00493135 in CommitTransactionCommand () #6 0x0074938a in InitPostgres () #7 0x0066ddd7 in PostgresMain () #8 0x00627d81 in PostmasterMain () #9 0x005c4803 in main () The connection attempt is waiting for a reply from the standby. This is strange, because we didn't anticipate that the connection establishment (and subsequent SELECT queries) would update something and write some WAL. The doc says: http://www.postgresql.org/docs/current/static/warm-standby.html#SYNCHRONOUS-REPLICATION When requesting synchronous replication, each commit of a write transaction will wait until confirmation is received that the commit has been written to the transaction log on disk of both the primary and standby server. ... Read only transactions and transaction rollbacks need not wait for replies from standby servers. Subtransaction commits do not wait for responses from standby servers, only top-level commits. [Hypothesis] Why does the connection processing emit WAL? Probably, it did page-at-a-time vacuum during access to pg_database and pg_authid for client authentication. src/backend/access/heap/README.HOT describes: Effectively, space reclamation happens during tuple retrieval when the page is nearly full (10% free) and a buffer cleanup lock can be acquired. This means that UPDATE, DELETE, and SELECT can trigger space reclamation, but often not during INSERT ... VALUES because it does not retrieve a row. But the customer could not reproduce the problem when he performed the same archive recovery from the same base backup again. Why? I guess the autovacuum daemon vacuumed the system catalogs before he attempted to connect to the database. Is this correct? [How to fix] Of course, adding -o '-c synchronous_commit=local' or -o '-c synchronous_standby_names=' to pg_ctl start in the recovery script would prevent the problem. But isn't there anything to fix in PostgreSQL? I think the doc needs improvement so that users won't misunderstand that only write transactions would block at commit. Do you think something else should be done? I guess pg_basebackup, pg_isready, and PQping() called in pg_ctl -w start/restart would block likewise, and I'm afraid users don't anticipate it. pg_upgrade appears to set synchronous_commit to local when starting the database server. Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issue while calling new PostgreSQL command from a Java Application
Ashoke s.ash...@gmail.com writes: Thank you Ashutosh*.* That was the issue. But, could you please explain why it worked from command line? Simple vs extended query protocol, probably --- the former avoids copying the constructed parsetree, but I think the latter doesn't. Or maybe the JDBC driver tried to prepare the query; a prepared statement is most certainly going to copy the parsetree. In general, if you add a field to any node type, you'd better go through backend/nodes/ and teach all the relevant functions about it. What I tend to do is grep for one of the existing fields in the struct and see which functions that reference it need additions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
Vik Fearing vik.fear...@dalibo.com writes: You mean that if synchronous_standby_names is an empty it automatically should be set to cluster_name? No, I mean that synchronous_standby_names should look at cluster_name first, and if it's not set then unfortunately look at application_name for backward compatibility. I think you're failing to explain yourself very well. What you really mean is that we should use cluster_name not application_name to determine the name that a standby server reports to the master. There's something to that, perhaps, but I think that the mechanism we use for doing the reporting involves the application_name parameter passed through libpq. It might be a bit painful to change that, and I'm not entirely sure it'd be less confusing. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No toast table for pg_shseclabel but for pg_seclabel
Kohei KaiGai kai...@kaigai.gr.jp writes: Here is no other reason than what Alvaro mentioned in the upthread. We intended to store security label of SELinux (less than 100bytes at most), so I didn't think it leads any problem actually. On the other hands, pg_seclabel was merged in another development cycle. We didn't have deep discussion about necessity of toast table of pg_seclabel. I added its toast table mechanically. So maybe we should get rid of the toast table for pg_seclabel. One less catalog table for a feature that hardly anyone is using seems like a fine idea to me ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
At 2014-07-04 19:27:10 +0530, rahilasye...@gmail.com wrote: Please find attached patches with no whitespace error and improved formatting. Thanks. There are still numerous formatting changes required, e.g. spaces around = and correct formatting of comments. And git diff --check still has a few whitespace problems. I won't point these out one by one, but maybe you should run pgindent. diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 3f92482..39635de 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -60,6 +60,9 @@ #include storage/spin.h #include utils/builtins.h #include utils/guc.h +#include utils/pg_lzcompress.h +#include utils/pg_snappy.h +#include utils/pg_lz4.h #include utils/ps_status.h #include utils/relmapper.h #include utils/snapmgr.h This hunk still fails to apply to master (due to the subsequent inclusion of memutils.h), but I just added it in by hand. +int compress_backup_block = false; Should be initialised to BACKUP_BLOCK_COMPRESSION_OFF as noted earlier. + /* Allocates memory for compressed backup blocks according to the compression + * algorithm used.Once per session at the time of insertion of first XLOG + * record. + * This memory stays till the end of session. OOM is handled by making the + * code proceed without FPW compression*/ I suggest something like this: /* * Allocates pages to store compressed backup blocks, with the page * size depending on the compression algorithm selected. These pages * persist throughout the life of the backend. If the allocation * fails, we disable backup block compression entirely. */ But though the code looks better locally than before, the larger problem is that this is still unsafe. As Pavan pointed out, XLogInsert is called from inside critical sections, so we can't allocate memory here. Could you look into his suggestions of other places to do the allocation, please? + static char *compressed_pages[XLR_MAX_BKP_BLOCKS]; + static bool compressed_pages_allocated = false; These declarations can't just be in the middle of the function, they'll have to move up to near the top of the closest enclosing scope (wherever you end up doing the allocation). + if (compress_backup_block != BACKUP_BLOCK_COMPRESSION_OFF + compressed_pages_allocated!= true) No need for != true with a boolean. + if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_SNAPPY) + buffer_size += snappy_max_compressed_length(BLCKSZ); + else if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_LZ4) + buffer_size += LZ4_compressBound(BLCKSZ); + else if (compress_backup_block == BACKUP_BLOCK_COMPRESSION_PGLZ) + buffer_size += PGLZ_MAX_OUTPUT(BLCKSZ); There's nothing wrong with this, but given that XLR_MAX_BKP_BLOCKS is 4, I would just allocate pages of size BLCKSZ. But maybe that's just me. + bkpb-block_compression=BACKUP_BLOCK_COMPRESSION_OFF; Wouldn't it be better to set bkpb-block_compression = compress_backup_block; once earlier instead of setting it that way once and setting it to BACKUP_BLOCK_COMPRESSION_OFF in two other places? + if(VARSIZE(buf) orig_len-2) + /* successful compression */ + { + *len = VARSIZE(buf); + return (char *) buf; + } + else + return NULL; +} That comment after the if just has to go. It's redundant given the detailed explanation above anyway. Also, I'd strongly prefer checking for failure rather than success here, i.e. if (VARSIZE(buf) = orig_len - 2) return NULL; *len = VARSIZE(buf); /* Doesn't this need + VARHDRSIZE? */ return (char *) buf; I don't quite remember what I suggested last time, but if it was what's in the patch now, I apologise. + /* Decompress if backup block is compressed*/ + else if (VARATT_IS_COMPRESSED((struct varlena *) blk) + bkpb.block_compression!=BACKUP_BLOCK_COMPRESSION_OFF) If you're using VARATT_IS_COMPRESSED() to detect compression, don't you need SET_VARSIZE_COMPRESSED() in CompressBackupBlock? pglz_compress() does it for you, but the other two algorithms don't. But now that you've added bkpb.block_compression, you should be able to avoid VARATT_IS_COMPRESSED() altogether, unless I'm missing something. What do you think? +/* + */ +static const struct config_enum_entry backup_block_compression_options[] = { + {off, BACKUP_BLOCK_COMPRESSION_OFF, false}, + {false, BACKUP_BLOCK_COMPRESSION_OFF, true}, + {no, BACKUP_BLOCK_COMPRESSION_OFF, true}, + {0, BACKUP_BLOCK_COMPRESSION_OFF, true}, + {pglz, BACKUP_BLOCK_COMPRESSION_PGLZ, true}, + {snappy, BACKUP_BLOCK_COMPRESSION_SNAPPY, true}, + {lz4,
Re: [HACKERS] RLS Design
Kaigai, On Thursday, July 3, 2014, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Sorry for my late responding, now I'm catching up the discussion. * Robert Haas (robertmh...@gmail.com javascript:;) wrote: On Tue, Jul 1, 2014 at 3:20 PM, Dean Rasheed dean.a.rash...@gmail.com javascript:; wrote: If RLS quals are instead regarded as constraints on access, and multiple policies apply, then it seems that the quals should now be combined with AND rather than OR, right? I do feel that RLS quals are constraints on access, but I don't see how it follows that multiple quals should be AND'd together because of that. I view the RLS policies on each table as being independent and standing alone regarding what can be seen. If you have access to a table today through policy A, and then later policy B is added, using AND would mean that the set of rows returned is less than if only policy A existed. That doesn't seem correct to me. It seems to me direction of the constraints (RLS-policy) works to is reverse. In case when we have no RLS-policy, 100% of rows are visible isn't it? No, as outlined later, the table would appear empty if no policies exist and RLS is enabled for the table. Addition of a constraint usually reduces the number of rows being visible, or same number of rows at least. Constraint shall never work to the direction to increase the number of rows being visible. Can you clarify where this is coming from..? It sounds like you're referring to an existing implementation and, if so, it'd be good to get more information on how that works exactly. If multiple RLS-policies are connected with OR-operator, the first policy works to the direction to reduce number of visible rows, but the second policy works to the reverse direction. This isn't accurate, as mentioned. Each policy stands alone to define what is visible through it and if no policy exists then no rows are visible. If we would have OR'd RLS-policy, how does it merged with user given qualifiers with? The RLS quals are all applied together with OR's and the result is AND'd with any user quals provided. This is only when multiple policies are being applied for a given query and seems pretty straight forward to me. For example, if RLS-policy of t1 is (t1.credential get_user_credential) and user's query is: SELECT * FROM t1 WHERE t1.x = t1.x; Do you think RLS-policy shall be merged with OR'd form? Only the RLS policies are OR'd together, not user provided quals. The above would result in: Where t1.x = t1.x and (t1.credential get_user_credential) If another policy also applies for this query, such as t1.cred2 get_user_credential then we would have: Where t1.x = t1.x and (t1.credential get_user_credential OR t1.cred2 get_user_credential) This is similar to how roles work- your overall access includes all access granted to any roles you are a member of. You don't need SELECT rights granted to every role you are a member of to select from the table. Additionally, if an admin wants to AND the quals together then they can simply create a policy which does that rather than have 2 policies. Yeah, maybe. I intuitively feel that OR would be more useful, so it would be nice to find a design where that makes sense. But it depends a lot, in my view, on what syntax we end up with. For example, suppose we add just one command: ALTER TABLE table_name FILTER [ role_name | PUBLIC ] USING qual; If the given role inherits from multiple roles that have different filters, I think the user will naturally expect all of the filters to be applied. Agreed. But you could do it other ways. For example: ALTER TABLE table_name [ NO ] ROW LEVEL SECURITY; ALTER TABLE table_name GRANT ROW ACCESS TO role_name USING qual; If a table is set to NO ROW LEVEL SECURITY then it behaves just like it does now: anyone who accesses it sees all the rows, restricted to those columns for which they have permission. If the table is set to ROW LEVEL SECURITY then the default is to show no rows. The second command then allows access to a subset of the rows for a give role name. In this case, it is probably logical for access to be combined via OR. I can see value is having a table-level option to indicate if RLS is applied for that table or not, but I had been thinking we'd just automatically manage that. That is to say that once you define an RLS policy for a table, we go look and see what policy should be applied in each case. With the user able to control that, what happens if they say row security on the table and there are no policies? All access would show the table as empty? What if policies exist and they decide to 'turn off' RLS for the table- suddenly everyone can see all the rows? My answers to the above (which are making me like the idea more, actually...) would be: Yes, if they turn on RLS for the
Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()
At 2014-07-02 15:51:08 -0700, p...@heroku.com wrote: Attached revision factors in everyone's concerns here, I think. Is anyone planning to review Peter's revised patch? These new measures make the coding somewhat more complex than that of the initial version, although overall the parser code added by this patch is almost entirely confined to code paths concerned only with producing diagnostic messages to help users. Yes, the new patch looks quite a bit more involved than earlier, but if that's what it takes to provide a useful HINT, I guess it's not too bad. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes
At 2014-07-04 21:02:33 +0530, a...@2ndquadrant.com wrote: +/* + */ +static const struct config_enum_entry backup_block_compression_options[] = { Oh, I forgot to mention that the configuration setting changes are also pending. I think we had a working consensus to use full_page_compression as the name of the GUC. As I understand it, that'll accept an algorithm name as an argument while we're still experimenting, but eventually once we select an algorithm, it'll become just a boolean (and then we don't need to put algorithm information into BkpBlock any more either). -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_xlogdump --stats
Andres Freund wrote: On 2014-07-04 18:31:34 +0800, gotoschool6g wrote: Carp the code: const char * clog_identify(uint8 info) { switch (info) { case CLOG_ZEROPAGE: return ZEROPAGE; case CLOG_TRUNCATE: return TRUNCATE; break; } return NULL; } I agree that performance is secondary here. The part that I don't quite like in all these blocks is the fact that they initialize the return value to NULL beforehand, and there's no 'default' label. Now, I see that pg_xlogdump is checking for NULL return, but I'm not sure this is the best possible API. Shouldn't we perhaps return a different string that indicates there is no known description? Also, are we certain that we're never going to need anything other than the info to identify the record? In practice we seem to follow that rule currently, but it's not totally out of the question that someday the rm_redo function uses more than just info to identify the record. I wonder if it'd be better to pass the whole record instead and have the rm_identify function pull out the info if it's all it needs, but leave the possibility open that it could read, say, some header bytes in the record data. Also I didn't check how you handle the init bit in RM_HEAP and RM_HEAP2. Are we just saying that insert (init) is a different record type from insert? Maybe that's okay, but I'm not 100% sure. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_xlogdump --stats
At 2014-07-04 13:43:46 -0400, alvhe...@2ndquadrant.com wrote: Now, I see that pg_xlogdump is checking for NULL return, but I'm not sure this is the best possible API. Note that the patched pg_xlogdump will display rm_name/0xNN for any records that rm_identify returns a NULL for. Earlier, when there was the possibility that new record types could be added without updating pg_xlogdump's identification code, this made good sense. Now one could argue that pg_xlogdump should fully depend on every record in WAL being properly identified. If that's the case, rm_identify could return UNKNOWN, and pg_xlogdump could use the return value without checking. I considered that, but the only thing that stopped me was the thought that if a weird record DOES show up in WAL somehow, it'd be pretty handy to (a) know the value of xl_info, and (b) see if there's more than one kind (per-rmid). But I don't feel strongly enough about it that I'd object to displaying just UNKNOWN. I wonder if it'd be better to pass the whole record instead and have the rm_identify function pull out the info if it's all it needs, but leave the possibility open that it could read, say, some header bytes in the record data. I don't *have* an XLogRecord at the point where I'm calling rm_identify. I could call rm_identify earlier, and either store the name in the stats structure, or hash the name and use the hash value to store that record type's statistics. We don't even have any other potential callers for rm_identify. Adding it and making it significantly more difficult to use for the only code that actually needs it… no, I pretty much hate that idea. Personally, I think it's fine to keep rm_identify the way it is. It's hardly very difficult code, and if the assumption that records can be identified by xl_info ever becomes invalid, this isn't the only code that will need to be changed either. (Otherwise, I'd actually prefer to keep all the identification code in pg_xlogdump, i.e. the pre-rm_identify version of my patch. At least that would make it possible to maintain a version that could be built against 9.3, the way Marti did.) Also I didn't check how you handle the init bit in RM_HEAP and RM_HEAP2. Are we just saying that insert (init) is a different record type from insert? Yes, that's what the patch does currently. X vs. X+INIT. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] buildfarm and rolling release distros
Re: Craig Ringer 2014-07-02 53b39638.9010...@2ndquadrant.com +1. The buildfarm has one such member already, anchovy, and I recall it having given at least one helpful forewarning. It shows as Arch Linux testing [updated daily], which is sufficient annotation for me. Its failure rate has been low; member-caused failures due to ENOSPC and other miscellany are a good deal more common. Yep - I see early notice of new gcc special behaviour, etc as quite valuable. If we're dubious about a result, we wait a few days and see if it goes away on its own. I was looking at the zoo some time ago and was surprised there's only a single animal running Debian unstable/testing - and that's on ia64 which Debian killed some months ago because no one seemed to care enough about maintaining the toolchain. Is this because unstable/testing are considered rolling releases? (On a second look, dugong seems to be running etch/4.0, which is... old.) My plan was to set up two animals, amd64 and i386, using the compiler flags we are using for the Debian packages, though I was still waiting for a VM in a soon-to-be-there cluster at credativ. That would have catched the ASLR problems we discussed some weeks ago quite early, I guess. Does it make sense to look into setting up that target? Christoph -- c...@df7cb.de | http://www.df7cb.de/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RLS Design
Kaigai, On Thursday, July 3, 2014, Kouhei Kaigai kai...@ak.jp.nec.com wrote: Sorry for my late responding, now I'm catching up the discussion. * Robert Haas (robertmh...@gmail.com javascript:; ) wrote: On Tue, Jul 1, 2014 at 3:20 PM, Dean Rasheed dean.a.rash...@gmail.com javascript:; wrote: If RLS quals are instead regarded as constraints on access, and multiple policies apply, then it seems that the quals should now be combined with AND rather than OR, right? I do feel that RLS quals are constraints on access, but I don't see how it follows that multiple quals should be AND'd together because of that. I view the RLS policies on each table as being independent and standing alone regarding what can be seen. If you have access to a table today through policy A, and then later policy B is added, using AND would mean that the set of rows returned is less than if only policy A existed. That doesn't seem correct to me. It seems to me direction of the constraints (RLS-policy) works to is reverse. In case when we have no RLS-policy, 100% of rows are visible isn't it? No, as outlined later, the table would appear empty if no policies exist and RLS is enabled for the table. Addition of a constraint usually reduces the number of rows being visible, or same number of rows at least. Constraint shall never work to the direction to increase the number of rows being visible. Can you clarify where this is coming from..? It sounds like you're referring to an existing implementation and, if so, it'd be good to get more information on how that works exactly. Oracle VPD - Multiple Policies for Each Table, View, or Synonym http://docs.oracle.com/cd/B19306_01/network.102/b14266/apdvpoli.htm#i1008351 It says - Note that all policies applied to a table are enforced with AND syntax. Not only Oracle VPD, it fits attitude of defense in depth. Please assume a system that installs network firewall, unix permissions and selinux. If somebody wants to reference an information asset within a file, he has to connect the server from the network address being allowed by the firewall configuration AND both of DAC and MAC has to allow his access. Usually, we have to pass all the access control to reference the target information, not one of the access control stuffs being installed. For example, if RLS-policy of t1 is (t1.credential get_user_credential) and user's query is: SELECT * FROM t1 WHERE t1.x = t1.x; Do you think RLS-policy shall be merged with OR'd form? Only the RLS policies are OR'd together, not user provided quals. The above would result in: Where t1.x = t1.x and (t1.credential get_user_credential) If another policy also applies for this query, such as t1.cred2 get_user_credential then we would have: Where t1.x = t1.x and (t1.credential get_user_credential OR t1.cred2 get_user_credential) This is similar to how roles work- your overall access includes all access granted to any roles you are a member of. You don't need SELECT rights granted to every role you are a member of to select from the table. Additionally, if an admin wants to AND the quals together then they can simply create a policy which does that rather than have 2 policies. It seems to me a pain on database administration, if we have to pay attention not to conflict each RLS-policy. I expect 90% of RLS-policy will be configured to PUBLIC user, to apply everybody same rules on access. In this case, DBA has to ensure the target table has no policy or existing policy does not conflict with the new policy to be set. I don't think it is a good idea to enforce DBA these checks. Please assume here are two individual applications that use RLS on table-X. Even if application-1 want only rows being public become visible, it may expose credential or secret rows by interaction of orthogonal policy configured by application-2 (that may configure the policy according to the source ip-address). It seems to me application-2 partially invalidated the RLS-policy configured by application-1. You are suggesting instead that if application 2 sets up policies on the table and then application 1 adds another policy that it should reduce what application 2's users can see? That doesn't make any sense to me. I'd actually expect these applications to at least use different roles anyway, which means they could each have a single role specific policy which only returns what that application is allowed to see. I don't think this assumption is reasonable. Please expect two applications: app-X that is a database security product to apply access control based on remote ip-address of the client for any table accesses by any database roles. app-Y that is a
[HACKERS] DISTINCT with btree skip scan
Hello As an exercise I hacked up the simplest code I could think of that would demonstrate a faster DISTINCT based on skipping ahead to the next distinct value in an index-only scan. Please see the attached (extremely buggy) patch, and the example session below. (It's against my natural instinct to send such half-baked early hacking phase code to the list, but thought it would make sense to demo the concept and then seek advice, warnings, cease and desist notices etc before pressing on down that route!) I would be most grateful for any feedback you might have. Best regards, Thomas Munro postgres=# create table foo (a int, b int, primary key (a, b)); CREATE TABLE Time: 9.002 ms postgres=# insert into foo select generate_series(1, 500) % 10, generate_series(1, 500); INSERT 0 500 Time: 35183.444 ms postgres=# analyze foo; ANALYZE Time: 493.168 ms postgres=# set enable_hashagg = true; SET Time: 0.274 ms postgres=# explain select distinct a from foo; ┌───┐ │QUERY PLAN │ ├───┤ │ HashAggregate (cost=84624.00..84624.10 rows=10 width=4) │ │ Group Key: a│ │ - Seq Scan on foo (cost=0.00..72124.00 rows=500 width=4) │ │ Planning time: 0.065 ms │ └───┘ (4 rows) Time: 0.500 ms postgres=# select distinct a from foo; ┌───┐ │ a │ ├───┤ │ 6 │ │ 8 │ │ 1 │ │ 2 │ │ 3 │ │ 4 │ │ 5 │ │ 9 │ │ 7 │ │ 0 │ └───┘ (10 rows) Time: 2017.019 ms postgres=# set enable_hashagg = false; SET Time: 0.302 ms postgres=# explain select distinct a from foo; ┌─┐ │ QUERY PLAN │ ├─┤ │ Index Only Scan for distinct prefix 1 using foo_pkey on foo (cost=0.43..263354.20 rows=10 width=4) │ │ Planning time: 0.063 ms │ └─┘ (2 rows) Time: 0.443 ms postgres=# select distinct a from foo; ┌───┐ │ a │ ├───┤ │ 0 │ │ 1 │ │ 2 │ │ 3 │ │ 4 │ │ 5 │ │ 6 │ │ 7 │ │ 8 │ │ 9 │ └───┘ (10 rows) Time: 0.565 ms diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c index 53cf96f..5f10d7f 100644 --- a/src/backend/access/index/indexam.c +++ b/src/backend/access/index/indexam.c @@ -29,6 +29,7 @@ * index_can_return - does index support index-only scans? * index_getprocid - get a support procedure OID * index_getprocinfo - get a support procedure's lookup info + * index_skip - advance to the next key value in a scan * * NOTES * This file contains the index_ routines which used @@ -742,6 +743,37 @@ index_can_return(Relation indexRelation) PointerGetDatum(indexRelation))); } +bool +index_can_skip(Relation indexRelation) +{ + FmgrInfo *procedure; + + RELATION_CHECKS; + + /* amcanskip is optional; assume FALSE if not provided by AM */ + if (!RegProcedureIsValid(indexRelation-rd_am-amcanskip)) + return false; + + GET_REL_PROCEDURE(amcanskip); + + return DatumGetBool(FunctionCall1(procedure, + PointerGetDatum(indexRelation))); +} + +bool +index_skip(IndexScanDesc scan, ScanDirection direction, int prefix) +{ + FmgrInfo *procedure; + SCAN_CHECKS; + GET_SCAN_PROCEDURE(amskip); + Datum result = + DatumGetPointer(FunctionCall3(procedure, + PointerGetDatum(scan), + Int32GetDatum(direction), + Int32GetDatum(prefix))); + return DatumGetBool(result); +} + /* * index_getprocid * diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c index 36dc6c2..2f987e4 100644 --- a/src/backend/access/nbtree/nbtree.c +++ b/src/backend/access/nbtree/nbtree.c @@ -434,6 +434,8 @@ btbeginscan(PG_FUNCTION_ARGS) so-currPos.nextTupleOffset = 0; so-markPos.nextTupleOffset = 0; + so-skipScanKey = NULL; + scan-xs_itupdesc = RelationGetDescr(rel); scan-opaque = so; @@ -509,6 +511,19 @@ btrescan(PG_FUNCTION_ARGS) } /* + * btskip() -- skip to the beginning of the next key prefix + */ +Datum +btskip(PG_FUNCTION_ARGS) +{ + IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0); + ScanDirection dir = (ScanDirection) PG_GETARG_INT32(1); + int prefix = PG_GETARG_INT32(2); + bool result = _bt_skip(scan, dir, prefix); + PG_RETURN_BOOL(result); +} + +/* * btendscan() -- close down a scan */ Datum @@ -1118,3 +1133,12 @@ btcanreturn(PG_FUNCTION_ARGS) { PG_RETURN_BOOL(true); } + +/* + * btcanskip() -- Check whether btree indexes support skipping. + */ +Datum
Re: [HACKERS] DISTINCT with btree skip scan
On 07/05/2014 02:17 AM, Thomas Munro wrote: As an exercise I hacked up the simplest code I could think of that would demonstrate a faster DISTINCT based on skipping ahead to the next distinct value in an index-only scan. Please see the attached (extremely buggy) patch, and the example session below. (It's against my natural instinct to send such half-baked early hacking phase code to the list, but thought it would make sense to demo the concept and then seek advice, warnings, cease and desist notices etc before pressing on down that route!) I would be most grateful for any feedback you might have. This is called a Loose Index Scan in our wiki[1] which I believe is based on the terminology used for the same feature in MySQL although I haven't and shan't check. This is a feature I would really like to have. Your benchmarks look promising but unfortunately it blows up for me so I can't really test it. I have not yet read the patch nor debugged to see why, but please do keep up work on this. People more expert than I can tell you whether you're implementing it the right way. [1] http://wiki.postgresql.org/wiki/Loose_indexscan -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Aggregate function API versus grouping sets
On 07/02/2014 10:15 PM, Andrew Gierth wrote: (Had one request so far for a mode() variant that returns the unique modal value if one exists, otherwise null; so the current set of ordered-set aggs by no means exhausts the possible applications.) I resemble that remark. :) But seriously, am I the only one who doesn't want some random value when there is no dominant one? And the fact that I can't create my own without C code is a real bummer. -- Vik -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pg_upgrade and toast tables bug discovered
On Fri, Jul 4, 2014 at 12:01:37AM -0400, Bruce Momjian wrote: The most robust, but not trivial, approach seems to be to prevent toast table creation if there wasn't a set_next_toast_pg_class_oid(). Then, after all relations are created, iterate over all pg_class entries that possibly need toast tables and recheck if they now might need one. Wow, that is going to be kind of odd in that there really isn't a good way to create toast tables except perhaps add a dummy TEXT column and remove it. There also isn't an easy way to not create a toast table, but also find out that one was needed. I suppose we would have to insert some dummy value in the toast pg_class column and come back later to clean it up. I am wondering what the probability of having a table that didn't need a toast table in the old cluster, and needed one in the new cluster, and there being an oid collision. I think the big question is whether we want to go down that route. Here is an updated patch. It was a little tricky because if the mismatched non-toast table is after the last old relation, you need to test differently and emit a different error message as there is no old relation to complain about. As far as the reusing of oids, we don't set the oid counter until after the restore, so any new unmatched toast table would given a very low oid. Since we restore in oid order, for an oid to be assigned that was used in the old cluster, it would have to be a very early relation, so I think that reduces the odds considerably. I am not inclined to do anything more to avoid this until I see an actual error report --- trying to address it might be invasive and introduce new errors. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c new file mode 100644 index 6205c74..72f805c *** a/contrib/pg_upgrade/info.c --- b/contrib/pg_upgrade/info.c *** gen_db_file_maps(DbInfo *old_db, DbInfo *** 38,58 int *nmaps, const char *old_pgdata, const char *new_pgdata) { FileNameMap *maps; ! int relnum; int num_maps = 0; maps = (FileNameMap *) pg_malloc(sizeof(FileNameMap) * old_db-rel_arr.nrels); ! for (relnum = 0; relnum Min(old_db-rel_arr.nrels, new_db-rel_arr.nrels); ! relnum++) { ! RelInfo*old_rel = old_db-rel_arr.rels[relnum]; ! RelInfo*new_rel = new_db-rel_arr.rels[relnum]; if (old_rel-reloid != new_rel-reloid) ! pg_fatal(Mismatch of relation OID in database \%s\: old OID %d, new OID %d\n, ! old_db-db_name, old_rel-reloid, new_rel-reloid); /* * TOAST table names initially match the heap pg_class oid. In --- 38,96 int *nmaps, const char *old_pgdata, const char *new_pgdata) { FileNameMap *maps; ! int old_relnum, new_relnum; int num_maps = 0; maps = (FileNameMap *) pg_malloc(sizeof(FileNameMap) * old_db-rel_arr.nrels); ! /* ! * The old database shouldn't have more relations than the new one. ! * We force the new cluster to have a TOAST table if the old table ! * had one. ! */ ! if (old_db-rel_arr.nrels new_db-rel_arr.nrels) ! pg_fatal(old and new databases \%s\ have a mismatched number of relations\n, ! old_db-db_name); ! ! /* Drive the loop using new_relnum, which might be higher. */ ! for (old_relnum = new_relnum = 0; new_relnum new_db-rel_arr.nrels; ! new_relnum++) { ! RelInfo*old_rel; ! RelInfo*new_rel = new_db-rel_arr.rels[new_relnum]; ! ! /* ! * It is possible that the new cluster has a TOAST table for a table ! * that didn't need one in the old cluster, e.g. 9.0 to 9.1 changed the ! * NUMERIC length computation. Therefore, if we have a TOAST table ! * in the new cluster that doesn't match, skip over it and continue ! * processing. It is possible this TOAST table used an OID that was ! * reserved in the old cluster, but we have no way of testing that, ! * and we would have already gotten an error at the new cluster schema ! * creation stage. Fortunately, since we only restore the OID counter ! * after schema restore, and restore in OID order, a conflict would ! * only happen if the new TOAST table had a very low OID. ! */ ! if (old_relnum == old_db-rel_arr.nrels) ! { ! if (strcmp(new_rel-nspname, pg_toast) == 0) ! continue; ! else ! pg_fatal(Extra non-TOAST relation found in database \%s\: new OID %d\n, ! old_db-db_name, new_rel-reloid); ! } ! ! old_rel = old_db-rel_arr.rels[old_relnum]; if (old_rel-reloid != new_rel-reloid) ! { ! if (strcmp(new_rel-nspname, pg_toast) == 0) ! continue; ! else ! pg_fatal(Mismatch of relation OID in database \%s\: old OID %d, new OID %d\n, ! old_db-db_name, old_rel-reloid, new_rel-reloid); ! } /* * TOAST table names
Re: [HACKERS] postgresql.auto.conf and reload
On Fri, Jun 27, 2014 at 9:11 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Jun 26, 2014 at 1:49 PM, Christoph Berg c...@df7cb.de wrote: Re: Amit Kapila 2014-06-26 CAA4eK1+mUTjc=GXJK3bYtSwV2BmBni= phevbqlqkhduv9cw...@mail.gmail.com How about adding a note in Alter System so that users are aware of such behaviour and can ensure that they don't have duplicate entries? If the behavior isn't going to change, that issue need to be documented, sure. I will send a patch to address this unless someone comes with a better way to address this and if no one objects to adding a note in Alter System documentation. Please find the patch attached to address the above concern. I have updated docs, so that users can be aware of such behaviour. I will add this patch for next CF to avoid getting lost, however I believe it should be done for 9.4. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com alter_system_postmaster_params_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers