Re: [HACKERS] GIN improvements part 1: additional information
Guys, before digging deep into the art of comp/decomp world I'd like to know if you familiar with results of http://wwwconference.org/www2008/papers/pdf/p387-zhangA.pdf paper and some newer research ? Do we agree in what we really want ? Basically, there are three main features: size, compression, decompression speed - we should take two :) Should we design sort of plugin, which could support independent storage on disk, so users can apply different techniques, depending on data. What I want to say is that we certainly can play with this very challenged task, but we have limited time before 9.4 and we should think in positive direction. Oleg On Wed, Dec 18, 2013 at 6:50 PM, Heikki Linnakangas wrote: > On 12/18/2013 01:45 PM, Alexander Korotkov wrote: >> >> On Tue, Dec 17, 2013 at 2:49 AM, Heikki Linnakangas >> >> >>> wrote: >> >> >>> On 12/17/2013 12:22 AM, Alexander Korotkov wrote: >>> 2) Storage would be easily extendable to hold additional information as well. Better compression shouldn't block more serious improvements. >>> >>> I'm not sure I agree with that. For all the cases where you don't care >>> about additional information - which covers all existing users for >>> example >>> - reducing disk size is pretty important. How are you planning to store >>> the >>> additional information, and how does using another encoding gets in the >>> way >>> of that? >> >> >> I was planned to store additional information datums between >> varbyte-encoded tids. I was expected it would be hard to do with PFOR. >> However, I don't see significant problems in your implementation of >> Simple9 >> encoding. I'm going to dig deeper in your version of patch. > > > Ok, thanks. > > I had another idea about the page format this morning. Instead of having the > item-indexes at the end of the page, it would be more flexible to store a > bunch of self-contained posting list "segments" on the page. So I propose > that we get rid of the item-indexes, and instead store a bunch of > independent posting lists on the page: > > typedef struct > { > ItemPointerData first; /* first item in this segment (unpacked) */ > uint16 nwords; /* number of words that follow */ > uint64 words[1];/* var length */ > } PostingListSegment; > > Each segment can be encoded and decoded independently. When searching for a > particular item (like on insertion), you skip over segments where 'first' > > the item you're searching for. > > This format offers a lot more flexibility compared to the separate item > indexes. First, we don't need to have another fixed sized area on the page, > which simplifies the page format. Second, we can more easily re-encode only > one segment on the page, on insertion or vacuum. The latter is particularly > important with the Simple-9 encoding, which operates one word at a time > rather than one item at a time; removing or inserting an item in the middle > can require a complete re-encoding of everything that follows. Third, when a > page is being inserted into and contains only uncompressed items, you don't > waste any space for unused item indexes. > > While we're at it, I think we should use the above struct in the inline > posting lists stored directly in entry tuples. That wastes a few bytes > compared to the current approach in the patch (more alignment, and 'words' > is redundant with the number of items stored on the tuple header), but it > simplifies the functions handling these lists. > > > - Heikki > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [9.3 bug] disk space in pg_xlog increases during archive recovery
On Mon, Dec 16, 2013 at 9:22 PM, MauMau wrote: > Hi, Fujii san, > >> On Wed, Aug 7, 2013 at 7:03 AM, Fujii Masao wrote: >>> >>> On second thought, probably we cannot remove the restored WAL files early >>> because they might be required for fast promotion which is new feature in >>> 9.3. >>> In fast promotion, an end-of-recovery checkpoint is not executed. After >>> the end >>> of recovery, normal online checkpoint starts. If the server crashes >>> before such >>> an online checkpoint completes, the server needs to replay again all the >>> WAL >>> files which it replayed before the promotion. Since this is the crash >>> recovery, >>> all those WAL files need to exist in pg_xlog directory. So if we remove >>> the >>> restored WAL file from pg_xlog early, such a crash recovery might fail. >>> >>> So even if cascade replication is disabled, if standby_mode = on, i.e., >>> fast >>> promotion can be performed, we cannot remove the restored WAL files >>> early. > > > Following Fujii-san's advice, I've made the attached patch. Thanks for the patch! ! if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested) Even when standby_mode is not enabled, we can use cascade replication and it needs the accumulated WAL files. So I think that AllowCascadeReplication() should be added into this condition. ! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG"); ! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo); ! ! if (restoredFromArchive) Don't we need to check !StandbyModeRequested and !AllowCascadeReplication() here? ! /* !* If the latest segment is not archival, but there's still a !* RECOVERYXLOG laying about, get rid of it. !*/ ! unlink(recoveryPath); /* ignore any error */ The similar line exists in the lower side of exitArchiveRecovery(), so ISTM that you can refactor that. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER SYSTEM SET command to change postgresql.conf parameters
> I found that the psql tab-completion for ALTER SYSTEM SET has not been > implemented yet. > Attached patch does that. Barring any objections, I will commit this patch. Good catch! Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On 19 December 2013 05:31 Bruce Momjian wrote: > On Wed, Dec 11, 2013 at 10:22:32AM +, Haribabu kommi wrote: > > The make_absolute_path() function moving to port is changed in > similar > > way as Bruce Momjian approach. The psprintf is used to store the > error > > string which Occurred in the function. But psprintf is not used for > > storing the absolute path As because it is giving problems in freeing > the allocated memory in SelectConfigFiles. > > Because the same memory is allocated in a different code branch from > guc_malloc. > > > > After adding the make_absolute_path() function with psprintf stuff in > > path.c file It is giving linking problem in compilation of ecpg. I am > not able to find the problem. > > So I added another file abspath.c in port which contains these two > functions. > > What errors are you seeing? If I move the make_absolute_path function from abspath.c to path.c, I was getting following linking errors while compiling "ecpg". ../../../../src/port/libpgport.a(path.o): In function `make_absolute_path': /home/hari/postgres/src/port/path.c:795: undefined reference to `psprintf' /home/hari/postgres/src/port/path.c:809: undefined reference to `psprintf' /home/hari/postgres/src/port/path.c:818: undefined reference to `psprintf' /home/hari/postgres/src/port/path.c:830: undefined reference to `psprintf' collect2: ld returned 1 exit status make[4]: *** [ecpg] Error 1 make[3]: *** [all-preproc-recurse] Error 2 make[2]: *** [all-ecpg-recurse] Error 2 make[1]: *** [all-interfaces-recurse] Error 2 make: *** [all-src-recurse] Error 2 Regards, Hari babu. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER SYSTEM SET command to change postgresql.conf parameters
On Thu, Dec 19, 2013 at 12:08 PM, Amit Kapila wrote: > On Wed, Dec 18, 2013 at 8:25 PM, Tatsuo Ishii wrote: Is there any reason for the function returns int as it always returns 0 or 1. Maybe returns bool is better? >>> >> >> I have committed your patches. Thanks. > > Thank you very much. I found that the psql tab-completion for ALTER SYSTEM SET has not been implemented yet. Attached patch does that. Barring any objections, I will commit this patch. Regards, -- Fujii Masao *** a/src/bin/psql/tab-complete.c --- b/src/bin/psql/tab-complete.c *** *** 541,546 static const SchemaQuery Query_for_list_of_matviews = { --- 541,552 "SELECT pg_catalog.quote_ident(nspname) FROM pg_catalog.pg_namespace "\ " WHERE substring(pg_catalog.quote_ident(nspname),1,%d)='%s'" + #define Query_for_list_of_alter_system_set_vars \ + "SELECT name FROM "\ + " (SELECT pg_catalog.lower(name) AS name FROM pg_catalog.pg_settings "\ + " WHERE context != 'internal') ss "\ + " WHERE substring(name,1,%d)='%s'" + #define Query_for_list_of_set_vars \ "SELECT name FROM "\ " (SELECT pg_catalog.lower(name) AS name FROM pg_catalog.pg_settings "\ *** *** 930,936 psql_completion(char *text, int start, int end) {"AGGREGATE", "COLLATION", "CONVERSION", "DATABASE", "DEFAULT PRIVILEGES", "DOMAIN", "EXTENSION", "FOREIGN DATA WRAPPER", "FOREIGN TABLE", "FUNCTION", "GROUP", "INDEX", "LANGUAGE", "LARGE OBJECT", "MATERIALIZED VIEW", "OPERATOR", ! "ROLE", "RULE", "SCHEMA", "SERVER", "SEQUENCE", "TABLE", "TABLESPACE", "TEXT SEARCH", "TRIGGER", "TYPE", "USER", "USER MAPPING FOR", "VIEW", NULL}; --- 936,942 {"AGGREGATE", "COLLATION", "CONVERSION", "DATABASE", "DEFAULT PRIVILEGES", "DOMAIN", "EXTENSION", "FOREIGN DATA WRAPPER", "FOREIGN TABLE", "FUNCTION", "GROUP", "INDEX", "LANGUAGE", "LARGE OBJECT", "MATERIALIZED VIEW", "OPERATOR", ! "ROLE", "RULE", "SCHEMA", "SERVER", "SEQUENCE", "SYSTEM SET", "TABLE", "TABLESPACE", "TEXT SEARCH", "TRIGGER", "TYPE", "USER", "USER MAPPING FOR", "VIEW", NULL}; *** *** 1263,1268 psql_completion(char *text, int start, int end) --- 1269,1279 COMPLETE_WITH_LIST(list_ALTER_SERVER); } + /* ALTER SYSTEM SET */ + else if (pg_strcasecmp(prev3_wd, "ALTER") == 0 && + pg_strcasecmp(prev2_wd, "SYSTEM") == 0 && + pg_strcasecmp(prev_wd, "SET") == 0) + COMPLETE_WITH_QUERY(Query_for_list_of_alter_system_set_vars); /* ALTER VIEW */ else if (pg_strcasecmp(prev3_wd, "ALTER") == 0 && pg_strcasecmp(prev2_wd, "VIEW") == 0) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Thu, Dec 12, 2013 at 4:18 PM, Peter Geoghegan wrote: > Both of these revisions have identical ad-hoc test cases included as > new files - see testcase.sh and upsert.sql. My patch doesn't have any > unique constraint violations, and has pretty consistent performance, > while yours has many unique constraint violations. I'd like to hear > your thoughts on the testcase, and the design implications. I withdraw the test-case. Both approaches behave similarly if you look for long enough, and that's okay. I also think that changes to HeapTupleSatisfiesUpdate() are made unnecessary by recent bug fixes to that function. The test case previously described [1] that broke that is no longer recreatable, at least so far. Do you think that we need to throw a serialization failure within ExecLockHeapTupleForUpdateSpec() iff heap_lock_tuple() returns HeapTupleInvisible and IsolationUsesXactSnapshot()? Also, I'm having a hard time figuring out a good choke point to catch MVCC snapshots availing of our special visibility rule where they should not due to IsolationUsesXactSnapshot(). It seems sufficient to continue to assume that Postgres won't attempt to lock any tid invisible under conventional MVCC rules in the first place, except within ExecLockHeapTupleForUpdateSpec(), but what do we actually do within ExecLockHeapTupleForUpdateSpec()? I'm thinking of a new tqual.c routine concerning the tuple being in the future that we re-check when IsolationUsesXactSnapshot(). That's not very modular, though. Maybe we'd go through heapam.c. I think it doesn't matter that what now constitute MVCC snapshots (with the new, special "reach into the future" rule) have that new rule, for the purposes of higher isolation levels, because we'll have a serialization failure within ExecLockHeapTupleForUpdateSpec() before this is allowed to become a problem. In order for the new rule to be relevant, we'd have to be the Xact to lock in the first place, and as an xact in non-read-committed mode, we'd be sure to call the new tqual.c "in the future" routine or whatever. Only upserters can lock a row in the future, so it is the job of upserters to care about this special case. Incidentally, I tried to rebase recently, and saw some shift/reduce conflicts due to 1b4f7f93b4693858cb983af3cd557f6097dab67b, "Allow empty target list in SELECT". The fix for that is not immediately obvious. So I think we should proceed with the non-conclusive-check-first approach (if only on pragmatic grounds), but even now I'm not really sure. I think there might be unprincipled deadlocking should ExecInsertIndexTuples() fail to be completely consistent about its ordering of insertion - the use of dirty snapshots (including as part of conventional !UNIQUE_CHECK_PARTIAL unique index enforcement) plays a part in this risk. Roughly speaking, heap_delete() doesn't render the tuple immediately invisible to some-other-xact's dirty snapshot [2], and I think that could have unpleasant interactions, even if it is also beneficial in some ways. Our old, dead tuples from previous attempts stick around, and function as "value locks" to everyone else, since for example _bt_check_unique() cares about visibility having merely been affected, which is grounds for blocking. More counter-intuitive still, we go ahead with "value locking" (i.e. btree UNIQUE_CHECK_PARTIAL tuple insertion originating from the main speculative ExecInsertIndexTuples() call) even though we already know that we will delete the corresponding heap row (which, as noted, still satisfies HeapTupleSatisfiesDirty() and so is value-lock-like). Empirically, retrying because ExecInsertIndexTuples() returns some recheckIndexes occurs infrequently, so maybe that makes all of this okay. Or maybe it happens infrequently *because* we don't give up on insertion when it looks like the current iteration is futile. Maybe just inserting into every unique index, and then blocking on an xid within ExecCheckIndexConstraints(), works out fairly and performs reasonably in all common cases. It's pretty damn subtle, though, and I worry about the worst case performance, and basic correctness issues for these reasons. The fact that deferred unique indexes also use UNIQUE_CHECK_PARTIAL is cold comfort -- that only ever has to through an error on conflict, and only once. We haven't "earned the right" to lock *all* values in all unique indexes, but kind of do so anyway in the event of an "insertion conflicted after pre-check". Another concern that bears reiterating is: I think making the lock-for-update case work for exclusion constraints is a lot of additional complexity for a very small return. Do you think it's worth optimizing ExecInsertIndexTuples() to avoid futile non-unique/exclusion constrained index tuple insertion? [1] http://www.postgresql.org/message-id/CAM3SWZS2--GOvUmYA2ks_aNyfesb0_H6T95_k8+wyx7Pi=c...@mail.gmail.com [2] https://github.com/postgres/postgres/blob/94b899b829657332bda856ac3f06153d09077bd1/src/backend/utils/
Re: [HACKERS] Logging WAL when updating hintbit
On Wed, Dec 18, 2013 at 11:30 AM, Michael Paquier wrote: > On Wed, Dec 18, 2013 at 11:22 AM, Amit Kapila wrote: >> On Fri, Dec 13, 2013 at 7:57 PM, Heikki Linnakangas >> wrote: >>> Thanks, committed with some minor changes: >> >> Should this patch in CF app be moved to Committed Patches or is there >> something left for this patch? > Nothing has been forgotten for this patch. It can be marked as committed. Thanks for confirmation, I have marked it as Committed. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] row security roadmap proposal
On 12/18/2013 11:14 PM, Robert Haas wrote: > To be clear, I wasn't advocating for a declarative approach; I think > predicates are fine. There are usability issues to worry about either > way, and my concern is that we address those. A declarative approach > would certainly be valuable in that, for those people for whom it is > sufficiently flexible, it's probably quite a lot easier than writing > predicates. But I fear that some people will want a lot more > generality than a label-based system can accommodate. Having spent some time reading the HL7 spec, I now tend to agree that labels alone are not going to be sufficient. There are security decisions made based on: - Security labels - User/group/role memberships, both for accessor and target entity owner (down to the row level) - Black/white list ACLs, with inheritance and deny rules. - One-off or narrow authorizations. "I permit my GP to examine my mental health record details, but only over the last year, and the authorization stands only for today." - Authorization assertions. "I declare that the patient told me it is OK to access these, let me see them." - "Break glass" access. "This is an emergency. Give me access and I'll deal with the consequences later." So while security labels are an important part of the picture I'm forced to agree that they are not sufficient, and that a generalized row security mechanism actually is necessary. We don't have the time or resources to build all these sorts of things individually, and if we did it'd still be too inflexible for somebody. In the end, sometimes I guess there's no replacement for "WHERE call_some_procedure()" In particular, labels are totally orthogonal to entity identity in the data model, and being able to make row access decisions based on information already in the data model is important. FK relationships to owning entities and relationships between entities must be usable for security access policy decisions. So: I'm withdrawing the proposal to go straight to label security; I concede that it's insufficiently expressive to meet all possible needs, and we don't have the time to build anything declarative and user-friendly that would be. I do want to make sure that whatever we include this time around does not create problems for a future label security approach, but that's kind of required by the desire to add SEPostgreSQL RLS down the track anyway. Given that: What are your specific usability concerns about the current approach proposed in KaiGai's patch? My main worry was that it requires the user to build everything manually, and is potentially error prone as a result. To address that we can build convenience features (label security, ACL types and operators, etc) on top of the same infrastructure later - it doesn't have to go in right away. So putting that side, the concerns I have are: - The current RLS design is restricted to one predicate per table with no contextual control over when that predicate applies. That means we can't implement anything like "policy groups" or overlapping sets of predicates, anything like that has to be built by the user. We don't need multiple predicates to start with but I want to make sure there's room for them later, particularly so that different apps / extensions that use row-security don't have to fight with each other. - You can't declare a predicate then apply it to a bunch of tables with consistent security columns. Updating/changing predicates will be a pain. We might be able to get around that by recommending that people use an inlineable SQL function to declare their predicates, but "inlineable" can be hard to pin down sometimes. If not, we need something akin to GRANT ... ALL TABLES IN SCHEMA ... for row security, and ALTER DEFAULT PRIVILEGES ... too. - It's too hard to see what tables have row-security and what impact it has. Needs psql improvements. - There's no way to control whether or not a client can see the row-security policy definition. The other one I want to deal with is secure session variables, but that's mostly a performance optimisation we can add later. What's your list? -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] clang's -Wmissing-variable-declarations shows some shoddy programming
On Sat, Dec 14, 2013 at 04:52:28PM +0100, Andres Freund wrote: > Hi, > > Compiling postgres with said option in CFLAGS really gives an astounding > number of warnings. Except some bison/flex generated ones, none of them > looks acceptable to me. > Most are just file local variables with a missing static and easy to > fix. Several other are actually shared variables, where people simply > haven't bothered to add the variable to a header. Some of them with > comments declaring that fact, others adding longer comments, even others > adding longer comments about that fact. > > I've attached the output of such a compilation run for those without > clang. Now that pg_upgrade has stabilized, I think it is time to centralize all the pg_upgrade_support control variables in a single C include file that can be used by the backend and by pg_upgrade_support. This will eliminate the compiler warnings too. The attached patch accomplishes this. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/contrib/pg_upgrade_support/pg_upgrade_support.c b/contrib/pg_upgrade_support/pg_upgrade_support.c new file mode 100644 index 99e64c4..6e30deb *** a/contrib/pg_upgrade_support/pg_upgrade_support.c --- b/contrib/pg_upgrade_support/pg_upgrade_support.c *** *** 11,16 --- 11,17 #include "postgres.h" + #include "catalog/binary_upgrade.h" #include "catalog/namespace.h" #include "catalog/pg_type.h" #include "commands/extension.h" *** *** 24,40 PG_MODULE_MAGIC; #endif - extern PGDLLIMPORT Oid binary_upgrade_next_pg_type_oid; - extern PGDLLIMPORT Oid binary_upgrade_next_array_pg_type_oid; - extern PGDLLIMPORT Oid binary_upgrade_next_toast_pg_type_oid; - - extern PGDLLIMPORT Oid binary_upgrade_next_heap_pg_class_oid; - extern PGDLLIMPORT Oid binary_upgrade_next_index_pg_class_oid; - extern PGDLLIMPORT Oid binary_upgrade_next_toast_pg_class_oid; - - extern PGDLLIMPORT Oid binary_upgrade_next_pg_enum_oid; - extern PGDLLIMPORT Oid binary_upgrade_next_pg_authid_oid; - Datum set_next_pg_type_oid(PG_FUNCTION_ARGS); Datum set_next_array_pg_type_oid(PG_FUNCTION_ARGS); Datum set_next_toast_pg_type_oid(PG_FUNCTION_ARGS); --- 25,30 diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c new file mode 100644 index 6f2e142..032a20e *** a/src/backend/catalog/heap.c --- b/src/backend/catalog/heap.c *** *** 34,39 --- 34,40 #include "access/sysattr.h" #include "access/transam.h" #include "access/xact.h" + #include "catalog/binary_upgrade.h" #include "catalog/catalog.h" #include "catalog/dependency.h" #include "catalog/heap.h" diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c new file mode 100644 index aa31429..7ad9720 *** a/src/backend/catalog/index.c --- b/src/backend/catalog/index.c *** *** 30,35 --- 30,36 #include "access/visibilitymap.h" #include "access/xact.h" #include "bootstrap/bootstrap.h" + #include "catalog/binary_upgrade.h" #include "catalog/catalog.h" #include "catalog/dependency.h" #include "catalog/heap.h" diff --git a/src/backend/catalog/pg_enum.c b/src/backend/catalog/pg_enum.c new file mode 100644 index 35899b4..23d2a41 *** a/src/backend/catalog/pg_enum.c --- b/src/backend/catalog/pg_enum.c *** *** 17,22 --- 17,23 #include "access/heapam.h" #include "access/htup_details.h" #include "access/xact.h" + #include "catalog/binary_upgrade.h" #include "catalog/catalog.h" #include "catalog/indexing.h" #include "catalog/pg_enum.h" diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c new file mode 100644 index 23ac3dd..634915b *** a/src/backend/catalog/pg_type.c --- b/src/backend/catalog/pg_type.c *** *** 17,22 --- 17,23 #include "access/heapam.h" #include "access/htup_details.h" #include "access/xact.h" + #include "catalog/binary_upgrade.h" #include "catalog/dependency.h" #include "catalog/indexing.h" #include "catalog/objectaccess.h" diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c new file mode 100644 index 385d64d..f58e434 *** a/src/backend/catalog/toasting.c --- b/src/backend/catalog/toasting.c *** *** 16,21 --- 16,22 #include "access/tuptoaster.h" #include "access/xact.h" + #include "catalog/binary_upgrade.h" #include "catalog/dependency.h" #include "catalog/heap.h" #include "catalog/index.h" *** *** 31,38 #include "utils/syscache.h" /* Potentially set by contrib/pg_upgrade_support functions */ - extern Oid binary_upgrade_next_toast_pg_class_oid; - Oid binary_upgrade_next_toast_pg_type_oid = InvalidOid; static bool create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, --- 32,37 diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
Re: [HACKERS] ALTER SYSTEM SET command to change postgresql.conf parameters
On Wed, Dec 18, 2013 at 8:25 PM, Tatsuo Ishii wrote: >>> Is there any reason for the function returns int as it always returns >>> 0 or 1. Maybe returns bool is better? >> > > I have committed your patches. Thanks. Thank you very much. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
On Wed, Dec 18, 2013 at 5:54 PM, Andres Freund wrote: >> if (frz->frzflags & XLH_FREEZE_XVAC) >> + { >> HeapTupleHeaderSetXvac(tuple, FrozenTransactionId); >> + /* If we somehow haven't hinted the tuple previously, do it >> now. */ >> + HeapTupleHeaderSetXminCommitted(tuple); >> + } > > What's the reasoning behind adding HeapTupleHeaderSetXminCommitted() > here? I'm just copying the existing logic. See the final stanza of heap_prepare_freeze_tuple. > [ snip ] Will fix. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autoconf 2.69 update
On Thu, 2013-11-14 at 22:00 -0500, Peter Eisentraut wrote: > I'm proposing that we upgrade our Autoconf to 2.69 This has been done. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewarm status
On Wed, Dec 18, 2013 at 6:07 PM, Cédric Villemain wrote: > In the case of effective_io_concurrency, however, this may not work as well as > expected, IIRC it is used to prefetch heap blocks, hopefully the requested > blocks are contiguous but if there are too much holes it is enough to fill the > ring very quickly (with the current max value of effective_io_concurrency). Yeah, we'd need to figure out how big the ring would need to be for reasonable values of effective_io_concurrency. >> When the prefetch process starts up, it services requests from the >> queue by reading the requested blocks (or block ranges). When the >> queue is empty, it sleeps. If it receives no requests for some period >> of time, it unregisters itself and exits. This is sort of a souped-up >> version of the hibernation facility we already have for some auxiliary >> processes, in that we don't just make the process sleep for a longer >> period of time but actually get rid of it altogether. > > I'm just a bit skeptical about the starting time: backend will ReadBuffer very > soon after requesting the Prefetch... Yeah, absolutely. The first backend that needs a prefetch probably isn't going to get it in time. I think that's OK though. Once the background process is started, response times will be quicker... although possibly still not quick enough. We'd need to benchmark this to determine how quickly the background process can actually service requests. Does anybody have a good self-contained test case that showcases the benefits of prefetching? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
On 12/18/2013 01:02 PM, Alexander Korotkov wrote: My idea for a solution was to modify tuplesort to allow storing the already sorted keys in either memtuples or the sort result file, but setting a field so it does not sort thee already sorted tuples again. This would allow the rescan to work as it used to, but I am unsure how clean or ugly this code would be. Was this something you considered? I'm not sure. I believe that best answer depends on particular parameter: how much memory we've for sort, how expensive is underlying node and how it performs rescan, how big are groups in partial sort. Yes, if one does not need a rescan your solution will use less memory and about the same amount of CPU (if the tuplesort does not spill to disk). While if we keep all the already sorted tuples in the tuplesort rescans will be cheap but more memory will be used with an increased chance of spilling to disk. -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewarm status
Robert Haas escribió: > I'm not inclined to wait for the next CommitFest to commit this, > because it's a very simple patch and has already had a lot more field > testing than most patches get before they're committed. And it's just > a contrib module, so the damage it can do if there is in fact a bug is > pretty limited. All that having been said, any review is appreciated. Looks nice. Some really minor things I noticed while skimming are that you have some weird indentation using spaces in some ereport() calls; there's an extra call to RelationOpenSmgr() in read mode; and the copyright year is 2012. Please use — in sgml instead of plain dashes, and please avoid the "!strcmp()" idiom. I didn't actually try it out ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewarm status
Le mercredi 18 décembre 2013 18:40:09, Robert Haas a écrit : > Now that we have dynamic background workers, I've been thinking that > it might be possible to write a background worker to do asynchronous > prefetch on systems where we don't have OS support. We could store a > small ring buffer in shared memory, say big enough for 1k entries. > Each entry would consist of a relfilenode, a starting block number, > and a block count. We'd also store a flag indicating whether the > prefetch worker has been registered with the postmaster, and a pointer > to the PGPROC of any running worker. When a process wants to do a > prefetch, it locks the buffer, adds its prefetch request to the queue > (overwriting the oldest existing request if the queue is already > full), and checks the flag. If the flag is not set, it also registers > the background worker. Then, it releases the lock and sets the latch > of any running worker (whose PGPROC it remembered before releasing the > lock). Good idea. If the list is full it is probably that the system is busy, I suppose that in such case some alternative behavior can be interesting. Perhaps flush a part of the ring. Oldest entries are the less interesting, we're talking about prefetching after all. In the case of effective_io_concurrency, however, this may not work as well as expected, IIRC it is used to prefetch heap blocks, hopefully the requested blocks are contiguous but if there are too much holes it is enough to fill the ring very quickly (with the current max value of effective_io_concurrency). > When the prefetch process starts up, it services requests from the > queue by reading the requested blocks (or block ranges). When the > queue is empty, it sleeps. If it receives no requests for some period > of time, it unregisters itself and exits. This is sort of a souped-up > version of the hibernation facility we already have for some auxiliary > processes, in that we don't just make the process sleep for a longer > period of time but actually get rid of it altogether. I'm just a bit skeptical about the starting time: backend will ReadBuffer very soon after requesting the Prefetch... > All of this might be overkill; we could also do it with a permanent > auxiliary process. But it's sort of a shame to run an extra process > for a facility that might never get used, or might be used only > rarely. And I'm wary of cluttering things up with a thicket of > auxiliary processes each of which caters only to a very specific, very > narrow situation. Anyway, just thinking out loud here. For windows see the C++ PrefetchVirtualMemory() function. I really wonder if such a bgworker can improve the prefetching on !windows too if ring insert is faster than posix_fadvise call. If this is true, then effective_io_concurrency can be revisited. Maybe Greg Stark already did some benchmark of that... -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On Wed, Dec 11, 2013 at 10:22:32AM +, Haribabu kommi wrote: > The make_absolute_path() function moving to port is changed in similar way as > Bruce Momjian approach. The psprintf is used to store the error string which > Occurred in the function. But psprintf is not used for storing the absolute > path > As because it is giving problems in freeing the allocated memory in > SelectConfigFiles. > Because the same memory is allocated in a different code branch from > guc_malloc. > > After adding the make_absolute_path() function with psprintf stuff in path.c > file > It is giving linking problem in compilation of ecpg. I am not able to find > the problem. > So I added another file abspath.c in port which contains these two functions. What errors are you seeing? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
Josh Berkus wrote: > On 12/18/2013 11:26 AM, Jim Nasby wrote: >> Another possibility is to allow for two different types of >> assertions, one based on SSI and one based on locking. > > The locking version would have to pretty much lock on a table > basis (or even a whole-database basis) every time an assertion > executed, no? As far as I can see, if SSI is *not* used, there needs to be a mutually exclusive lock taken from somewhere inside the COMMIT code until the transaction is complete -- effectively serializing assertion processing for transactions which could affect a given assertion. Locking on tables would, as previously suggested, be very prone to deadlocks on the heavyweight locks. Locking on the assertions in a predictable order seems more promising, especially if there could be some way to only do that if the transaction really might have done something which could affect the truth of the assertion. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_length(anyarray)
On 12/19/13, 12:01 AM, David Johnston wrote: Marko Tiikkaja-4 wrote On 2013-12-18 22:32, Andrew Dunstan wrote: You're not really free to assume it - you'll need an exception handler for the other-than-1 case, or your code might blow up. This seems to be codifying a bad pattern, which should be using array_lower() and array_upper() instead. That's the entire point -- I *want* my code to blow up. If someone passes a multi-dimensional array to a function that assumes its input is one-dimensional and its indexes start from 1, I want it to be obvious that the caller did something wrong. Now I either copy-paste lines and lines of codes to always test for the weird cases or my code breaks in subtle ways. This is no different from an Assert() somewhere -- if the caller breaks the documented interface, it's his problem, not mine. And I don't want to waste my time coding around the fact that this simple thing is so hard to do in PG. 1) Why cannot we just make the second argument of the current function optional and default to 1? That still does the wrong thing for the empty array, multidimensional arrays and arrays that don't start from index 1. 2) How about providing a function that returns the "1-dim/lower=1" input array or raise/exception if the input array does not conform? CREATE FUNCTION array_normal(arr anyarray) RETURNS anyarray $$ begin if (empty(arr)) return arr; if (ndim(arr) > 1) raise exception; if (array_lower() <> 1) raise exception return arr; end; $$ With this, I would still have to do COALESCE(array_length(array_normal($1), 1), 0). That's pretty stupid for the most common use case of arrays, don't you think? I can also see wanting 1-dimensional enforced without having to require the lower-bound to be 1 so maybe a separate function for that. I really don't see the point. How often have you ever created a function that doesn't have a lower bound of 1 on purpose? What good did it serve you? Usage: SELECT array_length(array_normal(input_array)) I could see this being especially useful for a domain and/or column constraint definition and also allowing for a textbook case of separation of concerns. What would array_length() in this case be? With what you suggested above, you would still get NULL for an empty array. I am torn, but mostly opposed, to making an array_length(anyarray) function with these limitations enforced - especially if other similar functions are not created at the same time. I fully agree that array_length(anyarray) should be a valid call without requiring the user to specify ", 1" by rote. I'm specifically asking for something that is different from array_length(anyarray, int), because I personally think it's too full of caveats. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stats for network traffic WIP
On Wed, Dec 18, 2013 at 03:41:24PM -0500, Robert Haas wrote: > On the other hand, there's not much value in adding monitoring > features that are going to materially harm performance, and a lot of > the monitoring features that get proposed die on the vine for exactly > that reason. I think the root of the problem is that our stats > infrastructure is a streaming pile of crap. A number of people have "streaming"? I can't imagine what that looks like. ;-) I think the larger point is that network is only one of many things we need to address, so this needs a holistic approach that looks at all needs and creates infrastructure to address it. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure in base backup code path
Hi Magnus, It looks to me like the path to do_pg_start_backup() outside of a transaction context comes from your initial commit of the base backup facility. The problem is that you're not allowed to do anything leading to a syscache/catcache lookup in those contexts. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] preserving forensic information when we freeze
Hi, On 2013-12-17 16:00:14 -0500, Robert Haas wrote: > @@ -5874,19 +5858,27 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, > TransactionId cutoff_xid, > void > heap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *frz) > { > + tuple->t_infomask = frz->t_infomask; > + tuple->t_infomask2 = frz->t_infomask2; > + > if (frz->frzflags & XLH_FREEZE_XMIN) > - HeapTupleHeaderSetXmin(tuple, FrozenTransactionId); > + HeapTupleHeaderSetXminFrozen(tuple); > > HeapTupleHeaderSetXmax(tuple, frz->xmax); > > if (frz->frzflags & XLH_FREEZE_XVAC) > + { > HeapTupleHeaderSetXvac(tuple, FrozenTransactionId); > + /* If we somehow haven't hinted the tuple previously, do it > now. */ > + HeapTupleHeaderSetXminCommitted(tuple); > + } What's the reasoning behind adding HeapTupleHeaderSetXminCommitted() here? > @@ -823,14 +823,14 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats, >* NB: Like with per-tuple hint bits, > we can't set the >* PD_ALL_VISIBLE flag if the inserter > committed >* asynchronously. See SetHintBits for > more info. Check > - * that the HEAP_XMIN_COMMITTED hint > bit is set because of > - * that. > + * that the tuple is hinted > xmin-committed hint bit because > + * of that. >*/ Looks like there's a "hint bit" too much here. > @@ -65,6 +65,9 @@ manage to be a conflict it would merely mean that one > bit-update would > be lost and need to be done again later. These four bits are only hints > (they cache the results of transaction status lookups in pg_clog), so no > great harm is done if they get reset to zero by conflicting updates. > +Note, however, that a tuple is frozen by setting both HEAP_XMIN_INVALID > +and HEAP_XMIN_COMMITTED; this is a critical update and accordingly requires > +an exclusive buffer lock. I think it'd be appropriate to mention that this needs to be preserved via WAL logging, it sounds like it's suficient to set a hint bit without persistenc guarantees. (not sure if I already wrote this, but whatever) Looking good. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_length(anyarray)
Marko Tiikkaja-4 wrote > On 2013-12-18 22:32, Andrew Dunstan wrote: >> You're not really free to assume it - you'll need an exception handler >> for the other-than-1 case, or your code might blow up. >> >> This seems to be codifying a bad pattern, which should be using >> array_lower() and array_upper() instead. > > That's the entire point -- I *want* my code to blow up. If someone > passes a multi-dimensional array to a function that assumes its input is > one-dimensional and its indexes start from 1, I want it to be obvious > that the caller did something wrong. Now I either copy-paste lines and > lines of codes to always test for the weird cases or my code breaks in > subtle ways. > > This is no different from an Assert() somewhere -- if the caller breaks > the documented interface, it's his problem, not mine. And I don't want > to waste my time coding around the fact that this simple thing is so > hard to do in PG. 1) Why cannot we just make the second argument of the current function optional and default to 1? 2) How about providing a function that returns the "1-dim/lower=1" input array or raise/exception if the input array does not conform? CREATE FUNCTION array_normal(arr anyarray) RETURNS anyarray $$ begin if (empty(arr)) return arr; if (ndim(arr) > 1) raise exception; if (array_lower() <> 1) raise exception return arr; end; $$ I can also see wanting 1-dimensional enforced without having to require the lower-bound to be 1 so maybe a separate function for that. Usage: SELECT array_length(array_normal(input_array)) I could see this being especially useful for a domain and/or column constraint definition and also allowing for a textbook case of separation of concerns. I am torn, but mostly opposed, to making an array_length(anyarray) function with these limitations enforced - especially if other similar functions are not created at the same time. I fully agree that array_length(anyarray) should be a valid call without requiring the user to specify ", 1" by rote. Tangential Question: Is there any way to define a non-1-based array without using array-literal syntax but by using ARRAY[1,2,3] syntax? David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/array-length-anyarray-tp5783950p5783972.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] shared memory message queues
On 2013-12-18 15:23:23 -0500, Robert Haas wrote: > It sounds like most people who have looked at this stuff are broadly > happy with it, so I'd like to push on toward commit soon, but it'd be > helpful, Andres, if you could review the comment additions to > shm-mq-v2.patch and see whether those address your concerns. FWIW, I haven't looked carefully enough at the patch to consider myself having reviewed it. I am not saying that it's not ready for committer, just that I don't know whether it is. I will have a look at the comment improvements, and if you don't beat me to it, give the patch a read-over. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 regression with dbt2
Applying your patch plus adding -fno-omit-frame-pointer, I got 54526.90 notpm. The profile (part) below: # Samples: 610K of event 'cycles' # Event count (approx.): 6686532056450 # # Overhead Command Shared Object Symbol # .. . .. # 4.08%postgres postgres [.] hash_search_with_hash_value | --- hash_search_with_hash_value | --2.87%-- BufTableLookup | --2.86%-- ReadBuffer_common ReadBufferExtended | |--0.86%-- index_fetch_heap | | | --0.62%-- index_getnext | IndexNext | ExecScan | ExecProcNode | |--0.71%-- BitmapHeapNext | ExecScan | ExecProcNode | --0.57%-- _bt_relandgetbuf | --0.57%-- _bt_search 3.75%postgres postgres [.] heap_hot_search_buffer | --- heap_hot_search_buffer | |--1.92%-- BitmapHeapNext | ExecScan | ExecProcNode | | | --1.12%-- standard_ExecutorRun | _SPI_execute_plan | SPI_execute_plan | | | --0.70%-- payment |ExecMakeFunctionResult |ExecProject |ExecResult |ExecProcNode |standard_ExecutorRun |PortalRunSelect |PortalRun |PostgresMain |ServerLoop |PostmasterMain |main |__libc_start_main | --1.74%-- index_fetch_heap | --1.46%-- index_getnext | --1.45%-- IndexNext ExecScan ExecProcNode | --0.96%-- standard_ExecutorRun _SPI_execute_plan SPI_execute_plan | --0.50%-- new_order ExecMakeFunctionResult ExecProject ExecResult ExecProcNode standard_ExecutorRun PortalRunSelect PortalRunFetch PerformPortalFetch standard_ProcessUtility PortalRunUtility FillPortalStore PortalRun PostgresMain ServerLoop PostmasterMain main __libc_start_main 3.15%postgres postgres [.] LWLockAcquire | --- LWLockAcquire | --1.65%-- ReadBuffer_common ReadBufferExtended 2.74%postgres postgres [.] PinBuffer | --- PinBuffer | --2.72%-- ReadBuffer_common ReadBufferExtended | |--0.71%-- index_fetch_heap | | | --0.51%-- index_getnext | IndexNext | ExecScan | ExecProcNode | |--0.67%-- BitmapHeapNext | ExecScan | ExecProcNode | --0.60%
Re: [HACKERS] Extension Templates S03E11
Yeah I think this whole discussion is happening at the wrong level. The problem you're having, despite appearances, is not that people disagree about the best way to accomplish your goals. The problem is that not everyone is convinced your goals are a good idea. Either they just don't understand the goals or they do understand them but don't agree that they're a good idea. Personally I'm in the former category and would welcome a detailed explanation of the goals of the feature and what use cases those goals enable. I think Tom is in the later category and needs a very good argument for why those goals are important enough to outweigh the downsides. I don't think loading shared libraries from RAM or a temp download directory is a *complete* show stopper the way Tom says but it would require a pretty compelling use case to make it worth the difficulties it would cause. -- greg
Re: [HACKERS] array_length(anyarray)
On 2013-12-18 22:32, Andrew Dunstan wrote: You're not really free to assume it - you'll need an exception handler for the other-than-1 case, or your code might blow up. This seems to be codifying a bad pattern, which should be using array_lower() and array_upper() instead. That's the entire point -- I *want* my code to blow up. If someone passes a multi-dimensional array to a function that assumes its input is one-dimensional and its indexes start from 1, I want it to be obvious that the caller did something wrong. Now I either copy-paste lines and lines of codes to always test for the weird cases or my code breaks in subtle ways. This is no different from an Assert() somewhere -- if the caller breaks the documented interface, it's his problem, not mine. And I don't want to waste my time coding around the fact that this simple thing is so hard to do in PG. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_length(anyarray)
On 12/18/2013 04:19 PM, Marko Tiikkaja wrote: On 2013-12-18 22:13, Andrew Dunstan wrote: On 12/18/2013 03:27 PM, Marko Tiikkaja wrote: Attached is a patch to add support for array_length(anyarray), which only works for one-dimensional arrays, returns 0 for empty arrays and complains if the array's lower bound isn't 1. In other words, does the right thing when used with the arrays people use 99% of the time. Why the heck would it complain if the lower bound isn't 1? Because then you're free to assume that the first element is [1] and the last one is [array_length()]. Which is what 99% of the code using array_length(anyarray, int) does anyway. You're not really free to assume it - you'll need an exception handler for the other-than-1 case, or your code might blow up. This seems to be codifying a bad pattern, which should be using array_lower() and array_upper() instead. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_length(anyarray)
On 2013-12-18 22:19, I wrote: On 2013-12-18 22:13, Andrew Dunstan wrote: On 12/18/2013 03:27 PM, Marko Tiikkaja wrote: Attached is a patch to add support for array_length(anyarray), which only works for one-dimensional arrays, returns 0 for empty arrays and complains if the array's lower bound isn't 1. In other words, does the right thing when used with the arrays people use 99% of the time. Why the heck would it complain if the lower bound isn't 1? Because then you're free to assume that the first element is [1] and the last one is [array_length()]. Which is what 99% of the code using array_length(anyarray, int) does anyway. Just to clarify, I mean that array_lower(.., 1) must be 1. Whatever that's called. "The lower bound of the only dimension of the array"? Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_length(anyarray)
On 2013-12-18 22:13, Andrew Dunstan wrote: On 12/18/2013 03:27 PM, Marko Tiikkaja wrote: Attached is a patch to add support for array_length(anyarray), which only works for one-dimensional arrays, returns 0 for empty arrays and complains if the array's lower bound isn't 1. In other words, does the right thing when used with the arrays people use 99% of the time. Why the heck would it complain if the lower bound isn't 1? Because then you're free to assume that the first element is [1] and the last one is [array_length()]. Which is what 99% of the code using array_length(anyarray, int) does anyway. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/18/2013 11:26 AM, Jim Nasby wrote: > The flip-side is that now you can get serialization failures, and I > think there's a ton of software that has no clue how to deal with that. > So now you don't get to use assertions at all unless you re-engineer > your application (but see below). Well, the software will need to deal with an Assertion failure, which I doubt it's prepared to do right now either. >> This is consistent with how we treat the interaction of constraints and >> triggers; under some circumstances, we allow triggers to violate CHECK >> and FK constraints. > > We do? Under what circumstances? AFTER triggers are allowed to ignore constraints sometimes. For example, if you have a tree table with an FK to other rows in the same table, and you have an AFTER trigger on it, the AFTER trigger is allowed to violate the self-FK. That's the one I ran across, but I vaguely remember other cases, and there's some documentation on this in the order of application of triggers in the main docs. > Another possibility is to allow for two different types of assertions, > one based on SSI and one based on locking. The locking version would have to pretty much lock on a table basis (or even a whole-database basis) every time an assertion executed, no? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/18/2013 04:09 PM, Heikki Linnakangas wrote: On 12/18/2013 11:04 PM, Andrew Dunstan wrote: On 12/18/2013 02:45 PM, Andres Freund wrote: On 2013-12-18 16:39:58 -0300, Alvaro Herrera wrote: Andres Freund wrote: It would only force serialization for transactions that modify tables covered by the assert, that doesn't seem to bad. Anything covered by an assert shoulnd't be modified frequently, otherwise you'll run into major performance problems. Well, as presented there is no way (for the system) to tell which tables are covered by an assertion, is there? That's my point. Well, the patch's syntax seems to only allow to directly specify a SQL query to check - we could iterate over the querytree to gather all related tables and reject any function we do not understand. Umm, that's really a major limitation in utility. The query can be "SELECT is_my_assertion_true()", and the function can do anything. OK, but isn't that what Andres is suggesting we reject? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_length(anyarray)
On 12/18/2013 03:27 PM, Marko Tiikkaja wrote: Hi, Attached is a patch to add support for array_length(anyarray), which only works for one-dimensional arrays, returns 0 for empty arrays and complains if the array's lower bound isn't 1. In other words, does the right thing when used with the arrays people use 99% of the time. Why the heck would it complain if the lower bound isn't 1? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/18/2013 11:04 PM, Andrew Dunstan wrote: On 12/18/2013 02:45 PM, Andres Freund wrote: On 2013-12-18 16:39:58 -0300, Alvaro Herrera wrote: Andres Freund wrote: It would only force serialization for transactions that modify tables covered by the assert, that doesn't seem to bad. Anything covered by an assert shoulnd't be modified frequently, otherwise you'll run into major performance problems. Well, as presented there is no way (for the system) to tell which tables are covered by an assertion, is there? That's my point. Well, the patch's syntax seems to only allow to directly specify a SQL query to check - we could iterate over the querytree to gather all related tables and reject any function we do not understand. Umm, that's really a major limitation in utility. The query can be "SELECT is_my_assertion_true()", and the function can do anything. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stats for network traffic WIP
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Dec 18, 2013 at 8:47 AM, Stephen Frost wrote: > > Agreed. My other thought on this is that there's a lot to be said for > > having everything you need available through one tool- kinda like how > > Emacs users rarely go outside of it.. :) And then there's also the > > consideration that DBAs may not have access to the host system at all, > > or not to the level needed to do similar analysis there. > > I completely agree with this, and yet I still think we should reject > the patch, because I think the overhead is going to be intolerable. That's a fair point and I'm fine with rejecting it on the grounds that the overhead is too much. Hopefully that encourages the author to go back and review Tom's comments and consider how the overhead could be reduced or eliminated. We absolutely need better monitoring and I have had many of the same strace-involving conversations. perf is nearly out of the question as it's often not even installed and can be terribly risky (I once had to get a prod box hard-reset after running perf on it for mere moments because it never came back enough to let us do a clean restart). > I think the root of the problem is that our stats > infrastructure is a streaming pile of crap. +1 Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/18/2013 02:45 PM, Andres Freund wrote: On 2013-12-18 16:39:58 -0300, Alvaro Herrera wrote: Andres Freund wrote: It would only force serialization for transactions that modify tables covered by the assert, that doesn't seem to bad. Anything covered by an assert shoulnd't be modified frequently, otherwise you'll run into major performance problems. Well, as presented there is no way (for the system) to tell which tables are covered by an assertion, is there? That's my point. Well, the patch's syntax seems to only allow to directly specify a SQL query to check - we could iterate over the querytree to gather all related tables and reject any function we do not understand. Umm, that's really a major limitation in utility. We need to come up with a better answer than this, which would essentially hobble the facility. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stats for network traffic WIP
On Wed, Dec 18, 2013 at 8:47 AM, Stephen Frost wrote: > Agreed. My other thought on this is that there's a lot to be said for > having everything you need available through one tool- kinda like how > Emacs users rarely go outside of it.. :) And then there's also the > consideration that DBAs may not have access to the host system at all, > or not to the level needed to do similar analysis there. I completely agree with this, and yet I still think we should reject the patch, because I think the overhead is going to be intolerable. Now, the fact is, the monitoring facilities we have in PostgreSQL today are not nearly good enough. Other products do better. I cringe every time I tell someone to attach strace to a long-running autovac process to find out what block number it's currently on, so we can estimate when it will finish; or every time we need data about lwlock contention and the only way to get it is to use perf, or recompile with LWLOCK_STATS defined. These are not fun conversations to have with customers who are in production. On the other hand, there's not much value in adding monitoring features that are going to materially harm performance, and a lot of the monitoring features that get proposed die on the vine for exactly that reason. I think the root of the problem is that our stats infrastructure is a streaming pile of crap. A number of people have worked diligently to improve it and that work has not been fruitless, but the current situation is still not very good. In many ways, this situation reminds me of the situation with EXPLAIN a few years ago. People kept proposing useful extensions to EXPLAIN which we did not adopt because they required creating (and perhaps reserving) far too many keywords. Now that we have the extensible options syntax, EXPLAIN has options for COSTS, BUFFERS, TIMING, and FORMAT, all of which have proven to be worth their weight in code, at least IMHO. I am really not sure what a better infrastructure for stats collection should look like, but I know that until we get one, a lot of monitoring patches that would be really nice to have are going to get shot down because of concerns about performance, and specifically stats file bloat. Fixing that problem figures to be unglamorous, but I'll buy whoever does it a beer (or another beverage of your choice). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
Jim Nasby wrote: > On 12/18/13, 1:42 PM, Kevin Grittner wrote: >> Jim Nasby wrote: >> >>> This is another case where it would be very useful to restrict >>> what relations a transaction (or in this case, a substransaction) >>> can access. If we had the ability to make that restriction then >>> we could force assertions that aren't plain SQL to explicitly >>> specify what tables the assert is going to hit, and if the assert >>> tries to do something different then we throw an error. >>> >>> The ability to restrict object access within a transaction would >>> also benefit VACUUM and possibly the Changeset stuff. >> >> I'm pretty sure that SSI could also optimize based on that, >> although there are probably about 10 other optimizations that would >> be bigger gains before getting to that. > > Any ideas how hard this would be? If we had a list to check against, I think it would be possible to do this during parse analysis and AcquireRewriteLocks(). (One or the other happens before query rewrite.) The hard part seems to me to be defining a sane way to get the list. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] array_length(anyarray)
Hi, Attached is a patch to add support for array_length(anyarray), which only works for one-dimensional arrays, returns 0 for empty arrays and complains if the array's lower bound isn't 1. In other words, does the right thing when used with the arrays people use 99% of the time. I'll add this to the next commit fest, but feel free to discuss it before that. Regards, Marko Tiikkaja *** a/doc/src/sgml/array.sgml --- b/doc/src/sgml/array.sgml *** *** 338,343 SELECT array_length(schedule, 1) FROM sal_emp WHERE name = 'Carol'; --- 338,354 2 (1 row) + + The single-argument overload of array_length can be used + to get the length of a one-dimensional array: + + SELECT array_length(schedule) FROM sal_emp WHERE name = 'Carol'; + + array_length + -- + 2 + (1 row) + *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 11096,11101 SELECT NULLIF(value, '(none)') ... --- 11096,2 + array_length(anyarray) + + + int + returns the length of the array (array must be one-dimensional) + array_length(array[1,2,3]) + 3 + + + + array_lower(anyarray, int) *** a/src/backend/utils/adt/arrayfuncs.c --- b/src/backend/utils/adt/arrayfuncs.c *** *** 1740,1745 array_length(PG_FUNCTION_ARGS) --- 1740,1779 } /* + * array_length_single: + *Returns the length of a single-dimensional array. The array must be + *single-dimensional or empty and its lower bound must be 1. + */ + Datum + array_length_single(PG_FUNCTION_ARGS) + { + ArrayType *v = PG_GETARG_ARRAYTYPE_P(0); + int result; + int*lb; + int*dimv; + + /* empty array */ + if (ARR_NDIM(v) == 0) + PG_RETURN_INT32(0); + + if (ARR_NDIM(v) != 1) + ereport(ERROR, + (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR), +errmsg("input array is not single-dimensional"))); + + lb = ARR_LBOUND(v); + if (lb[0] != 1) + ereport(ERROR, + (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR), +errmsg("lower bound of array is not 1"))); + + dimv = ARR_DIMS(v); + result = dimv[0]; + PG_RETURN_INT32(result); + } + + + /* * array_ref : * This routine takes an array pointer and a subscript array and returns * the referenced item as a Datum. Note that for a pass-by-reference *** a/src/include/catalog/pg_proc.h --- b/src/include/catalog/pg_proc.h *** *** 840,845 DATA(insert OID = 2092 ( array_upper PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 --- 840,847 DESCR("array upper dimension"); DATA(insert OID = 2176 ( array_length PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 23 "2277 23" _null_ _null_ _null_ _null_ array_length _null_ _null_ _null_ )); DESCR("array length"); + DATA(insert OID = 3179 ( array_length PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 23 "2277" _null_ _null_ _null_ _null_ array_length_single _null_ _null_ _null_ )); + DESCR("array length"); DATA(insert OID = 378 ( array_appendPGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2277 "2277 2283" _null_ _null_ _null_ _null_ array_push _null_ _null_ _null_ )); DESCR("append element onto end of array"); DATA(insert OID = 379 ( array_prepend PGNSP PGUID 12 1 0 0 0 f f f f f f i 2 0 2277 "2283 2277" _null_ _null_ _null_ _null_ array_push _null_ _null_ _null_ )); *** a/src/include/utils/array.h --- b/src/include/utils/array.h *** *** 204,209 extern Datum array_dims(PG_FUNCTION_ARGS); --- 204,210 extern Datum array_lower(PG_FUNCTION_ARGS); extern Datum array_upper(PG_FUNCTION_ARGS); extern Datum array_length(PG_FUNCTION_ARGS); + extern Datum array_length_single(PG_FUNCTION_ARGS); extern Datum array_larger(PG_FUNCTION_ARGS); extern Datum array_smaller(PG_FUNCTION_ARGS); extern Datum generate_subscripts(PG_FUNCTION_ARGS); *** a/src/test/regress/expected/arrays.out --- b/src/test/regress/expected/arrays.out *** *** 1455,1460 select array_length(array[[1,2,3], [4,5,6]], 3); --- 1455,1482 (1 row) + select array_length(NULL::int[]); + array_length + -- + + (1 row) + + select array_length(array[1,2,3]); + array_length + -- + 3 + (1 row) + + select array_length('{}'::int[]); + array_length + -- + 0 + (1 row) + + select array_length('[2:4]={5,6,7}'::int[]); + ERROR: lower bound of array is not 1 + select array_length('{{1,2}}'::int[]); + ERROR: input array is not single-dimensional select array_agg(uni
Re: [HACKERS] 9.3 regression with dbt2
Hi, Applied your patch (but not using -fno-omit-frame-pointer). It seems recover the perf loss: 55659.72 notpm. FWIW, the profile is below. I will do a run with the flag.. Samples: 598K of event 'cycles', Event count (approx.): 6568957160059 + 4.03% postgres postgres [.] hash_search_with_hash_value + 3.74% postgres postgres [.] heap_hot_search_buffer + 3.17% postgres postgres [.] LWLockAcquire + 2.92% postgres postgres [.] _bt_compare + 2.64% postgres postgres [.] PinBuffer + 2.62% postgres postgres [.] AllocSetAlloc + 2.34% postgres postgres [.] XLogInsert + 2.27% postgres postgres [.] SearchCatCache + 1.81% postgres postgres [.] HeapTupleSatisfiesMVCC + 1.52% postgres postgres [.] ExecInitExpr + 1.45% postgres postgres [.] heap_page_prune_opt + 1.41%swapper [kernel.kallsyms] [k] intel_idle + 1.30% postgres postgres [.] LWLockRelease + 1.12% postgres postgres [.] heapgetpage + 0.96% postgres libc-2.17.so [.] _int_malloc + 0.86% postgres libc-2.17.so [.] __memcpy_ssse3_back + 0.84% postgres postgres [.] hash_any + 0.81% postgres postgres [.] MemoryContextAllocZeroAligned + 0.76% postgres postgres [.] XidInMVCCSnapshot + 0.74% postgres postgres [.] _bt_checkkeys + 0.70% postgres postgres [.] fmgr_info_cxt_security + 0.70% postgres postgres [.] FunctionCall2Coll + 0.66% postgres libc-2.17.so [.] __strncpy_sse2_unaligned + 0.63% postgres postgres [.] tbm_iterate + 0.58% postgres postgres [.] base_yyparse + 0.58% postgres libc-2.17.so [.] __printf_fp + 0.55% postgres postgres [.] palloc + 0.51% postgres postgres [.] TransactionIdPrecedes + 0.51% postgres postgres [.] slot_deform_tuple + 0.50% postgres postgres [.] ReadBuffer_common
Re: [HACKERS] SQL objects UNITs
On 12/18/13, 4:22 AM, Dimitri Fontaine wrote: ALTER UNIT name SET SCHEMA ; FWIW, with the "units" that we've developed we use schemas to differentiate between public objects and "internal" (private or protected) objects. So single-schema stuff becomes a PITA. Of course, since extensions already work that way I suppose that ship has sailed, but I thought I'd mention it. For those who are curious... we make the distinction of public/protected/private via schemas because we don't want general users to need to wade through that stuff when looking at objects. So the convention we settled on is that public objects go in one schema, protected objects go in a schema of the same name that's prepended with "_", and private objects are in the protjected schema but also prepend "_" to their names. IE: CREATE SCHEMA awesome_feature; CREATE VIEW awesome_feature.have_some_data CREATE SCHEMA _awesome_feature; -- Protected / private stuff CREATE VIEW _awesome_feature.stuff_for_database_code_to_see_but_not_users CREATE FUNCTION _awesome_feature._do_not_run_this_function_anywhere_outside_of_awesome_feature() -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 regression with dbt2
HI On Wed, Dec 18, 2013 at 3:03 PM, Andres Freund wrote: > > That looks like a postgres compiled without > -fno-omit-frame-pointer. Without that hierarchical profiles are > meaningless. Very new perfs can also do it using dwarf, but it's unusabl > slow... > > Let me complete current test without the flag (to serve as a checkpoint as well). I will do a run with the flag after. Thanks, Dong
Re: [HACKERS] 9.3 regression with dbt2
Hi, On 2013-12-18 14:59:45 -0500, Dong Ye wrote: > ~20 minutes each run with binary. > Try your patch now.. > You are right I used -g in perf record. But what I reported was flat (meant > as a start). > > Expand GetMultiXactIdMembers: > > 3.82% postgres postgres [.] > GetMultiXactIdMembers That looks like a postgres compiled without -fno-omit-frame-pointer. Without that hierarchical profiles are meaningless. Very new perfs can also do it using dwarf, but it's unusabl slow... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 regression with dbt2
~20 minutes each run with binary. Try your patch now.. You are right I used -g in perf record. But what I reported was flat (meant as a start). Expand GetMultiXactIdMembers: 3.82% postgres postgres [.] GetMultiXactIdMembers | |--9.09%-- GetMultiXactIdMembers | |--0.84%-- 0x48fb894853f58948 | | | |--0.74%-- 0x4296e0004296c | | GetMultiXactIdMembers | | | |--0.64%-- 0x52f8d00052f8d | | GetMultiXactIdMembers | | | |--0.64%-- 0xf6ce8000f6ce8 | | GetMultiXactIdMembers | | | |--0.62%-- 0x41de300041de1 | | GetMultiXactIdMembers | | | |--0.61%-- 0xf2c77000f2c71 | | GetMultiXactIdMembers | | | |--0.60%-- 0x3127700031275 | | GetMultiXactIdMembers | | | |--0.59%-- 0x10c98b0010c987 | | GetMultiXactIdMembers | | | |--0.59%-- 0x31df31df0 | | GetMultiXactIdMembers | | | |--0.59%-- 0xbefbd000befbd | | GetMultiXactIdMembers | | | |--0.58%-- 0xfe97c000fe976 | | GetMultiXactIdMembers | | | |--0.58%-- 0x82501000824f9 | | GetMultiXactIdMembers | | | |--0.58%-- 0x3a4410003a43c | | GetMultiXactIdMembers | | | |--0.58%-- 0x3b0cf0003b0c3 | | GetMultiXactIdMembers | | | |--0.58%-- 0x5325f0005325b | | GetMultiXactIdMembers | | | |--0.58%-- 0x7b6b80007b6b8 | | GetMultiXactIdMembers | | | |--0.57%-- 0x52e9b00052e9b | | GetMultiXactIdMembers | | | |--0.57%-- 0xf3d45000f3d40 | | GetMultiXactIdMembers | | | |--0.57%-- 0x27afd00027afa | | GetMultiXactIdMembers | | | |--0.57%-- 0x3244d0003244d | | GetMultiXactIdMembers | | | |--0.56%-- 0x53e0d00053e06 | | GetMultiXactIdMembers | | | |--0.56%-- 0xb64c6000b64bc | | GetMultiXactIdMembers | | | |--0.56%-- 0x423f1000423ef | | GetMultiXactIdMembers | | | |--0.56%-- 0xc18f2000c18ed | | GetMultiXactIdMembers | | | |--0.56%-- 0x6bdcf0006bdcd | | GetMultiXactIdMembers | | | |--0.55%-- 0xc6d25000c6d25 | | GetMultiXactIdMembers | | | |--0.55%-- 0xf6534000f6534 | | GetMultiXactIdMembers | | | |--0.55%-- 0x10bba80010bba0 | | GetMultiXactIdMembers | | | |--0.55%-- 0xb5a76000b5a6e | | GetMultiXactIdMembers | | | |--0.55%-- 0x2d3c10002d3b5 | | GetMultiXactIdMembers | | | |--0.55%-- 0xcc095000cc095 | | GetMultiXactIdMembers | |
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/18/13, 1:42 PM, Kevin Grittner wrote: Jim Nasby wrote: This is another case where it would be very useful to restrict what relations a transaction (or in this case, a substransaction) can access. If we had the ability to make that restriction then we could force assertions that aren't plain SQL to explicitly specify what tables the assert is going to hit, and if the assert tries to do something different then we throw an error. The ability to restrict object access within a transaction would also benefit VACUUM and possibly the Changeset stuff. I'm pretty sure that SSI could also optimize based on that, although there are probably about 10 other optimizations that would be bigger gains before getting to that. Any ideas how hard this would be? My thought is that we might be able to perform this check in the functions that do catalog lookups, but I'm worried that that wouldn't allow us to support subtransaction checks (which we'd need for assertions), and it runs the risk of long-lasting object references spanning the transaction (or subtransaction) and thereby thwarting the check. Another option would be in heap accessor functions, but that means we could only restrict access to tables. For assertions, it would be nice to also disallow access to functions that could have unintended consequences that could violate the assertion (like dblink). -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
On 2013-12-18 16:39:58 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > It would only force serialization for transactions that modify tables > > covered by the assert, that doesn't seem to bad. Anything covered by an > > assert shoulnd't be modified frequently, otherwise you'll run into major > > performance problems. > > Well, as presented there is no way (for the system) to tell which tables > are covered by an assertion, is there? That's my point. Well, the patch's syntax seems to only allow to directly specify a SQL query to check - we could iterate over the querytree to gather all related tables and reject any function we do not understand. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
Jim Nasby wrote: > This is another case where it would be very useful to restrict > what relations a transaction (or in this case, a substransaction) > can access. If we had the ability to make that restriction then > we could force assertions that aren't plain SQL to explicitly > specify what tables the assert is going to hit, and if the assert > tries to do something different then we throw an error. > > The ability to restrict object access within a transaction would > also benefit VACUUM and possibly the Changeset stuff. I'm pretty sure that SSI could also optimize based on that, although there are probably about 10 other optimizations that would be bigger gains before getting to that. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
Andres Freund wrote: > On 2013-12-18 13:44:15 -0300, Alvaro Herrera wrote: > > Heikki Linnakangas wrote: > > > > > Ah, I see. You don't need to block anyone else from modifying the > > > table, you just need to block anyone else from committing a > > > transaction that had modified the table. So the locks shouldn't > > > interfere with regular table locks. A ShareUpdateExclusiveLock on > > > the assertion should do it. > > > > Causing serialization of transaction commit just because a single > > assertion exists in the database seems too much of a hit, so looking for > > optimization opportunities seems appropriate. > > It would only force serialization for transactions that modify tables > covered by the assert, that doesn't seem to bad. Anything covered by an > assert shoulnd't be modified frequently, otherwise you'll run into major > performance problems. Well, as presented there is no way (for the system) to tell which tables are covered by an assertion, is there? That's my point. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/18/13, 10:44 AM, Alvaro Herrera wrote: It might prove useful to note that any given assertion involves tables A, B, C. If a transaction doesn't modify any of those tables then there's no need to run the assertion when the transaction commits, skipping acquisition of the assertion lock. Probably this can only be implemented for SQL-language assertions, but even so it might be worth considering. (Assertions in any other language would be checked by every transaction.) This is another case where it would be very useful to restrict what relations a transaction (or in this case, a substransaction) can access. If we had the ability to make that restriction then we could force assertions that aren't plain SQL to explicitly specify what tables the assert is going to hit, and if the assert tries to do something different then we throw an error. The ability to restrict object access within a transaction would also benefit VACUUM and possibly the Changeset stuff. From http://www.postgresql.org/message-id/52acaafd.6060...@nasby.net: "This would be useful when you have some tables that see very frequent updates/deletes in a database that also has to support long-running transactions that don't hit those tables. You'd explicitly limit the tables your long-running transaction will touch and that way vacuum can ignore the long-running XID when calculating minimum tuple age for the heavy-hit tables." -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/18/13, 11:57 AM, Josh Berkus wrote: On 12/18/2013 08:44 AM, Alvaro Herrera wrote: Another thought: at the initial run of the assertion, note which tables it locked, and record this as an OID array in the catalog row for the assertion; consider running the assertion only when those tables are touched. This doesn't work if the assertion code locks some tables when run under certain conditions and other tables under different conditions. But then this can be checked too: if an assertion lists in its catalog row that it involves tables A, B, C and then, under different conditions, it tries to acquire lock on table D, have the whole thing fail indicating that the assertion is misdeclared. This sounds like you're re-inventing SSI. SERIALIZABLE mode *exists* in order to be able to enforce constraints which potentially involve more than one transaction. "Balance can never go below 0", for example. The whole reason we have this really cool and unique SSI mode is so that we can do such things without killing performance. These sorts of requirements are ideally suited to Assertions, so it's logically consistent to require Serializable mode in order to use Assertions. The flip-side is that now you can get serialization failures, and I think there's a ton of software that has no clue how to deal with that. So now you don't get to use assertions at all unless you re-engineer your application (but see below). Now, if Postgres could re-attempt serialization failures, maybe that would become a non-issue... (though, there's probably a lot of dangers in doing that...) This is consistent with how we treat the interaction of constraints and triggers; under some circumstances, we allow triggers to violate CHECK and FK constraints. We do? Under what circumstances? Alternately, we add a GUC assertion_serializable_mode, which can be "off", "warn" or "error". If it's set to "error", and the user triggers an assertion while in READ COMMITTED mode, an exception occurs. If it's set to "off", then assertions are disabled, in order to deal with buggy assertions. Now, it would be even better if we could prevent users from switching transaction mode, but that's a MUCH bigger and more complicated patch. Another possibility is to allow for two different types of assertions, one based on SSI and one based on locking. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
On 2013-12-18 13:44:15 -0300, Alvaro Herrera wrote: > Heikki Linnakangas wrote: > > > Ah, I see. You don't need to block anyone else from modifying the > > table, you just need to block anyone else from committing a > > transaction that had modified the table. So the locks shouldn't > > interfere with regular table locks. A ShareUpdateExclusiveLock on > > the assertion should do it. > > Causing serialization of transaction commit just because a single > assertion exists in the database seems too much of a hit, so looking for > optimization opportunities seems appropriate. It would only force serialization for transactions that modify tables covered by the assert, that doesn't seem to bad. Anything covered by an assert shoulnd't be modified frequently, otherwise you'll run into major performance problems. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 regression with dbt2
Hello, On 2013-12-18 10:24:56 -0800, Dong Ye wrote: > It seems that 0ac5ad5134f2769ccbaefec73844f8504c4d6182 is the culprit > commit. How long does a run take to verify the problem? Could you retry with the patch attached to http://www.postgresql.org/message-id/20131201114514.gg18...@alap2.anarazel.de ? Based on the theory that it creates many superflous multixacts. > Flat perf profiles of two such runs look like: Those aren't really flat profiles tho ;) > 0ac: > > Samples: 706K of event 'cycles', Event count (approx.): 6690377376522 > > > + 3.82% postgres postgres [.] > GetMultiXactIdMembers Could you expland that one some levels, so we see the callers? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.3 regression with dbt2
Hi, We recently observed ~15% performance regression with dbt2 from PG 9.3. We narrowed down on testing master between 9.2 cut and 9.3 cut. It seems that 0ac5ad5134f2769ccbaefec73844f8504c4d6182 is the culprit commit. We did several runs and perf profiling comparing it against its parent (f925c79b9f36c54b67053ade5ad225a75b8dc803). Also tested a 12/16 commit on the master (3b97e6823b949624afdc3ce4c92b29a80429715f) once, it performed similarly as 0ac.. Regards, Dong Results: f92: 53k-56k'ish notpm 0ac: 47k-48k'ish notpm Server SUT: HP ML350 G6 Two Xeon E5520 (4c/p, 8 cores total, hyper-threading disabled) 12GB DRAM HP P410i RAID controller (256MB battery-backed cache) - three 10k-rpm SAS: / - three 10k-rpm SAS: /pgdata - one 15k-rpm SAS: /pgxlog - ext4 (rw,relatime,data=ordered) on all mounts. Fedora 19 (3.11.10-200.fc19.x86_64) max_connections=100 shared_buffers=8192MB effective_cache_size=10GB temp_buffers=8186kB work_mem=4093kB maintenance_work_mem=399MB wal_buffers=-1 checkpoint_segments=300 checkpoint_completion_target=0.9 logging_collector=on log_timezone=UTC datestyle='iso, mdy' lc_messages=C lc_monetary=C lc_numeric=C lc_time=C default_text_search_config='pg_catalog.english' listen_addresses='*' log_destination=csvlog log_directory=pg_log log_filename='pg-%a' log_rotation_age=1440 log_truncate_on_rotation=on Client and workload: Dell 390. Two core. Direct connect with the Server SUT. dbt2 (ToT) 40 warehouse 8 terminals, 8 connections zero think/key time 12-min run Flat perf profiles of two such runs look like: f92: Samples: 608K of event 'cycles', Event count (approx.): 6679607097416 + 4.04% postgres postgres [.] heap_hot_search_buffer + 3.63% postgres postgres [.] AllocSetAlloc + 3.37% postgres postgres [.] hash_search_with_hash_value + 2.85% postgres postgres [.] _bt_compare + 2.67% postgres postgres [.] SearchCatCache + 2.46% postgres postgres [.] LWLockAcquire + 2.16% postgres postgres [.] XLogInsert + 2.08% postgres postgres [.] PinBuffer + 1.32% postgres postgres [.] ExecInitExpr + 1.31% postgres libc-2.17.so [.] _int_malloc + 1.29%swapper [kernel.kallsyms] [k] intel_idle + 1.23% postgres postgres [.] MemoryContextAllocZeroAligned + 1.13% postgres postgres [.] heap_page_prune_opt + 1.06% postgres libc-2.17.so [.] __memcpy_ssse3_back + 1.02% postgres postgres [.] LWLockRelease + 0.94% postgres postgres [.] copyObject + 0.89% postgres postgres [.] fmgr_info_cxt_security + 0.82% postgres postgres [.] _bt_checkkeys + 0.81% postgres postgres [.] hash_any + 0.73% postgres postgres [.] FunctionCall2Coll + 0.69% postgres libc-2.17.so [.] __strncpy_sse2_unaligned + 0.67% postgres postgres [.] HeapTupleSatisfiesMVCC + 0.66% postgres postgres [.] MemoryContextAlloc + 0.65% postgres postgres [.] expression_tree_walker + 0.59% postgres postgres [.] check_stack_depth + 0.57% postgres libc-2.17.so [.] __printf_fp + 0.56% postgres libc-2.17.so [.] _int_free + 0.52% postgres postgres [.] base_yyparse 0ac: Samples: 706K of event 'cycles', Event count (approx.): 6690377376522 + 3.82% postgres postgres [.] GetMultiXactIdMembers + 3.43% postgres postgres [.] LWLockAcquire + 3.31% postgres postgres [.] hash_search_with_hash_value + 3.09% postgres postgres [.] heap_hot_search_buffer + 3.00% postgres postgres [.] AllocSetAlloc + 2.56% postgres postgres [.] _bt_compare + 2.19% postgres postgres [.] PinBuffer + 2.13% postgres postgres [.] SearchCatCache + 1.99% postgres postgres [.] XLo
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/18/2013 08:44 AM, Alvaro Herrera wrote: > Another thought: at the initial run of the assertion, note which tables > it locked, and record this as an OID array in the catalog row for the > assertion; consider running the assertion only when those tables are > touched. This doesn't work if the assertion code locks some tables when > run under certain conditions and other tables under different > conditions. But then this can be checked too: if an assertion lists in > its catalog row that it involves tables A, B, C and then, under > different conditions, it tries to acquire lock on table D, have the > whole thing fail indicating that the assertion is misdeclared. This sounds like you're re-inventing SSI. SERIALIZABLE mode *exists* in order to be able to enforce constraints which potentially involve more than one transaction. "Balance can never go below 0", for example. The whole reason we have this really cool and unique SSI mode is so that we can do such things without killing performance. These sorts of requirements are ideally suited to Assertions, so it's logically consistent to require Serializable mode in order to use Assertions. I'm leaning towards the alternative that Assertions require SERIALIZABLE mode, and throw a WARNING at the user and the log every time we create, modify, or trigger an assertion while not in SERIALIZABLE mode. And beyond, that, we don't guarantee the integrity of Assertions if people choose to run in READ COMMITTED anyway. This is consistent with how we treat the interaction of constraints and triggers; under some circumstances, we allow triggers to violate CHECK and FK constraints. Alternately, we add a GUC assertion_serializable_mode, which can be "off", "warn" or "error". If it's set to "error", and the user triggers an assertion while in READ COMMITTED mode, an exception occurs. If it's set to "off", then assertions are disabled, in order to deal with buggy assertions. Now, it would be even better if we could prevent users from switching transaction mode, but that's a MUCH bigger and more complicated patch. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewarm status
On Tue, Dec 17, 2013 at 7:05 PM, Cédric Villemain wrote: > Le mardi 17 décembre 2013 21:14:44, Josh Berkus a écrit : >> On 12/17/2013 06:34 AM, Robert Haas wrote: >> > On Tue, Dec 17, 2013 at 12:09 AM, Amit Kapila > wrote: >> >> I have used pg_prewarm during some of work related to Buffer Management >> >> and other performance related work. It is quite useful utility. >> >> +1 for reviving this patch for 9.4 >> > >> > Any other votes? >> >> I still support this patch (as I did originally), and don't think that >> the overlap with pgFincore is of any consequence. pgFincore does more >> than pgrewarm ever will, but it's also platform-specific, so it still >> makes sense for both to exist. > > Just for information, pgFincore is NOT limited to linux (the most interesting > part, the memory snapshot, works also on BSD based kernels with mincore() > syscall). > Like for the PostgreSQL effective_io_concurrency (and pg_warm) it just doesn't > work when posix_fadvise is not available. This is a fair point, and I should not have implied that it was Linux-only. What I really meant was "does not support Windows", because that's a really big part of our user base. However, I shouldn't have phrased it in a way that slights BSD and other UNIX variants. Now that we have dynamic background workers, I've been thinking that it might be possible to write a background worker to do asynchronous prefetch on systems where we don't have OS support. We could store a small ring buffer in shared memory, say big enough for 1k entries. Each entry would consist of a relfilenode, a starting block number, and a block count. We'd also store a flag indicating whether the prefetch worker has been registered with the postmaster, and a pointer to the PGPROC of any running worker. When a process wants to do a prefetch, it locks the buffer, adds its prefetch request to the queue (overwriting the oldest existing request if the queue is already full), and checks the flag. If the flag is not set, it also registers the background worker. Then, it releases the lock and sets the latch of any running worker (whose PGPROC it remembered before releasing the lock). When the prefetch process starts up, it services requests from the queue by reading the requested blocks (or block ranges). When the queue is empty, it sleeps. If it receives no requests for some period of time, it unregisters itself and exits. This is sort of a souped-up version of the hibernation facility we already have for some auxiliary processes, in that we don't just make the process sleep for a longer period of time but actually get rid of it altogether. All of this might be overkill; we could also do it with a permanent auxiliary process. But it's sort of a shame to run an extra process for a facility that might never get used, or might be used only rarely. And I'm wary of cluttering things up with a thicket of auxiliary processes each of which caters only to a very specific, very narrow situation. Anyway, just thinking out loud here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Tue, Dec 10, 2013 at 6:26 PM, Tom Lane wrote: > Noah Misch writes: >> On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote: >>> Let's not add more cases like that, if we can avoid it. > >> Only if we can avoid it for a modicum of effort and feature compromise. >> You're asking for PostgreSQL to reshape its use of persistent resources so >> you >> can throw around "killall -9 postgres; rm -rf $PGDATA" without so much as a >> memory leak. That use case, not PostgreSQL, has the defect here. > > The larger point is that such a shutdown process has never in the history > of Postgres been successful at removing shared-memory (or semaphore) > resources. I do not feel a need to put a higher recoverability standard > onto the DSM code than what we've had for fifteen years with SysV shmem. > > But, by the same token, it shouldn't be any *less* recoverable. In this > context that means that starting a new postmaster should recover the > resources, at least 90% of the time (100% if we still have the old > postmaster lockfile). I think the idea of keeping enough info in the > SysV segment to permit recovery of DSM resources is a good one. Then, > any case where the existing logic is able to free the old SysV segment > is guaranteed to also work for DSM. So I'm taking a look at this. There doesn't appear to be anything intrinsically intractable here, but it seems important to time the order of operations so as to minimize the chances that things fail in awkward places. The point where we remove the old shared memory segment from the previous postmaster invocation is here: /* * The segment appears to be from a dead Postgres process, or from a * previous cycle of life in this same process. Zap it, if possible. * This probably shouldn't fail, but if it does, assume the segment * belongs to someone else after all, and continue quietly. */ shmdt(memAddress); if (shmctl(shmid, IPC_RMID, NULL) < 0) continue; My first thought was to remember the control segment ID from the header just before the shmdt() there, and then later when the DSM module is starting, do cleanup. But that doesn't seem like a good idea, because then if startup fails after we remove the old shared memory segment and before we get to the DSM initialization code, we'll have lost the information on what control segment needs to be cleaned up. A subsequent startup attempt won't see the old shm again, because it's already gone. I'm fairly sure that this would be a net reduction in reliability vs. the status quo. So now what I'm thinking is that we ought to actually perform the DSM cleanup before detaching the old segment and trying to remove it. That shouldn't fail, but if it does, or if we get killed before completing it, the next run will hopefully find the same old shm and finish the cleanup. But that kind of flies in the face of the comment above: if we perform DSM cleanup and then discover that the segment wasn't ours after all, that means we just stomped all over somebody else's stuff. That's not too good. But trying to remove the segment first and then perform the cleanup creates a window where, if we get killed, the next restart will have lost information about how to finish cleaning up. So it seems that some kind of logic tweak is needed here, but I'm not sure exactly what. As I see it, the options are: 1. Make failure to remove the shared memory segment we thought was ours an error. This will definitely show up any problems, but only after we've burned down some other processes's dynamic shared memory segments. The most likely outcome is creating failure-to-start problems that don't exist today. 2. Make it a warning. We'll still burn down somebody else's DSMs, but at least we'll still start ourselves. Sadly, "WARNING: You have just done a bad thing. It's too late to fix it. Sorry!" is not very appealing. 3. Remove the old shared memory segment first, then perform the cleanup immediately afterwards. If we get killed before completing the cleanup, we'll leak the un-cleaned-up stuff. Decide that's OK and move on. 4. Adopt some sort of belt-and-suspenders approach, keeping the state file we have now and backstopping it with this mechanism, so that we really only need this to work when $PGDATA has been blown away and recreated. This seems pretty inelegant, and I'm not sure who it'll benefit other than those (few?) people who kill -9 the postmaster (or it segfaults or otherwise dies without running the code to remove shared memory segments) and then remove $PGDATA and then re-initdb and then expect cleanup to find the old DSMs. 5. Give up on this approach. We could keep what we have now, or make the DSM control segment land at a well-known address as we do for the main segment. What do people prefer? The other thing I'm a bit squidgy about is injecting more code that runs before we get the new shared memory segment up and r
Re: [HACKERS] pg_rewarm status
Le mardi 17 décembre 2013 21:14:44, Josh Berkus a écrit : > On 12/17/2013 06:34 AM, Robert Haas wrote: > > On Tue, Dec 17, 2013 at 12:09 AM, Amit Kapila wrote: > >> I have used pg_prewarm during some of work related to Buffer Management > >> and other performance related work. It is quite useful utility. > >> +1 for reviving this patch for 9.4 > > > > Any other votes? > > I still support this patch (as I did originally), and don't think that > the overlap with pgFincore is of any consequence. pgFincore does more > than pgrewarm ever will, but it's also platform-specific, so it still > makes sense for both to exist. Just for information, pgFincore is NOT limited to linux (the most interesting part, the memory snapshot, works also on BSD based kernels with mincore() syscall). Like for the PostgreSQL effective_io_concurrency (and pg_warm) it just doesn't work when posix_fadvise is not available. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] pg_rewarm status
Le mardi 17 décembre 2013 17:45:51, Robert Haas a écrit : > On Tue, Dec 17, 2013 at 11:02 AM, Jim Nasby wrote: > > On 12/17/13, 8:34 AM, Robert Haas wrote: > >> On Tue, Dec 17, 2013 at 12:09 AM, Amit Kapila > >> > >> wrote: > >>> I have used pg_prewarm during some of work related to Buffer Management > >>> and > >>> other performance related work. It is quite useful utility. > >>> +1 for reviving this patch for 9.4 > >> > >> Any other votes? > > > > We've had to manually code something that runs EXPLAIN ANALYZE SELECT * > > from a bunch of tables to warm our caches after a restart, but there's > > numerous flaws to that approach obviously. > > > > Unfortunately, what we really need to warm isn't the PG buffers, it's the > > FS cache, which I suspect this won't help. But I still see where just > > pg_buffers would be useful for a lot of folks, so +1. > > It'll do either one. For the FS cache, on Linux, you can also use > pgfincore. on Linux, *BSD (including OS X). like what's in postgresql. Only Windows is out of scope so far. and there is a solution for windows too, there is just no requirement from pgfincore users. Maybe you can add the windows support in PostgreSQL now ? -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] RFC: Async query processing
On 11/04/2013 02:51 AM, Claudio Freire wrote: On Sun, Nov 3, 2013 at 3:58 PM, Florian Weimer wrote: I would like to add truly asynchronous query processing to libpq, enabling command pipelining. The idea is to to allow applications to auto-tune to the bandwidth-delay product and reduce the number of context switches when running against a local server. ... If the application is not interested in intermediate query results, it would use something like this: ... If there is no need to exit from the loop early (say, because errors are expected to be extremely rare), the PQgetResultNoWait call can be left out. It doesn't seem wise to me making such a distinction. It sounds like you're oversimplifying, and that's why you need "modes", to overcome the evidently restrictive limits of the simplified interface, and that it would only be a matter of (a short) time when some other limitation requires some other mode. I need modes because I want to avoid unbound buffering, which means that result data has to be consumed in the order queries are issued. PGAsyncMode oldMode = PQsetsendAsyncMode(conn, PQASYNC_RESULT); bool more_data; do { more_data = ...; if (more_data) { int ret = PQsendQueryParams(conn, "INSERT ... RETURNING ...", ...); if (ret == 0) { // handle low-level error } } // Consume all pending results. while (1) { PGresult *res; if (more_data) { res = PQgetResultNoWait(conn); } else { res = PQgetResult(conn); } Somehow, that code looks backwards. I mean, really backwards. Wouldn't that be !more_data? No, if more data is available to transfer to the server, the no-wait variant has to be used to avoid a needless synchronization with the server. In any case, pipelining like that, without a clear distinction, in the wire protocol, of which results pertain to which query, could be a recipe for trouble when subtle bugs, either in lib usage or implementation, mistakenly treat one query's result as another's. We already use pipelining in libpq (see pqFlush, PQsendQueryGuts and pqParseInput3), the server is supposed to support it, and there is a lack of a clear tit-for-tat response mechanism anyway because of NOTIFY/LISTEN and the way certain errors are reported. Instead of buffering the results, we could buffer the encoded command messages in PQASYNC_RESULT mode. This means that PQsendQueryParams would not block when it cannot send the (complete) command message, but store in the connection object so that the subsequent PQgetResultNoWait and PQgetResult would send it. This might work better with single-tuple result mode. We cannot avoid buffering either multiple queries or multiple responses if we want to utilize the link bandwidth, or we'd risk deadlocks. This is a non-solution. Such an implementation, at least as described, would not remove neither network latency nor context switches, it would be a purely API change with no externally visible behavior change. Ugh, why? An effective solution must include multi-command packets. Without knowing the wire protocol in detail, something like: PARSE: INSERT blah BIND: args EXECUTE with DISCARD PARSE: INSERT blah BIND: args EXECUTE with DISCARD PARSE: SELECT blah BIND: args EXECUTE with FETCH ALL All in one packet, would be efficient and error-free (IMO). No, because this doesn't scale automatically with the bandwidth-delay product. It also requires that the client buffers queries and their parameters even though the network has to do that anyway. In any case, I don't want to change the wire protocol, I just want to enable libpq clients to use more of its capabilities. -- Florian Weimer / Red Hat Product Security Team -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 reference constraint regression
Alvaro Herrera wrote: > Andres Freund wrote: > > > I have to say, it makes me really uncomfortable to take such > > shortcuts. In other branches we are doing liveliness checks with > > MultiXactIdIsRunning() et al. Why isn't that necessary here? And how > > likely is that this won't cause breakage somewhere down the line because > > somebody doesn't know of that subtlety? > > I came up with the attached last night, which should do the right thing > in both cases. That one had a silly bug, which I fixed and pushed now. Thanks for the feedback, -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
Heikki Linnakangas wrote: > Ah, I see. You don't need to block anyone else from modifying the > table, you just need to block anyone else from committing a > transaction that had modified the table. So the locks shouldn't > interfere with regular table locks. A ShareUpdateExclusiveLock on > the assertion should do it. Causing serialization of transaction commit just because a single assertion exists in the database seems too much of a hit, so looking for optimization opportunities seems appropriate. Here are some ideas for brainstorming. It might prove useful to note that any given assertion involves tables A, B, C. If a transaction doesn't modify any of those tables then there's no need to run the assertion when the transaction commits, skipping acquisition of the assertion lock. Probably this can only be implemented for SQL-language assertions, but even so it might be worth considering. (Assertions in any other language would be checked by every transaction.) Another thought: at the initial run of the assertion, note which tables it locked, and record this as an OID array in the catalog row for the assertion; consider running the assertion only when those tables are touched. This doesn't work if the assertion code locks some tables when run under certain conditions and other tables under different conditions. But then this can be checked too: if an assertion lists in its catalog row that it involves tables A, B, C and then, under different conditions, it tries to acquire lock on table D, have the whole thing fail indicating that the assertion is misdeclared. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
Josh Berkus wrote: > Serializable or not, *what* do we lock for assertions? It's not > rows. Tables? Which tables? What if the assertion is an > interpreted language function? Does the SSI reference counter > really take care of all of this? The simple answer is that, without adding any additional blocking, SSI guarantees that the behavior of running any set of concurrent serializable transactions is consistent with some serial (one-at-a-time) execution of those transactions. If the assertion is run as part of the transaction, it is automatically handled, regardless of the issues you are asking about. The longer answer is in the README and the papers it references: http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/storage/lmgr/README-SSI;hb=master For practical examples of how it works, this Wiki page includes examples of maintaining a multi-table invariant using triggers: http://wiki.postgresql.org/wiki/SSI -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging WAL when updating hintbit
2013/12/14 0:14 "Tom Lane" : > > Heikki Linnakangas writes: > > I'm not totally satisfied with the name of the GUC, wal_log_hintbits. > > Me either; at the very least, it's short an underscore: wal_log_hint_bits > would be more readable. But how about just "wal_log_hints"? > +1 too. it's readble. Masahiko Sawada
Re: [HACKERS] [PATCH] SQL assertions prototype
Heikki Linnakangas wrote: >On 12/18/2013 01:39 PM, Andres Freund wrote: >> On 2013-12-18 13:07:51 +0200, Heikki Linnakangas wrote: >>> Here's another idea that doesn't involve SSI: >>> >>> At COMMIT, take a new snapshot and check that the assertion still passes >>> with that snapshot. Now, there's a race condition, if another transaction is >>> committing at the same time, and performs the same check concurrently. That >>> race condition can be eliminated by holding an exclusive lock while running >>> the assertion, until commit, effectively allowing the assertion to be >>> checked by only one transaction at a time. >>> >>> I think that would work, and would be simple, although it wouldn't scale too >>> well. >> >> It probably would also be very prone to deadlocks. > > Hmm, since this would happen at commit, you would know all the > assertions that need to be checked at that point. You could check them > e.g in Oid order to avoid deadlocks. So you would actually serialize all COMMITs, or at least all which had done anything which could affect the truth of an assertion? That seems like it would work, but I suspect that it would be worse for performance than SSI in many workloads, and better than SSI in other workloads. Maybe a GUC to determine which strategy is used? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.3 reference constraint regression
Andres Freund wrote: > I have to say, it makes me really uncomfortable to take such > shortcuts. In other branches we are doing liveliness checks with > MultiXactIdIsRunning() et al. Why isn't that necessary here? And how > likely is that this won't cause breakage somewhere down the line because > somebody doesn't know of that subtlety? I came up with the attached last night, which should do the right thing in both cases. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services *** a/src/backend/access/transam/multixact.c --- b/src/backend/access/transam/multixact.c *** *** 1275,1280 retry: --- 1275,1313 } /* + * MultiXactHasRunningRemoteMembers + * Does the given multixact have still-live members from + * transactions other than our own? + */ + bool + MultiXactHasRunningRemoteMembers(MultiXactId multi) + { + MultiXactMember *members; + int nmembers; + int i; + + nmembers = GetMultiXactIdMembers(multi, &members, true); + if (nmembers <= 0) + return false; + + for (i = 0; i < nmembers; i++) + { + /* not interested in our own members */ + if (TransactionIdIsCurrentTransactionId(members[i].xid)) + continue; + + if (TransactionIdIsInProgress(members[i].xid)) + { + pfree(members); + return true; + } + } + + pfree(members); + return false; + } + + /* * mxactMemberComparator * qsort comparison function for MultiXactMember * *** a/src/backend/utils/time/tqual.c --- b/src/backend/utils/time/tqual.c *** *** 686,693 HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid, if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid */ return HeapTupleMayBeUpdated; ! if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask)) /* not deleter */ ! return HeapTupleMayBeUpdated; if (tuple->t_infomask & HEAP_XMAX_IS_MULTI) { --- 686,721 if (tuple->t_infomask & HEAP_XMAX_INVALID) /* xid invalid */ return HeapTupleMayBeUpdated; ! if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask)) ! { ! TransactionId xmax; ! ! xmax = HeapTupleHeaderGetRawXmax(tuple); ! ! /* ! * Careful here: even though this tuple was created by our own ! * transaction, it might be locked by other transactions, if ! * the original version was key-share locked when we updated ! * it. ! */ ! ! if (tuple->t_infomask & HEAP_XMAX_IS_MULTI) ! { ! if (MultiXactHasRunningRemoteMembers(xmax)) ! return HeapTupleBeingUpdated; ! else ! return HeapTupleMayBeUpdated; ! } ! ! /* if locker is gone, all's well */ ! if (!TransactionIdIsInProgress(xmax)) ! return HeapTupleMayBeUpdated; ! ! if (!TransactionIdIsCurrentTransactionId(xmax)) ! return HeapTupleBeingUpdated; ! else ! return HeapTupleMayBeUpdated; ! } if (tuple->t_infomask & HEAP_XMAX_IS_MULTI) { *** *** 700,706 HeapTupleSatisfiesUpdate(HeapTupleHeader tuple, CommandId curcid, --- 728,738 /* updating subtransaction must have aborted */ if (!TransactionIdIsCurrentTransactionId(xmax)) + { + if (MultiXactHasRunningRemoteMembers(xmax)) + return HeapTupleBeingUpdated; return HeapTupleMayBeUpdated; + } else { if (HeapTupleHeaderGetCmax(tuple) >= curcid) *** a/src/include/access/multixact.h --- b/src/include/access/multixact.h *** *** 89,94 extern bool MultiXactIdIsRunning(MultiXactId multi); --- 89,95 extern void MultiXactIdSetOldestMember(void); extern int GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **xids, bool allow_old); + extern bool MultiXactHasRunningRemoteMembers(MultiXactId multi); extern bool MultiXactIdPrecedes(MultiXactId multi1, MultiXactId multi2); extern bool MultiXactIdPrecedesOrEquals(MultiXactId multi1, MultiXactId multi2); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logging WAL when updating hintbit
On Tue, Dec 17, 2013 at 10:22 PM, Amit Kapila wrote: >> Me either; at the very least, it's short an underscore: wal_log_hint_bits >> would be more readable. But how about just "wal_log_hints"? > > +1 for wal_log_hints, it sounds better. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] row security roadmap proposal
On Wed, Dec 18, 2013 at 3:30 AM, Craig Ringer wrote: > In my view the proposed patch doesn't offer a significant improvement in > declarative security, beyond what we can get by just adding update > support to s.b. views and using search_path to control whether a user > sees the view or the base table. > > It's a lot like Oracle Virtual Private Database (VPD): A set of low > level building blocks you can build your own flexible row security model > with. One where you have to very carefully write security-sensitive > predicates and define all your security model tables, etc yourself. > > That's why I'm now of the opinion that we should make it possible to > achieve the same thing with s.b. views and the search_path (by adding > update support)... then build a declarative row-security system that > doesn't require the user fiddling with delicate queries and sensitive > scripts on top of that. To be clear, I wasn't advocating for a declarative approach; I think predicates are fine. There are usability issues to worry about either way, and my concern is that we address those. A declarative approach would certainly be valuable in that, for those people for whom it is sufficiently flexible, it's probably quite a lot easier than writing predicates. But I fear that some people will want a lot more generality than a label-based system can accommodate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] row security roadmap proposal
On Tue, Dec 17, 2013 at 1:27 PM, Simon Riggs wrote: > Not sure I'd say required, but its certainly desirable to have > updateable security barrier views in themselves. And it comes across > to me as a cleaner and potentially more performant way of doing the > security checks for RLS. Yes, that's how I'm thinking of it. It's required in the sense that if we don't do it as a separate patch, we'll need to fold many of changes into the RLS patch, which IMV is not desirable. We'd end up with more complexity and less functionality with no real upside that I can see. But I think we are basically saying the same thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewarm status
On Tue, Dec 17, 2013 at 9:03 PM, KONDO Mitsumasa wrote: > (2013/12/18 5:33), Robert Haas wrote: >> Sounds like it might be worth dusting the patch off again... > > I'd like to request you to add all_index option and usage_count option. > When all_index option is selected, all index become rewarm nevertheless user > doesn't input relation name. And usage_count option adds usage_copunt in > shared_buffers. Useful buffers will remain long and not to be thrown easly. > I think these are easy to implements and useful. So please if you like. Prewarming indexes is useful, but I don't think we need to complicate the API for that. With the version I just posted, you can simply do something like this: SELECT pg_prewarm(indexrelid) FROM pg_index WHERE indrelid = 'pgbench_accounts'::regclass; I seriously doubt whether being able to set the usage count is actually useful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: SQL objects UNITs (was: [HACKERS] Extension Templates S03E11)
Stephen Frost escribió: > * Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote: > > Basically with building `UNIT` we realise with hindsight that we failed to > > build a proper `EXTENSION` system, and we send that message to our users. > > Little difficult to draw conclusions about what out 'hindsight' will > look like. I haven't been keeping very close attention to this, but I fail to see why extensions are so much of a failure. Surely we can invent a new "kind" of extensions, ones whose contents specifically are dumped by pg_dump. Regular extensions, the kind we have today, still wouldn't, but we could have a flag, say "CREATE EXTENSION ... (WITH DUMP)" or something. That way you don't have to come up with UNIT at all (or whatever). A whole new set of catalogs just to fix up a minor issue with extensions sounds a bit too much to me; we can just add this new thing on top of the existing infrastructure. I didn't much like the WITH UNIT/END UNIT thingy. What's wrong with CREATE foo; ALTER EXTENSION ADD foo? There's a bit of a problem that if you create the object and die before being able to add it to the extension, it would linger unreferenced; but that's easily fixable by doing the creation in a transaction, I think. (Alternatively, we could have a single command that creates the extension and the contained objects in one fell swoop, similar to how CREATE SCHEMA can do it; but I'm not sure that's all that much better, and from a grammar POV it probably sucks.) -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewarm status
On Tue, Dec 17, 2013 at 12:35 PM, Jeff Janes wrote: > Since it doesn't use directIO, you can't warm the PG buffers without also > warming FS cache as a side effect. That is why I like 'buffer' as the > default--if the data fits in shared_buffers, it warm those, otherwise it at > least warms the FS. If you want to only warm the FS cache, you can use > either the 'prefetch' or 'read' modes instead. All right, here is an updated patch. I swapped the second and third arguments, because I think overriding the prewarm mode will be a lot more common than overriding the relation fork. I also added defaults, so you can do this: SELECT pg_prewarm('pgbench_accounts'); Or this: SELECT pg_prewarm('pgbench_accounts', 'read'); I also fixed some oversights in the error checks. I'm not inclined to wait for the next CommitFest to commit this, because it's a very simple patch and has already had a lot more field testing than most patches get before they're committed. And it's just a contrib module, so the damage it can do if there is in fact a bug is pretty limited. All that having been said, any review is appreciated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/contrib/Makefile b/contrib/Makefile index 8a2a937..dd2683b 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -32,6 +32,7 @@ SUBDIRS = \ pg_archivecleanup \ pg_buffercache \ pg_freespacemap \ + pg_prewarm \ pg_standby \ pg_stat_statements \ pg_test_fsync \ diff --git a/contrib/pg_prewarm/Makefile b/contrib/pg_prewarm/Makefile new file mode 100644 index 000..176a29a --- /dev/null +++ b/contrib/pg_prewarm/Makefile @@ -0,0 +1,18 @@ +# contrib/pg_prewarm/Makefile + +MODULE_big = pg_prewarm +OBJS = pg_prewarm.o + +EXTENSION = pg_prewarm +DATA = pg_prewarm--1.0.sql + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/pg_prewarm +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/contrib/pg_prewarm/pg_prewarm--1.0.sql b/contrib/pg_prewarm/pg_prewarm--1.0.sql new file mode 100644 index 000..2bec776 --- /dev/null +++ b/contrib/pg_prewarm/pg_prewarm--1.0.sql @@ -0,0 +1,14 @@ +/* contrib/pg_prewarm/pg_prewarm--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION pg_prewarm" to load this file. \quit + +-- Register the function. +CREATE FUNCTION pg_prewarm(regclass, + mode text default 'buffer', + fork text default 'main', + first_block int8 default null, + last_block int8 default null) +RETURNS int8 +AS 'MODULE_PATHNAME', 'pg_prewarm' +LANGUAGE C; diff --git a/contrib/pg_prewarm/pg_prewarm.c b/contrib/pg_prewarm/pg_prewarm.c new file mode 100644 index 000..10317f3 --- /dev/null +++ b/contrib/pg_prewarm/pg_prewarm.c @@ -0,0 +1,205 @@ +/*- + * + * pg_prewarm.c + * prewarming utilities + * + * Copyright (c) 2010-2012, PostgreSQL Global Development Group + * + * IDENTIFICATION + * contrib/pg_prewarm/pg_prewarm.c + * + *- + */ +#include "postgres.h" + +#include +#include + +#include "access/heapam.h" +#include "catalog/catalog.h" +#include "fmgr.h" +#include "miscadmin.h" +#include "storage/bufmgr.h" +#include "storage/smgr.h" +#include "utils/acl.h" +#include "utils/builtins.h" +#include "utils/lsyscache.h" +#include "utils/rel.h" + +PG_MODULE_MAGIC; + +extern Datum pg_prewarm(PG_FUNCTION_ARGS); + +PG_FUNCTION_INFO_V1(pg_prewarm); + +typedef enum +{ + PREWARM_PREFETCH, + PREWARM_READ, + PREWARM_BUFFER +} PrewarmType; + +static char blockbuffer[BLCKSZ]; + +/* + * pg_prewarm(regclass, mode text, fork text, + * first_block int8, last_block int8) + * + * The first argument is the relation to be prewarmed; the second controls + * how prewarming is done; legal options are 'prefetch', 'read', and 'buffer'. + * The third is the name of the relation fork to be prewarmed. The fourth + * and fifth arguments specify the first and last block to be prewarmed. + * If the fourth argument is NULL, it will be taken as 0; if the fifth argument + * is NULL, it will be taken as the number of blocks in the relation. The + * return value is the number of blocks successfully prewarmed. + */ +Datum +pg_prewarm(PG_FUNCTION_ARGS) +{ + Oid relOid; + text *forkName; + text *type; + int64 first_block; + int64 last_block; + int64 nblocks; + int64 blocks_done = 0; + int64 block; + Relation rel; + ForkNumber forkNumber; + char *forkString; + char *ttype; + PrewarmType ptype; + AclResult aclresult; + + /* Basic sanity checking. */ + if (PG_ARGISNULL(0)) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("relation cannot be null"))); + relOid = PG_GETARG_OID(0); + if (PG_A
Re: [HACKERS] ALTER SYSTEM SET command to change postgresql.conf parameters
>> Is there any reason for the function returns int as it always returns >> 0 or 1. Maybe returns bool is better? > > No, return type should be bool, I have changed the same in attached patch. Confirmed. >> 2) initdb.c >> >> + strcpy(tempautobuf, "# Do not edit this file manually! \n"); >> + autoconflines[0] = pg_strdup(tempautobuf); >> + strcpy(tempautobuf, "# It will be overwritten by the ALTER SYSTEM >> command. \n"); >> + autoconflines[1] = pg_strdup(tempautobuf); >> >> Is there any reason to use "tempautobuf" here? I think we can simply change >> to this: >> >> + autoconflines[0] = pg_strdup("# Do not edit this file manually! \n"); >> + autoconflines[1] = pg_strdup("# It will be overwritten by the ALTER >> SYSTEM command. \n"); > > You are right, I have changed code as per your suggestion. Confirmed. >> 3) initdb.c >> >> It seems the memory allocated for autoconflines[0] and >> autoconflines[1] by pg_strdup is never freed. > > I think, it gets freed in writefile() in below code. Oh, I see. Sorry for noise. I have committed your patches. Thanks. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On 12/18/2013 01:45 PM, Alexander Korotkov wrote: On Tue, Dec 17, 2013 at 2:49 AM, Heikki Linnakangas wrote: On 12/17/2013 12:22 AM, Alexander Korotkov wrote: 2) Storage would be easily extendable to hold additional information as well. Better compression shouldn't block more serious improvements. I'm not sure I agree with that. For all the cases where you don't care about additional information - which covers all existing users for example - reducing disk size is pretty important. How are you planning to store the additional information, and how does using another encoding gets in the way of that? I was planned to store additional information datums between varbyte-encoded tids. I was expected it would be hard to do with PFOR. However, I don't see significant problems in your implementation of Simple9 encoding. I'm going to dig deeper in your version of patch. Ok, thanks. I had another idea about the page format this morning. Instead of having the item-indexes at the end of the page, it would be more flexible to store a bunch of self-contained posting list "segments" on the page. So I propose that we get rid of the item-indexes, and instead store a bunch of independent posting lists on the page: typedef struct { ItemPointerData first; /* first item in this segment (unpacked) */ uint16 nwords; /* number of words that follow */ uint64 words[1];/* var length */ } PostingListSegment; Each segment can be encoded and decoded independently. When searching for a particular item (like on insertion), you skip over segments where 'first' > the item you're searching for. This format offers a lot more flexibility compared to the separate item indexes. First, we don't need to have another fixed sized area on the page, which simplifies the page format. Second, we can more easily re-encode only one segment on the page, on insertion or vacuum. The latter is particularly important with the Simple-9 encoding, which operates one word at a time rather than one item at a time; removing or inserting an item in the middle can require a complete re-encoding of everything that follows. Third, when a page is being inserted into and contains only uncompressed items, you don't waste any space for unused item indexes. While we're at it, I think we should use the above struct in the inline posting lists stored directly in entry tuples. That wastes a few bytes compared to the current approach in the patch (more alignment, and 'words' is redundant with the number of items stored on the tuple header), but it simplifies the functions handling these lists. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: SQL objects UNITs (was: [HACKERS] Extension Templates S03E11)
* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote: > Here's my attempt: > > # Inline Extension, Extension Templates > > The problem with *Inline Extension* is the dump and restore policy. The > contents of an extensions are not be found in a `pg_dump` script, ever. You keep coming back to this and I think you're taking too narraw a view to the comments made on the prior threads. No, we don't really want extensions which have .sql files out on disk somewhere as part of them to be dumped out through pg_dump because then it becomes unclear which set of scripts should be used during restore. What we're talking about here is intended to not have that issue by using a different namespace, a flag, something which identifies these extensions as being defined through the catalog instead. > # The new thing™ > > A set of SQL objects that can be managed wholesale, with a version string > attached to it. Objects are part of `pg_dump` output, the whole set can be > relocatable, and has a version string attached. I'd like to see more than just a single version string included and I think that'd be beneficial for extensions too. > Name: [...] > I'll pick UNIT here. We can figure that later. > Commands: > > CREATE UNIT name [ SCHEMA ... ] [ [ NOT ] RELOCATABLE ] [ REQUIRE ...]; > > WITH UNIT name; > > END UNIT name; Interesting approach- I had considered something similar by having a 'fake' schema created into which you built up the 'UNIT'. The reason I was thinking schema instead of begin/end style commands, as you have above, is because of questions around transactions. Do you think the syntax you have here would require the definition to be all inside of a single transaction? Do we feel that would even be an issue or perhaps that it *should* be done that way? I don't currently have any strong feelings one way or the other on this and I'm curious what others think. > The `UPDATE TO` command only sets a new version string. So, one of the things I had been wondering about is if we could provide a 'diff' tool. Using your 'WITH UNIT' syntax above, an author might need to only write their initial implementation, build up a 'UNIT' based on it, then adjust that implementation with another 'WITH UNIT' clause and then ask PG for the differences. It's not clear if we could make that work but there is definitely a set of desireable capabilities out there, which some other databases have, around automated upgrade script building and doing schema differences. > # Implementation details > # Event Trigger support Not sure we're really ready to get into these yet. > The main drawback is that rather than building on extensions, both in a > technical way and in building user trust, we are basically going to > deprecate extensions entirely, giving them a new name an an incompatible way > to manage them. I don't see this as ending up deprecating extensions, even if we build a new thing with a new name. I would argue that properly supported extensions, such as those in contrib and the other 'main' ones, like PostGIS, and others that have any external dependencies (eg: FDWs) would almost certainly continue as extensions and would be packaged through the normal OS packaging systems. While you plan to use the event trigger mechanism to build something on top of this which tries to act like extenisons-but-not, I think that's an extremely narrow and limited use-case that very few people would have any interest in or use. > Basically with building `UNIT` we realise with hindsight that we failed to > build a proper `EXTENSION` system, and we send that message to our users. Little difficult to draw conclusions about what out 'hindsight' will look like. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/18/2013 06:00 AM, Heikki Linnakangas wrote: If you don't force everything to run in SSI mode, then you have to somehow figure out what parts do need to run in SSI mode to enforce the assertion. For example, if the assertion refers tables A and B, perhaps you can get away without predicate locks on table C? But the assertion might simply run a function. For non-trivial cases that's what I would expect people to do. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER SYSTEM SET command to change postgresql.conf parameters
On 12/18/2013 03:35 AM, Tatsuo Ishii wrote: 3) initdb.c It seems the memory allocated for autoconflines[0] and autoconflines[1] by pg_strdup is never freed. (I think there's similar problem with "conflines" as well, though it was not introduced by the patch). Why would we care? initdb doesn't run for very long, and the memory will be freed when it exits, usually within a few seconds. My recollection from back when I originally rewrote initdb in C was that cleaning up the memory leaks wasn't worth the trouble. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stats for network traffic WIP
* Craig Ringer (cr...@2ndquadrant.com) wrote: > On 12/12/2013 02:51 AM, Tom Lane wrote: > > The thing that I'm wondering is why the database would be the right place > > to be measuring it at all. If you've got a network usage problem, > > aggregate usage across everything on the server is probably what you > > need to be worried about, and PG can't tell you that. > > I suspect this feature would be useful for when you want to try to drill > down and figure out what's having network issues - specifically, to > associate network behaviour with individual queries, individual users, > application_name, etc. > > One sometimes faces the same issue with I/O: I know PostgreSQL is doing > lots of I/O, but what exactly is causing the I/O? Especially if you > can't catch it at the time it happens, it can be quite tricky to go from > "there's lots of I/O" to "this query changed from using synchronized > seqscans to doing an index-only scan that's hammering the cache". Agreed. My other thought on this is that there's a lot to be said for having everything you need available through one tool- kinda like how Emacs users rarely go outside of it.. :) And then there's also the consideration that DBAs may not have access to the host system at all, or not to the level needed to do similar analysis there. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] sepgsql: label regression test failed
# semodule -l | grep sepgslq sepgsql-regtest 1.07 Full list of modules is in attachment. 2013/12/18 Kohei KaiGai > Could you show me semodule -l on your environment? > I believe security policy has not been changed between F19 and F20... > > Thanks, > > 2013/12/18 Sergey Muraviov : > > Hi > > > > I've tried to test postgres 9.3.2 and 9.4devel with selinux on Fedora 20 > and > > met with a label regression test failure. > > > > PS > > I've got some warning during build process. > > > > -- > > Best regards, > > Sergey Muraviov > > > > > > -- > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-hackers > > > > > > -- > KaiGai Kohei > -- Best regards, Sergey Muraviov modules Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sepgsql: label regression test failed
Could you show me semodule -l on your environment? I believe security policy has not been changed between F19 and F20... Thanks, 2013/12/18 Sergey Muraviov : > Hi > > I've tried to test postgres 9.3.2 and 9.4devel with selinux on Fedora 20 and > met with a label regression test failure. > > PS > I've got some warning during build process. > > -- > Best regards, > Sergey Muraviov > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] sepgsql: label regression test failed
Hi I've tried to test postgres 9.3.2 and 9.4devel with selinux on Fedora 20 and met with a label regression test failure. PS I've got some warning during build process. -- Best regards, Sergey Muraviov regression.out Description: Binary data regression.diffs Description: Binary data warnings Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER SYSTEM SET command to change postgresql.conf parameters
On Wed, Dec 18, 2013 at 2:05 PM, Tatsuo Ishii wrote: > Hi, > > I have looked into this because it's marked as "ready for committer". Thank you. > It looks like working as advertised. Great! However I have noticed a > few minor issues. > > 1) validate_conf_option > > +/* > + * Validates configuration parameter and value, by calling check hook > functions > + * depending on record's vartype. It validates if the parameter > + * value given is in range of expected predefined value for that parameter. > + * > + * freemem - true indicates memory for newval and newextra will be > + * freed in this function, false indicates it will be > freed > + * by caller. > + * Return value: > + * 1: the value is valid > + * 0: the name or value is invalid > + */ > +int > +validate_conf_option(struct config_generic * record, const char *name, > +const char *value, GucSource source, > int elevel, > +bool freemem, void *newval, void > **newextra) > > Is there any reason for the function returns int as it always returns > 0 or 1. Maybe returns bool is better? No, return type should be bool, I have changed the same in attached patch. > 2) initdb.c > > + strcpy(tempautobuf, "# Do not edit this file manually! \n"); > + autoconflines[0] = pg_strdup(tempautobuf); > + strcpy(tempautobuf, "# It will be overwritten by the ALTER SYSTEM > command. \n"); > + autoconflines[1] = pg_strdup(tempautobuf); > > Is there any reason to use "tempautobuf" here? I think we can simply change > to this: > > + autoconflines[0] = pg_strdup("# Do not edit this file manually! \n"); > + autoconflines[1] = pg_strdup("# It will be overwritten by the ALTER > SYSTEM command. \n"); You are right, I have changed code as per your suggestion. > 3) initdb.c > > It seems the memory allocated for autoconflines[0] and > autoconflines[1] by pg_strdup is never freed. I think, it gets freed in writefile() in below code. for (line = lines; *line != NULL; line++) { if (fputs(*line, out_file) < 0) { fprintf(stderr, _("%s: could not write file \"%s\": %s\n"), progname, path, strerror(errno)); exit_nicely(); } free(*line); } With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com alter_system_v13.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] commit fest 2013-11 final report
On Tue, Dec 17, 2013 at 7:14 PM, Peter Eisentraut wrote: > On 12/17/13, 10:19 AM, Tom Lane wrote: >> Perhaps we should just move all the Needs Review and RFC patches forward >> to the next fest, so we don't forget about them? > > This was done the last few times, but it has caused some controversy. > One problem was that a number of patches arrived in this commit fest > without either the author or the reviewers knowing about it, which > caused the already somewhat stale patch to become completely abandoned. > > I think what I'll do is send an email to each of the affected patch > threads describing the situation. But I'd like someone involved in the > patch, either author or reviewer, to make the final call about moving > the patch forward. +1. And thanks for your work on this CommitFest. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
On Sat, Dec 14, 2013 at 11:47 PM, Andreas Karlsson wrote: > On 12/14/2013 10:59 AM, Alexander Korotkov wrote: > >> This patch allows to use index for order-by if order-by clause and index >> has non-empty common prefix. So, index gives right ordering for first n >> order-by columns. In order to provide right order for rest m columns, >> sort node is inserted. This sort node sorts groups of tuples where >> values of first n order-by columns are equal. >> > > I recently looked at the same problem. I see that you solved the > rescanning problem by simply forcing the sort to be redone on > ExecReScanSort if you have done a partial sort. > Naturally, I'm sure I solved it at all :) I just get version of patch working for very limited use-cases. > My idea for a solution was to modify tuplesort to allow storing the > already sorted keys in either memtuples or the sort result file, but > setting a field so it does not sort thee already sorted tuples again. This > would allow the rescan to work as it used to, but I am unsure how clean or > ugly this code would be. Was this something you considered? I'm not sure. I believe that best answer depends on particular parameter: how much memory we've for sort, how expensive is underlying node and how it performs rescan, how big are groups in partial sort. -- With best regards, Alexander Korotkov.
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/18/2013 01:50 PM, Andres Freund wrote: On 2013-12-18 13:46:59 +0200, Heikki Linnakangas wrote: On 12/18/2013 01:39 PM, Andres Freund wrote: On 2013-12-18 13:07:51 +0200, Heikki Linnakangas wrote: Here's another idea that doesn't involve SSI: At COMMIT, take a new snapshot and check that the assertion still passes with that snapshot. I think that would work, and would be simple, although it wouldn't scale too well. It probably would also be very prone to deadlocks. Hmm, since this would happen at commit, you would know all the assertions that need to be checked at that point. You could check them e.g in Oid order to avoid deadlocks. I think real problem are the lock upgrades, because eventual DML will have acquired less heavy locks. Ah, I see. You don't need to block anyone else from modifying the table, you just need to block anyone else from committing a transaction that had modified the table. So the locks shouldn't interfere with regular table locks. A ShareUpdateExclusiveLock on the assertion should do it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
On Sat, Dec 14, 2013 at 7:04 PM, Andres Freund wrote: > Hi, > > > > > Limit (cost=69214.06..69214.08 rows=10 width=16) (actual > > > > time=0.097..0.099 rows=10 loops=1) > > > >-> Sort (cost=69214.06..71714.06 rows=100 width=16) (actual > > > > time=0.096..0.097 rows=10 loops=1) > > > > Sort Key: v1, v2 > > > > Sort Method: top-N heapsort Memory: 25kB > > > > -> Index Scan using test_v1_idx on test > (cost=0.42..47604.42 > > > > rows=100 width=16) (actual time=0.017..0.066 rows=56 loops=1) > > > > Total runtime: 0.125 ms > > > > (6 rows) > > > > > > Is that actually all that beneficial when sorting with a bog standard > > > qsort() since that doesn't generally benefit from data being > pre-sorted? > > > I think we might need to switch to a different algorithm to really > > > benefit from mostly pre-sorted input. > > > > > > > In this patch I don't do full sort of dataset. For instance, index > returns > > data ordered by first column and we need to order them also by second > > column. > > Ah, that makes sense. > > > But, I don't think we should expect pre-sorted values of second column > > inside a group. > > Yes, if you do it that way, there doesn't seem to any need to assume > that any more than we usually do. > > I think you should make the explain output reflect the fact that we're > assuming v1 is presorted and just sorting v2. I'd be happy enough with: > Sort Key: v1, v2 > Partial Sort: v2 > or even just > "Partial Sort Key: [v1,] v2" > but I am sure others disagree. > Sure, I just didn't change explain output yet. It should look like what you propose. > > > *partial-knn-1.patch* > > > > Rechecking from the heap means adding a sort node though, which I don't > > > see here? Or am I misunderstanding something? > > > KNN-GiST contain RB-tree of scanned items. In this patch item is > rechecked > > inside GiST and reinserted into same RB-tree. It appears to be much > easier > > implementation for PoC and also looks very efficient. I'm not sure what > is > > actually right design for it. This is what I like to discuss. > > I don't have enough clue about gist to say wether it's the right design, > but it doesn't look wrong to my eyes. It'd probably be useful to export > the knowledge that we are rechecking and how often that happens to the > outside. > While I didn't really look into the patch, I noticed in passing that you > pass a all_dead variable to heap_hot_search_buffer without using the > result - just pass NULL instead, that performs a bit less work. Useful notice, thanks. -- With best regards, Alexander Korotkov.
Re: [HACKERS] PoC: Partial sort
On Sat, Dec 14, 2013 at 6:39 PM, Martijn van Oosterhout wrote: > On Sat, Dec 14, 2013 at 06:21:18PM +0400, Alexander Korotkov wrote: > > > Is that actually all that beneficial when sorting with a bog standard > > > qsort() since that doesn't generally benefit from data being > pre-sorted? > > > I think we might need to switch to a different algorithm to really > > > benefit from mostly pre-sorted input. > > > > > > > In this patch I don't do full sort of dataset. For instance, index > returns > > data ordered by first column and we need to order them also by second > > column. Then this node sorts groups (assumed to be small) where values of > > the first column are same by value of second column. And with limit > clause > > only required number of such groups will be processed. But, I don't think > > we should expect pre-sorted values of second column inside a group. > > Nice. I imagine this would be mostly beneficial for fast-start plans, > since you no longer need to sort the whole table prior to returning the > first tuple. > > Reduced memory usage might be a factor, especially for large sorts > where you otherwise might need to spool to disk. > > You can now use an index on (a) to improve sorting for (a,b). > > Cost of sorting n groups of size l goes from O(nl log nl) to just O(nl > log l), useful for large n. > Agree. Your reasoning looks correct. > Minor comments: > > I find cmpTuple a bad name. That's what it's doing but perhaps > cmpSkipColumns would be clearer. > > I think it's worthwhile adding a seperate path for the skipCols = 0 > case, to avoid extra copies. > Thanks. I'll take care about. -- With best regards, Alexander Korotkov.
Re: [HACKERS] [PATCH] SQL assertions prototype
On 2013-12-18 13:46:59 +0200, Heikki Linnakangas wrote: > On 12/18/2013 01:39 PM, Andres Freund wrote: > >On 2013-12-18 13:07:51 +0200, Heikki Linnakangas wrote: > >>Here's another idea that doesn't involve SSI: > >> > >>At COMMIT, take a new snapshot and check that the assertion still passes > >>with that snapshot. > >>I think that would work, and would be simple, although it wouldn't scale too > >>well. > > > >It probably would also be very prone to deadlocks. > > Hmm, since this would happen at commit, you would know all the assertions > that need to be checked at that point. You could check them e.g in Oid order > to avoid deadlocks. I think real problem are the lock upgrades, because eventual DML will have acquired less heavy locks. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hstore ng index for <@ ?
Hi! On Wed, Dec 18, 2013 at 2:35 PM, Kaare Rasmussen wrote: > In many ways the new hstore (and perhaps json) format looks very exciting. > But will there ever be (GIN/GIST) index support for <@ ? It looks not hard to do with GiST. About GIN I don't have promising ideas: likely we can't effectively use GIN for such kind of queries. I believe it's not implemented because wasn't requested yet. Could you share your use-case? -- With best regards, Alexander Korotkov.
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/18/2013 01:39 PM, Andres Freund wrote: On 2013-12-18 13:07:51 +0200, Heikki Linnakangas wrote: Here's another idea that doesn't involve SSI: At COMMIT, take a new snapshot and check that the assertion still passes with that snapshot. Now, there's a race condition, if another transaction is committing at the same time, and performs the same check concurrently. That race condition can be eliminated by holding an exclusive lock while running the assertion, until commit, effectively allowing the assertion to be checked by only one transaction at a time. I think that would work, and would be simple, although it wouldn't scale too well. It probably would also be very prone to deadlocks. Hmm, since this would happen at commit, you would know all the assertions that need to be checked at that point. You could check them e.g in Oid order to avoid deadlocks. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part 1: additional information
On Tue, Dec 17, 2013 at 2:49 AM, Heikki Linnakangas wrote: > On 12/17/2013 12:22 AM, Alexander Korotkov wrote: > >> On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas < >> hlinnakan...@vmware.com >> >>> wrote: >>> >> >> On 12/12/2013 06:44 PM, Alexander Korotkov wrote: >>> >>> When values are packed into small groups, we have to either insert >>> inefficiently encoded value or re-encode whole right part of values. >>> >>> It would probably be simplest to store newly inserted items uncompressed, >>> in a separate area in the page. For example, grow the list of >>> uncompressed >>> items downwards from pg_upper, and the compressed items upwards from >>> pg_lower. When the page fills up, re-encode the whole page. >>> >> > I hacked together an implementation of a variant of Simple9, to see how it > performs. Insertions are handled per the above scheme. > > In a limited pg_trgm test case I've been using a lot for this, this > reduces the index size about 20%, compared to varbyte encoding. It might be > possible to squeeze it a bit more, I handcrafted the "selectors" in the > encoding algorithm to suite our needs, but I don't actually have a good > idea of how to choose them optimally. Also, the encoding can encode 0 > values, but we never need to do that, so you could take advantage of that > to pack items tighter. > > Compression and decompression speed seems to be about the same. > > Patch attached if you want to play with it. WAL replay is still broken, > and there are probably bugs. > > > Good idea. But: >> 1) We'll still need item indexes in the end of page for fast scan. >> > > Sure. > > > 2) Storage would be easily extendable to hold additional information as >> well. >> Better compression shouldn't block more serious improvements. >> > > I'm not sure I agree with that. For all the cases where you don't care > about additional information - which covers all existing users for example > - reducing disk size is pretty important. How are you planning to store the > additional information, and how does using another encoding gets in the way > of that? I was planned to store additional information datums between varbyte-encoded tids. I was expected it would be hard to do with PFOR. However, I don't see significant problems in your implementation of Simple9 encoding. I'm going to dig deeper in your version of patch. -- With best regards, Alexander Korotkov.
Re: [HACKERS] Example query causing param_info to be set in plain rel path
I got an example where paths for plain rel require param_info i.e. plain rel scans require to take care of the lateral references. Here's the example from PG regression explain verbose select v.* from (int8_tbl x left join (select q1,(select coalesce(q2,0)) q2 from int8_tbl) y on x.q2 = y.q1) left join int4_tbl z on z.f1 = x.q2, lateral (select x.q1,y.q1 from dual union all select x.q2,y.q2 from dual) v(vx,vy); There is note in create_scan_plan(), which says, 324 * If it's a parameterized otherrel, there might be lateral references 325 * in the tlist, which need to be replaced with Params. This cannot 326 * happen for regular baserels, though. Note use_physical_tlist() 327 * always fails for otherrels, so we don't need to check this above. 328 */ Although it doesn't say why this can not happen for regular baserels. So, we do have a testcase which tests this code path as well. On Mon, Oct 28, 2013 at 12:30 PM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote: > No adding OFFSET there too didn't give the expected result. The lateral > was handled in subquery and passed as param to the underlying table scan. > > I am particularly interested in tables (unlike functions or subqueries) > since, the table scans are shipped to the datanodes and I wanted to test > the effect of lateral in such cases. OTH, functions involving access to the > tables or subqueries are initiated on the coordinators, where lateral gets > executed in the same way as PostgreSQL. > > If it's so hard to come up with an example query which would cause > lateral_relids to be set in RelOptInfo of a table, then it's very likely > that relevant code is untested in PostgreSQL. > > > On Fri, Oct 25, 2013 at 7:11 PM, Tom Lane wrote: > >> Ashutosh Bapat writes: >> > In order to test various cases of LATERAL join in Postgres-XC, I am >> trying >> > to find a query where RelOptInof->lateral_relids would get set for plain >> > base relations. >> >> I think you need a lateral reference in a function or VALUES FROM-item. >> As you say, plain sub-selects are likely to get flattened. (Possibly >> if you stuck in a flattening fence such as OFFSET 0, you could get the >> case to happen with a sub-select FROM item, but I'm too lazy to check.) >> >> regards, tom lane >> > > > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] [PATCH] SQL assertions prototype
On 2013-12-18 13:07:51 +0200, Heikki Linnakangas wrote: > Here's another idea that doesn't involve SSI: > > At COMMIT, take a new snapshot and check that the assertion still passes > with that snapshot. Now, there's a race condition, if another transaction is > committing at the same time, and performs the same check concurrently. That > race condition can be eliminated by holding an exclusive lock while running > the assertion, until commit, effectively allowing the assertion to be > checked by only one transaction at a time. > > I think that would work, and would be simple, although it wouldn't scale too > well. It probably would also be very prone to deadlocks. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/18/2013 02:59 AM, Josh Berkus wrote: On 12/17/2013 01:42 PM, Kevin Grittner wrote: It works fine as long as you set default_transaction_isolation = 'serializable' and never override that. :-) Of course, it sure would be nice to have a way to prohibit overrides, but that's another issue. Otherwise it is hard to see how to make it work in a general way without a mutually exclusive lock mode on the table for the duration of any transaction which modifies the table. Serializable or not, *what* do we lock for assertions? It's not rows. Tables? Which tables? What if the assertion is an interpreted language function? Does the SSI reference counter really take care of all of this? Here's another idea that doesn't involve SSI: At COMMIT, take a new snapshot and check that the assertion still passes with that snapshot. Now, there's a race condition, if another transaction is committing at the same time, and performs the same check concurrently. That race condition can be eliminated by holding an exclusive lock while running the assertion, until commit, effectively allowing the assertion to be checked by only one transaction at a time. I think that would work, and would be simple, although it wouldn't scale too well. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SQL assertions prototype
On 12/18/2013 02:59 AM, Josh Berkus wrote: On 12/17/2013 01:42 PM, Kevin Grittner wrote: Josh Berkus wrote: Going back over this patch, I haven't seen any further discussion of the point Heikki raises above, which seems like a bit of a showstopper. Heikki, did you have specific ideas on how to solve this? Right now my mind boggles. It works fine as long as you set default_transaction_isolation = 'serializable' and never override that. :-) Of course, it sure would be nice to have a way to prohibit overrides, but that's another issue. Otherwise it is hard to see how to make it work in a general way without a mutually exclusive lock mode on the table for the duration of any transaction which modifies the table. Serializable or not, *what* do we lock for assertions? It's not rows. Tables? Which tables? What if the assertion is an interpreted language function? Does the SSI reference counter really take care of all of this? SSI does make everything rosy, as long as all the transactions participate in it. The open questions revolve around what happens if a transaction is not running in SSI mode. If you force everyone to run in SSI as soon as you create even a single assertion in your database, that's kind of crappy for performance. Also, if one user creates an assertion, it becomes a funny kind of a partial denial of service attack to other users, if you force other user's to also run in SSI mode. If you don't force everything to run in SSI mode, then you have to somehow figure out what parts do need to run in SSI mode to enforce the assertion. For example, if the assertion refers tables A and B, perhaps you can get away without predicate locks on table C? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Templates S03E11
Tom Lane writes: > I keep telling you this, and it keeps not sinking in. How can you say that? I've been spending a couple of years on designing and implementing and arguing for a complete feature set where dealing with modules is avoided at all costs. The problem we have now is that I'm being told that the current feature is rejected if it includes anything about modules, and not interesting enough if it's not dealing with modules. I tried my best to make it so that nothing in-core changes wrt modules, yet finding out-of-core solutions to still cope with that. It's a failure, ok. I think we need a conclusion on this thread: Extension specs are frozen. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] hstore ng index for <@ ?
Hi In many ways the new hstore (and perhaps json) format looks very exciting. But will there ever be (GIN/GIST) index support for <@ ? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
SQL objects UNITs (was: [HACKERS] Extension Templates S03E11)
Simon Riggs writes: > On 17 December 2013 23:42, Tom Lane wrote: >>> We aim to have the simplest implementation that meets the stated need >>> and reasonable extrapolations of that. Text in a catalog table is the >>> simplest implementation. That is not a reason to reject it, especially >>> when we aren't suggesting a viable alternative. >> >> The first part of this assertion is debatable, and the claim that no >> viable alternative has been suggested is outright wrong. With due respect, it's only wrong when you buy into implementing something new rather than improving extensions. > Sounds like we have a way forward for this feature then, just not with > the current patch. > > Can someone attempt to summarise the way forward, with any caveats and > necessary restrictions? It would save further column inches of debate. Here's my attempt: # Inline Extension, Extension Templates The problem with *Inline Extension* is the dump and restore policy. The contents of an extensions are not be found in a `pg_dump` script, ever. The problem with the *Extension Templates* is that we store the extension scripts (plain text blobs) in the catalogs, where we already have the full SQL objects and tools (such as `pg_dump` and `pg_depends`) to manipulate and introspect them. # The new thing™ A set of SQL objects that can be managed wholesale, with a version string attached to it. Objects are part of `pg_dump` output, the whole set can be relocatable, and has a version string attached. Name: - not `PACKAGE`, Oracle - not `MODULE`, that's already the name of a .so file - not `SYSTEM`, already something else - `BUNDLE` - `LIBRARY` - `UNIT` I'll pick UNIT here. Commands: CREATE UNIT name [ SCHEMA ... ] [ [ NOT ] RELOCATABLE ] [ REQUIRE ...]; WITH UNIT name; END UNIT name; ALTER UNIT name OWNER TO ; ALTER UNIT name ADD ; ALTER UNIT name DROP ; ALTER UNIT name SET SCHEMA ; ALTER UNIT name UPDATE TO ; ALTER UNIT name SET [ NOT ] RELOCATABLE; ALTER UNIT name REQUIRE a, b, c; COMMENT ON UNIT name IS ''; DROP UNIT name [ CASCADE ]; The `UPDATE TO` command only sets a new version string. # Implementation details We need a new `pg_unit` catalog, that looks almost exactly like the `pg_extension` one, except for the `extconfig` and `extcondition` fields. We need a way to `recordDependencyOnCurrentUnit()`, so another pair of static variables `creating_unit` and `CurrentUnitObject`. Each and every command we do support for creating objects must be made aware of the new `UNIT` concept, including `CREATE EXTENSION`. The `pg_dump` dependencies have to be set so that all the objects are restored independently first, as of today, and only then issue `CREATE UNIT` and a bunch of `ALTER UNIT ADD` commands, one per object. # Event Trigger support Event Triggers are to be provided for all the `UNIT` commands. # Life with Extensions and Units PostgreSQL now includes two different ways to package SQL objects, with about the same feature set. The only difference is the `pg_restore` behavior: *Extensions* are re-created from external resources, *Units* are re-created from what's in the dump. The smarts about `ALTER EXTENSION ... UPDATE` are not available when dealing with *UNITS*, leaving the user or the client scripts to care about that entirely on their own. In principle, a client can prepare a SQL script from a PGXN distribution and apply it surrounded by `WITH UNIT` and `END UNIT` commands. Upgrade scripts, once identified, can be run as straight SQL, adding a simple `ALTER UNIT ... UPDATE TO ...` command before the `COMMIT` at the end of the script. Identifying the upgrade script(s) may require implementing current Extension update smarts into whatever client side program is going to be built to support installing from PGXN etc. # Conclusion The main advantage of the `UNIT` proposal is that it copes very well with relations and other usual schema objects, as the data are preserved at `pg_restore` time. A `UNIT` can also entirely replace an `EXTENSION`, including when it needs a *module*, provided that the *module* is made available on the server's file system before creating the functions in `LANGUAGE C` that depend on it. It is possible to write a *UNIT distribution network* where a client software drives the installation of SQL objects within an UNIT, and this client software needs to include UNIT update smarts too. It's possible also to build that software as a set of Event Triggers on the `CREATE UNIT` and `ALTER UNIT UPDATE TO` commands. # Analysis The main drawback is that rather than building on extensions, both in a technical way and in building user trust, we are basically going to deprecate extensions entirely, giving them a new name an an incompatible way to manage them. Only *contribs* are going to be shipped as extensions, as they are basically the only known extensions following the same delivery rules as the PostgreSQL core p
Re: [HACKERS] stats for network traffic WIP
On 12/12/2013 02:51 AM, Tom Lane wrote: > The thing that I'm wondering is why the database would be the right place > to be measuring it at all. If you've got a network usage problem, > aggregate usage across everything on the server is probably what you > need to be worried about, and PG can't tell you that. I suspect this feature would be useful for when you want to try to drill down and figure out what's having network issues - specifically, to associate network behaviour with individual queries, individual users, application_name, etc. One sometimes faces the same issue with I/O: I know PostgreSQL is doing lots of I/O, but what exactly is causing the I/O? Especially if you can't catch it at the time it happens, it can be quite tricky to go from "there's lots of I/O" to "this query changed from using synchronized seqscans to doing an index-only scan that's hammering the cache". -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Problem with displaying "wide" tables in psql
Hello 2013/12/18 Sameer Thakur > On Wed, Dec 11, 2013 at 11:13 PM, Sergey Muraviov > wrote: > > Hi. > > > > I've improved the patch. > > It works in expanded mode when either format option is set to wrapped > (\pset > > format wrapped), or we have no pager, or pager doesn't chop long lines > (so > > you can still use the trick). > > Target output width is taken from either columns option (\pset columns > 70), > > or environment variable $COLUMNS, or terminal size. > > And it's also compatible with any border style (\pset border 0|1|2). > > > > Here are some examples: > > > > postgres=# \x 1 > > postgres=# \pset format wrapped > > postgres=# \pset border 0 > > postgres=# select * from wide_table; > > * Record 1 > > value afadsafasd fasdf asdfasd fsad fas df sadf sad f sadf sadf sa df > > sadfsadfa > > sd fsad fsa df sadf asd fa sfd sadfsadf asdf sad f sadf sad fadsf > > * Record 2 > > value afadsafasd fasdf asdfasd > > > > postgres=# \pset border 1 > > postgres=# \pset columns 70 > > postgres=# select * from wide_table; > > -[ RECORD 1 ]- > > value | afadsafasd fasdf asdfasd fsad fas df sadf sad f sadf sadf sa > > | df sadfsadfasd fsad fsa df sadf asd fa sfd sadfsadf asdf sad f > > | sadf sad fadsf > > -[ RECORD 2 ]- > > value | afadsafasd fasdf asdfasd > > > > postgres=# \pset border 2 > > postgres=# \pset columns 60 > > postgres=# select * from wide_table; > > +-[ RECORD 1 ]-+ > > | value | afadsafasd fasdf asdfasd fsad fas df sadf sad f | > > | | sadf sadf sa df sadfsadfasd fsad fsa df sadf as | > > | | d fa sfd sadfsadf asdf sad f sadf sad fadsf | > > +-[ RECORD 2 ]-+ > > | value | afadsafasd fasdf asdfasd | > > +---+--+ > > > > Regards, > > Sergey > > > > The patch applies and compile cleanly. I tried the following > \pset format wrapped > \pset columns 70. > Not in expanded mode > select * from wide_table works fine. > select * from pg_stats has problems in viewing. Is it that pg_stats > can be viewed easily only in expanded mode i.e. if columns displayed > are wrapped then there is no way to view results in non expanded mode? > regards > Sameer > The problem with non expanded mode is that all column headers have to be displayed on one line. Otherwise, it is difficult to bind values to columns. And I have no idea how to solve this problem. -- Best regards, Sergey Muraviov
Re: [HACKERS] Proposal: variant of regclass
>> For the pgpool-II use case, I'm happy to follow you because pgpool-II >> always does a grammatical check (using raw parser) on a query first >> and such syntax problem will be pointed out, thus never reaches to >> the state where calling toregclass. >> >> One concern is, other users expect toregclass to detect such syntax >> problems. Anybody? > > It seems fine to me if the new function ignores the specific error of > "relation does not exist" while continuing to throw other errors. Thanks. Here is the revised conceptual patch. I'm going to post a concrete patch and register it to 2014-01 Commit Fest. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c index c24a2c1..0406c30 100644 --- a/src/backend/utils/adt/regproc.c +++ b/src/backend/utils/adt/regproc.c @@ -45,6 +45,7 @@ static char *format_operator_internal(Oid operator_oid, bool force_qualify); static char *format_procedure_internal(Oid procedure_oid, bool force_qualify); static void parseNameAndArgTypes(const char *string, bool allowNone, List **names, int *nargs, Oid *argtypes); +static Datum regclass_guts(char *class_name_or_oid, bool raiseError); /* @@ -804,21 +805,50 @@ Datum regclassin(PG_FUNCTION_ARGS) { char *class_name_or_oid = PG_GETARG_CSTRING(0); + Oid result; + + result = regclass_guts(class_name_or_oid, true); + PG_RETURN_OID(result); +} + +/* + * toregclass - converts "classname" to class OID + * + * Diffrence from rgclassin is, this does not throw an error if the classname + * does not exist. + * Note: this is not an I/O function. + */ +Datum +toregclass(PG_FUNCTION_ARGS) +{ + char *class_name_or_oid = PG_GETARG_CSTRING(0); + Oid result; + + result = regclass_guts(class_name_or_oid, false); + PG_RETURN_OID(result); +} + +/* + * Guts of regclassin and toregclass. + * If raiseError is false, returns InvalidOid upon error. + */ +static Datum regclass_guts(char *class_name_or_oid, bool raiseError) +{ Oid result = InvalidOid; List *names; /* '-' ? */ if (strcmp(class_name_or_oid, "-") == 0) - PG_RETURN_OID(InvalidOid); + return result; /* Numeric OID? */ if (class_name_or_oid[0] >= '0' && class_name_or_oid[0] <= '9' && strspn(class_name_or_oid, "0123456789") == strlen(class_name_or_oid)) { - result = DatumGetObjectId(DirectFunctionCall1(oidin, + result = DatumGetObjectId(DirectFunctionCall1(oidin, CStringGetDatum(class_name_or_oid))); - PG_RETURN_OID(result); + return result; } /* Else it's a name, possibly schema-qualified */ @@ -848,16 +878,19 @@ regclassin(PG_FUNCTION_ARGS) if (HeapTupleIsValid(tuple = systable_getnext(sysscan))) result = HeapTupleGetOid(tuple); else - ereport(ERROR, - (errcode(ERRCODE_UNDEFINED_TABLE), - errmsg("relation \"%s\" does not exist", class_name_or_oid))); + if (raiseError) +ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_TABLE), + errmsg("relation \"%s\" does not exist", class_name_or_oid))); + else +return InvalidOid; /* We assume there can be only one match */ systable_endscan(sysscan); heap_close(hdesc, AccessShareLock); - PG_RETURN_OID(result); + return result; } /* @@ -865,11 +898,16 @@ regclassin(PG_FUNCTION_ARGS) * pg_class entries in the current search path. */ names = stringToQualifiedNameList(class_name_or_oid); + if (names == NIL) + return InvalidOid; /* We might not even have permissions on this relation; don't lock it. */ - result = RangeVarGetRelid(makeRangeVarFromNameList(names), NoLock, false); + if (raiseError) + result = RangeVarGetRelid(makeRangeVarFromNameList(names), NoLock, false); + else + result = RangeVarGetRelid(makeRangeVarFromNameList(names), NoLock, true); - PG_RETURN_OID(result); + return result; } /* diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index 0117500..472ccad 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -3304,13 +3304,14 @@ DATA(insert OID = 2218 ( regclassin PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 2 DESCR("I/O"); DATA(insert OID = 2219 ( regclassout PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 2275 "2205" _null_ _null_ _null_ _null_ regclassout _null_ _null_ _null_ )); DESCR("I/O"); +DATA(insert OID = 3179 ( toregclass PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 2205 "2275" _null_ _null_ _null_ _null_ toregclass _null_ _null_ _null_ )); +DESCR("convert classname to regclass"); DATA(insert OID = 2220 ( regtypein PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 2206 "2275" _null_ _null_ _null_ _null_ regtypein _null_ _null_ _null_ )); DESCR("I/O"); DATA(insert OID = 2221 ( regtypeout PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 2275 "2206" _null_ _null_ _null_ _null_ regtypeout _null_ _null_ _null_ )); DESCR(