[HACKERS] Parse more than bind and execute when connect to database by jdbc
Hi everyone, Finally we found , the JDBC function we ever modified contributed to this phenomenon, thanks of all. Yours, Wang Shuo HighGo Software Co.,Ltd. November 18, 2013 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On Wed, Oct 30, 2013 at 3:32 AM, Andres Freund and...@2ndquadrant.comwrote: On 2013-10-25 09:26:29 -0400, Robert Haas wrote: In any case, it's very far from obvious to me that CLUSTER ought to throw away information by default, which is what you're proposing. I find it odd to referring to this as throwing away information. I know that you have a general concern about throwing away XIDs that are still needed for forensic purposes, but that is clearly the ONLY purpose that those XIDs serve, and the I/O advantages of freezing by default could be massive for many of our users. What's going to happen in practice is that experienced users will simply recommend CLUSTER FREEZE rather than plain CLUSTER, and you won't have the forensic information *anyway*. I think we should just apply your preserve forensic information when freezing patch. Then we're good to go without big arguments ;) Ok, here's a recap on the thread as I see it now. 1. Thomas wrote the patch with the idea that FREEZE could be an option for cluster to freeze the tuples during the cluster operation, which would save a vacuum freeze somewhere down the line. Sounds like a good idea. 2. Robert introduced the idea that this perhaps could be the default option for cluster. 3. Tom highlighted that making freeze the default would wipe out xmin values so they wouldn't be available to any forensics which might to take place in the event of a disaster. 4. Andres mentioned that we might want to wait for a patch which Robert has been working on which, currently I don't know much about but it sounds like it freezes without setting xmin to frozenXid? 5. Robert stated that he's not had much time to work on this patch due to having other commitments. In the meantime Thomas sent in a patch which gets rid of the FREEZE option from cluster and makes it the default. So now I'm wondering what the next move should be for this patch? a. Are we waiting on Robert's patch to be commited before we can apply Thomas's cluster with freeze as default? b. Are we waiting on me reviewing one or both of the patches? Which one? I think probably it sounds safer not to make freeze the default, but then if we release 9.4 with CLUSTER FREEZE then in 9.5/10 decide it should be the default then we then have a redundant syntax to support for ever and ever. Decision time? Regards David Rowley Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] writable FDWs / update targets confusion
Tom Lane wrote: Tom, could you show us a rope if there is one? What is it you actually need to fetch? IIRC, the idea was that most FDWs would do the equivalent of fetching the primary-key columns to use in an update. If that's what you need, then AddForeignUpdateTargets should identify those columns and generate Vars for them. postgres_fdw is probably not a good model since it's using ctid (a non-portable thing) and piggybacking on the existence of a tuple header field to put that in. If you're dealing with some sort of hidden tuple identity column that works like CTID but doesn't fit in six bytes, there may not be any good solution in the current state of the FDW support. As I mentioned, we'd batted around the idea of letting FDWs define a system column with some other datatype besides TID, but we'd not figured out all the nitty gritty details in time for 9.3. I was hoping for the latter (a hidden column). But I'll have to settle for primary keys, which is also ok. Thanks for taking the time. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] freeze cannot be finished
Committed, thanks for the report! On 16.11.2013 22:05, Миша Тюрин wrote: Hello! Could anyone review patch suggested by Jeff Janes ? Initial thread http://www.postgresql.org/message-id/flat/1384356585.995240612%40f50.i.mail.ru Thanks in advance! On Wed, Nov 13, 2013 at 3:53 PM, Sergey Burladyan eshkin...@gmail.com wrote: Jeff Janes jeff.ja...@gmail.com writes: If I not mistaken, looks like lazy_scan_heap() called from lazy_vacuum_rel() (see [1]) skip pages, even if it run with scan_all == true, lazy_scan_heap() does not increment scanned_pages if lazy_check_needs_freeze() return false, so if this occurred at wraparound vacuum it cannot update pg_class, because pg_class updated via this code: new_frozen_xid = FreezeLimit; if (vacrelstats-scanned_pages vacrelstats-rel_pages) new_frozen_xid = InvalidTransactionId; vac_update_relstats(onerel, new_rel_pages, new_rel_tuples, new_rel_allvisible, vacrelstats-hasindex, new_frozen_xid); so i think in our prevent wraparound vacuum vacrelstats-scanned_pages always less than vacrelstats-rel_pages and pg_class relfrozenxid never updated. Yeah, I think that that is a bug. If the clean-up lock is unavailable but the page is inspected without it and found not to need freezing, then the page needs to be counted as scanned, but is not so counted. commit bbb6e559c4ea0fb4c346beda76736451dc24eb4e Date: Mon Nov 7 21:39:40 2011 -0500 But this was introduced in 9.2.0, so unless the OP didn't upgrade to 9.2 until recently, I don't know why it just started happening. It looks like a simple fix (to HEAD attached), but I don't know how to test it. Also, it seem like it might be worth issuing a warning if scan_all is true but all was not scanned. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org) To make changes to your subscription: https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/mailpref/pgsql-generalk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=xGch4oNJbpD%2BKPJECmgw4SLBhytSZLBX7UnkZhtNcpw%3D%0Am=n4tu%2Fhw2DBAVCLO0UwZqdJsniWyqjnPt3OK%2FqepXInw%3D%0As=ffa1f6d45b713d12b320b72a4f417015498f57305664de0da209dc32f5e8a63b -- -- - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 15.11.2013 21:00, Simon Riggs wrote: On 15 November 2013 15:48, Peter Eisentraut pete...@gmx.net wrote: Also, you set this to returned with feedback in the CF app. Please verify whether that was intentional. Not sure that was me, if so, corrected. It was me, sorry. I figured this needs such a large restructuring that it's not going to be committable in this commitfest. But I'm happy to keep the discussion going if you think otherwise. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 15.11.2013 20:21, Andres Freund wrote: On 2013-11-15 20:08:30 +0200, Heikki Linnakangas wrote: It's pretty hard to review the this without seeing the other implementation you're envisioning to use this API. But I'll try: We've written a distributed sequence implementation against it, so it works at least for that use case. While certainly not release worthy, it still might be interesting to look at. https://urldefense.proofpoint.com/v1/url?u=http://git.postgresql.org/gitweb/?p%3Dusers/andresfreund/postgres.git%3Ba%3Dblob%3Bf%3Dcontrib/bdr/bdr_seq.c%3Bh%3Dc9480c8021882f888e35080f0e7a7494af37ae27%3Bhb%3Dbdrk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=xGch4oNJbpD%2BKPJECmgw4SLBhytSZLBX7UnkZhtNcpw%3D%0Am=Rbmo%2BDar4PZrQmGH2adz7cCgqRl2%2FXH845YIA1ifS7A%3D%0As=8d287f35070fe7cb586f10b5bfe8664ad29e986b5f2d2286c4109e09f615668d bdr_sequencer_fill_sequence pre-allocates ranges of values, bdr_sequence_alloc returns them upon nextval(). Thanks. That pokes into the low-level details of sequence tuples, just like I was afraid it would. I wonder if the level of abstraction is right. The patch assumes that the on-disk storage of all sequences is the same, so the access method can't change that. But then it leaves the details of actually updating the on-disk block, WAL-logging and all, as the responsibility of the access method. Surely that's going to look identical in all the seqams, if they all use the same on-disk format. That also means that the sequence access methods can't be implemented as plugins, as plugins can't do WAL-logging. Well, it exposes log_sequence_tuple() - together with the added am private column of pg_squence that allows to do quite a bit of different things. I think unless we really implement pluggable RMGRs - which I don't really see coming - we cannot be radically different. The proposed abstraction leaks like a sieve. The plugin should not need to touch anything but the private amdata field. log_sequence_tuple() is way too low-level. Just as a thought-experiment, imagine that we decided to re-implement sequences so that all the sequence values are stored in one big table, or flat-file in the data directory, instead of the current implementation where we have a one-block relation file for each sequence. If the sequence access method API is any good, it should stay unchanged. That's clearly not the case with the proposed API. The comment in seqam.c says that there's a private column reserved for access method-specific data, called am_data, but that seems to be the only mention of am_data in the patch. It's amdata, not am_data :/. Directly at the end of pg_sequence. Ah, got it. Stepping back a bit and looking at this problem from a higher level, why do you need to hack this stuff into the sequences? Couldn't you just define a new set of functions, say bdr_currval() and bdr_nextval(), to operate on these new kinds of sequences? You're not making much use of the existing sequence infrastructure, anyway, so it might be best to just keep the implementation completely separate. If you need it for compatibility with applications, you could create facade currval/nextval() functions that call the built-in version or the bdr version depending on the argument. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 11/18/2013 05:19 AM, Andrew Dunstan wrote: On 11/17/2013 08:49 PM, Josh Berkus wrote: Now, if it turns out that the new hstore is not dealing with json input and output, we could have json, jstore and hstore. Jstore isn't the worst name suggestion I've heard on this thread. The reason I prefer JSONB though, is that a new user looking for a place to put JSON data will clearly realize that JSON and JSONB are alternatives and related in some way. They won't necessarily expect that jstore has anything to do with JSON, especially when there is another type called JSON. Quite a few people are liable to think it's something to do with Java. Besides, we might get sued by these people: http://www.jstor.org/ ;-) I don't think any name that doesn't begin with json is acceptable. I could live with jsonb. It has the merit of brevity, but maybe it's a tad too close to json to be the right answer. How about jsondoc, or jsonobj ? It is still reasonably 'json' but not too easy to confuse with existing json when typing And it perhaps hints better at the main difference from string-json, namely that it is an object and not textual source code / notation / processing info . Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY table FROM STDIN doesn't show count tag
On 18 October 2013 17:07, Rajeev rastogi rajeev.rast...@huawei.com wrote: From the following mail, copy behaviour between stdin and normal file having some inconsistency. http://www.postgresql.org/message-id/ce85a517.4878e%tim.k...@gmail.com The issue was that if copy execute from stdin, then it goes to the server to execute the command and then server request for the input, it sends back the control to client to enter the data. So once client sends the input to server, server execute the copy command and sends back the result to client but client does not print the result instead it just clear it out. Changes are made to ensure the final result from server get printed before clearing the result. Please find the patch for the same and let me know your suggestions. In this call : success = handleCopyIn(pset.db, pset.cur_cmd_source, PQbinaryTuples(*results), intres) success; if (success intres) success = PrintQueryResults(intres); Instead of handling of the result status this way, what if we use the ProcessResult() argument 'result' to pass back the COPY result status to the caller ? We already call PrintQueryResults(results) after the ProcessResult() call. So we don't have to have a COPY-specific PrintQueryResults() call. Also, if there is a subsequent SQL command in the same query string, the consequence of the patch is that the client prints both COPY output and the last command output. So my suggestion would also allow us to be consistent with the general behaviour that only the last SQL command output is printed in case of multiple SQL commands. Here is how it gets printed with your patch : psql -d postgres -c \copy tab from '/tmp/st.sql' delimiter ' '; insert into tab values ('lll', 3) COPY 1 INSERT 0 1 This is not harmful, but just a matter of consistency. *Thanks and Regards,* *Kumar Rajeev Rastogi * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On 18 November 2013 11:19 Haribabu kommi wrote: On 17 November 2013 00:55 Fujii Masao wrote: On Sat, Nov 16, 2013 at 2:27 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: on 15 November 2013 17:26 Magnus Hagander wrote: On Fri, Nov 15, 2013 at 12:10 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 14 November 2013 23:59 Fujii Masao wrote: On Thu, Nov 14, 2013 at 9:08 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: Please find attached the patch, for adding a new option for pg_basebackup, to specify a different directory for pg_xlog. Sounds good! Here are the review comments: Don't we need to prevent users from specifying the same directory in both --pgdata and --xlogdir? I feel no need to prevent, even if user specifies both --pgdata and --xlogdir as same directory all the transaction log files will be created in the base directory instead of pg_xlog directory. Given how easy it would be to prevent that, I think we should. It would be an easy misunderstanding to specify that when you actually want it in wherever/pg_xlog. Specifying that would be redundant in the first place, but people ca do that, but it would also be very easy to do it by mistake, and you'd end up with something that's really bad, including a recursive symlink. Presently with initdb also user can specify both data and xlog directories as same. To prevent the data directory and xlog directory as same, there is a way in windows (_fullpath api) to get absolute path from a relative path, but I didn't get any such possibilities in linux. I didn't find any other way to check it, if anyone have any idea regarding this please let me know. What about make_absolute_path() in miscinit.c? The make_absoulte_patch() function gets the current working directory and adds The relative path to CWD, this is not giving proper absolute path. I have added a new function verify_data_and_xlog_dir_same() which will change the Current working directory to data directory and gets the CWD and the same way for transaction log directory. Compare the both data and xlog directories and throw an error. Please check it once. Is there any other way to identify that both data and xlog directories are pointing to the same Instead of comparing both absolute paths? Updated patch attached in the mail. Failure when the xlogdir doesn't exist is fixed in the updated patch attached in the mail. Regards, Hari babu. UserSpecifiedxlogDir_v4.patch Description: UserSpecifiedxlogDir_v4.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote: On 15.11.2013 20:21, Andres Freund wrote: Well, it exposes log_sequence_tuple() - together with the added am private column of pg_squence that allows to do quite a bit of different things. I think unless we really implement pluggable RMGRs - which I don't really see coming - we cannot be radically different. The proposed abstraction leaks like a sieve. The plugin should not need to touch anything but the private amdata field. log_sequence_tuple() is way too low-level. Why? It's *less* low level than a good part of other crash recovery code we have. I have quite some doubts about the layering, but it's imo pretty sensible to centralize the wal logging code that plugins couldn't write. If you look at what e.g the _alloc callback currently gets passed. a) Relation: Important for metadata like TupleDesc, name etc. b) SeqTable entry: Important to setup state for cur/lastval c) Buffer of the tuple: LSN etc. d) HeapTuple itself: a place to store the tuples changed state And if you then look at what gets modified in that callback: Form_pg_sequence-amdata -is_called -last_value -log_cnt SeqTable-last SeqTable-cached SeqTable-last_valid You need is_called, last_valid because of our current representation of sequences state (which we could change, to remove that need). The rest contains values that need to be set depending on how you want your sequence to behave: * Amdata is obvious. * You need log_cnt to influence/remember how big the chunks are you WAL log at once are. Which e.g. you need to control if want a sequence that doesn't leak values in crashes * cached is needed to control the per-backend caching. Just as a thought-experiment, imagine that we decided to re-implement sequences so that all the sequence values are stored in one big table, or flat-file in the data directory, instead of the current implementation where we have a one-block relation file for each sequence. If the sequence access method API is any good, it should stay unchanged. That's clearly not the case with the proposed API. I don't think we can entirely abstract away the reality that now they are based on relations and might not be at some later point. Otherwise we'd have to invent an API that copied all the data that's stored in struct RelationData. Like name, owner, ... Which non SQL accessible API we provide has a chance of providing that level of consistency in the face of fundamental refactoring? I'd say none. Stepping back a bit and looking at this problem from a higher level, why do you need to hack this stuff into the sequences? Couldn't you just define a new set of functions, say bdr_currval() and bdr_nextval(), to operate on these new kinds of sequences? Basically two things: a) User interface. For one everyone would need to reimplement the entire sequence DDL from start. For another it means it's hard to write (client) code that doesn't depend on a specific implementation. b) It's not actually easy to get similar semantics in userspace. How would you emulate the crash-safe but non-transactional semantics of sequences without copying most of sequence.c? Without writing XLogInsert()s which we cannot do from a out-of-tree code? You're not making much use of the existing sequence infrastructure, anyway, so it might be best to just keep the implementation completely separate. If you need it for compatibility with applications, you could create facade currval/nextval() functions that call the built-in version or the bdr version depending on the argument. That doesn't get you very far: a) the default functions created by sequences are pg_catalog prefixed. So you'd need to hack up the catalog to get your own functions to be used if you want the application to work transparently. In which you need to remember the former function because you now cannot call it normally anymore. Yuck. b) One would need nearly all of sequence.c again. You need the state across calls, the locking, the WAL logging, DDL support. Pretty much the only thing *not* used would be nextval_internal() and do_setval(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_statements: calls under-estimation propagation
Hello, Please find v10 of patch attached. This patch addresses following review comments 1. Removed errcode and used elogs for error pg_stat_statements schema is not supported by its binary 2. Removed comments and other code formatting not directly relevant to patch functionality 3. changed position of query_id in view to userid,dbid,query_id.. 4 cleaned the patch some more to avoid unnecessary whitespaces, newlines. I assume the usage of PGSS_TUP_LATEST after explanation given. Also the mixing of PG_VERSION_NUM with query_id is ok after after explanation given. regards Sameer pg_stat_statements-identification-v10.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 18.11.2013 11:48, Andres Freund wrote: On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote: On 15.11.2013 20:21, Andres Freund wrote: Well, it exposes log_sequence_tuple() - together with the added am private column of pg_squence that allows to do quite a bit of different things. I think unless we really implement pluggable RMGRs - which I don't really see coming - we cannot be radically different. The proposed abstraction leaks like a sieve. The plugin should not need to touch anything but the private amdata field. log_sequence_tuple() is way too low-level. Why? It's *less* low level than a good part of other crash recovery code we have. I have quite some doubts about the layering, but it's imo pretty sensible to centralize the wal logging code that plugins couldn't write. It doesn't go far enough, it's still too *low*-level. The sequence AM implementation shouldn't need to have direct access to the buffer page at all. If you look at what e.g the _alloc callback currently gets passed. a) Relation: Important for metadata like TupleDesc, name etc. b) SeqTable entry: Important to setup state for cur/lastval c) Buffer of the tuple: LSN etc. d) HeapTuple itself: a place to store the tuples changed state And if you then look at what gets modified in that callback: Form_pg_sequence-amdata -is_called -last_value -log_cnt SeqTable-last SeqTable-cached SeqTable-last_valid You need is_called, last_valid because of our current representation of sequences state (which we could change, to remove that need). The rest contains values that need to be set depending on how you want your sequence to behave: * Amdata is obvious. * You need log_cnt to influence/remember how big the chunks are you WAL log at once are. Which e.g. you need to control if want a sequence that doesn't leak values in crashes * cached is needed to control the per-backend caching. I don't think the sequence AM should be in control of 'cached'. The caching is done outside the AM. And log_cnt probably should be passed to the _alloc function directly as an argument, ie. the server code asks the AM to allocate N new values in one call. I'm thinking that the alloc function should look something like this: seqam_alloc(Relation seqrel, int nrequested, Datum am_private) And it should return: int64 value - the first value allocated. int nvalues - the number of values allocated. am_private - updated private data. The backend code handles the caching and logging of values. When it has exhausted all the cached values (or doesn't have any yet), it calls the AM's alloc function to get a new batch. The AM returns the new batch, and updates its private state as necessary. Then the backend code updates the relation file with the new values and the AM's private data. WAL-logging and checkpointing is the backend's responsibility. Just as a thought-experiment, imagine that we decided to re-implement sequences so that all the sequence values are stored in one big table, or flat-file in the data directory, instead of the current implementation where we have a one-block relation file for each sequence. If the sequence access method API is any good, it should stay unchanged. That's clearly not the case with the proposed API. I don't think we can entirely abstract away the reality that now they are based on relations and might not be at some later point. Otherwise we'd have to invent an API that copied all the data that's stored in struct RelationData. Like name, owner, ... Which non SQL accessible API we provide has a chance of providing that level of consistency in the face of fundamental refactoring? I'd say none. I'm not thinking that we'd change sequences to not be relations. I'm thinking that we might not want to store the state as a one-page file anymore. In fact that was just discussed in the other thread titled init_sequence spill to hash table. b) It's not actually easy to get similar semantics in userspace. How would you emulate the crash-safe but non-transactional semantics of sequences without copying most of sequence.c? Without writing XLogInsert()s which we cannot do from a out-of-tree code? heap_inplace_update() - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of pg_stat_statement usage about buffer hit ratio
On 18 October 2013 13:35 KONDO Mitsumasa wrote: Hi, I submit improvement of pg_stat_statement usage patch in CF3. In pg_stat_statement, I think buffer hit ratio is very important value. However, it is difficult to calculate it, and it need complicated SQL. This patch makes it more simple usage and documentation. -bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit / - nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent +bench=# SELECT query, calls, total_time, rows, +shared_blks_hit_percent FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5; It will be very simple:-) This patch conflicts pg_stat_statement_min_max_exectime patch which I submitted, and pg_stat_statement_min_max_exectime patch also adds new columns which are min_time and max_time. So I'd like to change it in this opportunity. This patch adds another column shared_blks_hit_percent to pg_stat_statements view Which is very beneficial to the user to know how much percentage of blks are hit. All changes are fine and working as described. Marked as ready for committer. Regards, Hari babu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of pg_stat_statement usage about buffer hit ratio
(2013/11/18 20:16), Haribabu kommi wrote: On 18 October 2013 13:35 KONDO Mitsumasa wrote: This patch conflicts pg_stat_statement_min_max_exectime patch which I submitted, and pg_stat_statement_min_max_exectime patch also adds new columns which are min_time and max_time. So I'd like to change it in this opportunity. This patch adds another column shared_blks_hit_percent to pg_stat_statements view Which is very beneficial to the user to know how much percentage of blks are hit. All changes are fine and working as described. Marked as ready for committer. Thank you for your reviewing! However, I'd like to add average time in each statement, too. Attached patch is my latest one. Adding shared_blks_hit_percent and ave_time. This is the adding main code. + total_time / calls::float AS avg_time, If this patch and min/max and stddev patch will be commited, we can see more detail and simple information in pg_stat_statements, by light-weight coding. Regards, -- Mitsumasa KONDO NTT Open Source Software Center diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql new file mode 100644 index 000..f0a8e0f --- /dev/null +++ b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql @@ -0,0 +1,63 @@ +/* contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql */ + +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use ALTER EXTENSION pg_stat_statements UPDATE to load this file. \quit + +/* First we have to remove them from the extension */ +ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements; +ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(); + +/* Then we can drop them */ +DROP VIEW pg_stat_statements; +DROP FUNCTION pg_stat_statements(); + +/* Now redefine */ +CREATE FUNCTION pg_stat_statements( +OUT userid oid, +OUT dbid oid, +OUT query text, +OUT calls int8, +OUT total_time float8, +OUT rows int8, +OUT shared_blks_hit int8, +OUT shared_blks_read int8, +OUT shared_blks_dirtied int8, +OUT shared_blks_written int8, +OUT local_blks_hit int8, +OUT local_blks_read int8, +OUT local_blks_dirtied int8, +OUT local_blks_written int8, +OUT temp_blks_read int8, +OUT temp_blks_written int8, +OUT blk_read_time float8, +OUT blk_write_time float8 +) +RETURNS SETOF record +AS 'MODULE_PATHNAME' +LANGUAGE C; + +CREATE VIEW pg_stat_statements AS + SELECT userid, + dbid, + query, + calls, + total_time, + rows, + CASE WHEN shared_blks_hit + shared_blks_read 0 + THEN 100.0 * (shared_blks_hit::float / (shared_blks_hit + shared_blks_read)) + ELSE 0 END AS shared_blks_hit_percent, + shared_blks_hit, + shared_blks_read, + shared_blks_dirtied, + shared_blks_written, + local_blks_hit, + local_blks_read, + local_blks_dirtied, + local_blks_written, + temp_blks_read, + temp_blks_written, + blk_read_time, + blk_write_time + FROM pg_stat_statements(); + +GRANT SELECT ON pg_stat_statements TO PUBLIC; diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql new file mode 100644 index 000..41dc16b --- /dev/null +++ b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql @@ -0,0 +1,65 @@ +/* contrib/pg_stat_statements/pg_stat_statements--1.2.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use CREATE EXTENSION pg_stat_statements to load this file. \quit + +-- Register functions. +CREATE FUNCTION pg_stat_statements_reset() +RETURNS void +AS 'MODULE_PATHNAME' +LANGUAGE C; + +CREATE FUNCTION pg_stat_statements( +OUT userid oid, +OUT dbid oid, +OUT query text, +OUT calls int8, +OUT total_time float8, +OUT rows int8, +OUT shared_blks_hit int8, +OUT shared_blks_read int8, +OUT shared_blks_dirtied int8, +OUT shared_blks_written int8, +OUT local_blks_hit int8, +OUT local_blks_read int8, +OUT local_blks_dirtied int8, +OUT local_blks_written int8, +OUT temp_blks_read int8, +OUT temp_blks_written int8, +OUT blk_read_time float8, +OUT blk_write_time float8 +) +RETURNS SETOF record +AS 'MODULE_PATHNAME' +LANGUAGE C; + +-- Register a view on the function for ease of use. +CREATE VIEW pg_stat_statements AS + SELECT userid, + dbid, + query, + calls, + total_time, + total_time / calls::float AS avg_time, + rows, + CASE WHEN shared_blks_hit + shared_blks_read 0 + THEN 100.0 * (shared_blks_hit::float / (shared_blks_hit + shared_blks_read)) + ELSE 0 END AS shared_blks_hit_percent, + shared_blks_hit, + shared_blks_read, + shared_blks_dirtied, + shared_blks_written, +
Re: [HACKERS] Sequence Access Method WIP
On 2013-11-18 12:50:21 +0200, Heikki Linnakangas wrote: On 18.11.2013 11:48, Andres Freund wrote: I don't think the sequence AM should be in control of 'cached'. The caching is done outside the AM. And log_cnt probably should be passed to the _alloc function directly as an argument, ie. the server code asks the AM to allocate N new values in one call. Sounds sane. I'm thinking that the alloc function should look something like this: seqam_alloc(Relation seqrel, int nrequested, Datum am_private) I don't think we can avoid giving access to the other columns of pg_sequence, stuff like increment, limits et all need to be available for reading, so that'd also need to get passed in. And we need to signal that am_private can be NULL, otherwise we'd end up with ugly ways to signal that. So I'd say to pass in the entire tuple, and return a copy? Alternatively we can return am_private as a 'struct varlena *', so we can properly signal empty values. We also need a way to set am_private from outside seqam_alloc/setval/... Many of the fancier sequences that people talked about will need preallocation somewhere in the background. As proposed that's easy enough to write using log_sequence_tuple(), this way we'd need something that calls a callback with the appropriate buffer lock held. So maybe a seqam_update(Relation seqrel, ...) callback that get's called when somebody executes pg_sequence_update(oid)? It'd probably a good idea to provide a generic function for checking whether a new value falls in the boundaries of the sequence's min, max + error handling. I'm not thinking that we'd change sequences to not be relations. I'm thinking that we might not want to store the state as a one-page file anymore. In fact that was just discussed in the other thread titled init_sequence spill to hash table. Yes, I read and even commented in that thread... But nothing in the current proposed API would prevent you from going in that direction, you'd just get passed in a different tuple/buffer. b) It's not actually easy to get similar semantics in userspace. How would you emulate the crash-safe but non-transactional semantics of sequences without copying most of sequence.c? Without writing XLogInsert()s which we cannot do from a out-of-tree code? heap_inplace_update() That gets the crashsafe part partially (doesn't allow making the tuple wider than before), but not the caching/stateful part et al. The point is that you need most of sequence.c again. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 18 November 2013 07:50, Heikki Linnakangas hlinnakan...@vmware.com wrote: It doesn't go far enough, it's still too *low*-level. The sequence AM implementation shouldn't need to have direct access to the buffer page at all. I don't think the sequence AM should be in control of 'cached'. The caching is done outside the AM. And log_cnt probably should be passed to the _alloc function directly as an argument, ie. the server code asks the AM to allocate N new values in one call. I can't see what the rationale of your arguments is. All index Ams write WAL and control buffer locking etc.. Do you have a new use case that shows why changes should happen? We can't just redesign things based upon arbitrary decisions about what things should or should not be possible via the API. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 18.11.2013 13:48, Simon Riggs wrote: On 18 November 2013 07:50, Heikki Linnakangas hlinnakan...@vmware.com wrote: It doesn't go far enough, it's still too *low*-level. The sequence AM implementation shouldn't need to have direct access to the buffer page at all. I don't think the sequence AM should be in control of 'cached'. The caching is done outside the AM. And log_cnt probably should be passed to the _alloc function directly as an argument, ie. the server code asks the AM to allocate N new values in one call. I can't see what the rationale of your arguments is. All index Ams write WAL and control buffer locking etc.. Index AM's are completely responsible for the on-disk structure, while with the proposed API, both the AM and the backend are intimately aware of the on-disk representation. Such a shared responsibility is not a good thing in an API. I would also be fine with going 100% to the index AM direction, and remove all knowledge of the on-disk layout from the backend code and move it into the AMs. Then you could actually implement the discussed store all sequences in a single file change by writing a new sequence AM for it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REINDEX CONCURRENTLY 2.0
On 2013-11-18 19:52:29 +0900, Michael Paquier wrote: On Sat, Nov 16, 2013 at 5:09 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-11-15 11:40:17 +0900, Michael Paquier wrote: - 20131114_3_reindex_concurrently.patch, providing the core feature. Patch 3 needs to have patch 2 applied first. Regression tests, isolation tests and documentation are included with the patch. Have you addressed my concurrency concerns from the last version? I have added documentation in the patch with a better explanation about why those choices of implementation are made. The dropping still isn't safe: After phase 4 we are in the state: old index: valid, live, !isdead new index: !valid, live, !isdead Then you do a index_concurrent_set_dead() from that state on in phase 5. There you do WaitForLockers(locktag, AccessExclusiveLock) before index_set_state_flags(INDEX_DROP_SET_DEAD). That's not sufficient. Consider what happens with the following sequence: 1) WaitForLockers(locktag, AccessExclusiveLock) - GetLockConflicts() = virtualxact 1 - VirtualXactLock(1) 2) virtualxact 2 starts, opens the *old* index since it's currently the only valid one. 3) virtualxact 1 finishes 4) index_concurrent_set_dead() does index_set_state_flags(DROP_SET_DEAD) 5) another transaction (vxid 3) starts inserting data into the relation, updates only the new index, the old index is dead 6) vxid 2 inserts data, updates only the old index. Since it had the index already open the cache invalidations won't be processed. Now the indexes are out of sync. There's entries only in the old index and there's entries only in the new index. Not good. I hate to repeat myself, but you really need to follow the current protocol for concurrently dropping indexes. Which involves *first* marking the index as invalid so it won't be used for querying anymore, then wait for everyone possibly still seing that entry to finish, and only *after* that mark the index as dead. You cannot argue away correctness concerns with potential deadlocks. c.f. http://www.postgresql.org/message-id/20130926103400.ga2471...@alap2.anarazel.de I am also still unconvinced that the logic in index_concurrent_swap() is correct. It very much needs to explain why no backend can see values that are inconsistent. E.g. what prevents a backend thinking the old and new indexes have the same relfilenode? MVCC snapshots don't seem to protect you against that. I am not sure there's a problem, but there certainly needs to more comments explaining why there are none. Something like the following might be possible: Backend 1: start reindex concurrently, till phase 4 Backend 2: ExecOpenIndices() - RelationGetIndexList (that list is consistent due to mvcc snapshots!) Backend 2: - index_open(old_index) (old relfilenode) Backend 1: index_concurrent_swap() - CommitTransaction() - ProcArrayEndTransaction() (changes visible to others henceforth!) Backend 2: - index_open(new_index) = no cache invalidations yet, gets the old relfilenode Backend 2: ExecInsertIndexTuples() = updates the same relation twice, corruptf Backend 1: stil. in CommitTransaction() - AtEOXact_Inval() sends out invalidations Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sequence Access Method WIP
On 14.11.2013 22:10, Simon Riggs wrote: Includes test extension which allows sequences without gaps - gapless. I realize this is just for demonstration purposes, but it's worth noting that it doesn't actually guarantee that when you use the sequence to populate a column in the table, the column would not have gaps. Sequences are not transactional, so rollbacks will still produce gaps. The documentation is misleading on that point. Without a strong guarantee, it's a pretty useless extension. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
On 17 November 2013 12:25 Amit Kapila wrote: On Sat, Nov 16, 2013 at 4:35 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 16 November 2013 09:43 Amit Kapila wrote: On Fri, Nov 15, 2013 at 10:18 PM, Peter Eisentraut pete...@gmx.net wrote: On 11/14/13, 2:50 AM, Amit Kapila wrote: Find the rebased version attached with this mail. ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(configuration file \%s\ contains errors, - ConfigFileName))); + + ErrorConfFile))); The ErrorConfFile prints postgresql.auto.conf only if there is any parsing problem with postgresql.auto.conf otherwise it always print postgresql.conf because of any other error. Changed to ensure ErrorConfFile contains proper config file name. Note: I have not asssigned file name incase of error in below loop, as file name in gconf is NULL in most cases and moreover this loops over guc_variables which doesn't contain values/parameters from auto.conf. So I don't think it is required to assign ErrorConfFile in this loop. ProcessConfigFile(GucContext context) { .. for (i = 0; i num_guc_variables; i++) { struct config_generic *gconf = guc_variables[i]; .. } Code changes are fine. If two or three errors are present in the configuration file, it prints the last error Configuration parameter file only. Is it required to be mentioned in the documentation? if any postmaster setting which are set by the alter system command which leads to failure of server start, what is the solution to user to proceed further to start the server. As it is mentioned that the auto.conf file shouldn't be edited manually. Yeah, but in case of emergency user can change it to get server started. Now the question is whether to mention it in documentation, I think we can leave this decision to committer. If he thinks that it is better to document then I will update it. Ok fine. Regards, Hari babu. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On Mon, Nov 18, 2013 at 6:31 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 18 November 2013 11:19 Haribabu kommi wrote: On 17 November 2013 00:55 Fujii Masao wrote: On Sat, Nov 16, 2013 at 2:27 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: on 15 November 2013 17:26 Magnus Hagander wrote: On Fri, Nov 15, 2013 at 12:10 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 14 November 2013 23:59 Fujii Masao wrote: On Thu, Nov 14, 2013 at 9:08 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: Please find attached the patch, for adding a new option for pg_basebackup, to specify a different directory for pg_xlog. Sounds good! Here are the review comments: Don't we need to prevent users from specifying the same directory in both --pgdata and --xlogdir? I feel no need to prevent, even if user specifies both --pgdata and --xlogdir as same directory all the transaction log files will be created in the base directory instead of pg_xlog directory. Given how easy it would be to prevent that, I think we should. It would be an easy misunderstanding to specify that when you actually want it in wherever/pg_xlog. Specifying that would be redundant in the first place, but people ca do that, but it would also be very easy to do it by mistake, and you'd end up with something that's really bad, including a recursive symlink. Presently with initdb also user can specify both data and xlog directories as same. To prevent the data directory and xlog directory as same, there is a way in windows (_fullpath api) to get absolute path from a relative path, but I didn't get any such possibilities in linux. I didn't find any other way to check it, if anyone have any idea regarding this please let me know. What about make_absolute_path() in miscinit.c? The make_absoulte_patch() function gets the current working directory and adds The relative path to CWD, this is not giving proper absolute path. I have added a new function verify_data_and_xlog_dir_same() which will change the Current working directory to data directory and gets the CWD and the same way for transaction log directory. Compare the both data and xlog directories and throw an error. Please check it once. Is there any other way to identify that both data and xlog directories are pointing to the same Instead of comparing both absolute paths? Updated patch attached in the mail. Failure when the xlogdir doesn't exist is fixed in the updated patch attached in the mail. Thanks for updating the patch! With the patch, when I specify the same directory in both -D and --xlogdir, I got the following error. $ cd /tmp $ pg_basebackup -D hoge --xlogdir=/tmp/hoge -X fetch pg_basebackup: could not change directory to hoge: No such file or directory Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE TABLE IF NOT EXISTS AS
On Sun, Nov 17, 2013 at 6:05 PM, David E. Wheeler da...@justatheory.comwrote: On Nov 16, 2013, at 4:40 PM, Tom Lane t...@sss.pgh.pa.us wrote: Co-worker asked a question I could not answer: Why is IF NOT EXISTS not supported by CREATE TABLE AS? That's an even worse idea than plain CREATE IF NOT EXISTS (which was put in over vocal objections from me and some other people). Not only would you not have the faintest clue as to the properties of the table afterwards, but no clue as to its contents either. You mean that, after running it, one cannot tell whether or not a new table was created or not without looking at it? I guess that makes sense, though sometimes I like to tell the system to assume that I know what I’m doing -- e.g., that either outcome works for me. Not essential as a core feature, mind you; I can use DO to accomplish the same thing. It’s just a bit more work that way. And perhaps that’s for the best. I'm planning to implement it for the next commit fest (2014-01)... Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] appendPQExpBufferVA vs appendStringInfoVA
On Mon, Nov 18, 2013 at 06:18:01PM +1300, David Rowley wrote: On Mon, Nov 18, 2013 at 1:01 AM, Marko Kreen mark...@gmail.com wrote: I am bit suspicious of performance impact of this patch, but think that it's still worthwhile as it decreases code style where single string argument is given to printf-style function without %s. This thread probably did not explain very will the point of this patch. All this kicked up from an earlier patch which added for alignment in the log_line_prefix GUC. After some benchmarks were done on the proposed patch for that, it was discovered that replacing appendStringInfoString with appendStringInfo gave a big enough slowdown to matter in high volume logging scenarios. That patch was revised and the appendStringInfo()'s were only used when they were really needed and performance increased again. I then ran a few benchmarks seen here: http://www.postgresql.org/message-id/caaphdvp2ulkpuhjnckonbgg2vxpvxolopzhrgxbs-m0r0v4...@mail.gmail.com To compare appendStringInfo(si, %s, str); with appendStringinfoString(a, str); and appendStringInfo(si, str); The conclusion to those benchmarks were that appendStringInfoString() was the best function to use when no formatting was required, so I submitted a patch which replaced appendStringInfo() with appendStringInfoString() where that was possible and that was accepted. appendPQExpBuffer() and appendPQExpBufferStr are the front end versions of appendStringInfo, so I spent an hour or so replacing these calls too as it should show a similar speedup, though in this case likely the performance is less critical, my thinking was more along the lines of, it increases performance a little bit with a total of 0 increase in code complexity. I was actually praising the patch that it reduces complexity, so it's worth applying even if there is no performance win. With performance win, it's doubleplus good. The patch applies and regtests work fine. So I mark it as ready for committer. The findings in the benchmarks in the link above also showed that we might want to look into turning appendStringInfoString into a macro around appendBinaryStringInfo() to allow us to skip the strlen() call for string constants... It's unclear at the moment if this would be a good idea or much of an improvement, so it was left for something to think about for the future. In any case it should be separate patch. Perhaps applying the same optimization for all such functions. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 11/15/13, 6:15 PM, Josh Berkus wrote: Thing is, I'm not particularly concerned about *Merlin's* specific use case, which there are ways around. What I am concerned about is that we may have users who have years of data stored in JSON text fields which won't survive an upgrade to binary JSON, because we will stop allowing certain things (ordering, duplicate keys) which are currently allowed in those columns. At the very least, if we're going to have that kind of backwards compatibilty break we'll want to call the new version 10.0. We could do something like SQL/XML and specify the level of validity in a typmod, e.g., json(loose), json(strict), etc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Force optimizer to use hash/nl/merge join?
Hi All, Is there any way to force the optimizer to use a specific join operator? For example, in SQL Server, I can do this way select * from (A inner hash join B on A.a = B.b) inner loop join C on A.a = C.c I did some search but didn't find PostgreSQL had similar join hints except for enable_* setting which doesn't help in my case. Best Regards, Zhan
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
On Mon, Nov 18, 2013 at 6:28 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 17 November 2013 12:25 Amit Kapila wrote: On Sat, Nov 16, 2013 at 4:35 PM, Haribabu kommi Find the rebased version attached with this mail. ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(configuration file \%s\ contains errors, - ConfigFileName))); + + ErrorConfFile))); The ErrorConfFile prints postgresql.auto.conf only if there is any parsing problem with postgresql.auto.conf otherwise it always print postgresql.conf because of any other error. Changed to ensure ErrorConfFile contains proper config file name. Note: I have not asssigned file name incase of error in below loop, as file name in gconf is NULL in most cases and moreover this loops over guc_variables which doesn't contain values/parameters from auto.conf. So I don't think it is required to assign ErrorConfFile in this loop. ProcessConfigFile(GucContext context) { .. for (i = 0; i num_guc_variables; i++) { struct config_generic *gconf = guc_variables[i]; .. } Code changes are fine. If two or three errors are present in the configuration file, it prints the last error Configuration parameter file only. Is it required to be mentioned in the documentation? Do you mean to say parsing errors or some run-time error, could you explain with example? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Force optimizer to use hash/nl/merge join?
On Mon, Nov 18, 2013 at 7:58 PM, Zhan Li zhanl...@gmail.com wrote: Hi All, Is there any way to force the optimizer to use a specific join operator? For example, in SQL Server, I can do this way select * from (A inner hash join B on A.a = B.b) inner loop join C on A.a = C.c I did some search but didn't find PostgreSQL had similar join hints except for enable_* setting which doesn't help in my case. PostgreSQL doesn't have join hints. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
On Thu, Nov 14, 2013 at 12:28 AM, Tom Lane t...@sss.pgh.pa.us wrote: 2) Allow foreign tables to have CHECK constraints Like NOT NULL, I think we don't need to enforce the check duroing INSERT/UPDATE against foreign table. Really? It's one thing to say that somebody who adds a CHECK constraint to a foreign table is responsible to make sure that the foreign data will satisfy the constraint. It feels like a different thing to say that ALTER TABLE ADD CONSTRAINT applied to a parent table will silently assume that some child table that happens to be foreign doesn't need any enforcement. Perhaps more to the point, inheritance trees are the main place where the planner depends on the assumption that CHECK constraints represent reality. Are we really prepared to say that it's the user's fault if the planner generates an incorrect plan on the strength of a CHECK constraint that's not actually satisfied by the foreign data? If so, that had better be documented by this patch. But for a project that refuses to let people create a local CHECK or FOREIGN KEY constraint without mechanically checking it, it seems pretty darn weird to be so laissez-faire about constraints on foreign data. I can see both sides of this issue. We certainly have no way to force the remote side to enforce CHECK constraints defined on the local side, because the remote side could also be accepting writes from other sources that don't have any matching constraint. But having said that, I can't see any particularly principled reason why we shouldn't at least check the new rows we insert ourselves. After all, we could be in the situation proposed by KaiGai Kohei, where the foreign data wrapper API is being used as a surrogate storage engine API - i.e. there are no writers to the foreign side except ourselves. In that situation, it would seem odd to randomly fail to enforce the constraints. On the other hand, the performance costs of checking every row bound for the remote table could be quite steep. Consider an update on an inheritance hierarchy that sets a = a + 1 for every row. If we don't worry about verifying that the resulting rows satisfy all local-side constraints, we can potentially ship a single update statement to the remote server and let it do all the work there. But if we DO have to worry about that, then we're going to have to ship every updated row over the wire in at least one direction, if not both. If the purpose of adding CHECK constraints was to enable constraint exclusion, that's a mighty steep price to pay for it. I think it's been previously proposed that we have some version of a CHECK constraint that effectively acts as an assertion for query optimization purposes, but isn't actually enforced by the system. I can see that being useful in a variety of situations, including this one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Sun, Nov 17, 2013 at 10:19 PM, Andrew Dunstan and...@dunslane.net wrote: I don't think any name that doesn't begin with json is acceptable. I could live with jsonb. It has the merit of brevity, but maybe it's a tad too close to json to be the right answer. I think that seems right. Couple thoughts: *) Aside from the text in and out routines, how is 'jsbonb' different from the coming 'nested hstore'? Enough to justify two code bases? *) How much of the existing json API has to be copied over to the jsonb type and how exactly is that going to happen? For example, I figure we'd need a record_to_jsonb etc. for sure, but do we also need a jsonb_each()...can't we overload instead? Maybe we can cheat a little bit overload the functions so that one the existing APIs (hstore or json) can be recovered -- only adding what minimal functionality needs to be added to handle the type distinction (mostly on serialization routines and casts). What I'm driving at here is that it would be nice if the API was not strongly bifurcated: this would cause quite a bit of mindspace confusion. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Review: pre-commit triggers
Review for Andrew Dunstan's patch in CF 2013-11: https://commitfest.postgresql.org/action/patch_view?id=1312 This review is more from a usage point of view, I would like to spend more time looking at the code but only so many hours in a day, etcetera; I hope this is useful as-is. Submission review - * Is the patch in a patch format which has context? Yes * Does it apply cleanly to the current git master? Yes. No new files are introduced by this patch. * Does it include reasonable tests, necessary doc patches, etc? Documentation patches included, but no tests. (I have a feeling it might be necessary to add a FAQ somewhere as to why there's no transaction rollback trigger). Usability review * Does the patch actually implement that? Yes. * Do we want that? There is an item in the todo-list Add database and transaction-level triggers; the linked thread: http://archives.postgresql.org/pgsql-hackers/2008-05/msg00620.php from 2008 doesn't seem too receptive to the idea, but this time round there don't seem to be any particular objections. Personally I don't have a use-case but it certainly looks like a useful compatibility feature when porting from other databases. Andrew mentions porting from Firebird; for reference this is the relevant Firebird documentation: http://www.firebirdsql.org/refdocs/langrefupd21-ddl-trigger.html * Do we already have it? No. * Does it follow SQL spec, or the community-agreed behavior? Not sure about the SQL spec. The Compatibility section of the CREATE TRIGGER doc page doesn't mention anything along these lines. http://www.postgresql.org/docs/9.3/static/sql-createtrigger.html#SQL-CREATETRIGGER-COMPATIBILITY * Does it include pg_dump support (if applicable)? Yes Feature test * Does the feature work as advertised? Yes. * Are there corner cases the author has failed to consider? I'm not sure whether it's intended behaviour, but if the commit trigger itself fails, there's an implicit rollback, e.g.: postgres=# BEGIN ; BEGIN postgres=*# INSERT INTO foo (id) VALUES (1); INSERT 0 1 postgres=*# COMMIT ; NOTICE: Pre-commit trigger called ERROR: relation bar does not exist LINE 1: SELECT foo FROM bar ^ QUERY: SELECT foo FROM bar CONTEXT: PL/pgSQL function pre_commit_trigger() line 4 at EXECUTE statement postgres=# I'd expect this to lead to a failed transaction block, or at least some sort of notice that the transaction itself has been rolled back. Note that 'DROP FUNCTION broken_trigger_function() CASCADE' will remove the broken function without having to resort to single user mode so it doesn't seem like an error in the commit trigger itself will necessarily lead to an intractable situation. * Are there any assertion failures or crashes? No. Performance review -- * Does the patch slow down simple tests? No. * If it claims to improve performance, does it? n/a * Does it slow down other things? No. Coding review - * Does it follow the project coding guidelines? Yes. * Are there portability issues? I don't see any. * Will it work on Windows/BSD etc? Tested on OS X and Linux. I don't see anything, e.g. system calls, which might prevent it working on Windows. * Does it do what it says, correctly? As far as I can tell, yes. * Does it produce compiler warnings? No. * Can you make it crash? So far, no. Architecture review --- * Is everything done in a way that fits together coherently with other features/modules? The changes are not all that complex and I don't see any issues, however I'm not very familiar with that area of the code (but hey, that's why I'm taking a look) so I might be missing something. * Are there interdependencies that can cause problems? I can't see any. Additional notes A sample transaction commit trigger: CREATE OR REPLACE FUNCTION pre_commit_trigger() RETURNS EVENT_TRIGGER LANGUAGE 'plpgsql' AS $$ BEGIN RAISE NOTICE 'Pre-commit trigger called'; END; $$; CREATE EVENT TRIGGER trg_pre_commit ON transaction_commit EXECUTE PROCEDURE pre_commit_trigger(); Some relevant links: http://www.postgresql.org/docs/9.3/interactive/event-triggers.html http://www.postgresql.org/docs/9.3/interactive/sql-createeventtrigger.html http://www.postgresql.org/docs/9.3/interactive/sql-prepare-transaction.html http://en.wikipedia.org/wiki/Database_trigger Regards Ian Barwick -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 11/18/2013 09:38 AM, Merlin Moncure wrote: On Sun, Nov 17, 2013 at 10:19 PM, Andrew Dunstan and...@dunslane.net wrote: I don't think any name that doesn't begin with json is acceptable. I could live with jsonb. It has the merit of brevity, but maybe it's a tad too close to json to be the right answer. I think that seems right. Couple thoughts: *) Aside from the text in and out routines, how is 'jsbonb' different from the coming 'nested hstore'? Enough to justify two code bases? The discussion has been around making a common library that would be used for both. *) How much of the existing json API has to be copied over to the jsonb type and how exactly is that going to happen? For example, I figure we'd need a record_to_jsonb etc. for sure, but do we also need a jsonb_each()...can't we overload instead? Overloading is what I was planning to do. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 14.10.2013 07:12, Peter Geoghegan wrote: On Wed, Oct 9, 2013 at 1:11 PM, Peter Geoghegan p...@heroku.com wrote: Unfortunately, I have a very busy schedule in the month ahead, including travelling to Ireland and Japan, so I don't think I'm going to get the opportunity to work on this too much. I'll try and produce a V4 that formally proposes some variant of my ideas around visibility of locked tuples. V4 is attached. Most notably, this adds the modifications to HeapTupleSatisfiesMVCC(), though they're neater than in the snippet I sent earlier. There is also some clean-up around row-level locking. That code has been simplified. I also try and handle serialization failures in a better way, though that really needs the attention of a subject matter expert. There are a few additional XXX comments highlighting areas of concern, particularly around serializable behavior. I've deferred making higher isolation levels care about wrongfully relying on the special HeapTupleSatisfiesMVCC() exception (e.g. they won't throw a serialization failure, mostly because I couldn't decide on where to do the test on time prior to travelling tomorrow). I've added code to do heap_prepare_insert before value locks are held. Whatever our eventual value locking implementation, that's going to be a useful optimization. Though unfortunately I ran out of time to give this the scrutiny it really deserves, I suppose that it's something that we can return to later. I ask that reviewers continue to focus on concurrency issues and broad design issues, and continue to defer discussion about an eventual value locking implementation. I continue to think that that's the most useful way of proceeding for the time being. My earlier points about probable areas of concern [1] remain a good place for reviewers to start. I think it's important to recap the design goals of this. I don't think these have been listed before, so let me try: * It should be usable and perform well for both large batch updates and small transactions. * It should perform well both when there are no duplicates, and when there are lots of duplicates And from that follows some finer requirements: * Performance when there are no duplicates should be close to raw INSERT performance. * Performance when all rows are duplicates should be close to raw UPDATE performance. * We should not leave behind large numbers of dead tuples in either case. Anything else I'm missing? What about exclusion constraints? I'd like to see this work for them as well. Currently, exclusion constraints are checked after the tuple is inserted, and you abort if the constraint was violated. We could still insert the heap and index tuples first, but instead of aborting on violation, we would kill the heap tuple we already inserted and retry. There are some complications there, like how to wake up any other backends that are waiting to grab a lock on the tuple we just killed, but it seems doable. That would, however, perform badly and leave garbage behind if there are duplicates. A refinement of that would be to first check for constraint violations, then insert the tuple, and then check again. That would avoid the garbage in most cases, but would perform much more poorly when there are no duplicates, because it needs two index scans for every insertion. A further refinement would be to keep track of how many duplicates there have been recently, and switch between the two strategies based on that. That cost of doing two scans could be alleviated by using markpos/restrpos to do the second scan. That is presumably cheaper than starting a whole new scan with the same key. (markpos/restrpos don't currently work for non-MVCC snapshots, so that'd need to be fixed, though) And that detour with exclusion constraints takes me back to the current patch :-). What if you implemented the unique check in a similar fashion too (when doing INSERT ON DUPLICATE KEY ...)? First, scan for a conflicting key, and mark the position. Then do the insertion to that position. If the insertion fails because of a duplicate key (which was inserted after we did the first scan), mark the heap tuple as dead, and start over. The indexam changes would be quite similar to the changes you made in your patch, but instead of keeping the page locked, you'd only hold a pin on the target page (if even that). The first indexam call would check that the key doesn't exist, and remember the insert position. The second call would re-find the previous position, and insert the tuple, checking again that there really wasn't a duplicate key violation. The locking aspects would be less scary than your current patch. I'm not sure if that would perform as well as your current patch. I must admit your current approach is pretty optimal performance-wise. But I'd like to see it, and that would be a solution for exclusion constraints in any case. One fairly limitation with your current approach
Re: [HACKERS] inherit support for foreign tables
Robert Haas robertmh...@gmail.com writes: On Thu, Nov 14, 2013 at 12:28 AM, Tom Lane t...@sss.pgh.pa.us wrote: 2) Allow foreign tables to have CHECK constraints Like NOT NULL, I think we don't need to enforce the check duroing INSERT/UPDATE against foreign table. Really? I think it's been previously proposed that we have some version of a CHECK constraint that effectively acts as an assertion for query optimization purposes, but isn't actually enforced by the system. I can see that being useful in a variety of situations, including this one. Yeah, I think it would be much smarter to provide a different syntax to explicitly represent the notion that we're only assuming the condition is true, and not trying to enforce it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On 18 November 2013 18:45 Fujii Masao wrote: On Mon, Nov 18, 2013 at 6:31 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 18 November 2013 11:19 Haribabu kommi wrote: On 17 November 2013 00:55 Fujii Masao wrote: On Sat, Nov 16, 2013 at 2:27 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: on 15 November 2013 17:26 Magnus Hagander wrote: On Fri, Nov 15, 2013 at 12:10 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 14 November 2013 23:59 Fujii Masao wrote: On Thu, Nov 14, 2013 at 9:08 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: Please find attached the patch, for adding a new option for pg_basebackup, to specify a different directory for pg_xlog. Sounds good! Here are the review comments: Don't we need to prevent users from specifying the same directory in both --pgdata and --xlogdir? I feel no need to prevent, even if user specifies both --pgdata and --xlogdir as same directory all the transaction log files will be created in the base directory instead of pg_xlog directory. Given how easy it would be to prevent that, I think we should. It would be an easy misunderstanding to specify that when you actually want it in wherever/pg_xlog. Specifying that would be redundant in the first place, but people ca do that, but it would also be very easy to do it by mistake, and you'd end up with something that's really bad, including a recursive symlink. Presently with initdb also user can specify both data and xlog directories as same. To prevent the data directory and xlog directory as same, there is a way in windows (_fullpath api) to get absolute path from a relative path, but I didn't get any such possibilities in linux. I didn't find any other way to check it, if anyone have any idea regarding this please let me know. What about make_absolute_path() in miscinit.c? The make_absoulte_patch() function gets the current working directory and adds The relative path to CWD, this is not giving proper absolute path. I have added a new function verify_data_and_xlog_dir_same() which will change the Current working directory to data directory and gets the CWD and the same way for transaction log directory. Compare the both data and xlog directories and throw an error. Please check it once. Is there any other way to identify that both data and xlog directories are pointing to the same Instead of comparing both absolute paths? Updated patch attached in the mail. Failure when the xlogdir doesn't exist is fixed in the updated patch attached in the mail. Thanks for updating the patch! With the patch, when I specify the same directory in both -D and -- xlogdir, I got the following error. $ cd /tmp $ pg_basebackup -D hoge --xlogdir=/tmp/hoge -X fetch pg_basebackup: could not change directory to hoge: No such file or directory Thanks. After finding the xlog directory absolute path returning back to current working Directory is missed, because of this reason it is failing while finding the absolute Path for data directory. Apart from this change the code is reorganized a bit. Updated patch is attached in the mail. Please review it once. Regards, Hari babu. UserSpecifiedxlogDir_v5.patch Description: UserSpecifiedxlogDir_v5.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
On 18 November 2013 20:01 Amit Kapila wrote: On Mon, Nov 18, 2013 at 6:28 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 17 November 2013 12:25 Amit Kapila wrote: On Sat, Nov 16, 2013 at 4:35 PM, Haribabu kommi Find the rebased version attached with this mail. ereport(ERROR, (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(configuration file \%s\ contains errors, - ConfigFileName))); + + ErrorConfFile))); The ErrorConfFile prints postgresql.auto.conf only if there is any parsing problem with postgresql.auto.conf otherwise it always print postgresql.conf because of any other error. Changed to ensure ErrorConfFile contains proper config file name. Note: I have not asssigned file name incase of error in below loop, as file name in gconf is NULL in most cases and moreover this loops over guc_variables which doesn't contain values/parameters from auto.conf. So I don't think it is required to assign ErrorConfFile in this loop. ProcessConfigFile(GucContext context) { .. for (i = 0; i num_guc_variables; i++) { struct config_generic *gconf = guc_variables[i]; .. } Code changes are fine. If two or three errors are present in the configuration file, it prints the last error Configuration parameter file only. Is it required to be mentioned in the documentation? Do you mean to say parsing errors or some run-time error, could you explain with example? LOG: parameter shared_buffers cannot be changed without restarting the server LOG: parameter port cannot be changed without restarting the server LOG: configuration file /home/hari/installation/bin/../../data/postgresql.auto.conf contains errors; unaffected changes were applied The shared_buffers parameter is changed in postgresql.conf and port is changed in postgresql.auto.conf. The error file displays the last error occurred file. Is it required to mention the above behavior in the document? Regards, Hari babu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small improvement to json out functions by using cstring_to_text_with_len instead of cstring_to_text
On Thu, Nov 14, 2013 at 2:18 AM, David Rowley dgrowle...@gmail.com wrote: Hi, Here's a small patch which should speedup json out functions a little bit by removing a call to strlen for which could be a long string. The length of the string is already known by the StringInfoData, so there's no point in letting cstring_to_text() loop over the whole string again. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Can we add sample systemd service file to git repo?
On 12 November 2013 20:54, Nigel Heron nhe...@querymetrics.com wrote: On Tue, Nov 12, 2013 at 11:47 AM, Devrim GÜNDÜZ dev...@gunduz.org wrote: ... http://svn.pgrpms.org/browser/rpm/redhat/9.3/postgresql/F-19/postgresql-9.3.service is an example of what we use in the RPMs. (if website fails, please just reload) Attached is a modified version that will work with the compile defaults. Hi, should we put PGPORT in the systemd file without an easy way to override it? As an example, when yum updating minor versions on fedora 18 (using the yum.postgresql.org rpms), any changes to the systemd service file are overwritten by the new rpm and the port is forced back to 5432. This makes having pg9.2 and pg9.3 on the same box conflict after each minor version update. IIRC there's a %config(noreplace) directive in the .spec that can cause the new file to be saved with an extension (.rpmnew?) instead of overwriting the old one. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freezing without write I/O
On 2013-09-25 12:31:20 +0300, Heikki Linnakangas wrote: Hmm, some of those are trivial, but others, rewrite_heap_tuple() are currently only passed the HeapTuple, with no reference to the page where the tuple came from. Instead of changing code to always pass that along with a tuple, I think we should add a boolean to HeapTupleData, to indicate if the tuple came from a mature page. If the flag is set, the tuple should be considered visible to everyone, without looking at the xmin/xmax. This might affect extensions, although usually external C code that have to deal with HeapTuples will use functions like heap_form_tuple to do so, and won't need to deal with low-level stuff or visibility themselves. Attached is a new version, which adds that field to HeapTupleData. Most of the issues on you listed above have been fixed, plus a bunch of other bugs I found myself. The bug that Jeff ran into with his count.pl script has also been fixed. This seems a bit hacky to me. Change a widely used struct because a few functions don't get passed enough information? Visibilitychecks are properly done with a buffer passed along; that some places have cut corners around that doesn't mean we have to continue to do so. The pullups around rs_pagemature are imo indicative of this (there's even a bug because a page can be all visible before it's mature, right? That could then cause an assert somewhere down the line when we check page and tuple are coherent). Ok, making another scan through this: * Why can we do PageSetLSN(page, RangeSwitchLSN) in heap_* when the action doesn't need WAL? And why is that correct? I can see doing that in vacuum itself, but doing it when we write *new* data to the page? * heap_freeze_page(): The PageSetLSN(page, RangeSwitchLSN) if there's nothing to log is not WAL logged. Which means that if we crash it won't necessarily be set, so the VM and the heap lsn might get out of sync. That probably doesn't have any bad effects, but imo deserves a comment. * heap_freeze_page(): PageUpdateNeedsFreezing() can now be true before and after. That's a bit confusing :/ * GetSafeClogTruncationPoint truncates the xid-lsn ranges, but there's also an, uncommented, TruncateXidLSNRanges. At leas how they work together should be described better. * There's quite some FIXMEs around * Let's move the xid-lsn ranges code from GetNewTransactionId() into it's own function. * PageMatureLSN and RangeSwitchLSN should be documented somewhere. They are implicitly, but they are used widely enough that that doesn't seem sufficient. * pg_controldata should output pageMatureLSN Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore patch
Hi, On 2013-11-12 22:35:31 +0400, Teodor Sigaev wrote: Attatched patch adds nesting feature, types (string, boll and numeric values), arrays and scalar to hstore type. I took a quick peek at this: * You cannot simply catch and ignore errors by doing + PG_TRY(); + { + n = DatumGetNumeric(DirectFunctionCall3(numeric_in, CStringGetDatum(s-val), 0, -1)); + } + PG_CATCH(); + { + n = NULL; + } + PG_END_TRY(); That skips cleanup and might ignore some errors (think memory allocation failures). But why do you even want to silently ignore errors there? * Shouldn't the checks for v-size be done before filling the datastructures in makeHStoreValueArray() and makeHStoreValuePairs()? * could you make ORDER_PAIRS() a function instead of a macro? It's pretty long and there's no reason not to use a function. * You call numeric_recv via recvHStoreValue via recvHStore without checks on the input length. That seems - without having checked it in detail - a good way to read unrelated memory. Generally ISTM the input needs to be more carefully checked in the whole recv function. * There's quite some new new, completely uncommented, code. Especially in hstore_op.c. * the _PG_init you added should probably do a EmitWarningsOnPlaceholders(). * why does hstore need it's own atoi? * shouldn't all the function prototypes be marked as externs? * Lots of trailing whitespaces, quite some long lines, cuddly braces, ... * I think hstore_compat.c's header should be updated. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] appendPQExpBufferVA vs appendStringInfoVA
On 18.11.2013 15:40, Marko Kreen wrote: On Mon, Nov 18, 2013 at 06:18:01PM +1300, David Rowley wrote: On Mon, Nov 18, 2013 at 1:01 AM, Marko Kreen mark...@gmail.com wrote: I am bit suspicious of performance impact of this patch, but think that it's still worthwhile as it decreases code style where single string argument is given to printf-style function without %s. This thread probably did not explain very will the point of this patch. All this kicked up from an earlier patch which added for alignment in the log_line_prefix GUC. After some benchmarks were done on the proposed patch for that, it was discovered that replacing appendStringInfoString with appendStringInfo gave a big enough slowdown to matter in high volume logging scenarios. That patch was revised and the appendStringInfo()'s were only used when they were really needed and performance increased again. I then ran a few benchmarks seen here: http://www.postgresql.org/message-id/CAApHDvp2uLKPuHJnCkonBGG2VXPvxoLOPzhrGXBS-M0r0v4wSA%40mail.gmail.com To compare appendStringInfo(si, %s, str); with appendStringinfoString(a, str); and appendStringInfo(si, str); The conclusion to those benchmarks were that appendStringInfoString() was the best function to use when no formatting was required, so I submitted a patch which replaced appendStringInfo() with appendStringInfoString() where that was possible and that was accepted. appendPQExpBuffer() and appendPQExpBufferStr are the front end versions of appendStringInfo, so I spent an hour or so replacing these calls too as it should show a similar speedup, though in this case likely the performance is less critical, my thinking was more along the lines of, it increases performance a little bit with a total of 0 increase in code complexity. I was actually praising the patch that it reduces complexity, so it's worth applying even if there is no performance win. With performance win, it's doubleplus good. The patch applies and regtests work fine. So I mark it as ready for committer. Ok, committed. PS. I'm not 100% convinced that this kind of code churn is worthwhile. It doesn't make any difference to readability in my eyes, in general. In some cases it does, but in others it messes with indentation (describeOneTables in src/bin/psql/describe.c). It also makes backpatching more laborious. But it's done now, hopefully this is a one-off thing. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On Mon, Nov 18, 2013 at 09:22:58PM +1300, David Rowley wrote: So now I'm wondering what the next move should be for this patch? a. Are we waiting on Robert's patch to be committed before we can apply Thomas's cluster with freeze as default? b. Are we waiting on me reviewing one or both of the patches? Which one? I think probably it sounds safer not to make freeze the default, but then if we release 9.4 with CLUSTER FREEZE then in 9.5/10 decide it should be the default then we then have a redundant syntax to support for ever and ever. Decision time? Yes, we probably should make a decision, unless Robert's idea can be implemented. We have to balance the ease of debugging MVCC failures with the interface we give to the user community. From my perspective, I don't see how we can justify the addition of a FREEZE option to CLUSTER. If we explain that you would always use FREEZE unless you want to preserve MVCC information for later debugging, users will reply with Huh, why would I want that? MVCC debugging is just too rare an event for them to understand its usefulness. If we do add FREEZE, all we would be doing is delaying the time when all CLUSTER operations will use FREEZE, and hence debugging will be just as difficult. My point is that will full knowledge, everyone would use FREEZE unless they expect MVCC bugs, which is going to be an almost zero population. Many MVCC failures are reproducible, so if we provide a C define that disables the freezing so MVCC information can be preserved, that might be good enough. Developers could enable the macro, and the macro could be used in other places where MVCC information is lost. This will make some MVCC harder, and will perhaps allow some MVCC bugs to exist longer. I believe there are other cases where rows could be frozen but we have avoided it for MVCC debugging reasons. Another idea would be the addition of a GUC that can disable optimistic freezing. This could be enabled by sites that suspect MVCC bugs. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On 2013-11-18 11:39:44 -0500, Bruce Momjian wrote: On Mon, Nov 18, 2013 at 09:22:58PM +1300, David Rowley wrote: So now I'm wondering what the next move should be for this patch? a. Are we waiting on Robert's patch to be committed before we can apply Thomas's cluster with freeze as default? b. Are we waiting on me reviewing one or both of the patches? Which one? I think probably it sounds safer not to make freeze the default, but then if we release 9.4 with CLUSTER FREEZE then in 9.5/10 decide it should be the default then we then have a redundant syntax to support for ever and ever. Decision time? Yes, we probably should make a decision, unless Robert's idea can be implemented. We have to balance the ease of debugging MVCC failures with the interface we give to the user community. Imo that patch really doesn't need too much further work. From my perspective, I don't see how we can justify the addition of a FREEZE option to CLUSTER. If we explain that you would always use FREEZE unless you want to preserve MVCC information for later debugging, users will reply with Huh, why would I want that? I tend to agree that we should work towards not needing that option. Many MVCC failures are reproducible, so if we provide a C define that disables the freezing so MVCC information can be preserved, that might be good enough. Developers could enable the macro, and the macro could be used in other places where MVCC information is lost. This will make some MVCC harder, and will perhaps allow some MVCC bugs to exist longer. I believe there are other cases where rows could be frozen but we have avoided it for MVCC debugging reasons. Another idea would be the addition of a GUC that can disable optimistic freezing. This could be enabled by sites that suspect MVCC bugs. I think that'd be an enormous failure. You don't usually need to look at those because there's a bug in postgres' mvcc logic but somewhere else (application, other postgres code). And looking at the mvcc information helps you to narrow it down, so you have a chance of actually finding a reproducible bug. Besides, in many of the !rewrite cases it's far from clear in which cases it's a benefit to freeze earlier. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: pre-commit triggers
On 11/18/2013 09:39 AM, Ian Lawrence Barwick wrote: Review for Andrew Dunstan's patch in CF 2013-11: https://commitfest.postgresql.org/action/patch_view?id=1312 This review is more from a usage point of view, I would like to spend more time looking at the code but only so many hours in a day, etcetera; I hope this is useful as-is. Many thanks for the fast review. * Does it include reasonable tests, necessary doc patches, etc? Documentation patches included, but no tests. (I have a feeling it might be necessary to add a FAQ somewhere as to why there's no transaction rollback trigger). I'll add some tests very shortly, and see about adding that to the docs. * Are there corner cases the author has failed to consider? I'm not sure whether it's intended behaviour, but if the commit trigger itself fails, there's an implicit rollback, e.g.: postgres=# BEGIN ; BEGIN postgres=*# INSERT INTO foo (id) VALUES (1); INSERT 0 1 postgres=*# COMMIT ; NOTICE: Pre-commit trigger called ERROR: relation bar does not exist LINE 1: SELECT foo FROM bar ^ QUERY: SELECT foo FROM bar CONTEXT: PL/pgSQL function pre_commit_trigger() line 4 at EXECUTE statement postgres=# I'd expect this to lead to a failed transaction block, or at least some sort of notice that the transaction itself has been rolled back. I'll check on this. Note that 'DROP FUNCTION broken_trigger_function() CASCADE' will remove the broken function without having to resort to single user mode so it doesn't seem like an error in the commit trigger itself will necessarily lead to an intractable situation. Given that, do we want to keep the bar on these operating in single user mode? I can easily drop it and just document this way out of difficulty. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
Hi, On 2013-11-15 22:38:05 -0500, Jaime Casanova wrote: sorry, i clearly misunderstood you. attached a rebased patch with those functions removed. * --write-standby-enable seems to loose quite some functionality in comparison to --write-recovery-conf since it doesn't seem to set primary_conninfo, standby anymore. * CheckRecoveryReadyFile() doesn't seem to be a very descriptive function name. * Why does StartupXLOG() now use ArchiveRecoveryRequested StandbyModeRequested instead of just the former? * I am not sure I like recovery.trigger as a name. It seems to close to what I've seen people use to trigger failover and too close to trigger_file. * You check for a trigger file in a way that's not compatible with it being NULL. Why did you change that? - if (TriggerFile == NULL) + if (!trigger_file[0]) * You made recovery_target_inclusive/pause_at_recovery_target PGC_SIGHUP - did you review that actually works? Imo that should be changed in a separate commit. * Maybe we should rename names like pause_at_recovery_target to recovery_pause_at_target? Since we already force everyone to bother changing their setup... * the description of archive_cleanup_command seems wrong to me. * Why did you change some of the recovery gucs to lowercase names, but left out XLogRestoreCommand? * Why the PG_TRY/PG_CATCH in check_recovery_target_time? Besides being really strangely formatted (multiline :? inside a function?) it doesn't a) seem to be correct to ignore potential memory allocation errors by just switching back into the context that just errored out, and continue to work there b) forgo cleanup by just continuing as if nothing happened. That's unlikely to be acceptable. * You access recovery_target_name[0] unconditionally, although it's intialized to NULL. * Generally, ISTM there's too much logic in the assign hooks. * Why do you include xlog_internal.h in guc.c and not xlog.h? Ok, I think that's enough for now ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 11/18/2013 06:13 AM, Peter Eisentraut wrote: On 11/15/13, 6:15 PM, Josh Berkus wrote: Thing is, I'm not particularly concerned about *Merlin's* specific use case, which there are ways around. What I am concerned about is that we may have users who have years of data stored in JSON text fields which won't survive an upgrade to binary JSON, because we will stop allowing certain things (ordering, duplicate keys) which are currently allowed in those columns. At the very least, if we're going to have that kind of backwards compatibilty break we'll want to call the new version 10.0. We could do something like SQL/XML and specify the level of validity in a typmod, e.g., json(loose), json(strict), etc. Doesn't work; with XML, the underlying storage format didn't change. With JSONB, it will ... so changing the typemod would require a total rewrite of the table. That's a POLS violation if I ever saw one -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On 2013-11-18 15:01:42 +, Haribabu kommi wrote: /* + * Returns the malloced string of containing current working directory. + * The caller has to take care of freeing the memory. + * On failure exits with error code. + */ +static char * +get_current_working_dir() +{ + char *buf; + size_t buflen; + + buflen = MAXPGPATH; + for (;;) + { + buf = pg_malloc(buflen); + if (getcwd(buf, buflen)) + break; + else if (errno == ERANGE) + { + pg_free(buf); + buflen *= 2; + continue; + } + else + { + pg_free(buf); + fprintf(stderr, + _(%s: could not get current working directory: %s\n), + progname, strerror(errno)); + exit(1); + } + } + + return buf; +} + +/* + * calculates the absolute path for the given path. The output absolute path + * is a malloced string. The caller needs to take care of freeing the memory. + */ +static char * +get_absolute_path(const char *input_path) +{ + char *pwd = NULL; + char *abspath = NULL; + + /* Getting the present working directory */ + pwd = get_current_working_dir(); + + if (chdir(input_path) 0) + { + /* Directory doesn't exist */ + if (errno == ENOENT) + return NULL; + + fprintf(stderr, _(%s: could not change directory to \%s\: %s\n), + progname, input_path, strerror(errno)); + exit(1); + } + + abspath = get_current_working_dir(); + + /* Returning back to old working directory */ + if (chdir(pwd) 0) + { + fprintf(stderr, _(%s: could not change directory to \%s\: %s\n), + progname, pwd, strerror(errno)); + exit(1); + } + + pg_free(pwd); + return abspath; +} This looks like it should be replaced by moving make_absolute_path from pg_regress.c to path.[ch] and then use that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option for pg_basebackup, to specify a different directory for pg_xlog
On Tue, Nov 19, 2013 at 12:01 AM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 18 November 2013 18:45 Fujii Masao wrote: On Mon, Nov 18, 2013 at 6:31 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 18 November 2013 11:19 Haribabu kommi wrote: On 17 November 2013 00:55 Fujii Masao wrote: On Sat, Nov 16, 2013 at 2:27 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: on 15 November 2013 17:26 Magnus Hagander wrote: On Fri, Nov 15, 2013 at 12:10 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 14 November 2013 23:59 Fujii Masao wrote: On Thu, Nov 14, 2013 at 9:08 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: Please find attached the patch, for adding a new option for pg_basebackup, to specify a different directory for pg_xlog. Sounds good! Here are the review comments: Don't we need to prevent users from specifying the same directory in both --pgdata and --xlogdir? I feel no need to prevent, even if user specifies both --pgdata and --xlogdir as same directory all the transaction log files will be created in the base directory instead of pg_xlog directory. Given how easy it would be to prevent that, I think we should. It would be an easy misunderstanding to specify that when you actually want it in wherever/pg_xlog. Specifying that would be redundant in the first place, but people ca do that, but it would also be very easy to do it by mistake, and you'd end up with something that's really bad, including a recursive symlink. Presently with initdb also user can specify both data and xlog directories as same. To prevent the data directory and xlog directory as same, there is a way in windows (_fullpath api) to get absolute path from a relative path, but I didn't get any such possibilities in linux. I didn't find any other way to check it, if anyone have any idea regarding this please let me know. What about make_absolute_path() in miscinit.c? The make_absoulte_patch() function gets the current working directory and adds The relative path to CWD, this is not giving proper absolute path. I have added a new function verify_data_and_xlog_dir_same() which will change the Current working directory to data directory and gets the CWD and the same way for transaction log directory. Compare the both data and xlog directories and throw an error. Please check it once. Is there any other way to identify that both data and xlog directories are pointing to the same Instead of comparing both absolute paths? Updated patch attached in the mail. Failure when the xlogdir doesn't exist is fixed in the updated patch attached in the mail. Thanks for updating the patch! With the patch, when I specify the same directory in both -D and -- xlogdir, I got the following error. $ cd /tmp $ pg_basebackup -D hoge --xlogdir=/tmp/hoge -X fetch pg_basebackup: could not change directory to hoge: No such file or directory Thanks. After finding the xlog directory absolute path returning back to current working Directory is missed, because of this reason it is failing while finding the absolute Path for data directory. Apart from this change the code is reorganized a bit. Updated patch is attached in the mail. Please review it once. Thanks for newer version of the patch! I found that the empty base directory is created and remains even when the same directory is specified in both -D and --xlogdir and the error occurs. I think that it's better to throw an error in that case before creating any new directory. Thought? +xlogdir = get_absolute_path(xlog_dir); xlog_dir must be an absolute path. ISTM we don't do the above. No? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
Merlin, *) Aside from the text in and out routines, how is 'jsbonb' different from the coming 'nested hstore'? Enough to justify two code bases? In/out functions and defaults are all different. Otherwise, the two types will be accessing the same code base, so no duplication. Think of is as XML vs. TEXT. Maybe we can cheat a little bit overload the functions so that one the existing APIs (hstore or json) can be recovered -- only adding what minimal functionality needs to be added to handle the type distinction (mostly on serialization routines and casts). What I'm driving at here is that it would be nice if the API was not strongly bifurcated: this would cause quite a bit of mindspace confusion. I'll also note that for functions designed for output to the client, it doesn't make much of a difference whether the result is JSON or JSONB, since the string representation will be identical. However, it makes a difference if the data is going to be stored, since a double conversion on a large JSON value would be expensive. In other words, we need a version of each function which outputs JSONB, but that version doesn't have to be the default if users don't specify. Note that this raises the issue of first alternate data type ambiguity again for overloading builtin functions. We really need that method of prefering a specific version of the function ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
On 11/18/2013 08:39 AM, Bruce Momjian wrote: If we do add FREEZE, all we would be doing is delaying the time when all CLUSTER operations will use FREEZE, and hence debugging will be just as difficult. My point is that will full knowledge, everyone would use FREEZE unless they expect MVCC bugs, which is going to be an almost zero population. +1 None of our users would willingly choose worse performance over the 0.1% possibility of needing to analyze a transaction failure. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: pre-commit triggers
Andrew Dunstan and...@dunslane.net writes: Given that, do we want to keep the bar on these operating in single user mode? I can easily drop it and just document this way out of difficulty. Currently Event Triggers are disabled in single user mode, in parts because operating them require accessing to a catalog index, which might be corrupted and the reason why you're in single user mode in the first place. So please keep your new event trigger disabled in single user mode. http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cd3413ec3683918c9cb9cfb39ae5b2c32f231e8b Regards, and thanks for this new Event Trigger! -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of pg_stat_statement usage about buffer hit ratio
On Mon, Nov 18, 2013 at 8:36 PM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: (2013/11/18 20:16), Haribabu kommi wrote: On 18 October 2013 13:35 KONDO Mitsumasa wrote: This patch conflicts pg_stat_statement_min_max_exectime patch which I submitted, and pg_stat_statement_min_max_exectime patch also adds new columns which are min_time and max_time. So I'd like to change it in this opportunity. This patch adds another column shared_blks_hit_percent to pg_stat_statements view Which is very beneficial to the user to know how much percentage of blks are hit. The same idea was proposed before but not committed because Itagaki thought that pg_stat_statements view should report only raw values. Please read the following thread. I have the same feeling with him. Anyway we should listen to more opinions from other people, though. http://www.postgresql.org/message-id/20091222172719.8b86.52131...@oss.ntt.co.jp Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Mon, Nov 18, 2013 at 12:10 PM, Josh Berkus j...@agliodbs.com wrote: Merlin, *) Aside from the text in and out routines, how is 'jsbonb' different from the coming 'nested hstore'? Enough to justify two code bases? In/out functions and defaults are all different. Otherwise, the two types will be accessing the same code base, so no duplication. Think of is as XML vs. TEXT. Maybe we can cheat a little bit overload the functions so that one the existing APIs (hstore or json) can be recovered -- only adding what minimal functionality needs to be added to handle the type distinction (mostly on serialization routines and casts). What I'm driving at here is that it would be nice if the API was not strongly bifurcated: this would cause quite a bit of mindspace confusion. I'll also note that for functions designed for output to the client, it doesn't make much of a difference whether the result is JSON or JSONB, since the string representation will be identical. However, it makes a difference if the data is going to be stored, since a double conversion on a large JSON value would be expensive. Hm, but it would matter wouldn't it...the jsonb representation would give output with the record fields reordered, deduplicated, etc. Also, presumably, the jsonb serialization would be necessarily slower for exactly that reason, although perhaps not enough to matter much. In other words, we need a version of each function which outputs JSONB, but that version doesn't have to be the default if users don't specify. Note that this raises the issue of first alternate data type ambiguity again for overloading builtin functions. We really need that method of prefering a specific version of the function ... You'd need explicit jsonb creating functions: record_to_jsonb, array_to_jsonb etc. AFAIK, these functions would be the only ones that would have to explicitly reference the jsonb type if you don't count casts. It's tempting to *not* make those functions as that would only require the user to specify jsonb for table columns...you'd then go from json to jsonb with a cast, perhaps even an implicit one. The disadvantage there is that you'd then get unsimplified json always. Hm -- on that note, is it technically feasible to *not* duplicate the json API implementations, but just (ab)use implicit casting between the APIs? That way the text API would own all the serialization routines as it does now but you'd run mutation and searching through jsonb... merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Three times in the last two weeks, we have experience data corruption secondary servers using streaming replication on client systems. The versions involved are 9.0.14, 9.2.5, and 9.3.1. Each incident was separate; two cases they were for the same client (9.0.14 and 9.3.1), one for a different client (9.2.5). The details of each incident are similar, but not identical. The details of each incident are: INCDIDENT #1: 9.0.14 -- A new secondary (S1) was initialized using rsync off of an existing, correct primary (P1) for the base backup, and using WAL-E for WAL segment shipping. Both the primary and secondary were running 9.0.14. S1 properly connected to the primary once the it was caught up on WAL segments, and S1 was then promoted as a primary using the trigger file. No errors in the log files on either system. After promotion, it was discovered that there was significant data loss on S1. Rows that were present on P1 were missing on S1, and some rows were duplicated (including duplicates that violated primary key and other unique constraints). The indexes were corrupt, in that they seemed to think that the duplicates were not duplicated, and that the missing rows were still present. Because the client's schema included a last_updated field, we were able to determine that all of the rows that were either missing or duplicated had been updated on P1 shortly (3-5 minutes) before S1 was promoted. It's possible, but not confirmed, that there were active queries (including updates) running on P1 at the moment of S1's promotion. As a note, P1 was created from another system (let's call it P0) using just WAL shipping (no streaming replication), and no data corruption was observed. P1 and S1 were both AWS instances running Ubuntu 12.04, using EBS (with xfs as the file system) as the data volume. P1 and S1 have been destroyed at this point. INCIDENT #2: 9.3.1 -- In order to repair the database, a pg_dump was taken of S1y, after having dropped the primary and unique constraints, and restored into a new 9.3.1 server, P2. Duplicate rows were purged, and missing rows were added again. The database, a new primary, was then put back into production, and ran without incident. A new secondary, S2 was created off of the primary. This secondary was created using pg_basebackup using --xlog-method=stream, although the WAL-E archiving was still present. S2 attached to P2 without incident and no errors in the logs, but nearly-identical corruption was discovered (although this time without the duplicated rows, just missing rows). At this point, it's not clear if there was some clustering in the last_updated timestamp for the rows that are missing from S2. No duplicated rows were observed. P2 and S2 are both AWS instances running Ubuntu 12.04, using EBS (with xfs as the file system) as the data volume. No errors in the log files on either system. P2 and S2 are still operational. INCIDENT #3: 9.2.5 -- A client was migrating a large database from a 9.2.2 system (P3) to a new 9.2.5 system (S3) using streaming replication. As I personally didn't do the steps on this one, I don't have quite as much information, but the basics are close to incident #2: When S3 was promoted using the trigger file, no errors were observed and the database came up normally, but rows were missing from S3 that were present on P3. P1 is running Centos 6.3 with ext4 as the file system. P2 is running Centos 6.4 with ext3 as the file system. Log shipping in this case was done via rsync. P3 and S3 are still operational. No errors in the log files on either system. -- Obviously, we're very concerned that a bug was introduced in the latest minor release. We're happy to gather data as required to assist in diagnosing this. -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of pg_stat_statement usage about buffer hit ratio
On Mon, Nov 18, 2013 at 10:49 AM, Fujii Masao masao.fu...@gmail.com wrote: The same idea was proposed before but not committed because Itagaki thought that pg_stat_statements view should report only raw values. Please read the following thread. I have the same feeling with him. Anyway we should listen to more opinions from other people, though. http://www.postgresql.org/message-id/20091222172719.8b86.52131...@oss.ntt.co.jp +1 from me. I think that a higher level tool needs to be built on pg_stat_statements to make things easy for those that want a slick, pre-packaged solution. As I said on the min/max thread, if we're not doing enough to help people who would like to build such a tool, we should discuss how we can do better. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
On 11/18/2013 10:58 AM, Christophe Pettus wrote: Three times in the last two weeks, we have experience data corruption secondary servers using streaming replication on client systems. The versions involved are 9.0.14, 9.2.5, and 9.3.1. Each incident was separate; two cases they were for the same client (9.0.14 and 9.3.1), one for a different client (9.2.5). To emphasize a salient point: we have not previously seen data corruption with these exact symptoms prior to the recent patch release. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
On Nov 18, 2013, at 10:58 AM, Christophe Pettus x...@thebuild.com wrote: As a note, P1 was created from another system (let's call it P0) using just WAL shipping (no streaming replication), and no data corruption was observed. As another data point, P0 was running 9.0.13, rather than 9.0.14. -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Hi, On 2013-11-18 10:58:26 -0800, Christophe Pettus wrote: INCDIDENT #1: 9.0.14 -- A new secondary (S1) was initialized using rsync off of an existing, correct primary (P1) for the base backup, and using WAL-E for WAL segment shipping. Both the primary and secondary were running 9.0.14. S1 properly connected to the primary once the it was caught up on WAL segments, and S1 was then promoted as a primary using the trigger file. Could you detail how exactly the base backup was created? Including the *exact* logic for copying? No errors in the log files on either system. Do you have the log entries for the startup after the base backup? Because the client's schema included a last_updated field, we were able to determine that all of the rows that were either missing or duplicated had been updated on P1 shortly (3-5 minutes) before S1 was promoted. It's possible, but not confirmed, that there were active queries (including updates) running on P1 at the moment of S1's promotion. Any chance you have log_checkpoints enabled? If so, could you check whether there was a checkpoint around the time of the base backup? This server is gone, right? If not, could you do: SELECT ctid, xmin, xmax, * FROM whatever WHERE duplicate_row? INCIDENT #2: 9.3.1 -- In order to repair the database, a pg_dump was taken of S1y, after having dropped the primary and unique constraints, and restored into a new 9.3.1 server, P2. Duplicate rows were purged, and missing rows were added again. The database, a new primary, was then put back into production, and ran without incident. A new secondary, S2 was created off of the primary. This secondary was created using pg_basebackup using --xlog-method=stream, although the WAL-E archiving was still present. S2 attached to P2 without incident and no errors in the logs, but nearly-identical corruption was discovered (although this time without the duplicated rows, just missing rows). At this point, it's not clear if there was some clustering in the last_updated timestamp for the rows that are missing from S2. No duplicated rows were observed. P2 and S2 are both AWS instances running Ubuntu 12.04, using EBS (with xfs as the file system) as the data volume. No errors in the log files on either system. Could you list the *exact* steps you did to startup the cluster? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
On Nov 18, 2013, at 11:28 AM, Andres Freund and...@2ndquadrant.com wrote: Could you detail how exactly the base backup was created? Including the *exact* logic for copying? 0. Before any of this began, P1 was archiving WAL segments to AWS-S3. 1. pg_start_backup('', true) on P1. 2. Using rsync -av on P1, the entire $PGDATA directory was pushed from P1 to S2. 3. Once the rsync was complete, pg_stop_backup() on P1. 4. Create appropriate recovery.conf on S1. 5. Bring up PostgreSQL on S1. 6. PostgreSQL recovers normally (pulling WAL segments from WAL-E), and eventually connects to P1. Do you have the log entries for the startup after the base backup? Sadly, not anymore. This server is gone, right? Correct. Could you list the *exact* steps you did to startup the cluster? 0. Before any of this began, P2 was archiving WAL segments to AWS-S3. 1. Initial (empty) data directory deleted on S2. 2. New data directory created with: /usr/lib/postgresql/9.3/bin/pg_basebackup --verbose --progress --xlog-method=stream --host=ip --user=repluser --pgdata=/data/9.3/main 3. Once the pg_basebackup completed, create appropriate recovery.conf on S1. 4. Bring up PostgreSQL on S2. 5. PostgreSQL recovers normally (pulling a small number of WAL segments from WAL-E), and eventually connects to P2. -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
On 2013-11-18 11:38:43 -0800, Christophe Pettus wrote: On Nov 18, 2013, at 11:28 AM, Andres Freund and...@2ndquadrant.com wrote: Could you detail how exactly the base backup was created? Including the *exact* logic for copying? 0. Before any of this began, P1 was archiving WAL segments to AWS-S3. 1. pg_start_backup('', true) on P1. 2. Using rsync -av on P1, the entire $PGDATA directory was pushed from P1 to S2. Without deleting any data, including pg_xlog/, backup.label, anything? Did you have hot_standby enabled on all of those machines? Even on the 9.0.13 cluster? Could you list the *exact* steps you did to startup the cluster? 0. Before any of this began, P2 was archiving WAL segments to AWS-S3. 1. Initial (empty) data directory deleted on S2. 2. New data directory created with: /usr/lib/postgresql/9.3/bin/pg_basebackup --verbose --progress --xlog-method=stream --host=ip --user=repluser --pgdata=/data/9.3/main 3. Once the pg_basebackup completed, create appropriate recovery.conf on S1. That was just recovery command and primary conninfo? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
On Nov 18, 2013, at 11:47 AM, Andres Freund and...@2ndquadrant.com wrote: Without deleting any data, including pg_xlog/, backup.label, anything? Correct. Did you have hot_standby enabled on all of those machines? Even on the 9.0.13 cluster? Yes. That was just recovery command and primary conninfo? Correct. -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
On Nov 18, 2013, at 11:47 AM, Andres Freund and...@2ndquadrant.com wrote: Did you have hot_standby enabled on all of those machines? Even on the 9.0.13 cluster? Actually, it's a bit more complex than this: 1. We don't know about P0, the 9.0.13 machine. I assume it was, but it was managed elsewhere. 2. P1 never had hot_standby = 'on', as it was never intended to be a hot stand_by. 3. S1 did have hot_standby = 'on. -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
On Nov 18, 2013, at 11:47 AM, Andres Freund and...@2ndquadrant.com wrote: Without deleting any data, including pg_xlog/, backup.label, anything? One more correction: After rsync finished and the pg_base_backup() was issued, the contents of pg_xlog/ on S1 were deleted. -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
On Nov 18, 2013, at 12:00 PM, Christophe Pettus x...@thebuild.com wrote: One more correction: After rsync finished and the pg_base_backup() was issued, the contents of pg_xlog/ on S1 were deleted. pg_stop_backup(), sorry. -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER FREEZE
Josh Berkus j...@agliodbs.com wrote: On 11/18/2013 08:39 AM, Bruce Momjian wrote: If we do add FREEZE, all we would be doing is delaying the time when all CLUSTER operations will use FREEZE, and hence debugging will be just as difficult. My point is that will full knowledge, everyone would use FREEZE unless they expect MVCC bugs, which is going to be an almost zero population. +1 +1 None of our users would willingly choose worse performance over the 0.1% possibility of needing to analyze a transaction failure. I assume that's intended to represent the lifetime probability that a typical user would ever benefit, not per year or per transaction. Even as a lifetime number, it seems high unless we're specifically talking about hackers. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changing pg_dump default file format
On Thu, Nov 7, 2013 at 02:40:17PM -0500, Peter Eisentraut wrote: On 11/7/13, 1:00 PM, Josh Berkus wrote: If we wanted to change the defaults, I think it would be easier to create a separate bin name (e.g. pg_backup) than to change the existing parameters for pg_dump. Note the following code in pg_dump.c: /* Set default options based on progname */ if (strcmp(progname, pg_backup) == 0) format = c; Wow, when was that added? git blame says 2002: 9f0ae0c8 (Tom Lane 2002-05-10 22:36:27 + 387) /* Set default options based on progname */ 9f0ae0c8 (Tom Lane 2002-05-10 22:36:27 + 388) if (strcmp(progname, pg_backup) == 0) 9f0ae0c8 (Tom Lane 2002-05-10 22:36:27 + 389) format = c; However, pggit log 9f0ae0c82060e3dcd1fa7ac8bbe35a3f9a44dbba does not show that line being added by the diff. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN improvements part2: fast scan
On Fri, Nov 15, 2013 at 11:42 PM, Alexander Korotkov aekorot...@gmail.comwrote: On Fri, Nov 15, 2013 at 11:39 PM, Rod Taylor rod.tay...@gmail.com wrote: On Fri, Nov 15, 2013 at 2:26 PM, Alexander Korotkov aekorot...@gmail.com wrote: On Fri, Nov 15, 2013 at 11:18 PM, Rod Taylor rod.tay...@gmail.comwrote: 2%. It's essentially sentence fragments from 1 to 5 words in length. I wasn't expecting it to be much smaller. 10 recent value selections: white vinegar reduce color running vinegar cure uti cane vinegar acidity depends parameter how remedy fir clogged shower use vinegar sensitive skin home remedies removing rust heating does non raw apple cider home remedies help maintain healthy can vinegar mess up your apple cide vineger ph balance I didn't get why it's not significantly smaller. Is it possible to share dump? Sorry, I reported that incorrectly. It's not something I was actually looking for and didn't pay much attention to at the time. The patched index is 58% of the 9.4 master size. 212 MB instead of 365 MB. Good. That's meet my expectations :) You mention that both master and patched versions was compiled with debug. Was cassert enabled? In the attached version of patch tsvector opclass is enabled to use fastscan. You can retry your tests. -- With best regards, Alexander Korotkov. gin-fast-scan.7.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestion: Issue warning when calling SET TRANSACTION outside transaction block
On Sat, Nov 9, 2013 at 02:15:52PM -0500, Robert Haas wrote: On Fri, Nov 8, 2013 at 5:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: [ I'm so far behind ... ] Bruce Momjian br...@momjian.us writes: Applied. Thank you for all your suggestions. I thought the suggestion had been to issue a *warning*. How did that become an error? This patch seems likely to break applications that may have just been harmlessly sloppy about when they were issuing SETs and/or what flavor of SET they use. We don't for example throw an error for START TRANSACTION with an open transaction or COMMIT or ROLLBACK without one --- how can it possibly be argued that these operations are more dangerous than those cases? I'd personally have voted for using NOTICE. Well, LOCK TABLE throws an error, so it's not without precedent. Yeah, I just copied the LOCK TABLE usage. You could argue that SET SESSION ISOLATION only affects later commands and doesn't do something like LOCK, so it should be a warning. Not sure how to interpret the COMMIT/ROLLBACK behavior you mentioned. Considering we are doing this outside of a transaction, and WARNING or ERROR is pretty much the same, from a behavioral perspective. Should we change this and LOCK to be a warning? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
On 2013-11-18 10:58:26 -0800, Christophe Pettus wrote: After promotion, it was discovered that there was significant data loss on S1. Rows that were present on P1 were missing on S1, and some rows were duplicated (including duplicates that violated primary key and other unique constraints). The indexes were corrupt, in that they seemed to think that the duplicates were not duplicated, and that the missing rows were still present. Were there any kind of patterns in the lost data? What kind of workload are they running? I have an idea what the issue might be... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unaccent module - two params function should be immutable
On Fri, Nov 8, 2013 at 06:00:53PM -0500, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: [ mark unaccent functions immutable ] Applied. This patch is flat out wrong and needs to be reverted. The functions were correctly marked (by you!) in commit c0577c92a84cc477a88fe6868c16c4a7e3348b11 on the basis of the discussion of bug #5781, http://www.postgresql.org/message-id/201012021544.ob2fitn1041...@wwwmaster.postgresql.org which was a request exactly like this one and was denied for good and sufficient reasons. There was absolutely no reasoning given in this thread that explained why we should ignore the previous objections. In particular, marking the single-argument version of unaccent() as immutable is the height of folly because its behavior depends on the setting of search_path. Changing the two-argument function is maybe a bit more debatable, but that's not what you did. If we were going to change the behavior, this patch would still be wrong because it fails to provide an upgrade path. The objections saying you needed a 1.1 migration script were completely correct. Thanks, patch reverted. If people still want this, it needs to be resbumitted with this feedback in mind. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
On Nov 18, 2013, at 12:57 PM, Andres Freund and...@2ndquadrant.com wrote: Were there any kind of patterns in the lost data? What kind of workload are they running? I have an idea what the issue might be... On the P1 S1 case, the data corrupted was data modified in the last few minutes before the switchover. I don't want to over-analyze, but it was within the checkpoint_timeout value for that sever. On the P2 S2 case, it's less obvious what the pattern is, since there was no cutover. Insufficient information on the P3 S3 case. Each of them is a reasonably high-volume OLTP-style workload. The P1/P2 client has a very high level of writes; the P3 more read-heavy, but still a fair number of writes. -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Hi, Afaics it's likely a combination/interaction of bugs and fixes between: * the initial HS code * 5a031a5556ff83b8a9646892715d7fef415b83c3 * f44eedc3f0f347a856eea8590730769125964597 But that'd mean nobody noticed it during 9.3's beta... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Great! If there's any further data I can supply to help, let me know. On Nov 18, 2013, at 2:15 PM, Andres Freund and...@2ndquadrant.com wrote: Hi, Afaics it's likely a combination/interaction of bugs and fixes between: * the initial HS code * 5a031a5556ff83b8a9646892715d7fef415b83c3 * f44eedc3f0f347a856eea8590730769125964597 But that'd mean nobody noticed it during 9.3's beta... Greetings, Andres Freund -- Andres Freundhttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
On 2013-11-18 14:25:58 -0800, Christophe Pettus wrote: Great! If there's any further data I can supply to help, let me know. Trying to reproduce the issue with and without hot_standby=on would be very helpful, but I guess that's time consuming? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
On Nov 18, 2013, at 2:26 PM, Andres Freund and...@2ndquadrant.com wrote: Trying to reproduce the issue with and without hot_standby=on would be very helpful, but I guess that's time consuming? I've been working on it, but I haven't gotten it to fail yet. I'll keep at it. -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Review: HStore Gin Speedup
Following are my initial comments for https://commitfest.postgresql.org/action/patch_view?id=1203 1. It no longer applies, probably due to commit a18167510f4c385329697588ce5132cbf95779c3 error: contrib/hstore/hstore--1.1.sql: No such file or directory 2. Compatibility with HStore v2.0 (https://commitfest.postgresql.org/action/patch_view?id=1289) As long as the separate operator class gin_hstore_combined_ops is used, there should be no conflict. I'll verify as soon as the $subject patch is applicable to master. 3. Regression tests Shouldn't EXPLAIN be there too? I suggest a test that creates 2 indexes on the hstore column - one using the existing GIN (default) opclass and one using the new one (combined). The execution plan will then show if an index is used and if it's the correct one. // Antonin Houska (Tony) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
On Mon, Nov 18, 2013 at 7:25 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: The attached patches are the revised custom-scan APIs. My initial review on this feature: - The patches apply and build, but it produces a warning: ctidscan.c: In function ‘CTidInitCustomScanPlan’: ctidscan.c:362:9: warning: unused variable ‘scan_relid’ [-Wunused-variable] I'd recommend that you split the part1 patch containing the ctidscan contrib into its own patch. It is more than half of the patch and its certainly stands on its own. IMO, I think ctidscan fits a very specific use case and would be better off being an extension instead of in contrib. - Custom-scan.sgml was added to introduce the way to write custom-scan provider in the official documentation. - Much code duplication in postgres_fdw.c was eliminated. I split some fdw- handlers into two parts; common portion and fdw specific one. Executor callbacks of custom-scan code utilizes the common portion above because most of its implementations are equivalent. I'd like to see comments regarding to the way to handle Var reference onto a custom-scan that replaced relations join. A varno of Var that references a join relation is rtindex of either right or left relation, then setrefs.c adjust it well; INNER_VAR or OUTER_VAR shall be set instead. However, it does not work well if a custom-scan that just references result of remote join query was chosen instead of local join, because its result shall be usually set in the ps_ResultTupleSlot of PlanState, thus ExecEvalScalarVar does not reference neither inner nor outer slot. Instead of existing solution, I added one more special varno; CUSTOM_VARNO that just references result-tuple-slot of the target relation. If CUSTOM_VARNO is given, EXPLAIN(verbose) generates column name from the TupleDesc of underlying ps_ResultTupleSlot. I'm not 100% certain whether it is the best approach for us, but it works well. Also, I'm uncertain for usage of param_info in Path structure, even though I followed the manner in other portion. So, please point out if my usage was not applicable well. Thanks, 2013/11/11 Kohei KaiGai kai...@kaigai.gr.jp: Hi, I tried to write up a wikipage to introduce how custom-scan works. https://wiki.postgresql.org/wiki/CustomScanAPI Any comments please. 2013/11/6 Kohei KaiGai kai...@kaigai.gr.jp: The attached patches provide a feature to implement custom scan node that allows extension to replace a part of plan tree with its own code instead of the built-in logic. In addition to the previous proposition, it enables us to integrate custom scan as a part of candidate paths to be chosen by optimizer. Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers a set of API stuff and a simple demonstration module that implement regular table scan using inequality operator on ctid system column. The second one (pgsql-v9.4-custom-scan-remote-join) enhances postgres_fdw to support remote join capability. Below is an example to show how does custom-scan work. We usually run sequential scan even if clause has inequality operator that references ctid system column. postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid '(10,0)'::tid; QUERY PLAN Seq Scan on t1 (cost=0.00..209.00 rows= width=43) Filter: (ctid '(10,0)'::tid) (2 rows) An extension that performs as custom-scan provider suggests an alternative path, and its cost was less than sequential scan, thus optimizer choose it. postgres=# LOAD 'ctidscan'; LOAD postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid '(10,0)'::tid; QUERY PLAN -- Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows= width=43) Filter: (ctid '(10,0)'::tid) (2 rows) Of course, more cost effective plan will win if exists. postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid '(10,0)'::tid AND a = 200; QUERY PLAN --- Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43) Index Cond: (a = 200) Filter: (ctid '(10,0)'::tid) (3 rows) One other worthwhile example is remote-join enhancement on the postgres_fdw as follows. Both of ft1 and ft2 are foreign table being managed by same foreign server. postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x WHERE f_leak(b) AND y like '%aaa%'; QUERY PLAN -- Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72) Output: a, b, x, y Filter:
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Mon, Nov 18, 2013 at 6:44 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I think it's important to recap the design goals of this. Seems reasonable to list them out. * It should be usable and perform well for both large batch updates and small transactions. I think that that's a secondary goal, a question to be considered but perhaps deferred during this initial effort. I agree that it certainly is important. * It should perform well both when there are no duplicates, and when there are lots of duplicates I think this is very important. And from that follows some finer requirements: * Performance when there are no duplicates should be close to raw INSERT performance. * Performance when all rows are duplicates should be close to raw UPDATE performance. * We should not leave behind large numbers of dead tuples in either case. I agree with all that. Anything else I'm missing? I think so, yes. I'll add: * Should not deadlock unreasonably. If the UPDATE case is to work and perform almost as well as a regular UPDATE, that must mean that it has essentially the same characteristics as plain UPDATE. In particular, I feel fairly strongly that it is not okay for upserts to deadlock with each other unless the possibility of each transaction locking multiple rows (in an inconsistent order) exists. I don't want to repeat the mistakes of MySQL here. This is a point that I stressed to Robert on a previous occasion [1]. It's why value locks and row locks cannot be held at the same time. Incidentally, that implies that all alternative schemes involving bloat will bloat once per attempt, I believe. I'll also add: * Should anticipate a day when Postgres needs plumbing for SQL MERGE, which is still something we want, particularly for batch operations. I realize that the standard doesn't strictly require MERGE to handle the concurrency issues, but even still I don't think that an implementation that doesn't is practicable - does such an implementation currently exist in any other system? What about exclusion constraints? I'd like to see this work for them as well. Currently, exclusion constraints are checked after the tuple is inserted, and you abort if the constraint was violated. We could still insert the heap and index tuples first, but instead of aborting on violation, we would kill the heap tuple we already inserted and retry. There are some complications there, like how to wake up any other backends that are waiting to grab a lock on the tuple we just killed, but it seems doable. I agree that it's at least doable. That would, however, perform badly and leave garbage behind if there are duplicates. A refinement of that would be to first check for constraint violations, then insert the tuple, and then check again. That would avoid the garbage in most cases, but would perform much more poorly when there are no duplicates, because it needs two index scans for every insertion. A further refinement would be to keep track of how many duplicates there have been recently, and switch between the two strategies based on that. Seems like an awful lot of additional mechanism. That cost of doing two scans could be alleviated by using markpos/restrpos to do the second scan. That is presumably cheaper than starting a whole new scan with the same key. (markpos/restrpos don't currently work for non-MVCC snapshots, so that'd need to be fixed, though) Well, it seems like we could already use a pick up where you left off mechanism in the case of regular btree index tuple insertions into unique indexes -- after all, we don't do that in the event of blocking pending the outcome of the other transaction (that inserted a duplicate that we need to conclusively know has or has not committed) today. The fact that this doesn't already exist leaves me less than optimistic about the prospect of making it work to facilitate a scheme such as the one you describe here. (Today we still need to catch a committed version of the tuple that would make our tuple a duplicate from a fresh index scan, only *after* waiting for a transaction to commit/abort at the end of our original index scan). So we're already pretty naive about this, even though it would pay to not be. Making something like markpos work for the purposes of an upsert implementation seems not only hard, but also like a possible modularity violation. Are we not unreasonably constraining the implementation going forward? My patch respects the integrity of the am abstraction, and doesn't really add any knowledge to the core system about how amcanunique index methods might go about implementing the new amlock method. The core system worries a bit about the low level locks (as it naively refers to value locks), and doesn't consider that it has the right to hold on to them for more than an instant, but that's about it. Plus we don't have to worry about whether something does or does not work for a certain snapshot type with my approach,
Re: [HACKERS] GIN improvements part2: fast scan
On Fri, Nov 15, 2013 at 2:42 PM, Alexander Korotkov aekorot...@gmail.comwrote: On Fri, Nov 15, 2013 at 11:39 PM, Rod Taylor rod.tay...@gmail.com wrote: The patched index is 58% of the 9.4 master size. 212 MB instead of 365 MB. Good. That's meet my expectations :) You mention that both master and patched versions was compiled with debug. Was cassert enabled? Just debug. I try not to do performance tests with assertions on. Patch 7 gives the results I was looking for on this small sampling of data. gin-fast-scan.6.patch/9.4 master performance = the: 1147.413 ms 1159.360 ms 1122.549 ms and: 1035.540 ms 999.514 ms 1003.042 ms hotel: 57.670 ms 61.152 ms58.862 ms and the hotel: 266.121 ms 256.711 ms 267.011 ms hotel and the: 260.213 ms 254.055 ms 255.611 ms gin-fast-scan.7.patch = the: 1091.735 ms 1068.909 ms 1076.474 ms and: 985.690 ms 972.833 ms 948.286 ms hotel: 60.756 ms 59.028 ms57.836 ms and the hotel: 50.391 ms 38.715 ms46.168 ms hotel and the: 45.395 ms 40.880 ms43.978 ms Thanks, Rod
[HACKERS] Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block
Bruce Momjian wrote Considering we are doing this outside of a transaction, and WARNING or ERROR is pretty much the same, from a behavioral perspective. Should we change this and LOCK to be a warning? From the calling application's perspective an error and a warning are definitely behaviorally different. For this I'd vote for a warning (haven't pondered the LOCK scenario) as using SET out of context means the user has a fairly serious mis-understanding of the code path they have written (accedentially or otherwise). Notice makes sense (speaking generally and without much research here) for stuff where the ultimate outcome matches the statement but the statement itself didn't actually do anything. Auto-sequence and index generation fell into this but even notice was too noisy. In this case we'd expect that the no-op statement was issued in error and thus should be changed making a warning the level of incorrect-ness to communicate. A notice would be more appropriate if there were valid use-cases for the user doing this and we just want to make sure they are conscious of the unusualness of the situation. I dislike error for backward compatibility reasons. And saving the user from this kind of mistake doesn't warrant breaking what could be properly functioning code. Just because PostgreSQL isn't in a transaction does not mean the client is expecting the current code to work correctly - even if by accident - as part of a sequence of queries. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Suggestion-Issue-warning-when-calling-SET-TRANSACTION-outside-transaction-block-tp5743139p5778994.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] More legacy code: pg_ctl
Folks, Speaking of legacy code with bad default behaviors: pg_ctl. The current utility is designed to fathfully reproduce the rather hackish shell script we originally had for postgres startup. This results in a couple of unintuitive defaults which give sysadmins and config management developers headaches: a) by default, it returns to the caller without waiting for postgres to actually start/stop/restart. In this mode, it also always returns success regardless of result. b) by default, it remains attached to the caller's tty for stdout and stderr, even after it has switched to the regular postgres log. Yes, one can work around both of these with -w and -l, but the only reason those are non-default settings is that's the way the 7.X era shell script behaved. So at this point we're preserving unintuitive default behavior in order to be backwards compatible with a 1999-era shell script. I don't know if the answer is to rename the utility like we're discussing with pg_dump/pg_backup, or something else. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More legacy code: pg_ctl
On 11/18/2013 05:09 PM, Josh Berkus wrote: Folks, Speaking of legacy code with bad default behaviors: pg_ctl. The current utility is designed to fathfully reproduce the rather hackish shell script we originally had for postgres startup. This results in a couple of unintuitive defaults which give sysadmins and config management developers headaches: a) by default, it returns to the caller without waiting for postgres to actually start/stop/restart. In this mode, it also always returns success regardless of result. b) by default, it remains attached to the caller's tty for stdout and stderr, even after it has switched to the regular postgres log. Oh, and one more: c) that stop defaults to smart mode, instead of fast mode. Yes, one can work around both of these with -w and -l, but the only reason those are non-default settings is that's the way the 7.X era shell script behaved. So at this point we're preserving unintuitive default behavior in order to be backwards compatible with a 1999-era shell script. I don't know if the answer is to rename the utility like we're discussing with pg_dump/pg_backup, or something else. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] configure: allow adding a custom string to PG_VERSION
On Tue, 2013-11-05 at 18:29 +0200, Oskari Saarenmaa wrote: This can be used to tag custom built packages with an extra version string such as the git describe id or distribution package release version. I think this is a reasonable feature, and the implementation is OK, but I don't see why the format of the extra version information needs to be exactly PG_VERSION=$PACKAGE_VERSION ($withval) I'd imagine, for example, that someone will want to do -something or +something. So I'd just make this PG_VERSION=$PACKAGE_VERSION$withval Comments? +# Allow adding a custom string to PG_VERSION +PGAC_ARG_REQ(with, extra-version, [NAME], [extra version information], +[PG_VERSION=$PACKAGE_VERSION ($withval)], [PG_VERSION=$PACKAGE_VERSION]) This could be indented better. It was a bit confusing at first. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Mon, Nov 18, 2013 at 4:37 PM, Peter Geoghegan p...@heroku.com wrote: You're right that the value locking is scary. I think we need to very carefully consider it, once I have buy-in on the basic approach. I really do think it's the least-worst approach described to date. It isn't like we can't discuss making it inherently less scary, but I hesitate to do that now, given that I don't know if that discussing will go anywhere. One possible compromise would be promise tuples where we know we'll be able to keep our promise. In other words: 1. We lock values in the first phase, in more or less the manner of the extant patch. 2. When a consensus exists that heap tuple insertion proceeds, we proceed with insertion of these promise index tuples (and probably keep just a pin on the relevant pages). 3. Proceed with insertion of the heap tuple (with no value locks of any kind held). 3. Go back to the unique indexes, update the heap tid and unset the index tuple flag (that indicates that the tuples are in this promise state). Probably we can even be bright about re-finding the existing promise tuples with their proper heap tid (e.g. maybe we can avoid doing a regular index scan at least some of the time - chances are pretty good that the index tuple is on the same page as before, so it's generally well worth a shot looking there first). As with the earlier promise tuple proposals, we store our xid in the ItemPointer. 4. Finally, insertion of non-unique index tuples occurs in the regular manner. Obviously the big advantage here is that we don't have to worry about value locking across heap tuple insertion at all, and yet we don't have to worry about bloating, because we really do know that insertion proper will proceed when inserting *this* type of promise index tuple. Maybe that even makes it okay to just use buffer locks, if we think some more about the other edge cases. Regular index scans take the aforementioned flag as a kind of visibility hint, perhaps, so we don't have to worry about them. And VACUUM would kill any dead promise tuples - this would be much less of a concern than with the earlier promise tuple proposals, because it is extremely non routine. Maybe it's fine to not make autovacuum concerned about a whole new class of (index-only) bloat, which seemed like a big problem with those earlier proposals, simply because crashes within this tiny window are hopefully so rare that it couldn't possibly amount to much bloat in the grand scheme of things (at least before a routine VACUUM - UPDATEs tend to necessitate those). If you have 50 upserting backends in this tiny window during a crash, that would be only 50 dead index tuples. Given the window is so tiny, I doubt it would be much of a problem at all - even 50 seems like a very high number. The track_counts counts that drive autovacuum here are already not crash safe, so I see no regression. Now, you still have to value lock across multiple btree unique indexes, and I understand there are reservations about this. But the surface area is made significantly smaller at reasonably low cost. Furthermore, doing TOASTing out-of-line and so on ceases to be necessary. The LOCK FOR UPDATE case is the same as before. Nothing else changes. FWIW, without presuming anything about value locking implementation, I'm not too worried about making the implementation scale to very large numbers of unique indexes, with very low shared_buffer settings. We already have a fairly similar situation with max_locks_per_transaction and so on, no? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of pg_stat_statement usage about buffer hit ratio
(2013/11/19 3:56), Peter Geoghegan wrote: On Mon, Nov 18, 2013 at 10:49 AM, Fujii Masao masao.fu...@gmail.com wrote: The same idea was proposed before but not committed because Itagaki thought that pg_stat_statements view should report only raw values. Please read the following thread. I have the same feeling with him. Anyway we should listen to more opinions from other people, though. http://www.postgresql.org/message-id/20091222172719.8b86.52131...@oss.ntt.co.jp I confirmed that Itagaki-san and Mr Cerdic disscution. He said that raw values be just simple. However, were his changes just simple? I cannot understand his aesthetics sense and also you, too:-( PG8.4 before: 012 CREATE FUNCTION pg_stat_statements( 013 OUT userid oid, 014 OUT dbid oid, 015 OUT query text, 016 OUT calls int8, 017 OUT total_time float8, 018 OUT rows int8 019 ) PG9.0 after: 012 CREATE FUNCTION pg_stat_statements( 013 OUT userid oid, 014 OUT dbid oid, 015 OUT query text, 016 OUT calls int8, 017 OUT total_time float8, 018 OUT rows int8, 019 OUT shared_blks_hit int8, 020 OUT shared_blks_read int8, 021 OUT shared_blks_written int8, 022 OUT local_blks_hit int8, 023 OUT local_blks_read int8, 024 OUT local_blks_written int8, 025 OUT temp_blks_read int8, 026 OUT temp_blks_written int8 027 ) It's too complicated, and do you know how to tuning PG from information of local_* and temp_*? At least, I think that most user cannot tuning from these information, and it might not be useful information only part of them. +1 from me. I think that a higher level tool needs to be built on pg_stat_statements to make things easy for those that want a slick, pre-packaged solution. No. It's not for geek tools and people having pre-packaged solution in big company, but also for common DBA tools. By the way, MySQL and Oracle database which are very popular have these statistics. I think that your argument might disturb people who wants to migration from these database and will accelerate popularity of these database more. As I said on the min/max thread, if we're not doing enough to help people who would like to build such a tool, we should discuss how we can do better. Could you tell me how to get min/max statistics with low cost? I'm not sure about detail of your patch in CF, but it seems very high cost. Repeatedly, I think that if we want to get drastic detail statistics, we have to create another tools of statistics. Your patch will be these statistics tools. However, pg_stat_statement sholud be just light weight. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block
On Mon, Nov 18, 2013 at 05:05:45PM -0800, David Johnston wrote: Bruce Momjian wrote Considering we are doing this outside of a transaction, and WARNING or ERROR is pretty much the same, from a behavioral perspective. Should we change this and LOCK to be a warning? From the calling application's perspective an error and a warning are definitely behaviorally different. For this I'd vote for a warning (haven't pondered the LOCK scenario) as using SET out of context means the user has a fairly serious mis-understanding of the code path they have written (accedentially or otherwise). Notice makes sense (speaking generally and without much research here) for stuff where the ultimate outcome matches the statement but the statement itself didn't actually do anything. Auto-sequence and index generation fell into this but even notice was too noisy. In this case we'd expect that the no-op statement was issued in error and thus should be changed making a warning the level of incorrect-ness to communicate. A notice would be more appropriate if there were valid use-cases for the user doing this and we just want to make sure they are conscious of the unusualness of the situation. I dislike error for backward compatibility reasons. And saving the user from this kind of mistake doesn't warrant breaking what could be properly functioning code. Just because PostgreSQL isn't in a transaction does not mean the client is expecting the current code to work correctly - even if by accident - as part of a sequence of queries. Well, ERROR is what LOCK returns, so if we change SET TRANSACTION to be WARNING, we should change LOCK too, so on backward-compatibility grounds, ERROR makes more sense. Personally, I am fine with changing them all to WARNING. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] configure: allow adding a custom string to PG_VERSION
On Tue, Nov 19, 2013 at 10:48 AM, Peter Eisentraut pete...@gmx.net wrote: I think this is a reasonable feature, and the implementation is OK, but I don't see why the format of the extra version information needs to be exactly PG_VERSION=$PACKAGE_VERSION ($withval) I'd imagine, for example, that someone will want to do -something or +something. So I'd just make this PG_VERSION=$PACKAGE_VERSION$withval Comments? It makes sense and brings more flexibility. So +1 for this modification. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block
Bruce Momjian wrote On Mon, Nov 18, 2013 at 05:05:45PM -0800, David Johnston wrote: Bruce Momjian wrote Considering we are doing this outside of a transaction, and WARNING or ERROR is pretty much the same, from a behavioral perspective. Should we change this and LOCK to be a warning? From the calling application's perspective an error and a warning are definitely behaviorally different. For this I'd vote for a warning (haven't pondered the LOCK scenario) as using SET out of context means the user has a fairly serious mis-understanding of the code path they have written (accedentially or otherwise). Notice makes sense (speaking generally and without much research here) for stuff where the ultimate outcome matches the statement but the statement itself didn't actually do anything. Auto-sequence and index generation fell into this but even notice was too noisy. In this case we'd expect that the no-op statement was issued in error and thus should be changed making a warning the level of incorrect-ness to communicate. A notice would be more appropriate if there were valid use-cases for the user doing this and we just want to make sure they are conscious of the unusualness of the situation. I dislike error for backward compatibility reasons. And saving the user from this kind of mistake doesn't warrant breaking what could be properly functioning code. Just because PostgreSQL isn't in a transaction does not mean the client is expecting the current code to work correctly - even if by accident - as part of a sequence of queries. Well, ERROR is what LOCK returns, so if we change SET TRANSACTION to be WARNING, we should change LOCK too, so on backward-compatibility grounds, ERROR makes more sense. Personally, I am fine with changing them all to WARNING. Error makes more sense if the goal is internal consistency. That goal should be subservient to backward compatibility. Changing LOCK to warning is less problematic since the likelihood of current code functioning in such a way that after upgrade it would begin working differently in the absence of an error does not seem probable. Basically someone would have be trapping on the error and conditionally branching their logic. That said, if this was a day 0 decision I'd likely raise an error. Weakening LOCK doesn't make sense since it is day 0 behavior. Document the warning for SET as being weaker than ideal because of backward compatibility and call it a day (i.e. leave LOCK at error). The documentation, not the code, then enforces the feeling that such usage is considered wrong without possibly breaking wrong but working code. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Suggestion-Issue-warning-when-calling-SET-TRANSACTION-outside-transaction-block-tp5743139p5779006.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of pg_stat_statement usage about buffer hit ratio
(2013/11/19 11:12), KONDO Mitsumasa wrote: (2013/11/19 3:56), Peter Geoghegan wrote: On Mon, Nov 18, 2013 at 10:49 AM, Fujii Masao masao.fu...@gmail.com wrote: The same idea was proposed before but not committed because Itagaki thought that pg_stat_statements view should report only raw values. Please read the following thread. I have the same feeling with him. Anyway we should listen to more opinions from other people, though. http://www.postgresql.org/message-id/20091222172719.8b86.52131...@oss.ntt.co.jp I confirmed that Itagaki-san and Mr Cerdic disscution. He said that raw values be just simple. However, were his changes just simple? I cannot understand his aesthetics sense and also you, too:-( PG8.4 before: 012 CREATE FUNCTION pg_stat_statements( 013 OUT userid oid, 014 OUT dbid oid, 015 OUT query text, 016 OUT calls int8, 017 OUT total_time float8, 018 OUT rows int8 019 ) PG9.0 after: 012 CREATE FUNCTION pg_stat_statements( 013 OUT userid oid, 014 OUT dbid oid, 015 OUT query text, 016 OUT calls int8, 017 OUT total_time float8, 018 OUT rows int8, 019 OUT shared_blks_hit int8, 020 OUT shared_blks_read int8, 021 OUT shared_blks_written int8, 022 OUT local_blks_hit int8, 023 OUT local_blks_read int8, 024 OUT local_blks_written int8, 025 OUT temp_blks_read int8, 026 OUT temp_blks_written int8 027 ) It's too complicated, and do you know how to tuning PG from information of local_* and temp_*? At least, I think that most user cannot tuning from these information, and it might not be useful information only part of them. +1 from me. I think that a higher level tool needs to be built on pg_stat_statements to make things easy for those that want a slick, pre-packaged solution. No. It's not for geek tools and people having pre-packaged solution in big company, but also for common DBA tools. By the way, MySQL and Oracle database which are very popular have these statistics. I think that your argument might disturb people who wants to migration from these database and will accelerate popularity of these database more. As I said on the min/max thread, if we're not doing enough to help people who would like to build such a tool, we should discuss how we can do better. Could you tell me how to get min/max statistics with low cost? I'm not sure about detail of your patch in CF, but it seems very high cost. Repeatedly, I think that if we want to get drastic detail statistics, we have to create another tools of statistics. Your patch will be these statistics tools. However, pg_stat_statement sholud be just light weight. Regards, Oh, I forgot to say it... I beleive that essence of information technology is to show more informative information in little information with low cost. If it wrong, please let me know the reason. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Remove use of obsolescent Autoconf macros
According to the Autoconf documentation, the following macros are obsolescent because no reasonably current system has the defect that they are testing for. - AC_C_CONST - AC_C_STRINGIZE - AC_C_VOLATILE - AC_FUNC_MEMCMP Therefore, I propose to remove their use, with the attached patch. From 21de25975e7ea66e562d9b320c85cef53d0b Mon Sep 17 00:00:00 2001 From: Peter Eisentraut pete...@gmx.net Date: Tue, 12 Nov 2013 06:46:01 -0500 Subject: [PATCH] Remove use of obsolescent Autoconf macros Remove the use of the following macros, which are obsolescent according to the Autoconf documentation: - AC_C_CONST - AC_C_STRINGIZE - AC_C_VOLATILE - AC_FUNC_MEMCMP --- configure | 297 - configure.in | 6 - src/include/c.h| 22 +--- src/include/pg_config.h.in | 10 -- src/port/memcmp.c | 70 --- 5 files changed, 3 insertions(+), 402 deletions(-) delete mode 100644 src/port/memcmp.c diff --git a/configure b/configure index 74e34eb..0165c3c 100755 --- a/configure +++ b/configure @@ -15326,113 +15326,6 @@ $as_echo $as_me: error: unknown endianness { (exit 1); exit 1; }; } ;; esac -{ $as_echo $as_me:$LINENO: checking for an ANSI C-conforming const 5 -$as_echo_n checking for an ANSI C-conforming const... 6; } -if test ${ac_cv_c_const+set} = set; then - $as_echo_n (cached) 6 -else - cat conftest.$ac_ext _ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h conftest.$ac_ext -cat conftest.$ac_ext _ACEOF -/* end confdefs.h. */ - -int -main () -{ -/* FIXME: Include the comments suggested by Paul. */ -#ifndef __cplusplus - /* Ultrix mips cc rejects this. */ - typedef int charset[2]; - const charset cs; - /* SunOS 4.1.1 cc rejects this. */ - char const *const *pcpcc; - char **ppc; - /* NEC SVR4.0.2 mips cc rejects this. */ - struct point {int x, y;}; - static struct point const zero = {0,0}; - /* AIX XL C 1.02.0.0 rejects this. - It does not let you subtract one const X* pointer from another in - an arm of an if-expression whose if-part is not a constant - expression */ - const char *g = string; - pcpcc = g + (g ? g-g : 0); - /* HPUX 7.0 cc rejects these. */ - ++pcpcc; - ppc = (char**) pcpcc; - pcpcc = (char const *const *) ppc; - { /* SCO 3.2v4 cc rejects this. */ -char *t; -char const *s = 0 ? (char *) 0 : (char const *) 0; - -*t++ = 0; -if (s) return 0; - } - { /* Someone thinks the Sun supposedly-ANSI compiler will reject this. */ -int x[] = {25, 17}; -const int *foo = x[0]; -++foo; - } - { /* Sun SC1.0 ANSI compiler rejects this -- but not the above. */ -typedef const int *iptr; -iptr p = 0; -++p; - } - { /* AIX XL C 1.02.0.0 rejects this saying - k.c, line 2.27: 1506-025 (S) Operand must be a modifiable lvalue. */ -struct s { int j; const int *ap[3]; }; -struct s *b; b-j = 5; - } - { /* ULTRIX-32 V3.1 (Rev 9) vcc rejects this */ -const int foo = 10; -if (!foo) return 0; - } - return !cs[0] !zero.x; -#endif - - ; - return 0; -} -_ACEOF -rm -f conftest.$ac_objext -if { (ac_try=$ac_compile -case (($ac_try in - *\* | *\`* | *\\*) ac_try_echo=\$ac_try;; - *) ac_try_echo=$ac_try;; -esac -eval ac_try_echo=\\$as_me:$LINENO: $ac_try_echo\ -$as_echo $ac_try_echo) 5 - (eval $ac_compile) 2conftest.er1 - ac_status=$? - grep -v '^ *+' conftest.er1 conftest.err - rm -f conftest.er1 - cat conftest.err 5 - $as_echo $as_me:$LINENO: \$? = $ac_status 5 - (exit $ac_status); } { - test -z $ac_c_werror_flag || - test ! -s conftest.err - } test -s conftest.$ac_objext; then - ac_cv_c_const=yes -else - $as_echo $as_me: failed program was: 5 -sed 's/^/| /' conftest.$ac_ext 5 - - ac_cv_c_const=no -fi - -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -fi -{ $as_echo $as_me:$LINENO: result: $ac_cv_c_const 5 -$as_echo $ac_cv_c_const 6; } -if test $ac_cv_c_const = no; then - -cat confdefs.h \_ACEOF -#define const /**/ -_ACEOF - -fi - { $as_echo $as_me:$LINENO: checking for inline 5 $as_echo_n checking for inline... 6; } if test ${ac_cv_c_inline+set} = set; then @@ -15572,40 +15465,6 @@ _ACEOF fi -{ $as_echo $as_me:$LINENO: checking for preprocessor stringizing operator 5 -$as_echo_n checking for preprocessor stringizing operator... 6; } -if test ${ac_cv_c_stringize+set} = set; then - $as_echo_n (cached) 6 -else - cat conftest.$ac_ext _ACEOF -/* confdefs.h. */ -_ACEOF -cat confdefs.h conftest.$ac_ext -cat conftest.$ac_ext _ACEOF -/* end confdefs.h. */ -#define x(y) #y - -char *s = x(teststring); -_ACEOF -if (eval $ac_cpp conftest.$ac_ext) 25 | - $EGREP #teststring /dev/null 21; then - ac_cv_c_stringize=no -else - ac_cv_c_stringize=yes -fi -rm -f conftest* - -fi -{ $as_echo $as_me:$LINENO: result: $ac_cv_c_stringize 5 -$as_echo $ac_cv_c_stringize 6; } -if test $ac_cv_c_stringize = yes; then - -cat confdefs.h \_ACEOF -#define HAVE_STRINGIZE 1 -_ACEOF - -fi - { $as_echo
Re: [HACKERS] Transaction-lifespan memory leak with plpgsql DO blocks
On 13 November 2013 03:17 David Johnston wrote, Having had this same thought WRT the FOR UPDATE in LOOP bug posting the lack of a listing of outstanding bugs does leave some gaps. I would imagine people would appreciate something like: Frequency: Rare Severity: Low Fix Complexity: Moderate Work Around: Easy - create an actual function; create some form of loop Status: Confirmed - Awaiting Volunteers to Fix This problem is fixed as explained below.. 1. Created own simple_eval_estate for every Do block in plpgsql_inline_handler and Freed it after the execution is finished. 2. While executing the simple expression if func-simple_eval_estate is not null (means its Do block) then use this otherwise use globle one. Patch is attached in the mail. Please let me know whether this approach is fine or not ? Regards, Dilip transaction-lifespan_memory_leak_fix.patch Description: transaction-lifespan_memory_leak_fix.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improvement of pg_stat_statement usage about buffer hit ratio
On Mon, Nov 18, 2013 at 6:12 PM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: I confirmed that Itagaki-san and Mr Cerdic disscution. He said that raw values be just simple. However, were his changes just simple? I cannot understand his aesthetics sense and also you, too:-( It's too complicated, and do you know how to tuning PG from information of local_* and temp_*? At least, I think that most user cannot tuning from these information, and it might not be useful information only part of them. All of those costs are cumulative aggregates. If we didn't aggregate them, then the user couldn't possibly determine them on their own, to any approximation. That's the difference. If you think the local_* and temp_* aren't very useful, I'm inclined to agree, but it's too late to do anything about that now. No. It's not for geek tools and people having pre-packaged solution in big company, but also for common DBA tools. I don't think that the tool needs to be expensive. If selecting from the pg_stat_statements view every 1-3 seconds is too expensive for such a tool, we can have a discussion about being smarter, because there certainly are ways to optimize it. Regarding your min/max patch: I'm opposed to adding even more to the spinlock-protected counters struct, so that we can get an exact answer to a question where an approximate answer would very likely be good enough. And as Itagaki-san said 4 years ago, who is to say that what you've done here for buffers (or equally, what you've done in your min/max patch) is more interesting than the same thing but for another cost? The point of having what you've removed from the pg_stat_statements docs about calculating averages is that it is an example that can be generalized from. I certainly think there should be better tooling to make displaying costs over time easier, or characterizing the distribution, but unfortunately this isn't it. Something like pg_stat_statements is allowed to be approximate. That's considered an appropriate trade-off. Most obviously, today there can be hash table collisions, and some of the entries can therefore be plain wrong. Previously, I put the probability of 1 collision in the hash table at about 1% when pg_stat_statements.max is set to 10,000. So if your min/max patch was implemented in userspace, and an outlier is lost in the noise with just one second of activity, I'm not terribly concerned about that. It's a trade-off, and if you don't think it's the correct one, well then I'm afraid that's just where you and I differ. As I've said many times, if you want to have a discussion about making aggressive snapshotting of the pg_stat_statements view more practical, I think that would be very useful. By the way, MySQL and Oracle database which are very popular have these statistics. I think that your argument might disturb people who wants to migration from these database and will accelerate popularity of these database more. I think that there should be better tooling built on top of pg_stat_statements. I don't know what Oracle does, but I'm pretty sure that MySQL has nothing like pg_stat_statements. Please correct me if I'm mistaken. As I said on the min/max thread, if we're not doing enough to help people who would like to build such a tool, we should discuss how we can do better. Could you tell me how to get min/max statistics with low cost? See my previous comments on the other thread about making pg_stat_statements only return changed entries, and only sending the query text once. I'm not sure about detail of your patch in CF, but it seems very high cost. I think you should actually go and read the code and read my explanation of it, and refrain from making uninformed remarks like this. Whatever overhead my patch may add, the important point is that it obviously and self-evidently adds exactly zero overhead to maintaining statistics for existing entries - we only care about the query text when first creating an entry, or when viewing an entry when the view is selected from. With the reduced shared memory consumption, more entries can be created, making the cost of creating new entries (and thus whatever might have been added to that cost) matter less. Having said that, the additional cost is thought to be very low anyway. If you read my mail to the list about this, you'd know that I specifically worried about the implications of what I'd proposed for tools like your tool. That's part of the reason I keep urging you to think about making pg_stat_statements cheaper for consumer tools. There is no reason to send query texts to such tools more than once per entry, and no reason to send unchanged entries. Repeatedly, I think that if we want to get drastic detail statistics, we have to create another tools of statistics. Your patch will be these statistics tools. However, pg_stat_statement sholud be just light weight. This is incomprehensible. As with the cumulative aggregate costs, how is a consumer
[HACKERS] Standalone synchronous master
This patch implements the following TODO item: Add a new eager synchronous mode that starts out synchronous but reverts to asynchronous after a failure timeout period This would require some type of command to be executed to alert administrators of this change. http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php This patch implementation is in the same line as it was given in the earlier thread. Some Of the additional important changes are: 1. Have added two GUC variable to take commands from user to be executed a. Master_to_standalone_cmd: To be executed before master switches to standalone mode. b. Master_to_sync_cmd: To be executed before master switches from sync mode to standalone mode. 2. Master mode switch will happen only if the corresponding command executed successfully. 3. Taken care of replication timeout to decide whether synchronous standby has gone down. i.e. only after expiry of wal_sender_timeout, the master will switch from sync mode to standalone mode. Please provide your opinion or any other expectation out of this patch. I will add the same to November commitFest. Thanks and Regards, Kumar Rajeev Rastogi replication_new_modeV1.patch Description: replication_new_modeV1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add \i option to bring in the specified file as a quoted literal
Dnia 2013-11-13, śro o godzinie 10:26 -0500, Bruce Momjian pisze: On Wed, Nov 13, 2013 at 08:58:07AM +0530, Amit Kapila wrote: On Tue, Nov 12, 2013 at 9:37 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Oct 23, 2013 at 10:31:39AM +0530, Amit Kapila wrote: On Tue, Oct 22, 2013 at 3:04 AM, Piotr Marcinczyk pmarc...@gmail.com wrote: Hi, I would like to implement item from TODO marked as easy: Add \i option to bring in the specified file as a quoted literal. I understand intent of this item, to be able to have parts of query written in separate files (now it is impossible, because \i tries to execute content of file as a separate command by function process_file). For the usecase discussed in the mail chain of that TODO item, Robert Haas has provided an alternative to achieve it, please check below link: http://www.postgresql.org/message-id/AANLkTi=7c8xfyf7uqw0y+si8ondkoy2nx8jc4bu0g...@mail.gmail.com If you think that alternative is not sufficient for the use case, then adding new option/syntax is worth, otherwise it might be a shortcut or other form of some existing way which can be useful depending on how frequently users use this syntax. So, can we remove this TODO item? TODO item is created before Robert Haas has provided an alternative way to achieve the same thing. In some cases there are multiple ways to achieve the same thing (example: shortcut options in psql) if it is used quite frequently and people want some easy way of doing it. In this case I don't think this is used frequently, so I don't see the need of doing it. We should remove this TODO item. OK, removed. Well, I wrote it few days ago. I'm sure it is not critical, but I suppose it may be useful. This is my first patch, so I think that it is good idea to sent it and have it reviewed anyway. First argument to send it, is to see what kind of errors I made, to not do them in the next patches. Second, if it (flexible appending file to buffer) appears interesting for reviewer, it may be committed. *** a/doc/src/sgml/ref/psql-ref.sgml --- b/doc/src/sgml/ref/psql-ref.sgml *** *** 1731,1736 hello 10 --- 1731,1749 varlistentry + termliteral\ib replaceable class=parameterfilename/replaceable [ replaceable class=parameterquote_string/replaceable ] /literal/term + listitem + para + The literal\ib/ command appends content of file literalfilename/literal + to current query buffer. If parameter literalquote_string/literal + is not set, no quotation is used. If it is set, content of file will be + quoted by literalquote_string/literal enclosed in literal$/literal. + /para + /listitem + /varlistentry + + + varlistentry termliteral\ir replaceable class=parameterfilename/replaceable/literal/term listitem para *** a/src/bin/psql/command.c --- b/src/bin/psql/command.c *** *** 59,64 static backslashResult exec_command(const char *cmd, --- 59,65 PQExpBuffer query_buf); static bool do_edit(const char *filename_arg, PQExpBuffer query_buf, int lineno, bool *edited); + static bool do_append(const char *filename_arg, const char *quote_string, PQExpBuffer query_buf); static bool do_connect(char *dbname, char *user, char *host, char *port); static bool do_shell(const char *command); static bool do_watch(PQExpBuffer query_buf, long sleep); *** *** 798,806 exec_command(const char *cmd, } ! /* \i and \ir include files */ else if (strcmp(cmd, i) == 0 || strcmp(cmd, include) == 0 ! || strcmp(cmd, ir) == 0 || strcmp(cmd, include_relative) == 0) { char *fname = psql_scan_slash_option(scan_state, OT_NORMAL, NULL, true); --- 799,808 } ! /* \i, \ib and \ir include files */ else if (strcmp(cmd, i) == 0 || strcmp(cmd, include) == 0 ! || strcmp(cmd, ir) == 0 || strcmp(cmd, include_relative) == 0 ! || strcmp(cmd, ib) == 0 || strcmp(cmd, include_buffer) == 0) { char *fname = psql_scan_slash_option(scan_state, OT_NORMAL, NULL, true); *** *** 812,823 exec_command(const char *cmd, } else { ! bool include_relative; - include_relative = (strcmp(cmd, ir) == 0 - || strcmp(cmd, include_relative) == 0); - expand_tilde(fname); - success = (process_file(fname, false, include_relative) == EXIT_SUCCESS); free(fname); } } --- 814,845 } else { ! bool include_buffer; ! ! include_buffer = (strcmp(cmd, ib) == 0 ! || strcmp(cmd, include_buffer) == 0); ! ! ! if (include_buffer) ! { ! char *quote_string = psql_scan_slash_option(scan_state, ! OT_NORMAL, NULL, true); ! expand_tilde(fname); ! success = !do_append(fname, quote_string, query_buf); ! if
Re: [HACKERS] GIN improvements part2: fast scan
I checked out master and put together a test case using a small percentage of production data for a known problem we have with Pg 9.2 and text search scans. A small percentage in this case means 10 million records randomly selected; has a few billion records. Tests ran for master successfully and I recorded timings. Applied the patch included here to master along with gin-packed-postinglists-14.patch. Run make clean; ./configure; make; make install. make check (All 141 tests passed.) initdb, import dump The GIN index fails to build with a segfault. DETAIL: Failed process was running: CREATE INDEX textsearch_gin_idx ON kp USING gin (to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT NULL); #0 XLogCheckBuffer (holdsExclusiveLock=1 '\001', lsn=lsn@entry=0x7fffcf341920, bkpb=bkpb@entry=0x7fffcf341960, rdata=0x468f11 ginFindLeafPage+529, rdata=0x468f11 ginFindLeafPage+529) at xlog.c:2339 #1 0x004b9ddd in XLogInsert (rmid=rmid@entry=13 '\r', info=info@entry=16 '\020', rdata=rdata@entry=0x7fffcf341bf0) at xlog.c:936 #2 0x00468a9e in createPostingTree (index=0x7fa4e8d31030, items=items@entry=0xfb55680, nitems=nitems@entry=762, buildStats=buildStats@entry=0x7fffcf343dd0) at gindatapage.c:1324 #3 0x004630c0 in buildFreshLeafTuple (buildStats=0x7fffcf343dd0, nitem=762, items=0xfb55680, category=optimized out, key=34078256, attnum=optimized out, ginstate=0x7fffcf341df0) at gininsert.c:281 #4 ginEntryInsert (ginstate=ginstate@entry=0x7fffcf341df0, attnum=optimized out, key=34078256, category=optimized out, items=0xfb55680, nitem=762, buildStats=buildStats@entry=0x7fffcf343dd0) at gininsert.c:351 #5 0x004635b0 in ginbuild (fcinfo=optimized out) at gininsert.c:531 #6 0x00718637 in OidFunctionCall3Coll (functionId=functionId@entry=2738, collation=collation@entry=0, arg1=arg1@entry=140346257507968, arg2=arg2@entry=140346257510448, arg3=arg3@entry=32826432) at fmgr.c:1649 #7 0x004ce1da in index_build (heapRelation=heapRelation@entry=0x7fa4e8d30680, indexRelation=indexRelation@entry=0x7fa4e8d31030, indexInfo=indexInfo@entry=0x1f4e440, isprimary=isprimary@entry=0 '\000', isreindex=isreindex@entry=0 '\000') at index.c:1963 #8 0x004ceeaa in index_create (heapRelation=heapRelation@entry=0x7fa4e8d30680, indexRelationName=indexRelationName@entry=0x1f4e660 textsearch_gin_knn_idx, indexRelationId=16395, indexRelationId@entry=0, relFileNode=optimized out, indexInfo=indexInfo@entry=0x1f4e440, indexColNames=indexColNames@entry=0x1f4f728, accessMethodObjectId=accessMethodObjectId@entry=2742, tableSpaceId=tableSpaceId@entry=0, collationObjectId=collationObjectId@entry=0x1f4fcc8, classObjectId=classObjectId@entry=0x1f4fce0, coloptions=coloptions@entry=0x1f4fcf8, reloptions=reloptions@entry=0, isprimary=0 '\000', isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000', allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000', is_internal=0 '\000') at index.c:1082 #9 0x00546a78 in DefineIndex (stmt=optimized out, indexRelationId=indexRelationId@entry=0, is_alter_table=is_alter_table@entry=0 '\000', check_rights=check_rights@entry=1 '\001', skip_build=skip_build@entry=0 '\000', quiet=quiet@entry=0 '\000') at indexcmds.c:594 #10 0x0065147e in ProcessUtilitySlow (parsetree=parsetree@entry=0x1f7fb68, queryString=0x1f7eb10 CREATE INDEX textsearch_gin_idx ON kp USING gin (to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT NULL);, context=optimized out, params=params@entry=0x0, completionTag=completionTag@entry=0x7fffcf344c10 , dest=optimized out) at utility.c:1163 #11 0x0065079e in standard_ProcessUtility (parsetree=0x1f7fb68, queryString=optimized out, context=optimized out, params=0x0, dest=optimized out, completionTag=0x7fffcf344c10 ) at utility.c:873 #12 0x0064de61 in PortalRunUtility (portal=portal@entry=0x1f4c350, utilityStmt=utilityStmt@entry=0x1f7fb68, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x1f7ff08, completionTag=completionTag@entry=0x7fffcf344c10 ) at pquery.c:1187 #13 0x0064e9e5 in PortalRunMulti (portal=portal@entry=0x1f4c350, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x1f7ff08, altdest=altdest@entry=0x1f7ff08, completionTag=completionTag@entry=0x7fffcf344c10 ) at pquery.c:1318 #14 0x0064f459 in PortalRun (portal=portal@entry=0x1f4c350, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x1f7ff08, altdest=altdest@entry=0x1f7ff08, completionTag=completionTag@entry=0x7fffcf344c10 ) at pquery.c:816 #15 0x0064d2d5 in exec_simple_query ( query_string=0x1f7eb10 CREATE INDEX textsearch_gin_idx ON kp USING gin (to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT NULL);) at postgres.c:1048 #16 PostgresMain (argc=optimized out, argv=argv@entry=0x1f2ad40, dbname=0x1f2abf8 rbt, username=optimized out) at
Re: [HACKERS] GIN improvements part2: fast scan
I tried again this morning using gin-packed-postinglists-16.patch and gin-fast-scan.6.patch. No crashes. It is about a 0.1% random sample of production data (10,000,000 records) with the below structure. Pg was compiled with debug enabled in both cases. Table public.kp Column | Type | Modifiers +-+--- id | bigint | not null string | text| not null score1 | integer | score2 | integer | score3 | integer | score4 | integer | Indexes: kp_pkey PRIMARY KEY, btree (id) kp_string_key UNIQUE CONSTRAINT, btree (string) textsearch_gin_idx gin (to_tsvector('simple'::regconfig, string)) WHERE score1 IS NOT NULL This is a query tested. All data is in Pg buffer cache for these timings. Words like the and and are very common (~9% of entries, each) and a word like hotel is much less common (~0.2% of entries). SELECT id,string FROM kp WHERE score1 IS NOT NULL AND to_tsvector('simple', string) @@ to_tsquery('simple', ?) -- ? is substituted with the query strings ORDER BY score1 DESC, score2 ASC LIMIT 1000; Limit (cost=56.04..56.04 rows=1 width=37) (actual time=250.010..250.032 rows=142 loops=1) - Sort (cost=56.04..56.04 rows=1 width=37) (actual time=250.008..250.017 rows=142 loops=1) Sort Key: score1, score2 Sort Method: quicksort Memory: 36kB - Bitmap Heap Scan on kp (cost=52.01..56.03 rows=1 width=37) (actual time=249.711..249.945 rows=142 loops=1) Recheck Cond: ((to_tsvector('simple'::regconfig, string) @@ '''hotel'' ''and'' ''the'''::tsquery) AND (score1 IS NOT NULL)) - Bitmap Index Scan on textsearch_gin_idx (cost=0.00..52.01 rows=1 width=0) (actual time=249.681..249.681 rows=142 loops=1) Index Cond: (to_tsvector('simple'::regconfig, string) @@ '''hotel'' ''and'' ''the'''::tsquery) Total runtime: 250.096 ms Times are from \timing on. MASTER === the: 888.436 ms 926.609 ms 885.502 ms and: 944.052 ms 937.732 ms 920.050 ms hotel: 53.992 ms57.039 ms65.581 ms and the hotel: 260.308 ms 248.275 ms 248.098 ms These numbers roughly match what we get with Pg 9.2. The time savings between 'the' and 'and the hotel' is mostly heap lookups for the score and the final sort. The size of the index on disk is about 2% smaller in the patched version. PATCHED === the: 1055.169 ms 1081.976 ms 1083.021 ms and: 912.173 ms 949.364 ms 965.261 ms hotel: 62.591 ms 64.341 ms62.923 ms and the hotel: 268.577 ms 259.293 ms 257.408 ms hotel and the: 253.574 ms 258.071 ms 250.280 ms I was hoping that the 'and the hotel' case would improve with this patch to be closer to the 'hotel' search, as I thought that was the kind of thing it targeted. Unfortunately, it did not. I actually applied the patches, compiled, initdb/load data, and ran it again thinking I made a mistake. Reordering the terms 'hotel and the' doesn't change the result. On Fri, Nov 15, 2013 at 1:51 AM, Alexander Korotkov aekorot...@gmail.comwrote: On Fri, Nov 15, 2013 at 3:25 AM, Rod Taylor r...@simple-knowledge.comwrote: I checked out master and put together a test case using a small percentage of production data for a known problem we have with Pg 9.2 and text search scans. A small percentage in this case means 10 million records randomly selected; has a few billion records. Tests ran for master successfully and I recorded timings. Applied the patch included here to master along with gin-packed-postinglists-14.patch. Run make clean; ./configure; make; make install. make check (All 141 tests passed.) initdb, import dump The GIN index fails to build with a segfault. Thanks for testing. See fixed version in thread about packed posting lists. -- With best regards, Alexander Korotkov.
Re: [HACKERS] pg_upgrade: delete_old_cluster.sh issues
On Tue, Nov 12, 2013 at 10:35:58AM +, Marc Mamin wrote: Hello, IMHO, there is a serious issue in the script to clean the old data directory when running pg_upgrade in link mode. in short: When working with symbolic links, the first step in delete_old_cluster.sh is to delete the old $PGDATA folder that may contain tablespaces used by the new instance. in long, our use case: Rather than removing files/directories individually, which would be difficult to maintain, we decided in pg_upgrade 9.3 to detect tablespaces in the old data directory and report that and not create a delete script. Here is the commit: http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=4765dd79219b9697d84f5c2c70f3fe00455609a1 The problem with your setup is that while you didn't pass symbolic links to pg_upgrade, you did use symbolic links when defining the tablespaces, so pg_upgrade couldn't recognize that the symbolic links were inside the old data directory. We could use readlink() to go walk over all symbolic links, but that seems quite complex. We could use stat() and make sure there are no matching inodes in the old data directory, or that they are in a different file system. We could look for a directory named after the PG catversion in the old data directory. We could update the docs. I am not sure what to do. We never expected people would put tablespaces in the data directory. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block
On Mon, Nov 18, 2013 at 06:30:32PM -0800, David Johnston wrote: Personally, I am fine with changing them all to WARNING. Error makes more sense if the goal is internal consistency. That goal should be subservient to backward compatibility. Changing LOCK to warning is less problematic since the likelihood of current code functioning in such a way that after upgrade it would begin working differently in the absence of an error does not seem probable. Basically someone would have be trapping on the error and conditionally branching their logic. That said, if this was a day 0 decision I'd likely raise an error. Weakening LOCK doesn't make sense since it is day 0 behavior. Document the warning for SET as being weaker than ideal because of backward compatibility and call it a day (i.e. leave LOCK at error). The documentation, not the code, then enforces the feeling that such usage is considered wrong without possibly breaking wrong but working code. We normally don't approach warts with documentation --- we usually just fix them and document them in the release notes. If we did, our docs would be a whole lot uglier. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS]
On 18 November 2013, Amit Khandekar wrote: On 18 October 2013 17:07, Rajeev rastogi rajeev.rast...@huawei.commailto:rajeev.rast...@huawei.com wrote: From the following mail, copy behaviour between stdin and normal file having some inconsistency. http://www.postgresql.org/message-id/ce85a517.4878e%tim.k...@gmail.comhttp://www.postgresql.org/message-id/ce85a517.4878e%25tim.k...@gmail.com The issue was that if copy execute from stdin, then it goes to the server to execute the command and then server request for the input, it sends back the control to client to enter the data. So once client sends the input to server, server execute the copy command and sends back the result to client but client does not print the result instead it just clear it out. Changes are made to ensure the final result from server get printed before clearing the result. Please find the patch for the same and let me know your suggestions. In this call : success = handleCopyIn(pset.db, pset.cur_cmd_source, PQbinaryTuples(*results), intres) success; if (success intres) success = PrintQueryResults(intres); Instead of handling of the result status this way, what if we use the ProcessResult() argument 'result' to pass back the COPY result status to the caller ? We already call PrintQueryResults(results) after the ProcessResult() call. So we don't have to have a COPY-specific PrintQueryResults() call. Also, if there is a subsequent SQL command in the same query string, the consequence of the patch is that the client prints both COPY output and the last command output. So my suggestion would also allow us to be consistent with the general behaviour that only the last SQL command output is printed in case of multiple SQL commands. Here is how it gets printed with your patch : Thank you for valuable comments. Your suggestion is absolutely correct. psql -d postgres -c \copy tab from '/tmp/st.sql' delimiter ' '; insert into tab values ('lll', 3) COPY 1 INSERT 0 1 This is not harmful, but just a matter of consistency. I hope you meant to write test case as psql -d postgres -c \copy tab from stdin; insert into tab values ('lll', 3), as if we are reading from file, then the above issue does not come. I have modified the patch as per your comment and same is attached with this mail. Please let me know in-case of any other issue or suggestion. Thanks and Regards, Kumar Rajeev Rastogi copydefectV2.patch Description: copydefectV2.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY table FROM STDIN doesn't show count tag
On 18 November 2013, Amit Khandekar wrote: On 18 October 2013 17:07, Rajeev rastogi rajeev.rast...@huawei.commailto:rajeev.rast...@huawei.com wrote: From the following mail, copy behaviour between stdin and normal file having some inconsistency. http://www.postgresql.org/message-id/ce85a517.4878e%tim.k...@gmail.comhttp://www.postgresql.org/message-id/ce85a517.4878e%25tim.k...@gmail.com The issue was that if copy execute from stdin, then it goes to the server to execute the command and then server request for the input, it sends back the control to client to enter the data. So once client sends the input to server, server execute the copy command and sends back the result to client but client does not print the result instead it just clear it out. Changes are made to ensure the final result from server get printed before clearing the result. Please find the patch for the same and let me know your suggestions. In this call : success = handleCopyIn(pset.db, pset.cur_cmd_source, PQbinaryTuples(*results), intres) success; if (success intres) success = PrintQueryResults(intres); Instead of handling of the result status this way, what if we use the ProcessResult() argument 'result' to pass back the COPY result status to the caller ? We already call PrintQueryResults(results) after the ProcessResult() call. So we don't have to have a COPY-specific PrintQueryResults() call. Also, if there is a subsequent SQL command in the same query string, the consequence of the patch is that the client prints both COPY output and the last command output. So my suggestion would also allow us to be consistent with the general behaviour that only the last SQL command output is printed in case of multiple SQL commands. Here is how it gets printed with your patch : Thank you for valuable comments. Your suggestion is absolutely correct. psql -d postgres -c \copy tab from '/tmp/st.sql' delimiter ' '; insert into tab values ('lll', 3) COPY 1 INSERT 0 1 This is not harmful, but just a matter of consistency. I hope you meant to write test case as psql -d postgres -c \copy tab from stdin; insert into tab values ('lll', 3), as if we are reading from file, then the above issue does not come. I have modified the patch as per your comment and same is attached with this mail. Please let me know in-case of any other issues or suggestions. Thanks and Regards, Kumar Rajeev Rastogi copydefectV2.patch Description: copydefectV2.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LISTEN / NOTIFY enhancement request for Postgresql
Thank you all for considering my feature request. Dimitri's suggestion is a very good one - I feel it will accomplish the goal of allowing more granularity in the Listen. We might also want to add a flag in postgresql.conf to disable this enhancement so that we don't break existing code. On 11/15/2013 8:19 AM, Pavel Golub wrote: Hello, Dimitri. You wrote: DF Bruce Momjian br...@momjian.us writes: • is used to separate names in a path • * is used to match any name in a path • is used to recursively match any destination starting from this name For example using the example above, these subscriptions are possible Subscription Meaning PRICE. Any price for any product on any exchange PRICE.STOCK.Any price for a stock on any exchange PRICE.STOCK.NASDAQ.* Any stock price on NASDAQ PRICE.STOCK.*.IBMAny IBM stock price on any exchange My request is to implement the same or similar feature in Postgresql. This does seem useful and pretty easy to implement. Should we add a TODO? DF I think we should consider the ltree syntax in that case, as documented DF in the following link: DF http://www.postgresql.org/docs/9.3/interactive/ltree.html Great idea! Thanks for link. DF Regards, DF -- DF Dimitri Fontaine DF http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers