Re: [HACKERS] C# reading result from a function
hello, do you have a good c# mailing list? or do you mean it is in the wrong channel here? the function is called, but the code isnt here. I get the result from par1+par2 into the database, but do not get the result in the c# command line ;) best regards from Hamburg Germany -- View this message in context: http://postgresql.nabble.com/C-reading-result-from-a-function-tp5856560p5856686.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.5 alpha: some small comments on BRIN and btree_gin
Hello, First: KUDO !!! The release notes are extremely promising in regard to performance improvements :-) I've made some (dirty) tests with BRIN and btree_gin. (on a smalll Windows laptop ...) just a few remarks: - btree_gin deserve a better description than that: However, they are useful for GIN testing and as a base for developing other GIN operator classes. I came to similar times between btree and gin for indexes on category columns (ca 20 to 200 distinct values) For me, gin clearly wins here thanks to the index size difference. You should really consider moving btree_gin to core ... - btree_gin on integers doesn't cope well with BETWEEN. Seems to always lead to a full index scan I think I understand why, but maybe this is worth a comment in the doc to underline the difference to btree. SELECT * from tgin_1 WHERE cat_id between 1 and 2: http://explain.depesz.com/s/fmqn SELECT * from tgin_1 WHERE cat_id IN (1,2): http://explain.depesz.com/s/bYg - BRIN cost: I've made a silly test, where all distinct values exist in all BRIN page ranges: INSERT into tbrin_1 (cat_id, ) SELECT s%20, ... FROM generate_series(1,300 )s; CREATE INDEX cat_brin_1 on tbrin_1 using BRIN (cat_id)with (pages_per_range=64); SELECT * from tbrin_1 WHERE cat_id=10; http://explain.depesz.com/s/9YQR There seems to be no fence against useless BRIN indexes that would allow a fallback on a table scan. But the time overhead remind small :) best regards, Marc Mamin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22
On 7/5/2015 11:35 PM, Andrew Dunstan wrote: On 07/04/2015 11:02 AM, Tom Lane wrote: I wondered how come we had not seen this problem in the buildfarm, but the answer appears to be that our only working Cygwin critter (brolga) doesn't build any of the optional PLs, so it skips these modules altogether. Seems like we need to improve that situation. brolga has been on life support for quite a long time. The reason it hasn't been retired is that for a long time I was unable to get a buildfarm member running successfully on a more modern Cygwin. That now appears to have resolved itself, so in a week or so I will set up a Cygwin buildfarm member running on a modern Cygwin on Windows 8.1, and build with perl, python, openssl, libxml and libxslt. Note that I have only tested 32 bit Cygwin - 64-bit might well be a whole different story. Hi Andrew, I have not seen any particular difference between the two cygwin flavors I am running both versions on my W7 64 bit. The only trick is to have two different names for the cygserver service to avoid collision. Those services are capable to run at the same time. E.G. from 64 bit point of view: $ cygrunsrv.exe -L (cygserver) cygserver64 (sshd) in brackets the 32bit services. Let me know if you need any support cheers andrew Regards Marco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink: add polymorphic functions.
On Mon, Jul 6, 2015 at 4:23 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Mon, Jul 6, 2015 at 10:00 AM, Joe Conway m...@joeconway.com wrote: I wonder if it isn't better to just loop through all the args with get_fn_expr_argtype() every time and then test for the exact signature match? Another alternative might be to create a wrapper C function for each variant SQL function, but that seems like it could also get messy, especially with dblink_record_internal() since we would have to deal with every variant of all the external facing callers. Thoughts? Yeah, particularly the use of first_optarg makes things harder to follow in the code with this patch. A C wrapper has the disadvantage to decentralize the argument checks to many places making the flow harder to follow hence using get_fn_expr_argtype() with PG_NARGS would be the way to go, at least to me. This way, you can easily find how many arguments there are, and which value is assigned to which variable before moving on to the real processing. Just to be clear I mean that: if (PG_NARGS() == 5) { if (get_fn_expr_argtype(fcinfo-flinfo, 1) == TYPEOID) var = PG_GETARG_BOOL(1) [...] } else if (PG_NARGS() == 4) { [...] } else [...] -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_*_columns?
On Mon, Jun 29, 2015 at 11:14 PM, Jim Nasby jim.na...@bluetreble.com wrote: What might be interesting is setting things up so the collector simply inserted into history tables every X seconds and then had a separate process to prune that data. The big problem with that is I see no way for that to easily allow access to real-time data (which is certainly necessary sometimes) I think the idea sounds promising. If near real-time data is required, we could just update once every second, which should be often enough for everybody. Each backend process could then simply INSERT the stats for each txn that committed/rollbacked into an UNLOGGED table, and then the collector would do one single UPDATE of the collector stats based on the aggregate of the rows inserted since the previous update a second ago and then delete the processed rows (naturally in one operation, using DELETE FROM .. RETURNING *). That way we could get rid of the legacy communication protocol between the backends and the collector and instead rely on unlogged tables for the submission of data from the backends to the collector. INSERTing 100 000 rows to an unlogged table takes 70 ms on my laptop, so should be fast enough to handle the 10s of thousands of updates per second we need to handle.
Re: [HACKERS] dblink: add polymorphic functions.
On Thu, Feb 19, 2015 at 4:06 PM, Corey Huinker corey.huin...@gmail.com wrote: In the course of writing a small side project which hopefully will make its way onto pgxn soon, I was writing functions that had a polymorphic result set. create function foo( p_row_type anyelement, p_param1 ...) returns setof anyelement Inside that function would be multiple calls to dblink() in both synchronous and async modes. It is a requirement of the function that each query return a result set conforming to the structure passed into p_row_type, but there was no way for me to express that. Unfortunately, there's no way to say select * from dblink_get_result('connname') as polymorphic record type; Instead, I had to generate the query as a string like this. with x as ( select a.attname || ' ' || pg_catalog.format_type(a.atttypid, a.atttypmod) as sql_text frompg_catalog.pg_attribute a where a.attrelid = pg_typeof(p_row_type)::text::regclass and a.attisdropped is false and a.attnum 0 order by a.attnum ) select format('select * from dblink_get_result($1) as t(%s)',string_agg(x.sql_text,',')) fromx; Moreover, I'm now executing this string dynamically, incurring reparsing and replanning each time (and if all goes well, this would be executed many times). Granted, I could avoid that by rewriting the stored procedure in C and using prepared statements (not available in PL/PGSQL), but it seemed a shame that dblink couldn't itself handle this polymorphism. So with a little help, we were able to come up with polymorphic set returning dblink functions. Below is the results of the patch applied to a stock 9.4 installation. [local]:ubuntu@dblink_test# create extension dblink; CREATE EXTENSION Time: 12.778 ms [local]:ubuntu@dblink_test# \df dblink List of functions Schema | Name | Result data type | Argument data types | Type ++--+-+ public | dblink | SETOF record | text| normal public | dblink | SETOF anyelement | text, anyelement| normal public | dblink | SETOF record | text, boolean | normal public | dblink | SETOF anyelement | text, boolean, anyelement | normal public | dblink | SETOF record | text, text | normal public | dblink | SETOF anyelement | text, text, anyelement | normal public | dblink | SETOF record | text, text, boolean | normal public | dblink | SETOF anyelement | text, text, boolean, anyelement | normal (8 rows) sorry for the late reply. I'm a little concerned about the state of overloading here. If I'm not mistaken, you may have introduced a pretty serious backwards compatibility issue. Having the two signatures (text, anyelement) and (text, boolean) will now fail anytime (unknown, unknown) is passed, and that's a pretty common invocation. If I'm right, quickly scanning the function list, I don't think there's an easy solution to this issue other than adding an alternately named call. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012
Heikki Linnakangas hlinn...@iki.fi writes: I was able to reproduce huge gains from this, after I introduced an artificial latency to all network packets with: tc qdisc add dev eth2 root netem delay 100ms With that, and with the client on different host so that the traffic goes through that high-latency network, I saw over 10x difference with the same psql test you ran. Committed, thanks! Why does the committed patch do getsockopt() on server_fd rather than port-sock? This seems at best confusing and at worst wrong. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012
On 07/04/2015 02:03 PM, chenhj wrote: 2015-07-03 16:49:44,David Rowley david.row...@2ndquadrant.com wrote: I'm wondering what the original test setup was. I'm assuming psql and postgres both running on separate windows machines? I've tested the patch just connecting to a database running on localhost and I'm not getting much of a speedup. Perhaps 1%, if that's not noise. I don't have enough hardware here to have client and server on separate machines, at least not with a stable network that goes through copper. My original test environments is as the following Environment1: Server:Windows 2012(x64) The host is a VM in a private cloud Client:RHEL6(x64) The host is another VM in the same private cloud Network:1Gbit LAN Environment2: Server:Windows 2012(x64) The host is a VM in a private cloud Client:Windows 7(x64) The host is a physical machine(in fact it is My PC). Network:1Gbit LAN This Patch should only can speedup the environment which satisfy the following conditions. 1. The OS of the server is Windows 2012 or Win8(but i only tested it in Windows 2012). 2. The client and the server is separate machines. 3. The performance bottleneck is network throughput. 4. The utilization rate of network bandwidth is not full(such as only 50% or lower). I was able to reproduce huge gains from this, after I introduced an artificial latency to all network packets with: tc qdisc add dev eth2 root netem delay 100ms With that, and with the client on different host so that the traffic goes through that high-latency network, I saw over 10x difference with the same psql test you ran. Committed, thanks! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_*_columns?
On 2015-06-29 16:14:34 -0500, Jim Nasby wrote: What might be interesting is setting things up so the collector simply inserted into history tables every X seconds and then had a separate process to prune that data. The big problem with that is I see no way for that to easily allow access to real-time data (which is certainly necessary sometimes). Uh. Hot-Standby? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
* Tom Lane (t...@sss.pgh.pa.us) wrote: Jan de Visser j...@de-visser.net writes: Attached a new patch, rebased against the current head. Errors in pg_hba.conf and pg_ident.conf are now also noticed. I checked the documentation for pg_ctl reload, and the only place where it's explained seems to be runtime.sgml and that description is so high-level that adding this new bit of functionality wouldn't make much sense. BTW, it's probably worth pointing out that the recent work on the pg_file_settings view has taken away a large part of the use-case for this, in that you can find out more with less risk by inspecting pg_file_settings before issuing SIGHUP, instead of SIGHUP'ing and hoping you didn't break anything too nastily. Also, you can use pg_file_settings remotely, unlike pg_ctl (though admittedly you still need contrib/adminpack or something to allow uploading a new config file if you're doing remote admin). I wonder whether we should consider inventing similar views for pg_hba.conf and pg_ident.conf. Yes. That's definitely something that I'd been hoping someone would work on. Also, thanks for the work on the pg_file_settings code; I agree with all you did there. Thanks again! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_stat_*_columns?
Joel Jacobson j...@trustly.com writes: On Mon, Jun 29, 2015 at 11:14 PM, Jim Nasby jim.na...@bluetreble.com wrote: What might be interesting is setting things up so the collector simply inserted into history tables every X seconds and then had a separate process to prune that data. The big problem with that is I see no way for that to easily allow access to real-time data (which is certainly necessary sometimes) I think the idea sounds promising. If near real-time data is required, we could just update once every second, which should be often enough for everybody. I'd bet a good lunch that performance will be absolutely disastrous. Even with unlogged tables, the vacuuming cost would be intolerable. Why would we insist on pushing the envelope in what's known to be Postgres' weakest area performance-wise? Each backend process could then simply INSERT the stats for each txn that committed/rollbacked into an UNLOGGED table, ... and if its transaction failed, how would it do that? Regular tables are *not* what we want here, either from a semantics or a performance standpoint. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
On Sat, Jul 4, 2015 at 2:26 AM, Andres Freund and...@anarazel.de wrote: On 2015-07-03 19:02:29 +0200, Andres Freund wrote: Maybe I'm just daft right now (35C outside, 32 inside, so ...), but I'm right now missing how the whole skip wal logging if relation has just been truncated optimization can ever actually be crashsafe unless we use a new relfilenode (which we don't!). Agreed... When I ran the following test scenario, I found that the loaded data disappeared after the crash recovery. 1. start PostgreSQL server with wal_level = minimal 2. execute the following SQL statements \copy (SELECT num FROM generate_series(1,10) num) to /tmp/num.csv with csv BEGIN; CREATE TABLE test (i int primary key); TRUNCATE TABLE test; \copy test from /tmp/num.csv with csv COMMIT; SELECT COUNT(*) FROM test; -- returns 10 3. shutdown the server with immediate mode 4. restart the server 5. execute the following SQL statement after crash recovery ends SELECT COUNT(*) FROM test; -- returns 0.. In #2, 10 rows were copied and the transaction was committed. The subsequent statement of select count(*) obviously returned 10. However, after crash recovery, in #5, the same statement returned 0. That is, the loaded (+ committed) 10 data was lost after the crash. We actually used to use a different relfilenode, but optimized that away: cab9a0656c36739f59277b34fea8ab9438395869 commit cab9a0656c36739f59277b34fea8ab9438395869 Author: Tom Lane t...@sss.pgh.pa.us Date: Sun Aug 23 19:23:41 2009 + Make TRUNCATE do truncate-in-place when processing a relation that was created or previously truncated in the current (sub)transaction. This is safe since if the (sub)transaction later rolls back, we'd just discard the rel's current physical file anyway. This avoids unreasonable growth in the number of transient files when a relation is repeatedly truncated. Per a performance gripe a couple weeks ago from Todd Cook. to me the reasoning here looks flawed. Before this commit, when I ran the above test scenario, no data loss happened. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
On July 6, 2015 09:23:12 AM Stephen Frost wrote: I wonder whether we should consider inventing similar views for pg_hba.conf and pg_ident.conf. Yes. That's definitely something that I'd been hoping someone would work on. There actually is a patch in the current CF that provides a view for pg_hba. I could look at one for pg_ident, which seems much simpler. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH v1] GSSAPI encryption support
Robbie, * Robbie Harwood (rharw...@redhat.com) wrote: As previously discussed on this list, I have coded up GSSAPI encryption support. If it is easier for anyone, this code is also available for viewing on my github: https://github.com/postgres/postgres/compare/master...frozencemetery:feature/gssencrypt Fallback support is present in both directions for talking to old client and old servers; GSSAPI encryption is by default auto-upgraded to where available (for compatibility), but both client and server contain settings for requiring it. There are 8 commits in this series; I have tried to err on the side of creating too much separation rather than too little. A patch for each is attached. This is v1 of the series. Excellent, thanks! I've got other things to tend to at the moment, but this is definitely something I'm interested in and will work on (likely in August). If we could get a reviewer or two to take a look and take the patch out for a test-drive, at least, that would be a huge help. Thanks again! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Information of pg_stat_ssl visible to all users
* Magnus Hagander (mag...@hagander.net) wrote: On Thu, Jul 2, 2015 at 10:06 PM, Andres Freund and...@anarazel.de wrote: On 2015-07-02 16:52:01 -0300, Alvaro Herrera wrote: If there's interest in closing these holes, this might be a first I don't think such an isolated attempt buys us anything except maybe unsatisfied users. I can see a benefit in allowing to restrict information about users and such in other clusters, but changing stat_ssl seeems to be an inconsequentially small problem on that path. We discussed earlier having a monitoring role or attribute or something like that, and I think this would be another case of that. We definitely want to go towards something like that, but that's not happening in 9.5... Agreed, but if we make this visible to all in 9.5 then we're going to have a tough time restricting it to just the monitoring role in 9.6, I'm afraid... We realize it's a problem, for my 2c, I'd rather not double-down on it by providing more information which should really be limited to privileged users. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012
On 07/06/2015 04:30 PM, Tom Lane wrote: Heikki Linnakangas hlinn...@iki.fi writes: I was able to reproduce huge gains from this, after I introduced an artificial latency to all network packets with: tc qdisc add dev eth2 root netem delay 100ms With that, and with the client on different host so that the traffic goes through that high-latency network, I saw over 10x difference with the same psql test you ran. Committed, thanks! Why does the committed patch do getsockopt() on server_fd rather than port-sock? This seems at best confusing and at worst wrong. Good catch, fixed! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: closing the loop for pg_ctl reload
* Jan de Visser (j...@de-visser.net) wrote: On July 6, 2015 09:23:12 AM Stephen Frost wrote: I wonder whether we should consider inventing similar views for pg_hba.conf and pg_ident.conf. Yes. That's definitely something that I'd been hoping someone would work on. There actually is a patch in the current CF that provides a view for pg_hba. I could look at one for pg_ident, which seems much simpler. That would be good, as would a review of the patch for pg_hba with particular consideration of what's been done with the pg_file_settings view. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Inconsistent style in pgbench's error messages
Fabien COELHO coe...@cri.ensmp.fr writes: Personally I think pgbench: adds very little and so I'd rather get rid of it, but I'm sure others feel differently. I think that one of the reason for this is that once pgbench started to run it is desirable to differentiate error messages that come from libpq and those that come from pgbench itsef. Don't really see the issue. Errors reported by the server are easy to distinguish by formatting (as long as we don't go out of our way to make pgbench's errors look like them, which is one reason I do not like the HINT proposal in the bug #12379 thread). Errors reported by libpq itself might look like they came from pgbench, but I don't exactly see what's wrong with that. I don't think users much care about the difference, as long as they can tell server-side errors apart from client-side errors. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Missing extension locks in the nbtree code
Hi, There've recently been more and more reports of unexpected data beyond EOF in block %u of relation %s for me to think that it's likely to be caused by a kernel bug. It's now been reproduced at least on somewhat recent linux and freebsd versions. So I started looking around for causes. Not for the first time. One, probably harmless thing is that _bt_getroot() creates the initial root page without an extension lock. That's not pretty, but it should happen on the first write and be safe due to the content lock on the metapage. ISTM we should still not do that, but it's probably not the explanation. The fix is just to change if (fd == -1 || XLByteInSeg(change-lsn, curOpenSegNo)) into if (fd == -1 || !XLByteInSeg(change-lsn, curOpenSegNo)) the bug doesn't have any correctness implications afaics, just performance. We read all the spilled files till the end, so even change spilled to the wrong segment gets picked up. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Missing extension locks in the nbtree code
On 2015-07-06 23:21:12 +0200, Andres Freund wrote: There've recently been more and more reports of unexpected data beyond EOF in block %u of relation %s for me to think that it's likely to be caused by a kernel bug. It's now been reproduced at least on somewhat recent linux and freebsd versions. So I started looking around for causes. Not for the first time. One, probably harmless thing is that _bt_getroot() creates the initial root page without an extension lock. That's not pretty, but it should happen on the first write and be safe due to the content lock on the metapage. ISTM we should still not do that, but it's probably not the explanation. Uh, this was a mixup, I didn't want to send this email yet. The below obviously is from a different thread. The fix is just to change if (fd == -1 || XLByteInSeg(change-lsn, curOpenSegNo)) into if (fd == -1 || !XLByteInSeg(change-lsn, curOpenSegNo)) the bug doesn't have any correctness implications afaics, just performance. We read all the spilled files till the end, so even change spilled to the wrong segment gets picked up. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix broken Install.bat when target directory contains a space
On Tue, Jul 7, 2015 at 4:19 AM, Heikki Linnakangas hlinn...@iki.fi wrote: On 07/06/2015 04:38 AM, Michael Paquier wrote: On Sat, Jul 4, 2015 at 3:57 AM, Heikki Linnakangas wrote: Hmm. Why is install.bat not like build.bat, i.e. just a thin wrapper that just calls install.pl, passing all arguments? I guess we just haven't noticed it. And indeed it makes everything more simple, and fixes as well the error reported when install path contains a space. Ok, committed that way. Shoudn't this patch be backpatched? In the backbranches install.bat does not work correctly with paths containing spaces. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink: add polymorphic functions.
On Mon, Jul 6, 2015 at 11:13 AM, Corey Huinker corey.huin...@gmail.com wrote: On Mon, Jul 6, 2015 at 11:33 AM, Merlin Moncure mmonc...@gmail.com wrote: On Mon, Jul 6, 2015 at 9:52 AM, Joe Conway m...@joeconway.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/06/2015 07:37 AM, Merlin Moncure wrote: yup, and at least one case now fails where previously it ran through: postgres=# select * from dblink('a', 'b', 'c'); ERROR: function dblink(unknown, unknown, unknown) is not unique Hmm, that is an issue, possibly a fatal one. When Cory first mentioned this to me over a year ago we discussed some other, arguably better and more generic solutions. One was to build support for: SELECT * FROM srf() AS TYPE OF(foo); The second idea I think is actually SQL standard if I recall correctly: SELECT * FROM CAST(srf() AS foo) x; Currently this works: 8 select * from cast (row(11,'l','{a11,b11,c11}') as foo); f1 | f2 | f3 - ++--- 11 | l | {a11,b11,c11} (1 row) 8 But this fails: 8 select * from cast (dblink('dbname=contrib_regression','select * from foo') as foo); ERROR: cannot cast type record to foo 8 Comments in the source have this to say: 8 /* * coerce_record_to_complex * Coerce a RECORD to a specific composite type. * * Currently we only support this for inputs that are RowExprs or * whole-row Vars. */ 8 That explains why the first example works while the second does not. I'm not sure how hard it would be to fix that, but it appears that that is where we should focus. Yeah. FWIW, here's my 0.02$: I use dblink all the time, for all kinds of reasons, vastly preferring to have control over the query string (vs. FDW type solutions). I have two basic gripes with it. #1 is that remote queries are not cancelled over all call sites when cancelled locally (off-topic for this thread) and #2 is that the SRF record describing mechanics are not abstractable because of using syntax to describe the record. Corey's proposal, overloading issues aside, appears to neatly deal with this problem because anyelement can be passed down through a wrapping API. IOW, I'd like to do: CREATE FUNCTION remote_call(...) RETURNS ... AS $$ SELECT dblink(...) AS r(...) $$ language sql; ...which can't be done (discounting dynamic sql acrobatics) because of the syntax based expression of the 'r' record. So I like Corey's idea...I just think the functions need to be named differently (maybe to 'dblink_any', and dblink_get_result_any'?). TBH, to do better than that you'd need SQL level support for handling the return type in the vein of NEW/OLD. For fun, let's call it 'OUT'...then you could: SELECT * FROM remote_call(...) RETURNS SETOF foo; Inside remote_call, you'd see something like: SELECT dblink(...) AS OUT; As to the proposed syntax, I would vote to support the SQL standard variant if it could be handled during parse. I don't see what AS TYPE OF really buys you because FWICT it does not support wrapping. merlin Your experiences with dblink are very similar to mine. The whole issue arose for me as an outcropping of my Poor Man's Parallel Processing extension (still not released but currently working for us in production internally). At some point I had to do dblink_get_result(...) as t(...) and not only did I have to render the structure as a string, I was going to have to execute that SQL dynamically (because plpgsql lacks a PREPARE statement) or I was going to have to re-code in C or plv8. Overall those calls aren't terribly expensive (it's working in production - for us - without this dblink modification), but a cleaner solution would be better. Another option is to handle the intermediate result in json if you're willing to sacrifice a little bit of performance. For example, suppose you wanted to log every dblink call through an wrapper without giving up the ability to handle arbitrary result sets: CREATE OR REPLACE FUNCTION dblink_log(_query TEXT) RETURNS SETOF JSON AS $$ BEGIN RAISE WARNING 'got dblink %', _query; RETURN QUERY SELECT * FROM dblink( 'host=localhost', format('SELECT row_to_json(q) from (%s) q', _query)) AS R(j json); END; $$ LANGUAGE PLPGSQL; postgres=# select * from dblink_log('select 0 as value'); WARNING: got dblink select 0 as value dblink_log ─ {value:0} (1 row) With json, you have a number of good options to convert back to a record. For example, postgres=# CREATE TYPE foo AS (value int); CREATE TYPE SELECT p.* FROM dblink_log('SELECT s AS value FROM generate_series(1,3) s') j CROSS JOIN LATERAL json_populate_record(null::foo, j) p; WARNING: got dblink select s as value from generate_series(1,3) s value ─── 1 2 3
Re: [HACKERS] dblink: add polymorphic functions.
On Mon, Jul 6, 2015 at 3:52 PM, Merlin Moncure mmonc...@gmail.com wrote: On Mon, Jul 6, 2015 at 11:13 AM, Corey Huinker corey.huin...@gmail.com wrote: On Mon, Jul 6, 2015 at 11:33 AM, Merlin Moncure mmonc...@gmail.com wrote: On Mon, Jul 6, 2015 at 9:52 AM, Joe Conway m...@joeconway.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/06/2015 07:37 AM, Merlin Moncure wrote: yup, and at least one case now fails where previously it ran through: postgres=# select * from dblink('a', 'b', 'c'); ERROR: function dblink(unknown, unknown, unknown) is not unique Hmm, that is an issue, possibly a fatal one. When Cory first mentioned this to me over a year ago we discussed some other, arguably better and more generic solutions. One was to build support for: SELECT * FROM srf() AS TYPE OF(foo); The second idea I think is actually SQL standard if I recall correctly: SELECT * FROM CAST(srf() AS foo) x; Currently this works: 8 select * from cast (row(11,'l','{a11,b11,c11}') as foo); f1 | f2 | f3 - ++--- 11 | l | {a11,b11,c11} (1 row) 8 But this fails: 8 select * from cast (dblink('dbname=contrib_regression','select * from foo') as foo); ERROR: cannot cast type record to foo 8 Comments in the source have this to say: 8 /* * coerce_record_to_complex * Coerce a RECORD to a specific composite type. * * Currently we only support this for inputs that are RowExprs or * whole-row Vars. */ 8 That explains why the first example works while the second does not. I'm not sure how hard it would be to fix that, but it appears that that is where we should focus. Yeah. FWIW, here's my 0.02$: I use dblink all the time, for all kinds of reasons, vastly preferring to have control over the query string (vs. FDW type solutions). I have two basic gripes with it. #1 is that remote queries are not cancelled over all call sites when cancelled locally (off-topic for this thread) and #2 is that the SRF record describing mechanics are not abstractable because of using syntax to describe the record. Corey's proposal, overloading issues aside, appears to neatly deal with this problem because anyelement can be passed down through a wrapping API. IOW, I'd like to do: CREATE FUNCTION remote_call(...) RETURNS ... AS $$ SELECT dblink(...) AS r(...) $$ language sql; ...which can't be done (discounting dynamic sql acrobatics) because of the syntax based expression of the 'r' record. So I like Corey's idea...I just think the functions need to be named differently (maybe to 'dblink_any', and dblink_get_result_any'?). TBH, to do better than that you'd need SQL level support for handling the return type in the vein of NEW/OLD. For fun, let's call it 'OUT'...then you could: SELECT * FROM remote_call(...) RETURNS SETOF foo; Inside remote_call, you'd see something like: SELECT dblink(...) AS OUT; As to the proposed syntax, I would vote to support the SQL standard variant if it could be handled during parse. I don't see what AS TYPE OF really buys you because FWICT it does not support wrapping. merlin Your experiences with dblink are very similar to mine. The whole issue arose for me as an outcropping of my Poor Man's Parallel Processing extension (still not released but currently working for us in production internally). At some point I had to do dblink_get_result(...) as t(...) and not only did I have to render the structure as a string, I was going to have to execute that SQL dynamically (because plpgsql lacks a PREPARE statement) or I was going to have to re-code in C or plv8. Overall those calls aren't terribly expensive (it's working in production - for us - without this dblink modification), but a cleaner solution would be better. Another option is to handle the intermediate result in json if you're willing to sacrifice a little bit of performance. For example, suppose you wanted to log every dblink call through an wrapper without giving up the ability to handle arbitrary result sets: CREATE OR REPLACE FUNCTION dblink_log(_query TEXT) RETURNS SETOF JSON AS $$ BEGIN RAISE WARNING 'got dblink %', _query; RETURN QUERY SELECT * FROM dblink( 'host=localhost', format('SELECT row_to_json(q) from (%s) q', _query)) AS R(j json); END; $$ LANGUAGE PLPGSQL; postgres=# select * from dblink_log('select 0 as value'); WARNING: got dblink select 0 as value dblink_log ─ {value:0} (1 row) With json, you have a number of good options to convert back to a record. For example, postgres=# CREATE TYPE foo AS (value int); CREATE TYPE SELECT p.* FROM
Re: [HACKERS] copy.c handling for RLS is insecure
Noah, * Noah Misch (n...@leadboat.com) wrote: On Tue, Dec 02, 2014 at 11:32:27AM -0500, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Thu, Nov 27, 2014 at 2:03 AM, Stephen Frost sfr...@snowman.net wrote: Alright, I've done the change to use the RangeVar from CopyStmt, but also added a check wherein we verify that the relation's OID returned from the planned query is the same as the relation's OID that we did the RLS check on- if they're different, we throw an error. Please let me know if there are any remaining concerns. Here is the check in question (added in commit 143b39c): plan = planner(query, 0, NULL); /* * If we were passed in a relid, make sure we got the same one back * after planning out the query. It's possible that it changed * between when we checked the policies on the table and decided to * use a query and now. */ if (queryRelId != InvalidOid) { Oid relid = linitial_oid(plan-relationOids); /* * There should only be one relationOid in this case, since we * will only get here when we have changed the command for the * user from a COPY relation TO to COPY (SELECT * FROM * relation) TO, to allow row level security policies to be * applied. */ Assert(list_length(plan-relationOids) == 1); if (relid != queryRelId) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg(relation referenced by COPY statement has changed))); } That's clearly an improvement, but I'm not sure it's water-tight. What if the name that originally referenced a table ended up referencing a view? Then you could get list_length(plan-relationOids) != 1. I'll test it out and see what happens. Certainly a good question and if there's an issue there then I'll get it addressed. Yes, it can be made to reference a view and trip the assertion. Ok. We can certainly make that assertion be a run-time consideration instead, though I'm not thrilled with that being the only safe-guard. (And, in that case, I also wonder if you could get eval_const_expressions() to do evil things on your behalf while planning.) If it can be made to reference a view then there's an issue as the view might include a function call itself which is provided by the attacker.. Indeed. As the parenthetical remark supposed, the check happens too late to prevent a security breach. planner() has run eval_const_expressions(), executing code of the view owner's choosing. Right. Clearly, if we found a relation originally then we need that same relation with the same OID after the conversion to a query. That is necessary but not sufficient. CREATE RULE can convert a table to a view without changing the OID, thereby fooling the check. Test procedure: Ugh, yes, rules would cause a problem for this.. -- as superuser (or createrole) create user blackhat; create user alice; -- as blackhat begin; create table exploit_rls_copy (c int); alter table exploit_rls_copy enable row level security; grant select on exploit_rls_copy to public; commit; -- as alice -- first, set breakpoint on BeginCopy copy exploit_rls_copy to stdout; -- as blackhat begin; create or replace function leak() returns int immutable as $$begin raise notice 'in leak()'; return 7; end$$ language plpgsql; create rule _RETURN as on select to exploit_rls_copy do instead select leak() as c from (values (0)) dummy; commit; -- Release breakpoint. leak() function call happens. After that, assertion -- fires if enabled. ERROR does not fire in any case. Thanks for pointing this out. I'll look into it. Stephen signature.asc Description: Digital signature
Re: [HACKERS] ERROR: unexpected data beyond EOF
Hi, On 2015-04-30 10:05:33 -0700, Joshua D. Drake wrote: postgres[21118]: [8-1] ERROR: unexpected data beyond EOF in block 9 of relation base/430666195/430666206 FreeBSD 10 ZFS iSCSI RAID 50 (don't start, I didn't spec it). fsync on, full_page_writes on The restart of PostgreSQL makes the error go away and things progress normally. We don't experience further errors etc... A couple questions: 1) Do you know what 430666206 refers to? Something like SELECT oid::regclass, relkind FROM pg_class WHERE pg_relation_filenode(430666206) in the correct database should answer. 2) Any chance you have zero_damaged_pages enabled? 3) Does the issue still occur? Is there any chance of manual investigation before restaring the database? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
Fujii Masao masao.fu...@gmail.com writes: On Sat, Jul 4, 2015 at 2:26 AM, Andres Freund and...@anarazel.de wrote: We actually used to use a different relfilenode, but optimized that away: cab9a0656c36739f59277b34fea8ab9438395869 commit cab9a0656c36739f59277b34fea8ab9438395869 Author: Tom Lane t...@sss.pgh.pa.us Date: Sun Aug 23 19:23:41 2009 + Make TRUNCATE do truncate-in-place when processing a relation that was created or previously truncated in the current (sub)transaction. This is safe since if the (sub)transaction later rolls back, we'd just discard the rel's current physical file anyway. This avoids unreasonable growth in the number of transient files when a relation is repeatedly truncated. Per a performance gripe a couple weeks ago from Todd Cook. to me the reasoning here looks flawed. Before this commit, when I ran the above test scenario, no data loss happened. Actually I think what is broken here is COPY's test to decide whether it can omit writing WAL: * Check to see if we can avoid writing WAL * * If archive logging/streaming is not enabled *and* either *- table was created in same transaction as this COPY *- data is being written to relfilenode created in this transaction * then we can skip writing WAL. It's safe because if the transaction * doesn't commit, we'll discard the table (or the new relfilenode file). * If it does commit, we'll have done the heap_sync at the bottom of this * routine first. The problem with that analysis is that it supposes that, if we crash and recover, the WAL replay sequence will not touch the data. What's killing us in this example is the replay of the TRUNCATE, but that is not the only possibility. For example consider this modification of Fujii-san's test case: BEGIN; CREATE TABLE test (i int primary key); INSERT INTO test VALUES(-1); \copy test from /tmp/num.csv with csv COMMIT; SELECT COUNT(*) FROM test; The COUNT() correctly says 11 rows, but after crash-and-recover, only the row with -1 is there. This is because the INSERT writes out an INSERT+INIT WAL record, which we happily replay, clobbering the data added later by COPY. We might have to give up on this COPY optimization :-(. I'm not sure what would be a safe rule for deciding that we can skip WAL logging in this situation, but I am pretty sure that it would require keeping information we don't currently keep about what's happened earlier in the transaction. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL logging problem in 9.4.3?
Andres Freund and...@anarazel.de writes: On 2015-07-06 11:14:40 -0400, Tom Lane wrote: The COUNT() correctly says 11 rows, but after crash-and-recover, only the row with -1 is there. This is because the INSERT writes out an INSERT+INIT WAL record, which we happily replay, clobbering the data added later by COPY. ISTM any WAL logged action that touches a relfilenode essentially needs to disable further optimization based on the knowledge that the relation is new. After a bit more thought, I think it's not so much any WAL logged action as any unconditionally-replayed action. INSERT+INIT breaks this example because heap_xlog_insert will unconditionally replay the action, even if the page is valid and has same or newer LSN. Similarly, TRUNCATE is problematic because we redo it unconditionally (and in that case it's hard to see an alternative). It'd not be impossible to add more state to the relcache entry for the relation. Whether it's likely that we'd find all the places that'd need updating that state, I'm not sure. Yeah, the sticking point is mainly being sure that the state is correctly tracked, both now and after future changes. We'd need to identify a state invariant that we could be pretty confident we'd not break. One idea I had was to allow the COPY optimization only if the heap file is physically zero-length at the time the COPY starts. That would still be able to optimize in all the cases we care about making COPY fast for. Rather than reverting cab9a0656c36739f, which would re-introduce a different performance problem, perhaps we could have COPY create a new relfilenode when it does this. That should be safe if the table was previously empty. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freeze avoidance of very large table.
On 3 July 2015 at 09:25, Sawada Masahiko sawada.m...@gmail.com wrote: On Fri, Jul 3, 2015 at 1:23 AM, Simon Riggs si...@2ndquadrant.com wrote: On 2 July 2015 at 16:30, Sawada Masahiko sawada.m...@gmail.com wrote: Also, the flags of each heap page header might be set PD_ALL_FROZEN, as well as all-visible Is it possible to have VM bits set to frozen but not visible? The description makes those two states sound independent of each other. Are they? Or not? Do we test for an impossible state? It's impossible to have VM bits set to frozen but not visible. These bit are controlled independently. But eventually, when all-frozen bit is set, all-visible is also set. And my understanding is that if you clear all-visible you would also clear all-frozen... So I don't understand why you have two separate calls to visibilitymap_clear() Surely the logic should be to clear both bits at the same time? In my understanding the state logic is 1. Both bits unset ~(VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN) which can be changed to state 2 only 2. VISIBILITYMAP_ALL_VISIBLE only which can be changed state 1 or state 3 3. VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN which can be changed to state 1 only If that is the case please simplify the logic for setting and unsetting the bits so they are set together efficiently. At the same time please also put in Asserts to ensure that the state logic is maintained when it is set and when it is tested. I would also like to see the visibilitymap_test function exposed in SQL, so we can write code to examine the map contents for particular ctids. By doing that we can then write a formal test that shows the evolution of tuples from insertion, vacuuming and freezing, testing the map has been set correctly at each stage. I guess that needs to be done as an isolationtest so we have an observer that contrains the xmin in various ways. In light of multixact bugs, any code that changes the on-disk tuple metadata needs formal tests. Other than that the overall concept seems sound. I think we need something for pg_upgrade to rewrite existing VMs. Otherwise a large read only database would suddenly require a massive revacuum after upgrade, which seems bad. That can wait for now until we all agree this patch is sound. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Bug in bttext_abbrev_convert()
Kevin Grittner wrote: Alvaro Herrera alvhe...@2ndquadrant.com wrote: Peter Geoghegan wrote: It would be nice to always have a html report from gcov always available on the internet. That would be something useful to automate, IMV. http://pgci.eisentraut.org/jenkins/job/postgresql_master_coverage/Coverage/ What would it take to get something like that which uses the check-world target instead of just the check target? I suggest CC'ing Peter as a first measure. I already suggested this (or something similar) to him months ago. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Redundant error messages in policy.c
Daniele, * Daniele Varrazzo (daniele.varra...@gmail.com) wrote: There are 5 different strings (one has a whitespace error), they could be 2. Patch attached. Fair point. I did try to address the language around policy vs. row security vs. row level security, but didn't look as closely as I should have at the error messages overall. Will address. Thanks! Stephen signature.asc Description: Digital signature
[HACKERS] Redundant error messages in policy.c
There are 5 different strings (one has a whitespace error), they could be 2. Patch attached. postgresql/src/backend$ grep errmsg commands/policy.c | grep policy | sed 's/^ *//' errmsg(policy \%s\ for relation \%s\ already exists, errmsg(policy \%s\ on table \%s\ does not exist, errmsg(policy \%s\ for table \%s\ already exists, errmsg(policy \%s\ for table \%s\ does not exist, errmsg(policy \%s\ for table \%s\ does not exist, -- Daniele diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c index 6e95ba2..11efc9f 100644 --- a/src/backend/commands/policy.c +++ b/src/backend/commands/policy.c @@ -563,7 +563,7 @@ CreatePolicy(CreatePolicyStmt *stmt) if (HeapTupleIsValid(policy_tuple)) ereport(ERROR, (errcode(ERRCODE_DUPLICATE_OBJECT), - errmsg(policy \%s\ for relation \%s\ already exists, + errmsg(policy \%s\ for table \%s\ already exists, stmt-policy_name, RelationGetRelationName(target_table; values[Anum_pg_policy_polrelid - 1] = ObjectIdGetDatum(table_id); @@ -735,7 +735,7 @@ AlterPolicy(AlterPolicyStmt *stmt) if (!HeapTupleIsValid(policy_tuple)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg(policy \%s\ on table \%s\ does not exist, + errmsg(policy \%s\ for table \%s\ does not exist, stmt-policy_name, RelationGetRelationName(target_table; @@ -977,7 +977,7 @@ get_relation_policy_oid(Oid relid, const char *policy_name, bool missing_ok) if (!missing_ok) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), - errmsg(policy \%s\ for table \%s\ does not exist, + errmsg(policy \%s\ for table \%s\ does not exist, policy_name, get_rel_name(relid; policy_oid = InvalidOid; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] raw output from copy
Hi here is a version with both direction support. postgres=# copy foo from '/tmp/1.jpg' (format raw); COPY 1 Time: 93.021 ms postgres=# \dt+ foo List of relations ┌┬──┬───┬───┬┬─┐ │ Schema │ Name │ Type │ Owner │ Size │ Description │ ╞╪══╪═══╪═══╪╪═╡ │ public │ foo │ table │ pavel │ 256 kB │ │ └┴──┴───┴───┴┴─┘ (1 row) postgres=# \copy foo to '~/3.jpg' (format raw) COPY 1 Time: 2.401 ms Regards Pavel 2015-07-02 17:02 GMT+02:00 Tom Lane t...@sss.pgh.pa.us: Andrew Dunstan and...@dunslane.net writes: Does the COPY line protocol even support binary data? The protocol, per se, just transmits a byte stream. There is a field in the CopyInResponse/CopyOutResponse messages that indicates whether a text or binary copy is being done. One thing we'd have to consider is whether raw mode is sufficiently different from binary to justify an additional value for this field, and if so whether that constitutes a protocol break. IIRC, psql wouldn't really care; it just transfers the byte stream to or from the target file, regardless of text or binary mode. But there might be other client libraries that are smarter and expect binary mode to mean the binary file format specified in the COPY reference page. So there may be value in being explicit about raw mode in these messages. A key point in all this is that people who need raw transfer probably need it in both directions, a point that your SELECT proposal cannot satisfy, but hacking COPY could. So I lean towards the latter really. regards, tom lane commit 5599347d6b0b29a2674d465b3ff03164fce59810 Author: Pavel Stehule pavel.steh...@gooddata.com Date: Mon Jul 6 23:18:18 2015 +0200 COPY FROM/TO (FORMAT RAW) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 2850b47..4b7b64d 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -190,7 +190,7 @@ COPY { replaceable class=parametertable_name/replaceable [ ( replaceable Selects the data format to be read or written: literaltext/, literalcsv/ (Comma Separated Values), - or literalbinary/. + literalbinary/ or literalraw/literal. The default is literaltext/. /para /listitem @@ -881,6 +881,23 @@ OIDs to be shown as null if that ever proves desirable. /para /refsect3 /refsect2 + + refsect2 + titleRaw Format/title + + para +The literalraw/literal format option causes all data to be +stored/read as binary format rather than as text. It shares format +for data with literalbinary/literal format. This format doesn't +use any metadata - only row data in network byte order are exported +or imported. + /para + + para +Because this format doesn't support any delimiter, only one value +can be exported or imported. NULL values are not allowed. + /para + /refsect2 /refsect1 refsect1 diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 8904676..2ad7eb1 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -92,6 +92,11 @@ typedef enum EolType * it's faster to make useless comparisons to trailing bytes than it is to * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is TRUE * when we have to do it the hard way. + * + * COPY supports three modes: text, binary and raw. The text format is plain + * text multiline format with specified delimiter. The binary format holds + * metadata (numbers, sizes) and data. The raw format holds data only and + * only one non NULL value can be processed. */ typedef struct CopyStateData { @@ -113,6 +118,7 @@ typedef struct CopyStateData char *filename; /* filename, or NULL for STDIN/STDOUT */ bool is_program; /* is 'filename' a program to popen? */ bool binary; /* binary format? */ + bool raw; /* required raw binary? */ bool oids; /* include OIDs? */ bool freeze; /* freeze rows on loading? */ bool csv_mode; /* Comma Separated Value format? */ @@ -202,6 +208,9 @@ typedef struct CopyStateData char *raw_buf; int raw_buf_index; /* next byte to process */ int raw_buf_len; /* total # of bytes stored */ + + /* field for RAW mode */ + bool row_processed; /* true, when first row was processed */ } CopyStateData; /* DestReceiver for COPY (SELECT) TO */ @@ -345,9 +354,16 @@ SendCopyBegin(CopyState cstate) /* new way */ StringInfoData buf; int natts = list_length(cstate-attnumlist); - int16 format = (cstate-binary ? 1 : 0); + int16 format; int i; + if (cstate-raw) + format = 2; + else if (cstate-binary) + format = 1; + else + format = 0; + pq_beginmessage(buf, 'H'); pq_sendbyte(buf, format); /* overall format */ pq_sendint(buf, natts, 2); @@ -359,7 +375,7 @@ SendCopyBegin(CopyState cstate) else if
Re: [HACKERS] dblink: add polymorphic functions.
On Mon, Jul 6, 2015 at 11:33 AM, Merlin Moncure mmonc...@gmail.com wrote: On Mon, Jul 6, 2015 at 9:52 AM, Joe Conway m...@joeconway.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/06/2015 07:37 AM, Merlin Moncure wrote: yup, and at least one case now fails where previously it ran through: postgres=# select * from dblink('a', 'b', 'c'); ERROR: function dblink(unknown, unknown, unknown) is not unique Hmm, that is an issue, possibly a fatal one. When Cory first mentioned this to me over a year ago we discussed some other, arguably better and more generic solutions. One was to build support for: SELECT * FROM srf() AS TYPE OF(foo); The second idea I think is actually SQL standard if I recall correctly: SELECT * FROM CAST(srf() AS foo) x; Currently this works: 8 select * from cast (row(11,'l','{a11,b11,c11}') as foo); f1 | f2 | f3 - ++--- 11 | l | {a11,b11,c11} (1 row) 8 But this fails: 8 select * from cast (dblink('dbname=contrib_regression','select * from foo') as foo); ERROR: cannot cast type record to foo 8 Comments in the source have this to say: 8 /* * coerce_record_to_complex * Coerce a RECORD to a specific composite type. * * Currently we only support this for inputs that are RowExprs or * whole-row Vars. */ 8 That explains why the first example works while the second does not. I'm not sure how hard it would be to fix that, but it appears that that is where we should focus. Yeah. FWIW, here's my 0.02$: I use dblink all the time, for all kinds of reasons, vastly preferring to have control over the query string (vs. FDW type solutions). I have two basic gripes with it. #1 is that remote queries are not cancelled over all call sites when cancelled locally (off-topic for this thread) and #2 is that the SRF record describing mechanics are not abstractable because of using syntax to describe the record. Corey's proposal, overloading issues aside, appears to neatly deal with this problem because anyelement can be passed down through a wrapping API. IOW, I'd like to do: CREATE FUNCTION remote_call(...) RETURNS ... AS $$ SELECT dblink(...) AS r(...) $$ language sql; ...which can't be done (discounting dynamic sql acrobatics) because of the syntax based expression of the 'r' record. So I like Corey's idea...I just think the functions need to be named differently (maybe to 'dblink_any', and dblink_get_result_any'?). TBH, to do better than that you'd need SQL level support for handling the return type in the vein of NEW/OLD. For fun, let's call it 'OUT'...then you could: SELECT * FROM remote_call(...) RETURNS SETOF foo; Inside remote_call, you'd see something like: SELECT dblink(...) AS OUT; As to the proposed syntax, I would vote to support the SQL standard variant if it could be handled during parse. I don't see what AS TYPE OF really buys you because FWICT it does not support wrapping. merlin Your experiences with dblink are very similar to mine. The whole issue arose for me as an outcropping of my Poor Man's Parallel Processing extension (still not released but currently working for us in production internally). At some point I had to do dblink_get_result(...) as t(...) and not only did I have to render the structure as a string, I was going to have to execute that SQL dynamically (because plpgsql lacks a PREPARE statement) or I was going to have to re-code in C or plv8. Overall those calls aren't terribly expensive (it's working in production - for us - without this dblink modification), but a cleaner solution would be better.
[HACKERS] Repeated pg_upgrade buildfarm failures on binturon
Hi, Binturon has repeatedly failed with errors like: ERROR: could not open file base/16400/32052: No such file or directory E.g. http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=binturongdt=2015-07-06%2014%3A20%3A24 It's not just master that's failing, even older branches report odd errors: connection to database failed: FATAL: could not open relation mapping file global/pg_filenode.map: No such file or directory http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=binturongdt=2015-07-06%2014%3A53%3A10 Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bypassing SQL lexer and parser
Le 6 juil. 2015 7:16 PM, Данила Поярков d...@dannote.net a écrit : Hello! What is the best starting point to PostgreSQL internal APIs for operating directly with the storage (performing basic INSERTs, UPDATEs, SELECTs and simple JOINs by hand)? I'm looking for something similar to MySQL Cluster NDB API or InnoDB internal API (the late HailDB and Embedded InnoDB). Does the PostgreSQL support any other type of plugins/extensions other than FDW and custom data types? I mean, is it possible to start another daemon within Postgres without slightly modifying the main codebase? That sounds a lot like a background worker. In case you are wondering why could anyone need something like that: I'm looking for a way to implement a small subset of HTSQL (see [ http://htsql.org/], of course not with the whole HTTP protocol) natively in one of popular RDBMS without extra URL-to-SQL conversion. Yes, I know that a lot of hard work was done for making query plans, optimizing etc. but I'm still really wish to do this for some very specific needs. I'm not going to completely replace the SQL and thus will be happy to do those manipulations on a SQL 2008-compliant DBMS. Good luck with that :-) -- Guillaume
[HACKERS] [PATCH] Add missing \ddp psql command
Quickie patch for spotted missing psql \ddp tab-completion. 0001-Add-missing-ddp-psql-tab-completion.patch Description: Binary data -- David Christensen PostgreSQL Team Manager End Point Corporation da...@endpoint.com 785-727-1171 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On 07/06/2015 10:03 AM, Sawada Masahiko wrote: The setting that you need is 1(1[A, C], 1[B, C]) in Michael's proposed grammer. If we set the remote disaster recovery site up as synch replica, we would get some big latencies even though we use quorum commit. So I think this case Fujii-san suggested is a good configuration, and many users would want to use it. I tend to agree with combine quorum and prioritization into one GUC parameter while keeping backward compatibility. OK, so here's the arguments pro-JSON and anti-JSON: pro-JSON: * standard syntax which is recognizable to sysadmins and devops. * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make additions/deletions from the synch rep config. * can add group labels (see below) anti-JSON: * more verbose * syntax is not backwards-compatible, we'd need a switch * people will want to use line breaks, which we can't support Re: group labels: I see a lot of value in being able to add names to quorum groups. Think about how this will be represented in system views; it will be difficult to show sync status of any quorum group in any meaningful way if the group has no label, and any system-assigned label would change unpredictably from the user's perspective. To give a JSON example, let's take the case of needing to sync to two of the servers in either London or NC: '{ remotes : { london_servers : { quorum : 2, servers : [ london1, london2, london3 ] }, nc_servers : { quorum : 1, servers [ nc1, nc2 ] } }' This says: as the remotes group, synch with a quorum of 2 servers in london and a quorum of 1 server in NC. This assumes for backwards-compatibility reasons that we support a priority list of groups of quorums, and not some other combination (see below for more on this). The advantage of having these labels is that it becomes easy to represent statuses for them: sync_group state definition remotes waiting { london_servers : { quorum ... london_servers synced { quorum : 2, servers : ... nc_servers waiting { quorum : 1, servers [ ... Without labels, we force the DBA to track groups by raw definitions, which would be difficult. Also, there's the question of what we do on reload with any statuses of synch groups which are currently in-process, if we don't have a stable key with which to identify groups. The other grammar issue has to do with the nesting nature of quorums and priorities. A theoretical user could want: * a priority list of quorum groups * a quorum group of priority lists * a quorum group of quorum groups * a priority list of quorum groups of quorum groups * a quorum group of quorum groups of priority lists ... etc. I don't really see any possible end to the possible permutations, which is why it would be good to establish some real use cases from now in order to figure out what we really want to support. Absent that, my inclination is that we should implement the simplest possible thing (i.e. no nesting) for 9.5. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5 alpha: some small comments on BRIN and btree_gin
On Mon, Jul 6, 2015 at 12:20 AM, Marc Mamin m.ma...@intershop.de wrote: Hello, First: KUDO !!! The release notes are extremely promising in regard to performance improvements :-) I've made some (dirty) tests with BRIN and btree_gin. (on a smalll Windows laptop ...) just a few remarks: - btree_gin deserve a better description than that: However, they are useful for GIN testing and as a base for developing other GIN operator classes. I came to similar times between btree and gin for indexes on category columns (ca 20 to 200 distinct values) For me, gin clearly wins here thanks to the index size difference. I reached the same conclusion for things with higher distinct values, but still several copies of each distinct value. except I don't think we should change the description until the BETWEEN issue is resolved. That is a pretty serious limitation, I think. Or at least, if you we invite people to use it for this purpose, we would have to warn them that it is not suitable for range queries. It wouldn't be so bad if it merely didn't support them well, but as things are now it positively pulls the planner away from better options, because it looks falsely attractive. I've looked into doing myself, but I'm afraid it is beyond me. Cheers, Jeff
[HACKERS] Bypassing SQL lexer and parser
Hello! What is the best starting point to PostgreSQL internal APIs for operating directly with the storage (performing basic INSERTs, UPDATEs, SELECTs and simple JOINs by hand)? I'm looking for something similar to MySQL Cluster NDB API or InnoDB internal API (the late HailDB and Embedded InnoDB). Does the PostgreSQL support any other type of plugins/extensions other than FDW and custom data types? I mean, is it possible to start another daemon within Postgres without slightly modifying the main codebase? In case you are wondering why could anyone need something like that: I'm looking for a way to implement a small subset of HTSQL (see [http://htsql.org/], of course not with the whole HTTP protocol) natively in one of popular RDBMS without extra URL-to-SQL conversion. Yes, I know that a lot of hard work was done for making query plans, optimizing etc. but I'm still really wish to do this for some very specific needs. I'm not going to completely replace the SQL and thus will be happy to do those manipulations on a SQL 2008-compliant DBMS. Thanks in advance, dannote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Mon, 2015-07-06 at 10:37 +0530, Amit Kapila wrote: Or the other way to look at it could be separate out fields which are required for parallel scan which is done currently by forming a separate structure ParallelHeapScanDescData. I was suggesting that you separate out both the normal scan fields and the partial scan fields, that way we're sure that rs_nblocks is not accessed during a parallel scan. Or, you could try wrapping the parts of heapam.c that are affected by parallelism into new static functions. The reason why partial scan can't be mixed with sync scan is that in parallel scan, it performs the scan of heap by synchronizing blocks (each parallel worker scans a block and then asks for a next block to scan) among parallel workers. Now if we try to make sync scans work along with it, the synchronization among parallel workers will go for a toss. It might not be impossible to make that work in some way, but not sure if it is important enough for sync scans to work along with parallel scan. I haven't tested it, but I think it would still be helpful. The block accesses are still in order even during a partial scan, so why wouldn't it help? You might be concerned about the reporting of a block location, which would become more noisy with increased parallelism. But in my original testing, sync scans weren't very sensitive to slight deviations, because of caching effects. tqueue.c is mainly designed to pass tuples between parallel workers and currently it is used in Funnel operator to gather the tuples generated by all the parallel workers. I think we can use it for any other operator which needs tuple communication among parallel workers. Some specifics of the Funnel operator seem to be a part of tqueue, which doesn't make sense to me. For instance, reading from the set of queues in a round-robin fashion is part of the Funnel algorithm, and doesn't seem suitable for a generic tuple communication mechanism (that would never allow order-sensitive reading, for example). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Enhanced ALTER OPERATOR
Hello hackers. This is the fifth version of the patch (the fourth was unsuccessful :)). I added documentation and was held a small refactoring. Thanks. -- Uriy Zhuravlev Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/doc/src/sgml/ref/alter_operator.sgml b/doc/src/sgml/ref/alter_operator.sgml index bdb2d02..4aaeed0 100644 --- a/doc/src/sgml/ref/alter_operator.sgml +++ b/doc/src/sgml/ref/alter_operator.sgml @@ -26,6 +26,11 @@ ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/repla ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/replaceable | NONE } , { replaceableright_type/replaceable | NONE } ) SET SCHEMA replaceablenew_schema/replaceable + +ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/replaceable | NONE } , { replaceableright_type/replaceable | NONE } ) SET ( +[, RESTRICT = { replaceable class=parameterres_proc/replaceable | NULL } ] +[, JOIN = { replaceable class=parameterjoin_proc/replaceable | NULL } ] +) /synopsis /refsynopsisdiv @@ -34,8 +39,7 @@ ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/repla para commandALTER OPERATOR/command changes the definition of - an operator. The only currently available functionality is to change the - owner of the operator. + an operator. /para para @@ -98,6 +102,25 @@ ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/repla /para /listitem /varlistentry + + varlistentry + termreplaceable class=parameterres_proc/replaceable/term + listitem + para + The restriction selectivity estimator function for this operator. + /para + /listitem + /varlistentry + + varlistentry + termreplaceable class=parameterjoin_proc/replaceable/term + listitem + para + The join selectivity estimator function for this operator. + /para + /listitem + /varlistentry + /variablelist /refsect1 @@ -109,6 +132,13 @@ ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/repla programlisting ALTER OPERATOR @@ (text, text) OWNER TO joe; /programlisting/para + + para +Change the restriction and join selectivity estimator function of a custom operator literala b/literal for type typeint[]/type: +programlisting +ALTER OPERATOR (_int4, _int4) SET (RESTRICT = _int_contsel, JOIN = _int_contjoinsel); +/programlisting/para + /refsect1 refsect1 diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c index 072f530..f5ff381 100644 --- a/src/backend/catalog/pg_operator.c +++ b/src/backend/catalog/pg_operator.c @@ -28,8 +28,10 @@ #include catalog/pg_operator.h #include catalog/pg_proc.h #include catalog/pg_type.h +#include commands/defrem.h #include miscadmin.h #include parser/parse_oper.h +#include parser/parse_func.h #include utils/acl.h #include utils/builtins.h #include utils/lsyscache.h @@ -53,7 +55,7 @@ static Oid OperatorShellMake(const char *operatorName, Oid leftTypeId, Oid rightTypeId); -static void OperatorUpd(Oid baseId, Oid commId, Oid negId); +static void ShellOperatorUpd(Oid baseId, Oid commId, Oid negId); static Oid get_other_operator(List *otherOp, Oid otherLeftTypeId, Oid otherRightTypeId, @@ -563,7 +565,7 @@ OperatorCreate(const char *operatorName, commutatorId = operatorObjectId; if (OidIsValid(commutatorId) || OidIsValid(negatorId)) - OperatorUpd(operatorObjectId, commutatorId, negatorId); + ShellOperatorUpd(operatorObjectId, commutatorId, negatorId); return address; } @@ -633,7 +635,7 @@ get_other_operator(List *otherOp, Oid otherLeftTypeId, Oid otherRightTypeId, } /* - * OperatorUpd + * ShellOperatorUpd * * For a given operator, look up its negator and commutator operators. * If they are defined, but their negator and commutator fields @@ -642,7 +644,7 @@ get_other_operator(List *otherOp, Oid otherLeftTypeId, Oid otherRightTypeId, * which are the negator or commutator of each other. */ static void -OperatorUpd(Oid baseId, Oid commId, Oid negId) +ShellOperatorUpd(Oid baseId, Oid commId, Oid negId) { int i; Relation pg_operator_desc; @@ -864,3 +866,150 @@ makeOperatorDependencies(HeapTuple tuple) return myself; } + +/* + * Operator update aka ALTER OPERATOR for RESTRICT, JOIN + */ +void OperatorUpd(Oid classId, + Oid baseId, + List *operator_params) +{ + int i; + ListCell *pl; + Relation catalog; + HeapTuple tup; + Oid operator_param_id = 0; + bool nulls[Natts_pg_operator]; + bool replaces[Natts_pg_operator]; + Datum values[Natts_pg_operator]; + + for (i = 0; i Natts_pg_operator; ++i) + { + values[i] = (Datum) 0; + replaces[i] = false; + nulls[i] = false; + } + + catalog = heap_open(classId, RowExclusiveLock); + tup = SearchSysCacheCopy1(OPEROID, ObjectIdGetDatum(baseId)); + if (HeapTupleIsValid(tup)) + { + /* + * loop over the definition list and
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Thu, Jul 2, 2015 at 9:31 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Jul 2, 2015 at 5:44 PM, Beena Emerson memissemer...@gmail.com wrote: Hello, There has been a lot of discussion. It has become a bit confusing. I am summarizing my understanding of the discussion till now. Kindly let me know if I missed anything important. Backward compatibility: We have to provide support for the current format and behavior for synchronous replication (The first running standby from list s_s_names) In case the new format does not include GUC, then a special value to be specified for s_s_names to indicate that. Priority and quorum: Quorum treats all the standby with same priority while in priority behavior, each one has a different priority and ACK must be received from the specified k lowest priority servers. I am not sure how combining both will work out. Mostly we would like to have some standbys from each data center to be in sync. Can it not be achieved by quorum only? So you're wondering if there is the use case where both quorum and priority are used together? For example, please imagine the case where you have two standby servers (say A and B) in local site, and one standby server (say C) in remote disaster recovery site. You want to set up sync replication so that the master waits for ACK from either A or B, i.e., the setting of 1(A, B). Also only when either A or B crashes, you want to make the master wait for ACK from either the remaining local standby or C. On the other hand, you don't want to use the setting like 1(A, B, C). Because in this setting, C can be sync standby when the master craches, and both A and B might be very behind of C. In this case, you need to promote the remote standby server C to new master,,, this is what you'd like to avoid. The setting that you need is 1(1[A, C], 1[B, C]) in Michael's proposed grammer. If we set the remote disaster recovery site up as synch replica, we would get some big latencies even though we use quorum commit. So I think this case Fujii-san suggested is a good configuration, and many users would want to use it. I tend to agree with combine quorum and prioritization into one GUC parameter while keeping backward compatibility. Regards, -- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix broken Install.bat when target directory contains a space
On 07/06/2015 04:38 AM, Michael Paquier wrote: On Sat, Jul 4, 2015 at 3:57 AM, Heikki Linnakangas wrote: Hmm. Why is install.bat not like build.bat, i.e. just a thin wrapper that just calls install.pl, passing all arguments? I guess we just haven't noticed it. And indeed it makes everything more simple, and fixes as well the error reported when install path contains a space. Ok, committed that way. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in bttext_abbrev_convert()
On Mon, Jul 6, 2015 at 12:27 PM, Kevin Grittner kgri...@ymail.com wrote: What would it take to get something like that which uses the check-world target instead of just the check target? Without the additional tests (like the isolation tests), some of these numbers don't reflect the coverage of regularly run tests. While it is of some interest what coverage we get on the 30 second `make check` runs, I would be a lot more interested in what our coverage is when the full 8.5 minute set of regression tests we have in git is run. I agree. Obviously in some cases the coverage is bad because concurrency is naturally required. It's even more glaring that recovery codepaths are always all-red in coverage reports. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in bttext_abbrev_convert()
Alvaro Herrera alvhe...@2ndquadrant.com wrote: Peter Geoghegan wrote: It would be nice to always have a html report from gcov always available on the internet. That would be something useful to automate, IMV. http://pgci.eisentraut.org/jenkins/job/postgresql_master_coverage/Coverage/ What would it take to get something like that which uses the check-world target instead of just the check target? Without the additional tests (like the isolation tests), some of these numbers don't reflect the coverage of regularly run tests. While it is of some interest what coverage we get on the 30 second `make check` runs, I would be a lot more interested in what our coverage is when the full 8.5 minute set of regression tests we have in git is run. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Repeated pg_upgrade buildfarm failures on binturon
Andres Freund and...@anarazel.de writes: Binturon has repeatedly failed with errors like: ERROR: could not open file base/16400/32052: No such file or directory I agree that binturong seems to have something odd going on; but there are a lot of other intermittent pg_upgrade test failures in the buildfarm history, and the general situation is that you can't tell what actually happened because the buildfarm script doesn't capture all the relevant log files. It would be real nice to improve that, but I lack the necessary Perl-fu. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5 alpha: some small comments on BRIN and btree_gin
On 07/06/2015 12:20 AM, Marc Mamin wrote: There seems to be no fence against useless BRIN indexes that would allow a fallback on a table scan. But the time overhead remind small :) When have we ever stopped users from creating useless indexes? For one thing, just because the index isn't useful *now* doesn't mean it won't be in the future. Now, it would be useful to have a brin_index_effectiveness() function so that DBAs could check for themselves whether they should dump indexes. However, I don't see needing that for 9.5. Are there usage stats in pg_stat_user_indexes for BRIN? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add checksums without --initdb
On Jul 2, 2015, at 3:43 PM, Heikki Linnakangas hlinn...@iki.fi wrote: On 07/02/2015 11:28 PM, Andres Freund wrote: On 2015-07-02 22:53:40 +0300, Heikki Linnakangas wrote: Add a enabling-checksums mode to the server where it calculates checksums for anything it writes, but doesn't check or complain about incorrect checksums on reads. Put the server into that mode, and then have a background process that reads through all data in the cluster, calculates the checksum for every page, and writes all the data back. Once that's completed, checksums can be fully enabled. You'd need, afaics, a bgworker that connects to every database to read pg_class, to figure out what type of page a relfilenode has. And this probably should call back into the relevant AM or such. Nah, we already assume that every relation data file follows the standard page format, at least enough to have the checksum field at the right location. See FlushBuffer() - it just unconditionally calculates the checksum before writing out the page. (I'm not totally happy about that, but that ship has sailed) - Heikki So thinking some more about the necessary design to support enabling checksums post-initdb, what about the following?: Introduce a new field in pg_control, data_checksum_state - (0 - disabled, 1 - enabling in process, 2 - enabled). This could be set via (say) a pg_upgrade flag when creating a new cluster with --enable-checksums or a standalone program to adjust that field in pg_control. Checksum enforcing behavior will be dependent on that setting; 0 is non-enforcing read or write, 1 is enforcing checksums on buffer write but ignoring on read, and 2 is the normal enforcing read/write mode. Disabling checksums could be done with this tool as well, and would trivially just cause it to ignore the checksums (or alternately set to 0 on page write, depending on if we think it matters). Add new catalog fields pg_database.dathaschecksum, pg_class.relhaschecksum; initially set to 0, or 1 if checksums were enabled at initdb time. Augment autovacuum to check if we are currently enabling checksums based on the value in pg_control; if so, loop over any database with !pg_database.dathaschecksum. For any relation in said database, check for relations with !pg_class.relhaschecksum; if found, read/dirty/write (however) each block to force the checksum written out for each page. As each relation is completely verified checksummed, update relhaschecksum = t. When no relations remain, set pg_database.dathaschecksum = t. (There may need to be some additional considerations for storing the checksum state of global relations or any other thing that uses the standard page format that live outside a specific database; i.e., all shared catalogs, quite possibly some things I haven't considered yet.) If the data_checksum_state is enabling and there are no database needing to be enabled, then we can set data_checksum_state to enabled; everything then works as expected for the normal enforcing state. External programs needing to be adjusted: - pg_reset_xlog -- add the persistence of the data_checksum_state - pg_controldata -- add the display of the data_checksum_state - pg_upgrade -- add an --enable-checksums flag to transition a new cluster with the data pages. May need some adjustments for the data_checksum_state field Possible new tool: - pg_enablechecksums -- basic tool to set the data_checksum_state flag of pg_control Other thoughts Do we need periodic CRC scanning background worker just to check buffers periodically? - if so, does this cause any interference with frozen relations? What additional changes would be required or what wrinkles would we have to work out? David -- David Christensen PostgreSQL Team Manager End Point Corporation da...@endpoint.com 785-727-1171 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix broken Install.bat when target directory contains a space
Michael Paquier michael.paqu...@gmail.com writes: On Tue, Jul 7, 2015 at 4:19 AM, Heikki Linnakangas hlinn...@iki.fi wrote: Ok, committed that way. Shoudn't this patch be backpatched? In the backbranches install.bat does not work correctly with paths containing spaces. I was about to ask the same. AFAICS, the other scripts have been similar one-liners at least since 9.0. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Spurious full-stop in message
postgresql/src/backend$ grep must be superuser to change bypassrls attribute commands/user.c | sed 's/ \+//' errmsg(must be superuser to change bypassrls attribute.))); errmsg(must be superuser to change bypassrls attribute))); Patch to fix attached. diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c index 3b381c5..5b20994 100644 --- a/src/backend/commands/user.c +++ b/src/backend/commands/user.c @@ -301,7 +301,7 @@ CreateRole(CreateRoleStmt *stmt) if (!superuser()) ereport(ERROR, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), -errmsg(must be superuser to change bypassrls attribute.))); +errmsg(must be superuser to change bypassrls attribute))); } else { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spurious full-stop in message
* Daniele Varrazzo (daniele.varra...@gmail.com) wrote: postgresql/src/backend$ grep must be superuser to change bypassrls attribute commands/user.c | sed 's/ \+//' errmsg(must be superuser to change bypassrls attribute.))); errmsg(must be superuser to change bypassrls attribute))); Patch to fix attached. Will fix. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] 9.5 alpha: some small comments on BRIN and btree_gin
On Tue, Jul 7, 2015 at 9:04 AM, Josh Berkus j...@agliodbs.com wrote: Are there usage stats in pg_stat_user_indexes for BRIN? Yes, they are here. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bypassing SQL lexer and parser
On 7/6/15 12:14 PM, Данила Поярков wrote: Hello! What is the best starting point to PostgreSQL internal APIs for operating directly with the storage (performing basic INSERTs, UPDATEs, SELECTs and simple JOINs by hand)? I'm looking for something similar to MySQL Cluster NDB API or InnoDB internal API (the late HailDB and Embedded InnoDB). There is support for plugging into the parser and executor, so that might be a possibility, but... Does the PostgreSQL support any other type of plugins/extensions other than FDW and custom data types? I mean, is it possible to start another daemon within Postgres without slightly modifying the main codebase? In case you are wondering why could anyone need something like that: I'm looking for a way to implement a small subset of HTSQL (see [http://htsql.org/], of course not with the whole HTTP protocol) natively in one of popular RDBMS without extra URL-to-SQL conversion. Yes, I know that a lot of hard work was done for making query plans, optimizing etc. but I'm still really wish to do this for some very specific needs. I'm not going to completely replace the SQL and thus will be happy to do those manipulations on a SQL 2008-compliant DBMS. AFAIK HTSQL is very happy producing SQL; why not just let it hand SQL to Postgres (which it already does...) Have you by chance talked to Clark or Kirill about this? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Fri, Jul 3, 2015 at 10:05 PM, Amit Kapila amit.kapil...@gmail.com wrote: Attached, find the rebased version of patch. Note - You need to first apply the assess-parallel-safety patch which you can find at: http://www.postgresql.org/message-id/CAA4eK1JjsfE_dOsHTr_z1P_cBKi_X4C4X3d7Nv=vwx9fs7q...@mail.gmail.com I ran some performance tests on a 16 core machine with large shared buffers, so there is no IO involved. With the default value of cpu_tuple_comm_cost, parallel plan is not getting generated even if we are selecting 100K records from 40 million records. So I changed the value to '0' and collected the performance readings. Here are the performance numbers: selectivity(millions) Seq scan(ms) Parallel scan 2 workers 4 workers 8 workers 0.1 11498.934821.40 3305.843291.90 0.4 10942.984967.46 3338.583374.00 0.8 11619.445189.61 3543.863534.40 1.5 12585.515718.07 4162.712994.90 2.7 14725.668346.96 10429.058049.11 5.4 18719.00 20212.33 21815.19 19026.99 7.2 21955.79 28570.74 28217.60 27042.27 The average table row size is around 500 bytes and query selection column width is around 36 bytes. when the query selectivity goes more than 10% of total table records, the parallel scan performance is dropping. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Tue, Jul 7, 2015 at 2:56 AM, Josh Berkus j...@agliodbs.com wrote: pro-JSON: * standard syntax which is recognizable to sysadmins and devops. * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make additions/deletions from the synch rep config. * can add group labels (see below) If we go this way, I think that managing a JSON blob with a GUC parameter is crazy, this is way longer in character size than a simple formula because of the key names. Hence, this JSON blob should be in a separate place than postgresql.conf not within the catalog tables, manageable using an SQL interface, and reloaded in backends using SIGHUP. anti-JSON: * more verbose * syntax is not backwards-compatible, we'd need a switch This point is valid as well in the pro-JSON portion. * people will want to use line breaks, which we can't support Yes, this is caused by the fact of using a GUC. For a simple formula this seems fine to me though, that's what we have today for s_s_names and using a formula is not much longer in character size than what we have now. Re: group labels: I see a lot of value in being able to add names to quorum groups. Think about how this will be represented in system views; it will be difficult to show sync status of any quorum group in any meaningful way if the group has no label, and any system-assigned label would change unpredictably from the user's perspective. To give a JSON example, let's take the case of needing to sync to two of the servers in either London or NC: '{ remotes : { london_servers : { quorum : 2, servers : [ london1, london2, london3 ] }, nc_servers : { quorum : 1, servers [ nc1, nc2 ] } }' The JSON blob managing sync node information could contain additional JSON objects that register a set of nodes as a given group. More easily, you could use let's say the following structure to store the blobs: - pg_syncinfo/global, to store the root of the formula, that could use groups. - pg_syncinfo/groups/$GROUP_NAME to store a set JSON blobs representing a group. The advantage of having these labels is that it becomes easy to represent statuses for them: sync_group state definition remotes waiting { london_servers : { quorum ... london_servers synced { quorum : 2, servers : ... nc_servers waiting { quorum : 1, servers [ ... Without labels, we force the DBA to track groups by raw definitions, which would be difficult. Also, there's the question of what we do on reload with any statuses of synch groups which are currently in-process, if we don't have a stable key with which to identify groups. Well, yes. The other grammar issue has to do with the nesting nature of quorums and priorities. A theoretical user could want: * a priority list of quorum groups * a quorum group of priority lists * a quorum group of quorum groups * a priority list of quorum groups of quorum groups * a quorum group of quorum groups of priority lists ... etc. I don't really see any possible end to the possible permutations, which is why it would be good to establish some real use cases from now in order to figure out what we really want to support. Absent that, my inclination is that we should implement the simplest possible thing (i.e. no nesting) for 9.5. I am not sure I agree that this will simplify the work. Currently s_s_names has already 1 level, and we want to append groups to each element of it as well, meaning that we'll need at least 2 level of nesting. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On 2015-07-07 AM 02:56, Josh Berkus wrote: Re: group labels: I see a lot of value in being able to add names to quorum groups. Think about how this will be represented in system views; it will be difficult to show sync status of any quorum group in any meaningful way if the group has no label, and any system-assigned label would change unpredictably from the user's perspective. To give a JSON example, let's take the case of needing to sync to two of the servers in either London or NC: '{ remotes : { london_servers : { quorum : 2, servers : [ london1, london2, london3 ] }, nc_servers : { quorum : 1, servers [ nc1, nc2 ] } }' What if we write the above as: remotes-1 (london_servers-2 [london1, london2, london3], nc_servers-1 [nc1, nc2]) That requires only slightly altering the proposed format, that is prepend sync group label string to the quorum number. The monitoring view can be made to internally generate JSON output (if needed) from it. It does not seem very ALTER SYSTEM SET friendly but there are trade-offs either way. Just my 2c. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Asynchronous execution on FDW
Hello, thank you for looking this. If it is acceptable to reconstruct the executor nodes to have additional return state PREP_RUN or such (which means it needs one more call for the first tuple) , I'll modify the whole executor to handle the state in the next patch to do so. I haven't take the advice I had so far in this sense. But I came to think that it is the most reasonable way to solve this. == - It was a problem when to give the first kick for async exec. It is not in ExecInit phase, and ExecProc phase does not fit, too. An extra phase ExecPreProc or something is too invasive. So I tried pre-exec callback. Any init-node can register callbacks on their turn, then the registerd callbacks are called just before ExecProc phase in executor. The first patch adds functions and structs to enable this. At a quick glance, I think this has all the same problems as starting the execution at ExecInit phase. The correct way to do this is to kick off the queries in the first IterateForeignScan() call. You said that ExecProc phase does not fit - why not? Execution nodes are expected to return the first tuple if available. But asynchronous execution can not return the first tuple immediately. Simultaneous execution for the first tuple on every foreign node is crucial than asynchronous fetching for many cases, especially for the cases like sort/agg pushdown on FDW. The reason why ExecProc does not fit is that the first loop without returning tuple looks impact too large portion in executor. It is my mistake that it doesn't address the problem about parameterized paths. Parameterized paths should be executed within ExecProc loops so this patch would be like following. - To gain the advantage of kicking execution before the first ExecProc loop, non-parameterized paths are started using the callback feature this patch provides. - Parameterized paths need the upper nodes executed before it starts execution so they should be start in ExecProc loop, but runs asynchronously if possible. This is rather a makeshift solution for the problem, but considering current trend of parallelism, it might the time to make the executor to fit parallel execution. If it is acceptable to reconstruct the executor nodes to have additional return state PREP_RUN or such (which means it needs one more call for the first tuple) , I'll modify the whole executor to handle the state in the next patch to do so. I hate my stupidity if you suggested this kind of solution by do it in ExecProc:( regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.5 alpha: some small comments on BRIN and btree_gin
Marc Mamin wrote: - BRIN cost: I've made a silly test, where all distinct values exist in all BRIN page ranges: INSERT into tbrin_1 (cat_id, ) SELECT s%20, ... FROM generate_series(1,300 )s; CREATE INDEX cat_brin_1 on tbrin_1 using BRIN (cat_id)with (pages_per_range=64); SELECT * from tbrin_1 WHERE cat_id=10; http://explain.depesz.com/s/9YQR There seems to be no fence against useless BRIN indexes that would allow a fallback on a table scan. But the time overhead remind small :) Hmm, I guess the costing function for brin could stand some improvement. Clearly we're not covering all bases. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Mon, Jul 6, 2015 at 10:54 PM, Jeff Davis pg...@j-davis.com wrote: On Mon, 2015-07-06 at 10:37 +0530, Amit Kapila wrote: Or the other way to look at it could be separate out fields which are required for parallel scan which is done currently by forming a separate structure ParallelHeapScanDescData. I was suggesting that you separate out both the normal scan fields and the partial scan fields, that way we're sure that rs_nblocks is not accessed during a parallel scan. In patch rs_nblocks is used in paratial scan's as well, only the way to initialize is changed. Or, you could try wrapping the parts of heapam.c that are affected by parallelism into new static functions. Sounds sensible to me, but I would like to hear from Robert before making this change, if he has any different opinions about this point, as he has originally written this part of the patch. The reason why partial scan can't be mixed with sync scan is that in parallel scan, it performs the scan of heap by synchronizing blocks (each parallel worker scans a block and then asks for a next block to scan) among parallel workers. Now if we try to make sync scans work along with it, the synchronization among parallel workers will go for a toss. It might not be impossible to make that work in some way, but not sure if it is important enough for sync scans to work along with parallel scan. I haven't tested it, but I think it would still be helpful. The block accesses are still in order even during a partial scan, so why wouldn't it help? You might be concerned about the reporting of a block location, which would become more noisy with increased parallelism. But in my original testing, sync scans weren't very sensitive to slight deviations, because of caching effects. I am not sure how many blocks difference could be considered okay for deviation? In theory, making parallel scan perform sync scan could lead to difference of multiple blocks, consider the case where there are 32 or more workers participating in scan and each got one block to scan, it is possible that first worker performs scan of 1st block after 32nd worker performs the scan of 32nd block (it could lead to even bigger differences). tqueue.c is mainly designed to pass tuples between parallel workers and currently it is used in Funnel operator to gather the tuples generated by all the parallel workers. I think we can use it for any other operator which needs tuple communication among parallel workers. Some specifics of the Funnel operator seem to be a part of tqueue, which doesn't make sense to me. For instance, reading from the set of queues in a round-robin fashion is part of the Funnel algorithm, and doesn't seem suitable for a generic tuple communication mechanism (that would never allow order-sensitive reading, for example). Okay, this makes sense to me, I think it is better to move Funnel operator specific parts out of tqueue.c unless Robert or anybody else feels otherwise. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Support for N synchronous standby servers - take 2
On 07/06/2015 06:40 PM, Michael Paquier wrote: On Tue, Jul 7, 2015 at 2:56 AM, Josh Berkus j...@agliodbs.com wrote: pro-JSON: * standard syntax which is recognizable to sysadmins and devops. * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make additions/deletions from the synch rep config. * can add group labels (see below) If we go this way, I think that managing a JSON blob with a GUC parameter is crazy, this is way longer in character size than a simple formula because of the key names. Hence, this JSON blob should be in a separate place than postgresql.conf not within the catalog tables, manageable using an SQL interface, and reloaded in backends using SIGHUP. I'm not following this at all. What are you saying here? I don't really see any possible end to the possible permutations, which is why it would be good to establish some real use cases from now in order to figure out what we really want to support. Absent that, my inclination is that we should implement the simplest possible thing (i.e. no nesting) for 9.5. I am not sure I agree that this will simplify the work. Currently s_s_names has already 1 level, and we want to append groups to each element of it as well, meaning that we'll need at least 2 level of nesting. Well, we have to draw a line somewhere, unless we're going to support infinite recursion. And if we are going to support infinitie recursion, and kind of compact syntax for a GUC isn't even worth talking about ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Tue, Jul 7, 2015 at 6:19 AM, Haribabu Kommi kommi.harib...@gmail.com wrote: On Fri, Jul 3, 2015 at 10:05 PM, Amit Kapila amit.kapil...@gmail.com wrote: Attached, find the rebased version of patch. Note - You need to first apply the assess-parallel-safety patch which you can find at: http://www.postgresql.org/message-id/CAA4eK1JjsfE_dOsHTr_z1P_cBKi_X4C4X3d7Nv=vwx9fs7q...@mail.gmail.com I ran some performance tests on a 16 core machine with large shared buffers, so there is no IO involved. With the default value of cpu_tuple_comm_cost, parallel plan is not getting generated even if we are selecting 100K records from 40 million records. So I changed the value to '0' and collected the performance readings. For reasonable default values for these parameters, still more testing is required. I think instead of 0, tests with 0.001 or 0.0025 for default of cpu_tuple_comm_cost and 100 or 1000 for default of parallel_setup_cost would have been more interesting. Here are the performance numbers: The average table row size is around 500 bytes and query selection column width is around 36 bytes. when the query selectivity goes more than 10% of total table records, the parallel scan performance is dropping. These are quite similar to what I have seen in my initial tests, now I think if you add some complex condition in the filter, you will see gains for even 25% or more selectivity (I have added factorial 10 calculation in filter to mimic the complex filter condition). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Memory Accounting v11
On 15 June 2015 at 07:43, Jeff Davis pg...@j-davis.com wrote: This patch tracks memory usage (at the block level) for all memory contexts. Individual palloc()s aren't tracked; only the new blocks allocated to the memory context with malloc(). It also adds a new function, MemoryContextMemAllocated() which can either retrieve the total allocated for the context, or it can recurse and sum up the allocations for all subcontexts as well. This is intended to be used by HashAgg in an upcoming patch that will bound the memory and spill to disk. Previous discussion here: http://www.postgresql.org/message-id/1407012053.15301.53.camel@jeff-desktop Previous concerns: * There was a slowdown reported of around 1-3% (depending on the exact version of the patch) on an IBM power machine when doing an index rebuild. The results were fairly noisy for me, but it seemed to hold up. See http://www.postgresql.org/message-id/CA +Tgmobnu7XEn1gRdXnFo37P79bF=qLt46=37ajp3cro9db...@mail.gmail.com * Adding a struct field to MemoryContextData may be bad for the CPU caching behavior, and may be the cause of the above slowdown. I've been looking at this patch and trying to reproduce the reported slowdown by using Tomas' function to try to exercise palloc() with minimal overhead of other code: https://github.com/tvondra/palloc_bench I also wanted to attempt to determine if the slowdown was due to the mem_allocated maintenance or if it was down to the struct size increasing. I decided to test this just by simply commenting out all of the context-mem_allocated += / -= and just keep the mem_allocated field in the struct, but I've been really struggling to see any sort of performance decrease anywhere. I'm just getting too much variation in the test results to get any sort of idea. I've attached a spreadsheet of the results I saw. It would be really good if Robert was able to test with the IBM PPC just with the extra field in the struct so we can see if the 1-3% lies there, or if it's in overhead of keeping mem_allocated up-to-date. My main question here is: How sure are you that none of your intended use cases will need to call MemoryContextMemAllocated with recurse = true? I'd imagine if you ever have to use MemoryContextMemAllocated(ctx, true) then we'll all be sitting around with our stopwatches out to see if the call incurs too much of a performance penalty. Other small issues: + oldblksize = block-endptr - ((char *)block); + block = (AllocBlock) realloc(block, blksize); if (block == NULL) return NULL; + + context-mem_allocated += blksize - oldblksize; + Shouldn't you be setting oldblksize after the if? I'd imagine the realloc() failure is rare, but it just seems better to do it just before it's required and only when required. + if (recurse) + { + MemoryContext child = context-firstchild; + for (child = context-firstchild; + child != NULL; Here there's a double assignment of child. *** *** 632,637 AllocSetDelete(MemoryContext context) --- 637,644 { AllocBlock next = block-next; + context-mem_allocated -= block-endptr - ((char *) block); + I might be mistaken here, but can't you just set context-mem_allocted = 0; after that loop? Or maybe it would be an improvement to only do the decrement if MEMORY_CONTEXT_CHECKING is defined, then Assert() that mem_allocated == 0 after the loop... One other thing that I don't remember seeing discussed would be to just store a List in each context which would store all of the mem_allocated's that need to be updated during each allocation and deallocation of memory. The list would be setup once when the context is created. When a child context is setup we'd just loop up the parent context chain and list_concat() all of the parent's lists onto the child's lists. Would this method add too much overhead when a context is deleted? Has this been explored already? Regards David Rowley -- David Rowley http://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training Services mem_accounting_benchmark.ods Description: application/vnd.oasis.opendocument.spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Tue, Jul 7, 2015 at 12:51 PM, Josh Berkus j...@agliodbs.com wrote: On 07/06/2015 06:40 PM, Michael Paquier wrote: On Tue, Jul 7, 2015 at 2:56 AM, Josh Berkus j...@agliodbs.com wrote: pro-JSON: * standard syntax which is recognizable to sysadmins and devops. * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make additions/deletions from the synch rep config. * can add group labels (see below) If we go this way, I think that managing a JSON blob with a GUC parameter is crazy, this is way longer in character size than a simple formula because of the key names. Hence, this JSON blob should be in a separate place than postgresql.conf not within the catalog tables, manageable using an SQL interface, and reloaded in backends using SIGHUP. I'm not following this at all. What are you saying here? A JSON string is longer in terms of number of characters than a formula because it contains key names, and those key names are usually repeated several times, making it harder to read in a configuration file. So what I am saying that that we do not save it as a GUC, but as a separate metadata that can be accessed with a set of SQL functions to manipulate it. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On 07/06/2015 09:56 PM, Michael Paquier wrote: On Tue, Jul 7, 2015 at 12:51 PM, Josh Berkus j...@agliodbs.com wrote: On 07/06/2015 06:40 PM, Michael Paquier wrote: On Tue, Jul 7, 2015 at 2:56 AM, Josh Berkus j...@agliodbs.com wrote: pro-JSON: * standard syntax which is recognizable to sysadmins and devops. * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make additions/deletions from the synch rep config. * can add group labels (see below) If we go this way, I think that managing a JSON blob with a GUC parameter is crazy, this is way longer in character size than a simple formula because of the key names. Hence, this JSON blob should be in a separate place than postgresql.conf not within the catalog tables, manageable using an SQL interface, and reloaded in backends using SIGHUP. I'm not following this at all. What are you saying here? A JSON string is longer in terms of number of characters than a formula because it contains key names, and those key names are usually repeated several times, making it harder to read in a configuration file. So what I am saying that that we do not save it as a GUC, but as a separate metadata that can be accessed with a set of SQL functions to manipulate it. Where, though? Someone already pointed out the issues with storing it in a system catalog, and adding an additional .conf file with a different format is too horrible to contemplate. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
Josh Berkus wrote: '{ remotes : { london_servers : { quorum : 2, servers : [ london1, london2, london3 ] }, nc_servers : { quorum : 1, servers [ nc1, nc2 ] } }' This says: as the remotes group, synch with a quorum of 2 servers in london and a quorum of 1 server in NC. I wanted to clarify about the format. The remotes group does not specify any quorum, only its individual elements mention the quorum. remotes is said to sync in london_servers and NC. Would absence of a quorum number in a group mean all elements? Or the above would be represented as following to imply AND between the 2 DC. '{ remotes : quorum : 2, servers : { london_servers : { quorum : 2, servers : [ london1, london2, london3 ] }, nc_servers : { quorum : 1, servers : [ nc1, nc2 ] } } }' - Beena Emerson -- View this message in context: http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5856868.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
Amit wrote: What if we write the above as: remotes-1 (london_servers-2 [london1, london2, london3], nc_servers-1 [nc1, nc2]) Yes this we can consider. Thanks, - Beena Emerson -- View this message in context: http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5856869.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink: add polymorphic functions.
On Sun, Jul 5, 2015 at 9:00 PM, Joe Conway m...@joeconway.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/05/2015 12:25 PM, Joe Conway wrote: On 02/22/2015 10:26 PM, Corey Huinker wrote: Changes in this patch: - added polymorphic versions of dblink_fetch() - upped dblink version # to 1.2 because of new functions - migration 1.1 - 1.2 - DocBook changes for dblink(), dblink_get_result(), dblink_fetch() The previous patch was missing dblink--1.1--1.2.sql and dblink--1.2.sql. I have added them, so it should apply cleanly against git master, but not done any actual review yet. Looking at the argument handling logic in dblink_fetch() and dblink_record_internal(), it has been messy for a while, but this patch serves to make it even worse. I wonder if it isn't better to just loop through all the args with get_fn_expr_argtype() every time and then test for the exact signature match? Another alternative might be to create a wrapper C function for each variant SQL function, but that seems like it could also get messy, especially with dblink_record_internal() since we would have to deal with every variant of all the external facing callers. Thoughts? When I was digging into it, it seemed that the parameter sniffing was on the verge of combinatorical chaos, which I assumed was due to parameter checking being expensive. And while there was no processing for the anyelement parameters, it did throw off the parameter counts, thus the extra complexity. If it's possible to suss out whether the last parameter is the polymorphic type, then we could bump down the arg-count by one, and have the old messy logic tree. I also didn't delve into whether or not it was dangerous to ask get_fn_expr_argtype() for parameters beyond PG_NARGS(). If it is, then we would still have the complicated signature-testing tree, but at least we could separate out the signature poking from the actual processing, and that might add some clarity. We could walk the arglist and handle each arg as you go, but that gets complicated with the text params because their meaning is often determined by whether the next param is also text. That has the potential to be every bit as messy. So from this, I see five options: 1. Leave it. It's a mess, but we have test case coverage to show it works. 2. A bloody minded approach if ((PG_NARGS() == 3) (get_fn_expr_argtype(...,0) == TEXTOID) (get_fn_expr_argtype(...,1) == TEXTOID) (get_fn_expr_argtype(...,2) == BOOLOID)) { } else if ((PG_NARGS() == 3) (get_fn_expr_argtype(...,0) == TEXTOID) (get_fn_expr_argtype(...,1) == BOOLOID) (get_fn_expr_argtype(...,2) == WHATEVER_ANYELEMENTS_ARE_OID)) { } else if ... Pro: It's utterly clear which signature was found Con: A lot of param checks, though most will shortcircuit The number of checks itself might become clutter. 3. Separate signature testing from using the parameters. if (PG_NARGS() == 3) { if (get_fn_expr_argtype(fcinfo-flinfo, 2) == BOOLOID) { signature = TEXTTEXTBOOL } else if (get_fn_expr_argtype(fcinfo-flinfo, 1) == BOOLOID) { signature = TEXTBOOLANY } else { signature = TEXTTEXTANY } } switch(signature) { case TEXTTEXTBOOL: t1 = text_to_cstring(PG_GETARG_TEXT_PP(0); t2 = text_to_cstring(PG_GETARG_TEXT_PP(1); fail = PG_GETARG_BOOL(2); break; Pro: Still getting to the signature with a minimum of comparisons. Con: A fair number of #defines for each signature combination (minus anyelements, as we never have to process those). Big(?) waterfall of options. The mess has just been segregated, not eliminated. 4. Walk the arglist, assigning as you go, on the assumption that if the current arg is text field it will be the last one encountered, and re-assigning the pointers when we discover otherwise. Pro: A tight loop, probably the least amount of code. I *think* this is what Joe was suggesting earlier. Con: We just made a mini-state engine, sorta like what you have to do with event driven parsers (XML SAX, for example). 5. Walk the arglist like in #3, assigning bool/int parameters immediately, but just count the text parameters, and assign them based on how many we found. Pro: Same as #4, but my high school CS teacher won't come out of retirement just to scold me for repurposing variables. Con: If at some point in the future we add more text parameters such that simple ordering is ambiguous (i.e. were those text fields at positions 1,2,3 or 1,2,5?) this would break down. I'm happy to try to code whichever of these people think is the most promising.
Re: [HACKERS] dblink: add polymorphic functions.
On Mon, Jul 6, 2015 at 4:25 AM, Joe Conway m...@joeconway.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/22/2015 10:26 PM, Corey Huinker wrote: Changes in this patch: - added polymorphic versions of dblink_fetch() - upped dblink version # to 1.2 because of new functions - migration 1.1 - 1.2 - DocBook changes for dblink(), dblink_get_result(), dblink_fetch() The previous patch was missing dblink--1.1--1.2.sql and dblink--1.2.sql. I have added them, so it should apply cleanly against git master, but not done any actual review yet. This extension format is still incorrect, as there is no need for dblink--1.1.sql as on the latest version of an extension needs to be supported, and Makefile needs to be updated in consequence. Please find attached an updated patch. +-- fetch using anyelement, which will change the column names SELECT * -FROM dblink_fetch('rmt_foo_cursor',4) AS t(a int, b text, c text[]); - a | b | c +---+ - 4 | e | {a4,b4,c4} - 5 | f | {a5,b5,c5} - 6 | g | {a6,b6,c6} - 7 | h | {a7,b7,c7} +FROM dblink_fetch('rmt_foo_cursor',4,null::foo) AS t; + f1 | f2 | f3 +++ + 4 | e | {a4,b4,c4} + 5 | f | {a5,b5,c5} + 6 | g | {a6,b6,c6} + 7 | h | {a7,b7,c7} (4 rows) Why isn't this test case not left alone? It looks like a regression to me to not test it. Regards, -- Michael diff --git a/contrib/dblink/Makefile b/contrib/dblink/Makefile index b8d5157..5189758 100644 --- a/contrib/dblink/Makefile +++ b/contrib/dblink/Makefile @@ -6,7 +6,8 @@ PG_CPPFLAGS = -I$(libpq_srcdir) SHLIB_LINK = $(libpq) EXTENSION = dblink -DATA = dblink--1.1.sql dblink--1.0--1.1.sql dblink--unpackaged--1.0.sql +DATA = dblink--1.2.sql dblink--1.1--1.2.sql dblink--1.0--1.1.sql \ + dblink--unpackaged--1.0.sql PGFILEDESC = dblink - connect to other PostgreSQL databases REGRESS = paths dblink diff --git a/contrib/dblink/dblink--1.1--1.2.sql b/contrib/dblink/dblink--1.1--1.2.sql new file mode 100644 index 000..a82e507 --- /dev/null +++ b/contrib/dblink/dblink--1.1--1.2.sql @@ -0,0 +1,44 @@ +/* contrib/dblink/dblink--1.1--1.2.sql */ + +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use ALTER EXTENSION dblink UPDATE TO '1.2' to load this file. \quit + +CREATE FUNCTION dblink (text, text, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_record' +LANGUAGE C; + +CREATE FUNCTION dblink (text, text, boolean, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_record' +LANGUAGE C; + +CREATE FUNCTION dblink (text, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_record' +LANGUAGE C; + +CREATE FUNCTION dblink (text, boolean, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_record' +LANGUAGE C; + +CREATE FUNCTION dblink_get_result (text, anyelement) +RETURNS SETOF anyelement +AS 'MODULE_PATHNAME', 'dblink_get_result' +LANGUAGE C; + +CREATE FUNCTION dblink_get_result (text, bool, anyelement) +RETURNS SETOF anyelement +AS 'MODULE_PATHNAME', 'dblink_get_result' +LANGUAGE C; + +CREATE FUNCTION dblink_fetch (text, int, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_fetch' +LANGUAGE C; + +CREATE FUNCTION dblink_fetch (text, int, boolean, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_fetch' +LANGUAGE C; diff --git a/contrib/dblink/dblink--1.1.sql b/contrib/dblink/dblink--1.2.sql similarity index 84% rename from contrib/dblink/dblink--1.1.sql rename to contrib/dblink/dblink--1.2.sql index 8733553..1d84df2 100644 --- a/contrib/dblink/dblink--1.1.sql +++ b/contrib/dblink/dblink--1.2.sql @@ -1,4 +1,4 @@ -/* contrib/dblink/dblink--1.1.sql */ +/* contrib/dblink/dblink--1.2.sql */ -- complain if script is sourced in psql, rather than via CREATE EXTENSION \echo Use CREATE EXTENSION dblink to load this file. \quit @@ -71,6 +71,16 @@ RETURNS setof record AS 'MODULE_PATHNAME','dblink_fetch' LANGUAGE C STRICT; +CREATE FUNCTION dblink_fetch (text, int, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_fetch' +LANGUAGE C; + +CREATE FUNCTION dblink_fetch (text, int, boolean, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_fetch' +LANGUAGE C; + CREATE FUNCTION dblink_fetch (text, text, int) RETURNS setof record AS 'MODULE_PATHNAME','dblink_fetch' @@ -121,6 +131,26 @@ RETURNS setof record AS 'MODULE_PATHNAME','dblink_record' LANGUAGE C STRICT; +CREATE FUNCTION dblink (text, text, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_record' +LANGUAGE C; + +CREATE FUNCTION dblink (text, text, boolean, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_record' +LANGUAGE C; + +CREATE FUNCTION dblink (text, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_record' +LANGUAGE C; + +CREATE FUNCTION dblink (text, boolean, anyelement) +RETURNS setof anyelement +AS 'MODULE_PATHNAME','dblink_record' +LANGUAGE C; + CREATE FUNCTION dblink_exec
Re: [HACKERS] WAL logging problem in 9.4.3?
On 2015-07-06 11:14:40 -0400, Tom Lane wrote: BEGIN; CREATE TABLE test (i int primary key); INSERT INTO test VALUES(-1); \copy test from /tmp/num.csv with csv COMMIT; SELECT COUNT(*) FROM test; The COUNT() correctly says 11 rows, but after crash-and-recover, only the row with -1 is there. This is because the INSERT writes out an INSERT+INIT WAL record, which we happily replay, clobbering the data added later by COPY. ISTM any WAL logged action that touches a relfilenode essentially needs to disable further optimization based on the knowledge that the relation is new. We might have to give up on this COPY optimization :-(. A crazy, not well though through, bandaid for the INSERT+INIT case would be to force COPY to use a new page when using the SKIP_WAL codepath. I'm not sure what would be a safe rule for deciding that we can skip WAL logging in this situation, but I am pretty sure that it would require keeping information we don't currently keep about what's happened earlier in the transaction. It'd not be impossible to add more state to the relcache entry for the relation. Whether it's likely that we'd find all the places that'd need updating that state, I'm not sure. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] LSN Map
On 02/24/2015 04:55 AM, Robert Haas wrote: On Mon, Feb 23, 2015 at 12:52 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Dunno, but Jim's got a point. This is a maintenance burden to all indexams, if they all have to remember to update the LSN map separately. It needs to be done in some common code, like in PageSetLSN or XLogInsert or something. Aside from that, isn't this horrible from a performance point of view? The patch doubles the buffer manager traffic, because any update to any page will also need to modify the LSN map. This code is copied from the visibility map code, but we got away with it there because the VM only needs to be updated the first time a page is modified. Subsequent updates will know the visibility bit is already cleared, and don't need to access the visibility map. Ans scalability: Whether you store one value for every N pages, or the LSN of every page, this is going to have a huge effect of focusing contention to the LSN pages. Currently, if ten backends operate on ten different heap pages, for example, they can run in parallel. There will be some contention on the WAL insertions (much less in 9.4 than before). But with this patch, they will all fight for the exclusive lock on the single LSN map page. You'll need to find a way to not update the LSN map on every update. For example, only update the LSN page on the first update after a checkpoint (although that would still have a big contention focusing effect right after a checkpoint). I think it would make more sense to do this in the background. Suppose there's a background process that reads the WAL and figures out which buffers it touched, and then updates the LSN map accordingly. Then the contention-focusing effect disappears, because all of the updates to the LSN map are being made by the same process. You need some way to make sure the WAL sticks around until you've scanned it for changed blocks - but that is mighty close to what a physical replication slot does, so it should be manageable. If you implement this as a background process that reads WAL, as Robert suggested, you could perhaps implement this completely in an extension. That'd be nice, even if we later want to integrate this in the backend, in order to get you started quickly. This is marked in the commitfest as Needs Review, but ISTM this got its fair share of review back in February. Marking as Returned with Feedback. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink: add polymorphic functions.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/06/2015 07:37 AM, Merlin Moncure wrote: yup, and at least one case now fails where previously it ran through: postgres=# select * from dblink('a', 'b', 'c'); ERROR: function dblink(unknown, unknown, unknown) is not unique Hmm, that is an issue, possibly a fatal one. When Cory first mentioned this to me over a year ago we discussed some other, arguably better and more generic solutions. One was to build support for: SELECT * FROM srf() AS TYPE OF(foo); The second idea I think is actually SQL standard if I recall correctly: SELECT * FROM CAST(srf() AS foo) x; Currently this works: 8 select * from cast (row(11,'l','{a11,b11,c11}') as foo); f1 | f2 | f3 - ++--- 11 | l | {a11,b11,c11} (1 row) 8 But this fails: 8 select * from cast (dblink('dbname=contrib_regression','select * from foo') as foo); ERROR: cannot cast type record to foo 8 Comments in the source have this to say: 8 /* * coerce_record_to_complex * Coerce a RECORD to a specific composite type. * * Currently we only support this for inputs that are RowExprs or * whole-row Vars. */ 8 That explains why the first example works while the second does not. I'm not sure how hard it would be to fix that, but it appears that that is where we should focus. Joe - -- Joe Conway -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJVmpZDAAoJEDfy90M199hlIUsQAKcYeOXzzrlZjCeDSdIdx8YM lAZ2kxjPwy5tLwm/WL1HwAM0epSBIMneF0JZuysFn77+bvbXId6LWtxkFKNakypP tM5Ep0BjUFXgvzM3DVL0u9dkvrgcKlnwILsBjiWXqUVd/x8o2VAqfSOwfKG2SNQU zo0l6u7ZeKY+gkFDsKU7eCo01dEI7MjGelKynRNE8TIpAJszNMsPpA4Y4WVRjbDT GvA4O7f1Cws8Ewszle4gRWjx6TZ0mJVlt/FriFgtoRepoJjEalb5nfGLulx47Veq iMFLbZr8xxc5u/ncR8bgi3Vc9g8H3eZsV6S85JeGuBS6cJOuw561gH5LkyEi5oeW NHjZgf3oUnMZg1JE5cCGOyTQ/E/63itTor767f6KpaP/oEXckSCIUTh7azvD89sm FD2udE3UWgC+16U11Ru+3TrRz11ETodF4TeW674CW3zA39dYjj3Us/PG6SDbM6zm INO+pBvDIdhVaYvqy1yRqGMzgoQyCIAuI6sP+Q6CYqOd1Fdl42zPKGZ3F2SLEjq5 RfTKywrE1VHv1eafa6lCs5yaR7Jzr5FyRKFw9ob+TixN+x7vf6Gwcb9hkGzs9t2J 7NIiZ++U02fgVVVJGJ2VZ24gZVytP6sahq6Z6ccWacwic7lL7Us0mWgOKLzU1Umj NLwHJoDkgV7SgGWVMsE/ =aRTA -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink: add polymorphic functions.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/06/2015 12:39 AM, Michael Paquier wrote: Yeah, particularly the use of first_optarg makes things harder to follow in the code with this patch. A C wrapper has the disadvantage to decentralize the argument checks to many places making the flow harder to follow hence using get_fn_expr_argtype() with PG_NARGS would be the way to go, at least to me. This way, you can easily find how many arguments there are, and which value is assigned to which variable before moving on to the real processing. Just to be clear I mean that: if (PG_NARGS() == 5) { if (get_fn_expr_argtype(fcinfo-flinfo, 1) == TYPEOID) var = PG_GETARG_BOOL(1) [...] Actually, I had in mind something like: 8- inti; intnumargs; int *argtypes; numargs = PG_NARGS(); argtypes = palloc(numargs * sizeof(int)); for (i = 0; i numargs; i++) argtypes[i] = get_fn_expr_argtype(fcinfo-flinfo, i); if ((numargs == 4 || numargs == 5) argtypes[0] == TEXTOID argtypes[1] == TEXTOID argtypes[2] == INT4OID argtypes[3] == BOOLOID) { [...] } else if ((numargs == 3 || numargs == 4) argtypes[0] == TEXTOID argtypes[1] == INT4OID argtypes[2] == BOOLOID) { [...] 8- etc. If the highest number of arguments is always checked first, the pattern is not ambiguous even with the extra anyelement. - -- Joe Conway -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJVmovZAAoJEDfy90M199hlzDsP/j/v76Ecv3VtZ+02Lo9mUVp6 mf+ZTLgw2OE4sNI6PL2MeVzu5ayZAkHxPRAah9rR1A7nDNEyZovi3dymFRHeiJ6Z rowjkdZiRX/xlV5s6RHrdmX6DkVkBeGcKMuIKB/Tud2uPCiuBULkcUuD1OvlxsCs W0E+hsuYmpGtsH8Vth+ciKiBUDX/BVWCOnqZXISRf6BZ5BjzITOEuCyn4EChx2o4 9gOGPTf3P/4I3buuDuV+DEmO1N4L07VvSWb9e92NrdS3VI1ae8YJu3u248WVmdPY +qHg0J7jLGYBFRZ+isC7p8OX7PANCm88GvMCqklZdPo+/76n4J6x5MJfjinQEfXN rbScwRh3O1DCimw404WqYSGKGEcX7MtUt98h+//nMft3aEz1gRnsuopnM/eRmQcS wYlbBYon5YEdamx2o5AP6NX5zFU+6HBcKcznCFQtcsaJeh03yz9zILuCWxg4dWNj T5aVCYu58PN4ELKP3yF5wN2UUSE4/OZJHgvIrHF8LiDOSdygvfADdUAmfD0sry48 tjbL9GC7JwHxqQ8qYktDMogxZo+s3TBsmw3NsnYXrNYwObbGCP9J0K0zV76ukMEc V1qiMXD4/gddkKXJz0cVfcActrDqEltKDPTCdhLpoc4Gb59mrohgk+j75f2af14V +n/AvaymgwdKjQpBhaTb =tESF -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink: add polymorphic functions.
On Mon, Jul 6, 2015 at 10:08 AM, Joe Conway m...@joeconway.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/06/2015 12:39 AM, Michael Paquier wrote: Yeah, particularly the use of first_optarg makes things harder to follow in the code with this patch. A C wrapper has the disadvantage to decentralize the argument checks to many places making the flow harder to follow hence using get_fn_expr_argtype() with PG_NARGS would be the way to go, at least to me. This way, you can easily find how many arguments there are, and which value is assigned to which variable before moving on to the real processing. Just to be clear I mean that: if (PG_NARGS() == 5) { if (get_fn_expr_argtype(fcinfo-flinfo, 1) == TYPEOID) var = PG_GETARG_BOOL(1) [...] Actually, I had in mind something like: 8- inti; intnumargs; int *argtypes; numargs = PG_NARGS(); argtypes = palloc(numargs * sizeof(int)); for (i = 0; i numargs; i++) argtypes[i] = get_fn_expr_argtype(fcinfo-flinfo, i); if ((numargs == 4 || numargs == 5) argtypes[0] == TEXTOID argtypes[1] == TEXTOID argtypes[2] == INT4OID argtypes[3] == BOOLOID) { [...] } else if ((numargs == 3 || numargs == 4) argtypes[0] == TEXTOID argtypes[1] == INT4OID argtypes[2] == BOOLOID) { [...] 8- etc. If the highest number of arguments is always checked first, the pattern is not ambiguous even with the extra anyelement. Utterly reasonable and do-able. I'll get to work on that soonish, pending resolution of other's concerns.
Re: [HACKERS] dblink: add polymorphic functions.
On Mon, Jul 6, 2015 at 9:52 AM, Joe Conway m...@joeconway.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 07/06/2015 07:37 AM, Merlin Moncure wrote: yup, and at least one case now fails where previously it ran through: postgres=# select * from dblink('a', 'b', 'c'); ERROR: function dblink(unknown, unknown, unknown) is not unique Hmm, that is an issue, possibly a fatal one. When Cory first mentioned this to me over a year ago we discussed some other, arguably better and more generic solutions. One was to build support for: SELECT * FROM srf() AS TYPE OF(foo); The second idea I think is actually SQL standard if I recall correctly: SELECT * FROM CAST(srf() AS foo) x; Currently this works: 8 select * from cast (row(11,'l','{a11,b11,c11}') as foo); f1 | f2 | f3 - ++--- 11 | l | {a11,b11,c11} (1 row) 8 But this fails: 8 select * from cast (dblink('dbname=contrib_regression','select * from foo') as foo); ERROR: cannot cast type record to foo 8 Comments in the source have this to say: 8 /* * coerce_record_to_complex * Coerce a RECORD to a specific composite type. * * Currently we only support this for inputs that are RowExprs or * whole-row Vars. */ 8 That explains why the first example works while the second does not. I'm not sure how hard it would be to fix that, but it appears that that is where we should focus. Yeah. FWIW, here's my 0.02$: I use dblink all the time, for all kinds of reasons, vastly preferring to have control over the query string (vs. FDW type solutions). I have two basic gripes with it. #1 is that remote queries are not cancelled over all call sites when cancelled locally (off-topic for this thread) and #2 is that the SRF record describing mechanics are not abstractable because of using syntax to describe the record. Corey's proposal, overloading issues aside, appears to neatly deal with this problem because anyelement can be passed down through a wrapping API. IOW, I'd like to do: CREATE FUNCTION remote_call(...) RETURNS ... AS $$ SELECT dblink(...) AS r(...) $$ language sql; ...which can't be done (discounting dynamic sql acrobatics) because of the syntax based expression of the 'r' record. So I like Corey's idea...I just think the functions need to be named differently (maybe to 'dblink_any', and dblink_get_result_any'?). TBH, to do better than that you'd need SQL level support for handling the return type in the vein of NEW/OLD. For fun, let's call it 'OUT'...then you could: SELECT * FROM remote_call(...) RETURNS SETOF foo; Inside remote_call, you'd see something like: SELECT dblink(...) AS OUT; As to the proposed syntax, I would vote to support the SQL standard variant if it could be handled during parse. I don't see what AS TYPE OF really buys you because FWICT it does not support wrapping. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in bttext_abbrev_convert()
Peter Geoghegan wrote: It would be nice to always have a html report from gcov always available on the internet. That would be something useful to automate, IMV. http://pgci.eisentraut.org/jenkins/job/postgresql_master_coverage/Coverage/ -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink: add polymorphic functions.
On Mon, Jul 6, 2015 at 9:35 AM, Tom Lane t...@sss.pgh.pa.us wrote: Joe Conway m...@joeconway.com writes: Actually, I had in mind something like: 8- inti; intnumargs; int *argtypes; numargs = PG_NARGS(); argtypes = palloc(numargs * sizeof(int)); for (i = 0; i numargs; i++) argtypes[i] = get_fn_expr_argtype(fcinfo-flinfo, i); if ((numargs == 4 || numargs == 5) argtypes[0] == TEXTOID argtypes[1] == TEXTOID argtypes[2] == INT4OID argtypes[3] == BOOLOID) { [...] } else if ((numargs == 3 || numargs == 4) argtypes[0] == TEXTOID argtypes[1] == INT4OID argtypes[2] == BOOLOID) { [...] 8- etc. If the set of allowed argument-type combinations is so easily enumerable, I don't understand why this is being done at all. Create a separate SQL function for each combination. You can still let the called C functions call a common implementation routine if that's helpful. However, this might all be moot in view of Merlin's objection. It is definitely completely uncool to have both of these: public | dblink | SETOF anyelement | text, anyelement| normal public | dblink | SETOF record | text, boolean | normal It's quite unclear which one will get called for cases like, say, second argument is a domain over boolean. And even if the second arg is just a boolean, maybe the user wanted the first case --- how will he get that behavior, if so? These need to have different names, and that might well help resolve the implementation-level issue... yup, and at least one case now fails where previously it ran through: postgres=# select * from dblink('a', 'b', 'c'); ERROR: function dblink(unknown, unknown, unknown) is not unique merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freeze avoidance of very large table.
On Fri, Jul 3, 2015 at 5:25 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Fri, Jul 3, 2015 at 1:23 AM, Simon Riggs si...@2ndquadrant.com wrote: On 2 July 2015 at 16:30, Sawada Masahiko sawada.m...@gmail.com wrote: Also, the flags of each heap page header might be set PD_ALL_FROZEN, as well as all-visible Is it possible to have VM bits set to frozen but not visible? The description makes those two states sound independent of each other. Are they? Or not? Do we test for an impossible state? It's impossible to have VM bits set to frozen but not visible. These bit are controlled independently. But eventually, when all-frozen bit is set, all-visible is also set. Attached latest version including some bug fix. Please review it. Regards, -- Sawada Masahiko 000_add_frozen_bit_into_visibilitymap_v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dblink: add polymorphic functions.
Joe Conway m...@joeconway.com writes: Actually, I had in mind something like: 8- inti; intnumargs; int *argtypes; numargs = PG_NARGS(); argtypes = palloc(numargs * sizeof(int)); for (i = 0; i numargs; i++) argtypes[i] = get_fn_expr_argtype(fcinfo-flinfo, i); if ((numargs == 4 || numargs == 5) argtypes[0] == TEXTOID argtypes[1] == TEXTOID argtypes[2] == INT4OID argtypes[3] == BOOLOID) { [...] } else if ((numargs == 3 || numargs == 4) argtypes[0] == TEXTOID argtypes[1] == INT4OID argtypes[2] == BOOLOID) { [...] 8- etc. If the set of allowed argument-type combinations is so easily enumerable, I don't understand why this is being done at all. Create a separate SQL function for each combination. You can still let the called C functions call a common implementation routine if that's helpful. However, this might all be moot in view of Merlin's objection. It is definitely completely uncool to have both of these: public | dblink | SETOF anyelement | text, anyelement| normal public | dblink | SETOF record | text, boolean | normal It's quite unclear which one will get called for cases like, say, second argument is a domain over boolean. And even if the second arg is just a boolean, maybe the user wanted the first case --- how will he get that behavior, if so? These need to have different names, and that might well help resolve the implementation-level issue... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] more RLS oversights
Noah, * Noah Misch (n...@leadboat.com) wrote: On Wed, Feb 25, 2015 at 11:37:24PM -0500, Stephen Frost wrote: * Stephen Frost (sfr...@snowman.net) wrote: I agree that it's great that we're catching issues prior to when the feature is released and look forward to anything else you (or anyone else!) finds. I've pushed a fix for this. Please let me know if you see any other issues or run into any problems. (1) CreatePolicy() and AlterPolicy() omit to call assign_expr_collations() on the node trees. Test case: Will fix. (2) CreatePolicy() and AlterPolicy() omit to create a pg_shdepend entry for each role in the TO clause. Test case: Will fix. +static void +dumpPolicy(Archive *fout, PolicyInfo *polinfo) ... + if (polinfo-polqual != NULL) + appendPQExpBuffer(query, USING %s, polinfo-polqual); (3) The USING clause needs parentheses; a dump+reload failed like so: Will fix. Add the same parentheses to psql \d output also, keeping that output similar to the SQL syntax. Yup. (3a) I found this by hacking the rowsecurity.sql regression test to not drop its objects, then running the pg_upgrade test suite. New features that affect pg_dump should leave objects in the regression database to test the pg_dump support via that suite. Will fix. (4) When DefineQueryRewrite() is about to convert a table to a view, it checks the table for features unavailable to views. For example, it rejects tables having triggers. It omits to reject tables having relrowsecurity or a pg_policy record. Test case: Will fix. + para + Referential integrity checks, such as unique or primary key constraints + and foreign key references, will bypass row security to ensure that + data integrity is maintained. Care must be taken when developing + schemas and row level policies to avoid a covert channel leak of + information through these referential integrity checks. ... + /* +* Row-level security should be disabled in the case where a foreign-key +* relation is queried to check existence of tuples that references the +* primary-key being modified. +*/ + temp_sec_context = save_sec_context | SECURITY_LOCAL_USERID_CHANGE; + if (qkey-constr_queryno == RI_PLAN_CHECK_LOOKUPPK + || qkey-constr_queryno == RI_PLAN_CHECK_LOOKUPPK_FROM_PK + || qkey-constr_queryno == RI_PLAN_RESTRICT_DEL_CHECKREF + || qkey-constr_queryno == RI_PLAN_RESTRICT_UPD_CHECKREF) + temp_sec_context |= SECURITY_ROW_LEVEL_DISABLED; (5) This code does not use SECURITY_ROW_LEVEL_DISABLED for foreign keys with CASCADE, SET NULL or SET DEFAULT actions. The associated documentation says nothing about this presumably-important distinction. Agreed, will fix. Is SECURITY_ROW_LEVEL_DISABLED mode even needed? We switch users to the owner of the FROM-clause table before running an RI query. That means use of this mode can only matter under row_security=force, right? I would rest easier if this mode went away, because it is a security landmine. If user code managed to run in this mode, it would bypass every policy in the database. (I find no such vulnerability today, because we use the mode only for parse analysis of ri_triggers.c queries.) That's a very interesting point.. At first blush, I agree, it shouldn't be necessary. I'll play with it and see if I can get everything to work properly without it. (6) AlterPolicy() calls InvokeObjectPostAlterHook(PolicyRelationId, ...), but CreatePolicy() and DropPolicy() lack their respective hook invocations. Will fix. (7) Using an aggregate function in a policy predicate elicits an inapposite error message due to use of EXPR_KIND_WHERE for parse analysis. Need a new ParseExprKind. Test case: Will fix. Thanks a lot for your review! Much appreciated. Stephen signature.asc Description: Digital signature
Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive
On Fri, Jun 26, 2015 at 12:39 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jun 25, 2015 at 9:23 AM, Peter Eisentraut pete...@gmx.net wrote: On 6/22/15 1:37 PM, Robert Haas wrote: Currently, the only time we report a process as waiting is when it is waiting for a heavyweight lock. I'd like to make that somewhat more fine-grained, by reporting the type of heavyweight lock it's awaiting (relation, relation extension, transaction, etc.). Also, I'd like to report when we're waiting for a lwlock, and report either the specific fixed lwlock for which we are waiting, or else the type of lock (lock manager lock, buffer content lock, etc.) for locks of which there is more than one. I'm less sure about this next part, but I think we might also want to report ourselves as waiting when we are doing an OS read or an OS write, because it's pretty common for people to think that a PostgreSQL bug is to blame when in fact it's the operating system that isn't servicing our I/O requests very quickly. Could that also cover waiting on network? Possibly. My approach requires that the number of wait states be kept relatively small, ideally fitting in a single byte. And it also requires that we insert pgstat_report_waiting() calls around the thing that is notionally blocking. So, if there are a small number of places in the code where we do network I/O, we could stick those calls around those places, and this would work just fine. But if a foreign data wrapper, or any other piece of code, does network I/O - or any other blocking operation - without calling pgstat_report_waiting(), we just won't know about it. Probably Itagaki-san's very similar proposal and patch would be useful to consider what wait events to track. http://www.postgresql.org/message-id/20090309125146.913c.52131...@oss.ntt.co.jp According to his patch, the wait events that he was thinking to add were: + typedef enum PgCondition + { + PGCOND_UNUSED= 0,/* unused */ + + /* 1 - CPU */ + PGCOND_CPU= 1,/* generic cpu operations */ + /* 11000 - CPU:PARSE */ + PGCOND_CPU_PARSE= 11000,/* pg_parse_query */ + PGCOND_CPU_PARSE_ANALYZE= 11100,/* parse_analyze */ + /* 12000 - CPU:REWRITE */ + PGCOND_CPU_REWRITE= 12000,/* pg_rewrite_query */ + /* 13000 - CPU:PLAN */ + PGCOND_CPU_PLAN= 13000,/* pg_plan_query */ + /* 14000 - CPU:EXECUTE */ + PGCOND_CPU_EXECUTE= 14000,/* PortalRun or PortalRunMulti */ + PGCOND_CPU_TRIGGER= 14100,/* ExecCallTriggerFunc */ + PGCOND_CPU_SORT= 14200,/* (generic sort operation) */ + PGCOND_CPU_SORT_HEAP= 14210,/* tuplesort_begin_heap */ + PGCOND_CPU_SORT_INDEX= 14220,/* tuplesort_begin_index_btree */ + PGCOND_CPU_SORT_DATUM= 14230,/* tuplesort_begin_datum */ + /* 15000 - CPU:UTILITY */ + PGCOND_CPU_UTILITY= 15000,/* ProcessUtility */ + PGCOND_CPU_COMMIT= 15100,/* CommitTransaction */ + PGCOND_CPU_ROLLBACK= 15200,/* AbortTransaction */ + /* 16000 - CPU:TEXT */ + PGCOND_CPU_TEXT= 16000,/* (generic text operation) */ + PGCOND_CPU_DECODE= 16100,/* pg_client_to_server */ + PGCOND_CPU_ENCODE= 16200,/* pg_server_to_client */ + PGCOND_CPU_LIKE= 16310,/* GenericMatchText */ + PGCOND_CPU_ILIKE= 16320,/* Generic_Text_IC_like */ + PGCOND_CPU_RE= 16400,/* (generic regexp operation) */ + PGCOND_CPU_RE_COMPILE= 16410,/* RE_compile_and_cache */ + PGCOND_CPU_RE_EXECUTE= 16420,/* RE_execute */ + + /* 2 - NETWORK */ + PGCOND_NETWORK= 2,/* (generic network operation) */ + PGCOND_NETWORK_RECV= 21000,/* secure_read */ + PGCOND_NETWORK_SEND= 22000,/* secure_write */ + + /* 3 - IDLE (should be larger than network to distinguish idle or recv) */ + PGCOND_IDLE= 3,/* IDLE */ + PGCOND_IDLE_IN_TRANSACTION= 31000,/* IDLE in transaction */ + PGCOND_IDLE_SLEEP= 32000,/* pg_usleep */ + + /* 4 - XLOG */ + PGCOND_XLOG= 4,/* (generic xlog operation) */ + PGCOND_XLOG_CRC= 41000,/* crc calculation in XLogInsert */ + PGCOND_XLOG_INSERT= 42000,/* insert in XLogInsert */ + PGCOND_XLOG_OPEN= 43000,/* XLogFileOpen */ + PGCOND_XLOG_CLOSE= 44000,/* XLogFileClose */ + PGCOND_XLOG_WRITE= 45000,/* write in XLogWrite */ + PGCOND_XLOG_FLUSH= 46000,/* issue_xlog_fsync */ + + /* 5 - DATA */ + PGCOND_DATA= 5,/* (generic