date:20150706

On Mon, Jul 6, 2015 at 4:23 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Mon, Jul 6, 2015 at 10:00 AM, Joe Conway m...@joeconway.com wrote:
 I wonder if it isn't better to just loop through all the args with
 get_fn_expr_argtype() every time and then test for the exact signature
 match? Another alternative might be to create a wrapper C function for
 each variant SQL function, but that seems like it could also get
 messy, especially with dblink_record_internal() since we would have to
 deal with every variant of all the external facing callers.
 Thoughts?

 Yeah, particularly the use of first_optarg makes things harder to
 follow in the code with this patch. A C wrapper has the disadvantage
 to decentralize the argument checks to many places making the flow
 harder to follow hence using get_fn_expr_argtype() with PG_NARGS would
 be the way to go, at least to me. This way, you can easily find how
 many arguments there are, and which value is assigned to which
 variable before moving on to the real processing.

Just to be clear I mean that:
if (PG_NARGS() == 5)
{
 if (get_fn_expr_argtype(fcinfo-flinfo, 1) == TYPEOID)
  var = PG_GETARG_BOOL(1)
 [...]
}
else if (PG_NARGS() == 4)
{
[...]
}
else
[...]
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_*_columns?

2015-07-06 Thread Joel Jacobson

On Mon, Jun 29, 2015 at 11:14 PM, Jim Nasby jim.na...@bluetreble.com
wrote:

 What might be interesting is setting things up so the collector simply
 inserted into history tables every X seconds and then had a separate
 process to prune that data. The big problem with that is I see no way for
 that to easily allow access to real-time data (which is certainly necessary
 sometimes)


I think the idea sounds promising. If near real-time data is required, we
could just update once every second, which should be often enough for
everybody.

Each backend process could then simply INSERT the stats for each txn that
committed/rollbacked into an UNLOGGED table, and then the collector would
do one single UPDATE of the collector stats based on the aggregate of the
rows inserted since the previous update a second ago and then delete the
processed rows (naturally in one operation, using DELETE FROM .. RETURNING
*).

That way we could get rid of the legacy communication protocol between the
backends and the collector and instead rely on unlogged tables for the
submission of data from the backends to the collector.

INSERTing 100 000 rows to an unlogged table takes 70 ms on my laptop, so
should be fast enough to handle the 10s of thousands of updates per second
we need to handle.

Re: [HACKERS] dblink: add polymorphic functions.

On Thu, Feb 19, 2015 at 4:06 PM, Corey Huinker corey.huin...@gmail.com wrote:
 In the course of writing a small side project which hopefully will make its
 way onto pgxn soon, I was writing functions that had a polymorphic result
 set.

 create function foo( p_row_type anyelement, p_param1 ...) returns setof
 anyelement

 Inside that function would be multiple calls to dblink() in both synchronous
 and async modes. It is a requirement of the function that each query return
 a result set conforming to the structure passed into p_row_type, but there
 was no way for me to express that.

 Unfortunately, there's no way to say


 select * from dblink_get_result('connname') as polymorphic record type;


 Instead, I had to generate the query as a string like this.

 with x as (
 select  a.attname || ' ' || pg_catalog.format_type(a.atttypid,
 a.atttypmod) as sql_text
 frompg_catalog.pg_attribute a
 where   a.attrelid = pg_typeof(p_row_type)::text::regclass
 and a.attisdropped is false
 and a.attnum  0
 order by a.attnum )
 select  format('select * from dblink_get_result($1) as
 t(%s)',string_agg(x.sql_text,','))
 fromx;

 Moreover, I'm now executing this string dynamically, incurring reparsing and
 replanning each time (and if all goes well, this would be executed many
 times). Granted, I could avoid that by rewriting the stored procedure in C
 and using prepared statements (not available in PL/PGSQL), but it seemed a
 shame that dblink couldn't itself handle this polymorphism.

 So with a little help, we were able to come up with polymorphic set
 returning dblink functions.

 Below is the results of the patch applied to a stock 9.4 installation.

 [local]:ubuntu@dblink_test# create extension dblink;
 CREATE EXTENSION
 Time: 12.778 ms
 [local]:ubuntu@dblink_test# \df dblink
List of functions
  Schema |  Name  | Result data type |   Argument data types   |
 Type
 ++--+-+
  public | dblink | SETOF record | text|
 normal
  public | dblink | SETOF anyelement | text, anyelement|
 normal
  public | dblink | SETOF record | text, boolean   |
 normal
  public | dblink | SETOF anyelement | text, boolean, anyelement   |
 normal
  public | dblink | SETOF record | text, text  |
 normal
  public | dblink | SETOF anyelement | text, text, anyelement  |
 normal
  public | dblink | SETOF record | text, text, boolean |
 normal
  public | dblink | SETOF anyelement | text, text, boolean, anyelement |
 normal
 (8 rows)

sorry for the late reply.  I'm a little concerned about the state of
overloading here.  If I'm not mistaken, you may have introduced a
pretty serious backwards compatibility issue.  Having the two
signatures (text, anyelement) and (text, boolean) will now fail
anytime (unknown, unknown) is passed, and that's a pretty common
invocation.  If I'm right, quickly scanning the function list, I don't
think there's an easy solution to this issue other than adding an
alternately named call.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012

Heikki Linnakangas hlinn...@iki.fi writes:
 I was able to reproduce huge gains from this, after I introduced an 
 artificial latency to all network packets with:

 tc qdisc add dev eth2 root netem delay 100ms

 With that, and with the client on different host so that the traffic 
 goes through that high-latency network, I saw over 10x difference with 
 the same psql test you ran.

 Committed, thanks!

Why does the committed patch do getsockopt() on server_fd rather than
port-sock?  This seems at best confusing and at worst wrong.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012


On 07/04/2015 02:03 PM, chenhj wrote:

2015-07-03 16:49:44,David Rowley david.row...@2ndquadrant.com wrote：


I'm wondering what the original test setup was. I'm assuming psql
and  postgres both running on separate windows machines?



I've tested the patch just connecting to a database running on
localhost and I'm not getting much of a speedup. Perhaps 1%, if
that's not noise. I don't have enough hardware here to have client
and server on separate machines, at least not with a stable network
that goes through copper.


My original test environments is as the following

Environment1:
Server:Windows 2012(x64)
   The host is a VM in a private cloud
Client:RHEL6(x64)
  The host is another VM in the same private cloud
Network:1Gbit LAN


Environment2:
Server:Windows 2012(x64)
   The host is a VM in a private cloud
Client:Windows 7(x64)
   The host is a physical machine(in fact it is My PC).
Network:1Gbit LAN


This Patch should only can speedup the environment which satisfy the following 
conditions.
1. The OS of the server is Windows 2012 or Win8(but i only tested it in Windows 
2012).
2. The client and the server is separate machines.
3. The performance bottleneck is network throughput.
4. The utilization rate of network bandwidth is not full(such as only 50% or 
lower).


I was able to reproduce huge gains from this, after I introduced an 
artificial latency to all network packets with:


tc qdisc add dev eth2 root netem delay 100ms

With that, and with the client on different host so that the traffic 
goes through that high-latency network, I saw over 10x difference with 
the same psql test you ran.


Committed, thanks!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_*_columns?

On 2015-06-29 16:14:34 -0500, Jim Nasby wrote:
 What might be interesting is setting things up so the collector simply
 inserted into history tables every X seconds and then had a separate process
 to prune that data. The big problem with that is I see no way for that to
 easily allow access to real-time data (which is certainly necessary
 sometimes).

Uh. Hot-Standby?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea: closing the loop for pg_ctl reload

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Jan de Visser j...@de-visser.net writes:
  Attached a new patch, rebased against the current head. Errors in
  pg_hba.conf and pg_ident.conf are now also noticed.
 
  I checked the documentation for pg_ctl reload, and the only place where
  it's explained seems to be runtime.sgml and that description is so
  high-level that adding this new bit of functionality wouldn't make much
  sense.
 
 BTW, it's probably worth pointing out that the recent work on the
 pg_file_settings view has taken away a large part of the use-case for
 this, in that you can find out more with less risk by inspecting
 pg_file_settings before issuing SIGHUP, instead of SIGHUP'ing and
 hoping you didn't break anything too nastily.  Also, you can use
 pg_file_settings remotely, unlike pg_ctl (though admittedly you
 still need contrib/adminpack or something to allow uploading a
 new config file if you're doing remote admin).
 
 I wonder whether we should consider inventing similar views for
 pg_hba.conf and pg_ident.conf.

Yes.  That's definitely something that I'd been hoping someone would
work on.

Also, thanks for the work on the pg_file_settings code; I agree with all
you did there.

Thanks again!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_stat_*_columns?

Joel Jacobson j...@trustly.com writes:
 On Mon, Jun 29, 2015 at 11:14 PM, Jim Nasby jim.na...@bluetreble.com
 wrote:
 What might be interesting is setting things up so the collector simply
 inserted into history tables every X seconds and then had a separate
 process to prune that data. The big problem with that is I see no way for
 that to easily allow access to real-time data (which is certainly necessary
 sometimes)

 I think the idea sounds promising. If near real-time data is required, we
 could just update once every second, which should be often enough for
 everybody.

I'd bet a good lunch that performance will be absolutely disastrous.
Even with unlogged tables, the vacuuming cost would be intolerable.  Why
would we insist on pushing the envelope in what's known to be Postgres'
weakest area performance-wise?

 Each backend process could then simply INSERT the stats for each txn that
 committed/rollbacked into an UNLOGGED table,

... and if its transaction failed, how would it do that?

Regular tables are *not* what we want here, either from a semantics or
a performance standpoint.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-06 Thread Fujii Masao

On Sat, Jul 4, 2015 at 2:26 AM, Andres Freund and...@anarazel.de wrote:
 On 2015-07-03 19:02:29 +0200, Andres Freund wrote:
 Maybe I'm just daft right now (35C outside, 32 inside, so ...), but I'm
 right now missing how the whole skip wal logging if relation has just
 been truncated optimization can ever actually be crashsafe unless we
 use a new relfilenode (which we don't!).

Agreed... When I ran the following test scenario, I found that
the loaded data disappeared after the crash recovery.

1. start PostgreSQL server with wal_level = minimal
2. execute the following SQL statements
\copy (SELECT num FROM generate_series(1,10) num) to /tmp/num.csv with csv
BEGIN;
CREATE TABLE test (i int primary key);
TRUNCATE TABLE test;
\copy test from /tmp/num.csv with csv
COMMIT;
SELECT COUNT(*) FROM test;  -- returns 10

3. shutdown the server with immediate mode
4. restart the server
5. execute the following SQL statement after crash recovery ends
SELECT COUNT(*) FROM test;  -- returns 0..

In #2, 10 rows were copied and the transaction was committed.
The subsequent statement of select count(*) obviously returned 10.
However, after crash recovery, in #5, the same statement returned 0.
That is, the loaded (+ committed) 10 data was lost after the crash.

 We actually used to use a different relfilenode, but optimized that
 away: cab9a0656c36739f59277b34fea8ab9438395869

 commit cab9a0656c36739f59277b34fea8ab9438395869
 Author: Tom Lane t...@sss.pgh.pa.us
 Date:   Sun Aug 23 19:23:41 2009 +

 Make TRUNCATE do truncate-in-place when processing a relation that was 
 created
 or previously truncated in the current (sub)transaction.  This is safe 
 since
 if the (sub)transaction later rolls back, we'd just discard the rel's 
 current
 physical file anyway.  This avoids unreasonable growth in the number of
 transient files when a relation is repeatedly truncated.  Per a 
 performance
 gripe a couple weeks ago from Todd Cook.

 to me the reasoning here looks flawed.

Before this commit, when I ran the above test scenario, no data loss happened.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea: closing the loop for pg_ctl reload

2015-07-06 Thread Jan de Visser

On July 6, 2015 09:23:12 AM Stephen Frost wrote:
  I wonder whether we should consider inventing similar views for
  pg_hba.conf and pg_ident.conf.
 
 Yes.  That's definitely something that I'd been hoping someone would
 work on.

There actually is a patch in the current CF that provides a view for pg_hba. I 
could look at one for pg_ident, which seems much simpler.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH v1] GSSAPI encryption support

Robbie,

* Robbie Harwood (rharw...@redhat.com) wrote:
 As previously discussed on this list, I have coded up GSSAPI encryption
 support.  If it is easier for anyone, this code is also available for
 viewing on my github:
 https://github.com/postgres/postgres/compare/master...frozencemetery:feature/gssencrypt
 
 Fallback support is present in both directions for talking to old client
 and old servers; GSSAPI encryption is by default auto-upgraded to where
 available (for compatibility), but both client and server contain
 settings for requiring it.
 
 There are 8 commits in this series; I have tried to err on the side of
 creating too much separation rather than too little.  A patch for each
 is attached.  This is v1 of the series.

Excellent, thanks!  I've got other things to tend to at the moment, but
this is definitely something I'm interested in and will work on (likely
in August).

If we could get a reviewer or two to take a look and take the patch out
for a test-drive, at least, that would be a huge help.

Thanks again!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Information of pg_stat_ssl visible to all users

* Magnus Hagander (mag...@hagander.net) wrote:
 On Thu, Jul 2, 2015 at 10:06 PM, Andres Freund and...@anarazel.de wrote:
 
  On 2015-07-02 16:52:01 -0300, Alvaro Herrera wrote:
   If there's interest in closing these holes, this might be a first
 
  I don't think such an isolated attempt buys us anything except maybe
  unsatisfied users.
 
  I can see a benefit in allowing to restrict information about users and
  such in other clusters, but changing stat_ssl seeems to be an
  inconsequentially small problem on that path.
 
 
 We discussed earlier having a monitoring role or attribute or something
 like that, and I think this would be another case of that. We definitely
 want to go towards something like that, but that's not happening in 9.5...

Agreed, but if we make this visible to all in 9.5 then we're going to
have a tough time restricting it to just the monitoring role in 9.6, I'm
afraid...

We realize it's a problem, for my 2c, I'd rather not double-down on it
by providing more information which should really be limited to
privileged users.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] PATCH:do not set Win32 server-side socket buffer size on windows 2012


On 07/06/2015 04:30 PM, Tom Lane wrote:

Heikki Linnakangas hlinn...@iki.fi writes:

I was able to reproduce huge gains from this, after I introduced an
artificial latency to all network packets with:



tc qdisc add dev eth2 root netem delay 100ms



With that, and with the client on different host so that the traffic
goes through that high-latency network, I saw over 10x difference with
the same psql test you ran.



Committed, thanks!


Why does the committed patch do getsockopt() on server_fd rather than
port-sock?  This seems at best confusing and at worst wrong.


Good catch, fixed!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idea: closing the loop for pg_ctl reload

* Jan de Visser (j...@de-visser.net) wrote:
 On July 6, 2015 09:23:12 AM Stephen Frost wrote:
   I wonder whether we should consider inventing similar views for
   pg_hba.conf and pg_ident.conf.
  
  Yes.  That's definitely something that I'd been hoping someone would
  work on.
 
 There actually is a patch in the current CF that provides a view for pg_hba. 
 I 
 could look at one for pg_ident, which seems much simpler.

That would be good, as would a review of the patch for pg_hba with
particular consideration of what's been done with the pg_file_settings
view.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Inconsistent style in pgbench's error messages

Fabien COELHO coe...@cri.ensmp.fr writes:
 Personally I think pgbench:  adds very little and so I'd rather get 
 rid of it, but I'm sure others feel differently.

 I think that one of the reason for this is that once pgbench started to 
 run it is desirable to differentiate error messages that come from libpq 
 and those that come from pgbench itsef.

Don't really see the issue.  Errors reported by the server are easy to
distinguish by formatting (as long as we don't go out of our way to make
pgbench's errors look like them, which is one reason I do not like the
HINT proposal in the bug #12379 thread).  Errors reported by libpq
itself might look like they came from pgbench, but I don't exactly see
what's wrong with that.  I don't think users much care about the
difference, as long as they can tell server-side errors apart from
client-side errors.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Missing extension locks in the nbtree code

Hi,

There've recently been more and more reports of unexpected data beyond
EOF in block %u of relation %s for me to think that it's likely to be
caused by a kernel bug. It's now been reproduced at least on somewhat
recent linux and freebsd versions.

So I started looking around for causes. Not for the first time.

One, probably harmless thing is that _bt_getroot() creates the initial
root page without an extension lock. That's not pretty, but it should
happen on the first write and be safe due to the content lock on the
metapage.  ISTM we should still not do that, but it's probably not the
explanation.

The fix is just to change
if (fd == -1 || XLByteInSeg(change-lsn, curOpenSegNo))
into
if (fd == -1 || !XLByteInSeg(change-lsn, curOpenSegNo))

the bug doesn't have any correctness implications afaics, just
performance. We read all the spilled files till the end, so even change
spilled to the wrong segment gets picked up.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Missing extension locks in the nbtree code

On 2015-07-06 23:21:12 +0200, Andres Freund wrote:
 There've recently been more and more reports of unexpected data beyond
 EOF in block %u of relation %s for me to think that it's likely to be
 caused by a kernel bug. It's now been reproduced at least on somewhat
 recent linux and freebsd versions.
 
 So I started looking around for causes. Not for the first time.
 
 One, probably harmless thing is that _bt_getroot() creates the initial
 root page without an extension lock. That's not pretty, but it should
 happen on the first write and be safe due to the content lock on the
 metapage.  ISTM we should still not do that, but it's probably not the
 explanation.

Uh, this was a mixup, I didn't want to send this email yet. The below
obviously is from a different thread.

 The fix is just to change
   if (fd == -1 || XLByteInSeg(change-lsn, curOpenSegNo))
 into
   if (fd == -1 || !XLByteInSeg(change-lsn, curOpenSegNo))
 
 the bug doesn't have any correctness implications afaics, just
 performance. We read all the spilled files till the end, so even change
 spilled to the wrong segment gets picked up.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix broken Install.bat when target directory contains a space

On Tue, Jul 7, 2015 at 4:19 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 On 07/06/2015 04:38 AM, Michael Paquier wrote:

 On Sat, Jul 4, 2015 at 3:57 AM, Heikki Linnakangas wrote:

 Hmm. Why is install.bat not like build.bat, i.e. just a thin wrapper that
 just calls install.pl, passing all arguments?


 I guess we just haven't noticed it. And indeed it makes everything
 more simple, and fixes as well the error reported when install path
 contains a space.


 Ok, committed that way.

Shoudn't this patch be backpatched? In the backbranches install.bat
does not work correctly with paths containing spaces.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dblink: add polymorphic functions.

On Mon, Jul 6, 2015 at 11:13 AM, Corey Huinker corey.huin...@gmail.com wrote:
 On Mon, Jul 6, 2015 at 11:33 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Mon, Jul 6, 2015 at 9:52 AM, Joe Conway m...@joeconway.com wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  On 07/06/2015 07:37 AM, Merlin Moncure wrote:
  yup, and at least one case now fails where previously it ran
  through: postgres=# select * from dblink('a', 'b', 'c'); ERROR:
  function dblink(unknown, unknown, unknown) is not unique
 
  Hmm, that is an issue, possibly a fatal one.
 
  When Cory first mentioned this to me over a year ago we discussed some
  other, arguably better and more generic solutions. One was to build
  support for:
 
SELECT * FROM srf() AS TYPE OF(foo);
 
  The second idea I think is actually SQL standard if I recall correctly:
 
SELECT * FROM CAST(srf() AS foo) x;
 
  Currently this works:
 
  8
  select *
  from cast (row(11,'l','{a11,b11,c11}') as foo);
   f1 | f2 |  f3
  - ++---
   11 | l  | {a11,b11,c11}
  (1 row)
  8
 
  But this fails:
 
  8
  select *
  from cast
  (dblink('dbname=contrib_regression','select * from foo') as foo);
  ERROR:  cannot cast type record to foo
  8
 
  Comments in the source have this to say:
 
  8
  /*
   * coerce_record_to_complex
   *  Coerce a RECORD to a specific composite type.
   *
   * Currently we only support this for inputs that are RowExprs or
   * whole-row Vars.
   */
  8
 
  That explains why the first example works while the second does not.
  I'm not sure how hard it would be to fix that, but it appears that
  that is where we should focus.

 Yeah.  FWIW, here's my 0.02$:  I use dblink all the time, for all
 kinds of reasons, vastly preferring to have control over the query
 string (vs. FDW type solutions).  I have two basic gripes with it.  #1
 is that remote queries are not cancelled over all call sites when
 cancelled locally (off-topic for this thread) and #2 is that the SRF
 record describing mechanics are not abstractable because of using
 syntax to describe the record.  Corey's proposal, overloading issues
 aside, appears to neatly deal with this problem because anyelement can
 be passed down through a wrapping API.

 IOW, I'd like to do:
 CREATE FUNCTION remote_call(...) RETURNS ... AS
 $$
   SELECT dblink(...) AS r(...)
 $$ language sql;

 ...which can't be done (discounting dynamic sql acrobatics) because of
 the syntax based expression of the 'r' record.  So I like Corey's
 idea...I just think the functions need to be named differently (maybe
 to 'dblink_any', and dblink_get_result_any'?).   TBH, to do better
 than that you'd need SQL level support for handling the return type in
 the vein of NEW/OLD.  For fun, let's call it 'OUT'...then you could:

 SELECT * FROM remote_call(...) RETURNS SETOF foo;

 Inside remote_call, you'd see something like:

 SELECT dblink(...) AS OUT;

 As to the proposed syntax, I would vote to support the SQL standard
 variant if it could be handled during parse.  I don't see what AS TYPE
 OF really buys you because FWICT it does not support wrapping.

 merlin


 Your experiences with dblink are very similar to mine.

 The whole issue arose for me as an outcropping of my Poor Man's Parallel
 Processing extension (still not released but currently working for us in
 production internally).

 At some point I had to do dblink_get_result(...) as t(...) and not only did
 I have to render the structure as a string, I was going to have to execute
 that SQL dynamically (because plpgsql lacks a PREPARE statement) or I was
 going to have to re-code in C or plv8. Overall those calls aren't terribly
 expensive (it's working in production - for us - without this dblink
 modification), but a cleaner solution would be better.

Another option is to handle the intermediate result in json if you're
willing to sacrifice a little bit of performance.  For example,
suppose you wanted to log every dblink call through an wrapper without
giving up the ability to handle arbitrary result sets:

CREATE OR REPLACE FUNCTION dblink_log(_query TEXT) RETURNS SETOF JSON AS
$$
BEGIN
  RAISE WARNING 'got dblink %', _query;

  RETURN QUERY SELECT * FROM dblink(
'host=localhost',
format('SELECT row_to_json(q) from (%s) q', _query))
  AS R(j json);
END;
$$ LANGUAGE PLPGSQL;

postgres=# select * from dblink_log('select 0 as value');

WARNING:  got dblink select 0 as value
 dblink_log
─
 {value:0}
(1 row)

With json, you have a number of good options to convert back to a
record.  For example,

postgres=# CREATE TYPE foo AS (value int);
CREATE TYPE

SELECT p.*
FROM dblink_log('SELECT s AS value FROM generate_series(1,3) s') j
CROSS JOIN LATERAL json_populate_record(null::foo, j) p;

WARNING:  got dblink select s as value from generate_series(1,3) s
 value
───
 1
 2
 3

Re: [HACKERS] dblink: add polymorphic functions.

On Mon, Jul 6, 2015 at 3:52 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Mon, Jul 6, 2015 at 11:13 AM, Corey Huinker corey.huin...@gmail.com
 wrote:
  On Mon, Jul 6, 2015 at 11:33 AM, Merlin Moncure mmonc...@gmail.com
 wrote:
 
  On Mon, Jul 6, 2015 at 9:52 AM, Joe Conway m...@joeconway.com wrote:
   -BEGIN PGP SIGNED MESSAGE-
   Hash: SHA1
  
   On 07/06/2015 07:37 AM, Merlin Moncure wrote:
   yup, and at least one case now fails where previously it ran
   through: postgres=# select * from dblink('a', 'b', 'c'); ERROR:
   function dblink(unknown, unknown, unknown) is not unique
  
   Hmm, that is an issue, possibly a fatal one.
  
   When Cory first mentioned this to me over a year ago we discussed some
   other, arguably better and more generic solutions. One was to build
   support for:
  
 SELECT * FROM srf() AS TYPE OF(foo);
  
   The second idea I think is actually SQL standard if I recall
 correctly:
  
 SELECT * FROM CAST(srf() AS foo) x;
  
   Currently this works:
  
   8
   select *
   from cast (row(11,'l','{a11,b11,c11}') as foo);
f1 | f2 |  f3
   - ++---
11 | l  | {a11,b11,c11}
   (1 row)
   8
  
   But this fails:
  
   8
   select *
   from cast
   (dblink('dbname=contrib_regression','select * from foo') as foo);
   ERROR:  cannot cast type record to foo
   8
  
   Comments in the source have this to say:
  
   8
   /*
* coerce_record_to_complex
*  Coerce a RECORD to a specific composite type.
*
* Currently we only support this for inputs that are RowExprs or
* whole-row Vars.
*/
   8
  
   That explains why the first example works while the second does not.
   I'm not sure how hard it would be to fix that, but it appears that
   that is where we should focus.
 
  Yeah.  FWIW, here's my 0.02$:  I use dblink all the time, for all
  kinds of reasons, vastly preferring to have control over the query
  string (vs. FDW type solutions).  I have two basic gripes with it.  #1
  is that remote queries are not cancelled over all call sites when
  cancelled locally (off-topic for this thread) and #2 is that the SRF
  record describing mechanics are not abstractable because of using
  syntax to describe the record.  Corey's proposal, overloading issues
  aside, appears to neatly deal with this problem because anyelement can
  be passed down through a wrapping API.
 
  IOW, I'd like to do:
  CREATE FUNCTION remote_call(...) RETURNS ... AS
  $$
SELECT dblink(...) AS r(...)
  $$ language sql;
 
  ...which can't be done (discounting dynamic sql acrobatics) because of
  the syntax based expression of the 'r' record.  So I like Corey's
  idea...I just think the functions need to be named differently (maybe
  to 'dblink_any', and dblink_get_result_any'?).   TBH, to do better
  than that you'd need SQL level support for handling the return type in
  the vein of NEW/OLD.  For fun, let's call it 'OUT'...then you could:
 
  SELECT * FROM remote_call(...) RETURNS SETOF foo;
 
  Inside remote_call, you'd see something like:
 
  SELECT dblink(...) AS OUT;
 
  As to the proposed syntax, I would vote to support the SQL standard
  variant if it could be handled during parse.  I don't see what AS TYPE
  OF really buys you because FWICT it does not support wrapping.
 
  merlin
 
 
  Your experiences with dblink are very similar to mine.
 
  The whole issue arose for me as an outcropping of my Poor Man's Parallel
  Processing extension (still not released but currently working for us in
  production internally).
 
  At some point I had to do dblink_get_result(...) as t(...) and not only
 did
  I have to render the structure as a string, I was going to have to
 execute
  that SQL dynamically (because plpgsql lacks a PREPARE statement) or I was
  going to have to re-code in C or plv8. Overall those calls aren't
 terribly
  expensive (it's working in production - for us - without this dblink
  modification), but a cleaner solution would be better.

 Another option is to handle the intermediate result in json if you're
 willing to sacrifice a little bit of performance.  For example,
 suppose you wanted to log every dblink call through an wrapper without
 giving up the ability to handle arbitrary result sets:

 CREATE OR REPLACE FUNCTION dblink_log(_query TEXT) RETURNS SETOF JSON AS
 $$
 BEGIN
   RAISE WARNING 'got dblink %', _query;

   RETURN QUERY SELECT * FROM dblink(
 'host=localhost',
 format('SELECT row_to_json(q) from (%s) q', _query))
   AS R(j json);
 END;
 $$ LANGUAGE PLPGSQL;

 postgres=# select * from dblink_log('select 0 as value');

 WARNING:  got dblink select 0 as value
  dblink_log
 ─
  {value:0}
 (1 row)

 With json, you have a number of good options to convert back to a
 record.  For example,

 postgres=# CREATE TYPE foo AS (value int);
 CREATE TYPE

 SELECT p.*
 FROM

Re: [HACKERS] copy.c handling for RLS is insecure

Noah,

* Noah Misch (n...@leadboat.com) wrote:
 On Tue, Dec 02, 2014 at 11:32:27AM -0500, Stephen Frost wrote:
  * Robert Haas (robertmh...@gmail.com) wrote:
   On Thu, Nov 27, 2014 at 2:03 AM, Stephen Frost sfr...@snowman.net wrote:
Alright, I've done the change to use the RangeVar from CopyStmt, but
also added a check wherein we verify that the relation's OID returned
from the planned query is the same as the relation's OID that we did the
RLS check on- if they're different, we throw an error.  Please let me
know if there are any remaining concerns.
 
 Here is the check in question (added in commit 143b39c):
 
   plan = planner(query, 0, NULL);
 
   /*
* If we were passed in a relid, make sure we got the same one 
 back
* after planning out the query.  It's possible that it changed
* between when we checked the policies on the table and 
 decided to
* use a query and now.
*/
   if (queryRelId != InvalidOid)
   {
   Oid relid = 
 linitial_oid(plan-relationOids);
 
   /*
* There should only be one relationOid in this case, 
 since we
* will only get here when we have changed the command 
 for the
* user from a COPY relation TO to COPY (SELECT * 
 FROM
* relation) TO, to allow row level security policies 
 to be
* applied.
*/
   Assert(list_length(plan-relationOids) == 1);
 
   if (relid != queryRelId)
   ereport(ERROR,
   
 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
   errmsg(relation referenced by COPY statement 
 has changed)));
   }
 
   That's clearly an improvement, but I'm not sure it's water-tight.
   What if the name that originally referenced a table ended up
   referencing a view?  Then you could get
   list_length(plan-relationOids) != 1.
  
  I'll test it out and see what happens.  Certainly a good question and
  if there's an issue there then I'll get it addressed.
 
 Yes, it can be made to reference a view and trip the assertion.

Ok.  We can certainly make that assertion be a run-time consideration
instead, though I'm not thrilled with that being the only safe-guard.

   (And, in that case, I also wonder if you could get
   eval_const_expressions() to do evil things on your behalf while
   planning.)
  
  If it can be made to reference a view then there's an issue as the view
  might include a function call itself which is provided by the attacker..
 
 Indeed.  As the parenthetical remark supposed, the check happens too late to
 prevent a security breach.  planner() has run eval_const_expressions(),
 executing code of the view owner's choosing.

Right.

  Clearly, if we found a relation originally then we need that same
  relation with the same OID after the conversion to a query.
 
 That is necessary but not sufficient.  CREATE RULE can convert a table to a
 view without changing the OID, thereby fooling the check.  Test procedure:

Ugh, yes, rules would cause a problem for this..

 -- as superuser (or createrole)
 create user blackhat;
 create user alice;
 
 -- as blackhat
 begin;
 create table exploit_rls_copy (c int);
 alter table exploit_rls_copy enable row level security;
 grant select on exploit_rls_copy to public;
 commit;
 
 -- as alice
 -- first, set breakpoint on BeginCopy
 copy exploit_rls_copy to stdout;
 
 -- as blackhat
 begin;
 create or replace function leak() returns int immutable as $$begin
   raise notice 'in leak()'; return 7; end$$ language plpgsql;
 create rule _RETURN as on select to exploit_rls_copy do instead
   select leak() as c from (values (0)) dummy;
 commit;
 
 -- Release breakpoint.  leak() function call happens.  After that, assertion
 -- fires if enabled.  ERROR does not fire in any case.

Thanks for pointing this out.  I'll look into it.

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] ERROR: unexpected data beyond EOF

Hi,

On 2015-04-30 10:05:33 -0700, Joshua D. Drake wrote:
 postgres[21118]: [8-1] ERROR:  unexpected data beyond EOF
 in block 9 of relation base/430666195/430666206
 
 FreeBSD 10
 ZFS
 iSCSI
 RAID 50 (don't start, I didn't spec it).
 fsync on, full_page_writes on
 
 The restart of PostgreSQL makes the error go away and things progress
 normally. We don't experience further errors etc...

A couple questions:
1) Do you know what 430666206 refers to? Something like SELECT
oid::regclass, relkind FROM pg_class WHERE pg_relation_filenode(430666206) in 
the
correct database should answer.

2) Any chance you have zero_damaged_pages enabled?

3) Does the issue still occur? Is there any chance of manual
investigation before restaring the database?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

Fujii Masao masao.fu...@gmail.com writes:
 On Sat, Jul 4, 2015 at 2:26 AM, Andres Freund and...@anarazel.de wrote:
 We actually used to use a different relfilenode, but optimized that
 away: cab9a0656c36739f59277b34fea8ab9438395869
 
 commit cab9a0656c36739f59277b34fea8ab9438395869
 Author: Tom Lane t...@sss.pgh.pa.us
 Date:   Sun Aug 23 19:23:41 2009 +
 
 Make TRUNCATE do truncate-in-place when processing a relation that was 
 created
 or previously truncated in the current (sub)transaction.  This is safe since
 if the (sub)transaction later rolls back, we'd just discard the rel's current
 physical file anyway.  This avoids unreasonable growth in the number of
 transient files when a relation is repeatedly truncated.  Per a performance
 gripe a couple weeks ago from Todd Cook.
 
 to me the reasoning here looks flawed.

 Before this commit, when I ran the above test scenario, no data loss happened.

Actually I think what is broken here is COPY's test to decide whether it
can omit writing WAL:

 * Check to see if we can avoid writing WAL
 *
 * If archive logging/streaming is not enabled *and* either
 *- table was created in same transaction as this COPY
 *- data is being written to relfilenode created in this transaction
 * then we can skip writing WAL.  It's safe because if the transaction
 * doesn't commit, we'll discard the table (or the new relfilenode file).
 * If it does commit, we'll have done the heap_sync at the bottom of this
 * routine first.

The problem with that analysis is that it supposes that, if we crash and
recover, the WAL replay sequence will not touch the data.  What's killing
us in this example is the replay of the TRUNCATE, but that is not the only
possibility.  For example consider this modification of Fujii-san's test
case:

BEGIN;
CREATE TABLE test (i int primary key);
INSERT INTO test VALUES(-1);
\copy test from /tmp/num.csv with csv
COMMIT;
SELECT COUNT(*) FROM test;

The COUNT() correctly says 11 rows, but after crash-and-recover,
only the row with -1 is there.  This is because the INSERT writes
out an INSERT+INIT WAL record, which we happily replay, clobbering
the data added later by COPY.

We might have to give up on this COPY optimization :-(.  I'm not
sure what would be a safe rule for deciding that we can skip WAL
logging in this situation, but I am pretty sure that it would
require keeping information we don't currently keep about what's
happened earlier in the transaction.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

Andres Freund and...@anarazel.de writes:
 On 2015-07-06 11:14:40 -0400, Tom Lane wrote:
 The COUNT() correctly says 11 rows, but after crash-and-recover,
 only the row with -1 is there.  This is because the INSERT writes
 out an INSERT+INIT WAL record, which we happily replay, clobbering
 the data added later by COPY.

 ISTM any WAL logged action that touches a relfilenode essentially needs
 to disable further optimization based on the knowledge that the relation
 is new.

After a bit more thought, I think it's not so much any WAL logged action
as any unconditionally-replayed action.  INSERT+INIT breaks this
example because heap_xlog_insert will unconditionally replay the action,
even if the page is valid and has same or newer LSN.  Similarly, TRUNCATE
is problematic because we redo it unconditionally (and in that case it's
hard to see an alternative).

 It'd not be impossible to add more state to the relcache entry for the
 relation. Whether it's likely that we'd find all the places that'd need
 updating that state, I'm not sure.

Yeah, the sticking point is mainly being sure that the state is correctly
tracked, both now and after future changes.  We'd need to identify a state
invariant that we could be pretty confident we'd not break.

One idea I had was to allow the COPY optimization only if the heap file is
physically zero-length at the time the COPY starts.  That would still be
able to optimize in all the cases we care about making COPY fast for.
Rather than reverting cab9a0656c36739f, which would re-introduce a
different performance problem, perhaps we could have COPY create a new
relfilenode when it does this.  That should be safe if the table was
previously empty.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-06 Thread Simon Riggs

On 3 July 2015 at 09:25, Sawada Masahiko sawada.m...@gmail.com wrote:

 On Fri, Jul 3, 2015 at 1:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
  On 2 July 2015 at 16:30, Sawada Masahiko sawada.m...@gmail.com wrote:
 
 
  Also, the flags of each heap page header might be set PD_ALL_FROZEN,
  as well as all-visible
 
 
  Is it possible to have VM bits set to frozen but not visible?
 
  The description makes those two states sound independent of each other.
 
  Are they? Or not? Do we test for an impossible state?
 

 It's impossible to have VM bits set to frozen but not visible.
 These bit are controlled independently. But eventually, when
 all-frozen bit is set, all-visible is also set.


And my understanding is that if you clear all-visible you would also clear
all-frozen...

So I don't understand why you have two separate calls to
visibilitymap_clear()
Surely the logic should be to clear both bits at the same time?

In my understanding the state logic is

1. Both bits unset   ~(VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN)
which can be changed to state 2 only

2. VISIBILITYMAP_ALL_VISIBLE only
which can be changed state 1 or state 3

3. VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN
which can be changed to state 1 only

If that is the case please simplify the logic for setting and unsetting the
bits so they are set together efficiently. At the same time please also put
in Asserts to ensure that the state logic is maintained when it is set and
when it is tested.

I would also like to see the visibilitymap_test function exposed in SQL, so
we can write code to examine the map contents for particular ctids. By
doing that we can then write a formal test that shows the evolution of
tuples from insertion, vacuuming and freezing, testing the map has been set
correctly at each stage. I guess that needs to be done as an isolationtest
so we have an observer that contrains the xmin in various ways. In light of
multixact bugs, any code that changes the on-disk tuple metadata needs
formal tests.

Other than that the overall concept seems sound.

I think we need something for pg_upgrade to rewrite existing VMs. Otherwise
a large read only database would suddenly require a massive revacuum after
upgrade, which seems bad. That can wait for now until we all agree this
patch is sound.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] Bug in bttext_abbrev_convert()

2015-07-06 Thread Alvaro Herrera

Kevin Grittner wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com wrote:
  Peter Geoghegan wrote:
 
  It would be nice to always have a html report from gcov always
  available on the internet. That would be something useful to
  automate, IMV.
 
  http://pgci.eisentraut.org/jenkins/job/postgresql_master_coverage/Coverage/
 
 What would it take to get something like that which uses the
 check-world target instead of just the check target?

I suggest CC'ing Peter as a first measure.  I already suggested this (or
something similar) to him months ago.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Redundant error messages in policy.c

Daniele,

* Daniele Varrazzo (daniele.varra...@gmail.com) wrote:
 There are 5 different strings (one has a whitespace error), they could
 be 2. Patch attached.

Fair point.  I did try to address the language around policy vs. row
security vs. row level security, but didn't look as closely as I should
have at the error messages overall.

Will address.

Thanks!

Stephen


signature.asc
Description: Digital signature

[HACKERS] Redundant error messages in policy.c

2015-07-06 Thread Daniele Varrazzo

There are 5 different strings (one has a whitespace error), they could
be 2. Patch attached.

postgresql/src/backend$ grep errmsg commands/policy.c | grep policy |
sed 's/^ *//'
 errmsg(policy \%s\ for relation \%s\ already exists,
 errmsg(policy \%s\ on table \%s\ does not exist,
 errmsg(policy \%s\ for table \%s\ already exists,
 errmsg(policy \%s\ for table \%s\ does not exist,
 errmsg(policy \%s\ for table  \%s\ does not exist,

-- Daniele
diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c
index 6e95ba2..11efc9f 100644
--- a/src/backend/commands/policy.c
+++ b/src/backend/commands/policy.c
@@ -563,7 +563,7 @@ CreatePolicy(CreatePolicyStmt *stmt)
 	if (HeapTupleIsValid(policy_tuple))
 		ereport(ERROR,
 (errcode(ERRCODE_DUPLICATE_OBJECT),
- errmsg(policy \%s\ for relation \%s\ already exists,
+ errmsg(policy \%s\ for table \%s\ already exists,
  stmt-policy_name, RelationGetRelationName(target_table;
 
 	values[Anum_pg_policy_polrelid - 1] = ObjectIdGetDatum(table_id);
@@ -735,7 +735,7 @@ AlterPolicy(AlterPolicyStmt *stmt)
 	if (!HeapTupleIsValid(policy_tuple))
 		ereport(ERROR,
 (errcode(ERRCODE_UNDEFINED_OBJECT),
- errmsg(policy \%s\ on table \%s\ does not exist,
+ errmsg(policy \%s\ for table \%s\ does not exist,
 		stmt-policy_name,
 		RelationGetRelationName(target_table;
 
@@ -977,7 +977,7 @@ get_relation_policy_oid(Oid relid, const char *policy_name, bool missing_ok)
 		if (!missing_ok)
 			ereport(ERROR,
 	(errcode(ERRCODE_UNDEFINED_OBJECT),
-	 errmsg(policy \%s\ for table  \%s\ does not exist,
+	 errmsg(policy \%s\ for table \%s\ does not exist,
 			policy_name, get_rel_name(relid;
 
 		policy_oid = InvalidOid;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2015-07-06 Thread Pavel Stehule

Hi

here is a version with both direction support.

postgres=# copy foo from '/tmp/1.jpg' (format raw);
COPY 1
Time: 93.021 ms
postgres=# \dt+ foo
   List of relations
┌┬──┬───┬───┬┬─┐
│ Schema │ Name │ Type  │ Owner │  Size  │ Description │
╞╪══╪═══╪═══╪╪═╡
│ public │ foo  │ table │ pavel │ 256 kB │ │
└┴──┴───┴───┴┴─┘
(1 row)

postgres=# \copy foo to '~/3.jpg' (format raw)
COPY 1
Time: 2.401 ms

Regards

Pavel

2015-07-02 17:02 GMT+02:00 Tom Lane t...@sss.pgh.pa.us:

 Andrew Dunstan and...@dunslane.net writes:
  Does the COPY line protocol even support binary data?

 The protocol, per se, just transmits a byte stream.  There is a field
 in the CopyInResponse/CopyOutResponse messages that indicates whether
 a text or binary copy is being done.  One thing we'd have to consider
 is whether raw mode is sufficiently different from binary to justify
 an additional value for this field, and if so whether that constitutes
 a protocol break.

 IIRC, psql wouldn't really care; it just transfers the byte stream to or
 from the target file, regardless of text or binary mode.  But there might
 be other client libraries that are smarter and expect binary mode to
 mean the binary file format specified in the COPY reference page.  So
 there may be value in being explicit about raw mode in these messages.

 A key point in all this is that people who need raw transfer probably
 need it in both directions, a point that your SELECT proposal cannot
 satisfy, but hacking COPY could.  So I lean towards the latter really.

 regards, tom lane

commit 5599347d6b0b29a2674d465b3ff03164fce59810
Author: Pavel Stehule pavel.steh...@gooddata.com
Date:   Mon Jul 6 23:18:18 2015 +0200

COPY FROM/TO (FORMAT RAW)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 2850b47..4b7b64d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -190,7 +190,7 @@ COPY { replaceable class=parametertable_name/replaceable [ ( replaceable
   Selects the data format to be read or written:
   literaltext/,
   literalcsv/ (Comma Separated Values),
-  or literalbinary/.
+  literalbinary/ or literalraw/literal.
   The default is literaltext/.
  /para
 /listitem
@@ -881,6 +881,23 @@ OIDs to be shown as null if that ever proves desirable.
 /para
/refsect3
   /refsect2
+
+  refsect2
+ titleRaw Format/title
+
+   para
+The literalraw/literal format option causes all data to be
+stored/read as binary format rather than as text. It shares format
+for data with literalbinary/literal format. This format doesn't
+use any metadata - only row data in network byte order are exported
+or imported.
+   /para
+
+   para
+Because this format doesn't support any delimiter, only one value
+can be exported or imported. NULL values are not allowed.
+   /para
+  /refsect2
  /refsect1
 
  refsect1
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8904676..2ad7eb1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -92,6 +92,11 @@ typedef enum EolType
  * it's faster to make useless comparisons to trailing bytes than it is to
  * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is TRUE
  * when we have to do it the hard way.
+ *
+ * COPY supports three modes: text, binary and raw. The text format is plain
+ * text multiline format with specified delimiter. The binary format holds
+ * metadata (numbers, sizes) and data. The raw format holds data only and
+ * only one non NULL value can be processed.
  */
 typedef struct CopyStateData
 {
@@ -113,6 +118,7 @@ typedef struct CopyStateData
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	bool		binary;			/* binary format? */
+	bool		raw;			/* required raw binary? */
 	bool		oids;			/* include OIDs? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -202,6 +208,9 @@ typedef struct CopyStateData
 	char	   *raw_buf;
 	int			raw_buf_index;	/* next byte to process */
 	int			raw_buf_len;	/* total # of bytes stored */
+
+	/* field for RAW mode */
+	bool		row_processed;		/* true, when first row was processed */
 } CopyStateData;
 
 /* DestReceiver for COPY (SELECT) TO */
@@ -345,9 +354,16 @@ SendCopyBegin(CopyState cstate)
 		/* new way */
 		StringInfoData buf;
 		int			natts = list_length(cstate-attnumlist);
-		int16		format = (cstate-binary ? 1 : 0);
+		int16		format;
 		int			i;
 
+		if (cstate-raw)
+			format = 2;
+		else if (cstate-binary)
+			format = 1;
+		else
+			format = 0;
+
 		pq_beginmessage(buf, 'H');
 		pq_sendbyte(buf, format);		/* overall format */
 		pq_sendint(buf, natts, 2);
@@ -359,7 +375,7 @@ SendCopyBegin(CopyState cstate)
 	else if

Re: [HACKERS] dblink: add polymorphic functions.

On Mon, Jul 6, 2015 at 11:33 AM, Merlin Moncure mmonc...@gmail.com wrote:

 On Mon, Jul 6, 2015 at 9:52 AM, Joe Conway m...@joeconway.com wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  On 07/06/2015 07:37 AM, Merlin Moncure wrote:
  yup, and at least one case now fails where previously it ran
  through: postgres=# select * from dblink('a', 'b', 'c'); ERROR:
  function dblink(unknown, unknown, unknown) is not unique
 
  Hmm, that is an issue, possibly a fatal one.
 
  When Cory first mentioned this to me over a year ago we discussed some
  other, arguably better and more generic solutions. One was to build
  support for:
 
SELECT * FROM srf() AS TYPE OF(foo);
 
  The second idea I think is actually SQL standard if I recall correctly:
 
SELECT * FROM CAST(srf() AS foo) x;
 
  Currently this works:
 
  8
  select *
  from cast (row(11,'l','{a11,b11,c11}') as foo);
   f1 | f2 |  f3
  - ++---
   11 | l  | {a11,b11,c11}
  (1 row)
  8
 
  But this fails:
 
  8
  select *
  from cast
  (dblink('dbname=contrib_regression','select * from foo') as foo);
  ERROR:  cannot cast type record to foo
  8
 
  Comments in the source have this to say:
 
  8
  /*
   * coerce_record_to_complex
   *  Coerce a RECORD to a specific composite type.
   *
   * Currently we only support this for inputs that are RowExprs or
   * whole-row Vars.
   */
  8
 
  That explains why the first example works while the second does not.
  I'm not sure how hard it would be to fix that, but it appears that
  that is where we should focus.

 Yeah.  FWIW, here's my 0.02$:  I use dblink all the time, for all
 kinds of reasons, vastly preferring to have control over the query
 string (vs. FDW type solutions).  I have two basic gripes with it.  #1
 is that remote queries are not cancelled over all call sites when
 cancelled locally (off-topic for this thread) and #2 is that the SRF
 record describing mechanics are not abstractable because of using
 syntax to describe the record.  Corey's proposal, overloading issues
 aside, appears to neatly deal with this problem because anyelement can
 be passed down through a wrapping API.

 IOW, I'd like to do:
 CREATE FUNCTION remote_call(...) RETURNS ... AS
 $$
   SELECT dblink(...) AS r(...)
 $$ language sql;

 ...which can't be done (discounting dynamic sql acrobatics) because of
 the syntax based expression of the 'r' record.  So I like Corey's
 idea...I just think the functions need to be named differently (maybe
 to 'dblink_any', and dblink_get_result_any'?).   TBH, to do better
 than that you'd need SQL level support for handling the return type in
 the vein of NEW/OLD.  For fun, let's call it 'OUT'...then you could:

 SELECT * FROM remote_call(...) RETURNS SETOF foo;

 Inside remote_call, you'd see something like:

 SELECT dblink(...) AS OUT;

 As to the proposed syntax, I would vote to support the SQL standard
 variant if it could be handled during parse.  I don't see what AS TYPE
 OF really buys you because FWICT it does not support wrapping.

 merlin


Your experiences with dblink are very similar to mine.

The whole issue arose for me as an outcropping of my Poor Man's Parallel
Processing extension (still not released but currently working for us in
production internally).

At some point I had to do dblink_get_result(...) as t(...) and not only did
I have to render the structure as a string, I was going to have to execute
that SQL dynamically (because plpgsql lacks a PREPARE statement) or I was
going to have to re-code in C or plv8. Overall those calls aren't terribly
expensive (it's working in production - for us - without this dblink
modification), but a cleaner solution would be better.

[HACKERS] Repeated pg_upgrade buildfarm failures on binturon

Hi,

Binturon has repeatedly failed with errors like:
ERROR:  could not open file base/16400/32052: No such file or directory

E.g.
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=binturongdt=2015-07-06%2014%3A20%3A24

It's not just master that's failing, even older branches report odd
errors:
connection to database failed: FATAL:  could not open relation mapping file 
global/pg_filenode.map: No such file or directory
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=binturongdt=2015-07-06%2014%3A53%3A10

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bypassing SQL lexer and parser

2015-07-06 Thread Guillaume Lelarge

Le 6 juil. 2015 7:16 PM, Данила Поярков d...@dannote.net a écrit :

 Hello!

 What is the best starting point to PostgreSQL internal APIs for operating
directly with the storage (performing basic INSERTs, UPDATEs, SELECTs and
simple JOINs by hand)? I'm looking for something similar to MySQL Cluster
NDB API or InnoDB internal API (the late HailDB and Embedded InnoDB).

 Does the PostgreSQL support any other type of plugins/extensions other
than FDW and custom data types? I mean, is it possible to start another
daemon within Postgres without slightly modifying the main codebase?


That sounds a lot like a background worker.

 In case you are wondering why could anyone need something like that: I'm
looking for a way to implement a small subset of HTSQL (see [
http://htsql.org/], of course not with the whole HTTP protocol) natively in
one of popular RDBMS without extra URL-to-SQL conversion. Yes, I know that
a lot of hard work was done for making query plans, optimizing etc. but I'm
still really wish to do this for some very specific needs. I'm not going to
completely replace the SQL and thus will be happy to do those manipulations
on a SQL 2008-compliant DBMS.


Good luck with that :-)

-- 
Guillaume

[HACKERS] [PATCH] Add missing \ddp psql command

2015-07-06 Thread David Christensen

Quickie patch for spotted missing psql \ddp tab-completion.



0001-Add-missing-ddp-psql-tab-completion.patch
Description: Binary data



--
David Christensen
PostgreSQL Team Manager
End Point Corporation
da...@endpoint.com
785-727-1171






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

On 07/06/2015 10:03 AM, Sawada Masahiko wrote:
  The setting that you need is 1(1[A, C], 1[B, C]) in Michael's proposed 
  grammer.
 
 If we set the remote disaster recovery site up as synch replica, we
 would get some big latencies even though we use quorum commit.
 So I think this case Fujii-san suggested is a good configuration, and
 many users would want to use it.
 I tend to agree with combine quorum and prioritization into one GUC
 parameter while keeping backward compatibility.

OK, so here's the arguments pro-JSON and anti-JSON:

pro-JSON:

* standard syntax which is recognizable to sysadmins and devops.
* can use JSON/JSONB functions with ALTER SYSTEM SET to easily make
additions/deletions from the synch rep config.
* can add group labels (see below)

anti-JSON:

* more verbose
* syntax is not backwards-compatible, we'd need a switch
* people will want to use line breaks, which we can't support

Re: group labels: I see a lot of value in being able to add names to
quorum groups.  Think about how this will be represented in system
views; it will be difficult to show sync status of any quorum group in
any meaningful way if the group has no label, and any system-assigned
label would change unpredictably from the user's perspective.

To give a JSON example, let's take the case of needing to sync to two of
the servers in either London or NC:

'{ remotes : { london_servers : { quorum : 2, servers : [
london1, london2, london3 ] }, nc_servers : { quorum : 1,
servers [ nc1, nc2 ] } }'

This says: as the remotes group, synch with a quorum of 2 servers in
london and a quorum of 1 server in NC.  This assumes for
backwards-compatibility reasons that we support a priority list of
groups of quorums, and not some other combination (see below for more on
this).

The advantage of having these labels is that it becomes easy to
represent statuses for them:

sync_group  state   definition
remotes waiting { london_servers : { quorum ...
london_servers  synced  { quorum : 2, servers : ...
nc_servers  waiting { quorum : 1, servers [  ...

Without labels, we force the DBA to track groups by raw definitions,
which would be difficult.  Also, there's the question of what we do on
reload with any statuses of synch groups which are currently in-process,
if we don't have a stable key with which to identify groups.

The other grammar issue has to do with the nesting nature of quorums and
priorities.  A theoretical user could want:

* a priority list of quorum groups
* a quorum group of priority lists
* a quorum group of quorum groups
* a priority list of quorum groups of quorum groups
* a quorum group of quorum groups of priority lists
... etc.

I don't really see any possible end to the possible permutations, which
is why it would be good to establish some real use cases from now in
order to figure out what we really want to support.  Absent that, my
inclination is that we should implement the simplest possible thing
(i.e. no nesting) for 9.5.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 alpha: some small comments on BRIN and btree_gin

2015-07-06 Thread Jeff Janes

On Mon, Jul 6, 2015 at 12:20 AM, Marc Mamin m.ma...@intershop.de wrote:

 Hello,

 First: KUDO !!!
 The release notes are extremely promising in regard to performance
 improvements :-)


 I've made some (dirty) tests with BRIN and btree_gin.
 (on a smalll Windows laptop ...)

 just a few remarks:


 - btree_gin deserve a better description than that:

   However, they are useful for GIN testing and as a base for developing
 other GIN operator classes.

I came to similar times between btree and gin for indexes on category
 columns (ca 20 to 200 distinct values)
For me, gin clearly wins here thanks to the index size difference.


I reached the same conclusion for things with higher distinct values, but
still several copies of each distinct value. except I don't think we should
change the description until the BETWEEN issue is resolved.  That is a
pretty serious limitation, I think.

Or at least, if you we invite people to use it for this purpose, we would
have to warn them that it is not suitable for range queries.  It wouldn't
be so bad if it merely didn't support them well, but as things are now it
positively pulls the planner away from better options, because it looks
falsely attractive.

I've looked into doing myself, but I'm afraid it is beyond me.

Cheers,

Jeff

[HACKERS] Bypassing SQL lexer and parser

2015-07-06 Thread Данила Поярков

Hello!

What is the best starting point to PostgreSQL internal APIs for operating 
directly with the storage (performing basic INSERTs, UPDATEs, SELECTs and 
simple JOINs by hand)? I'm looking for something similar to MySQL Cluster NDB 
API or InnoDB internal API (the late HailDB and Embedded InnoDB).

Does the PostgreSQL support any other type of plugins/extensions other than FDW 
and custom data types? I mean, is it possible to start another daemon within 
Postgres without slightly modifying the main codebase?

In case you are wondering why could anyone need something like that: I'm 
looking for a way to implement a small subset of HTSQL (see 
[http://htsql.org/], of course not with the whole HTTP protocol) natively in 
one of popular RDBMS without extra URL-to-SQL conversion. Yes, I know that a 
lot of hard work was done for making query plans, optimizing etc. but I'm still 
really wish to do this for some very specific needs. I'm not going to 
completely replace the SQL and thus will be happy to do those manipulations on 
a SQL 2008-compliant DBMS.

Thanks in advance,
dannote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-07-06 Thread Jeff Davis

On Mon, 2015-07-06 at 10:37 +0530, Amit Kapila wrote:

 Or the other way to look at it could be separate out fields which are
 required for parallel scan which is done currently by forming a
 separate structure ParallelHeapScanDescData.
 
I was suggesting that you separate out both the normal scan fields and
the partial scan fields, that way we're sure that rs_nblocks is not
accessed during a parallel scan.

Or, you could try wrapping the parts of heapam.c that are affected by
parallelism into new static functions.

 The reason why partial scan can't be mixed with sync scan is that in
 parallel
 scan, it performs the scan of heap by synchronizing blocks (each
 parallel worker
 scans a block and then asks for a next block to scan) among parallel
 workers.
 Now if we try to make sync scans work along with it, the
 synchronization among
 parallel workers will go for a toss.  It might not be impossible to
 make that
 work in some way, but not sure if it is important enough for sync
 scans to work
 along with parallel scan.

I haven't tested it, but I think it would still be helpful. The block
accesses are still in order even during a partial scan, so why wouldn't
it help?

You might be concerned about the reporting of a block location, which
would become more noisy with increased parallelism. But in my original
testing, sync scans weren't very sensitive to slight deviations, because
of caching effects.

 tqueue.c is mainly designed to pass tuples between parallel workers
 and currently it is used in Funnel operator to gather the tuples
 generated
 by all the parallel workers.  I think we can use it for any other
 operator
 which needs tuple communication among parallel workers.

Some specifics of the Funnel operator seem to be a part of tqueue, which
doesn't make sense to me. For instance, reading from the set of queues
in a round-robin fashion is part of the Funnel algorithm, and doesn't
seem suitable for a generic tuple communication mechanism (that would
never allow order-sensitive reading, for example).

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Enhanced ALTER OPERATOR

2015-07-06 Thread Uriy Zhuravlev

Hello hackers.

This is the fifth version of the patch (the fourth was unsuccessful :)).
I added documentation and was held a small refactoring.

Thanks.
-- 
Uriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/ref/alter_operator.sgml b/doc/src/sgml/ref/alter_operator.sgml
index bdb2d02..4aaeed0 100644
--- a/doc/src/sgml/ref/alter_operator.sgml
+++ b/doc/src/sgml/ref/alter_operator.sgml
@@ -26,6 +26,11 @@ ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/repla
 
 ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/replaceable | NONE } , { replaceableright_type/replaceable | NONE } )
SET SCHEMA replaceablenew_schema/replaceable
+
+ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/replaceable | NONE } , { replaceableright_type/replaceable | NONE } ) SET (
+[, RESTRICT = { replaceable class=parameterres_proc/replaceable | NULL } ]
+[, JOIN = { replaceable class=parameterjoin_proc/replaceable | NULL } ]
+)
 /synopsis
  /refsynopsisdiv
 
@@ -34,8 +39,7 @@ ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/repla
 
   para
commandALTER OPERATOR/command changes the definition of
-   an operator.  The only currently available functionality is to change the
-   owner of the operator.
+   an operator.
   /para
 
   para
@@ -98,6 +102,25 @@ ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/repla
  /para
 /listitem
/varlistentry
+
+   varlistentry
+ termreplaceable class=parameterres_proc/replaceable/term
+ listitem
+   para
+ The restriction selectivity estimator function for this operator.
+   /para
+  /listitem
+   /varlistentry
+
+   varlistentry
+ termreplaceable class=parameterjoin_proc/replaceable/term
+ listitem
+   para
+ The join selectivity estimator function for this operator.
+   /para
+ /listitem
+   /varlistentry
+
   /variablelist
  /refsect1
 
@@ -109,6 +132,13 @@ ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/repla
 programlisting
 ALTER OPERATOR @@ (text, text) OWNER TO joe;
 /programlisting/para
+
+  para
+Change the restriction and join selectivity estimator function of a custom operator literala  b/literal for type typeint[]/type:
+programlisting
+ALTER OPERATOR  (_int4, _int4) SET (RESTRICT = _int_contsel, JOIN = _int_contjoinsel);
+/programlisting/para
+
  /refsect1
 
  refsect1
diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 072f530..f5ff381 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -28,8 +28,10 @@
 #include catalog/pg_operator.h
 #include catalog/pg_proc.h
 #include catalog/pg_type.h
+#include commands/defrem.h
 #include miscadmin.h
 #include parser/parse_oper.h
+#include parser/parse_func.h
 #include utils/acl.h
 #include utils/builtins.h
 #include utils/lsyscache.h
@@ -53,7 +55,7 @@ static Oid OperatorShellMake(const char *operatorName,
   Oid leftTypeId,
   Oid rightTypeId);
 
-static void OperatorUpd(Oid baseId, Oid commId, Oid negId);
+static void ShellOperatorUpd(Oid baseId, Oid commId, Oid negId);
 
 static Oid get_other_operator(List *otherOp,
    Oid otherLeftTypeId, Oid otherRightTypeId,
@@ -563,7 +565,7 @@ OperatorCreate(const char *operatorName,
 		commutatorId = operatorObjectId;
 
 	if (OidIsValid(commutatorId) || OidIsValid(negatorId))
-		OperatorUpd(operatorObjectId, commutatorId, negatorId);
+		ShellOperatorUpd(operatorObjectId, commutatorId, negatorId);
 
 	return address;
 }
@@ -633,7 +635,7 @@ get_other_operator(List *otherOp, Oid otherLeftTypeId, Oid otherRightTypeId,
 }
 
 /*
- * OperatorUpd
+ * ShellOperatorUpd
  *
  *	For a given operator, look up its negator and commutator operators.
  *	If they are defined, but their negator and commutator fields
@@ -642,7 +644,7 @@ get_other_operator(List *otherOp, Oid otherLeftTypeId, Oid otherRightTypeId,
  *	which are the negator or commutator of each other.
  */
 static void
-OperatorUpd(Oid baseId, Oid commId, Oid negId)
+ShellOperatorUpd(Oid baseId, Oid commId, Oid negId)
 {
 	int			i;
 	Relation	pg_operator_desc;
@@ -864,3 +866,150 @@ makeOperatorDependencies(HeapTuple tuple)
 
 	return myself;
 }
+
+/*
+ * Operator update aka ALTER OPERATOR for RESTRICT, JOIN
+ */
+void OperatorUpd(Oid classId,
+ Oid baseId,
+ List *operator_params)
+{
+	int			i;
+	ListCell	*pl;
+	Relation	catalog;
+	HeapTuple	tup;
+	Oid 		operator_param_id = 0;
+	bool		nulls[Natts_pg_operator];
+	bool		replaces[Natts_pg_operator];
+	Datum		values[Natts_pg_operator];
+
+	for (i = 0; i  Natts_pg_operator; ++i)
+	{
+		values[i] = (Datum) 0;
+		replaces[i] = false;
+		nulls[i] = false;
+	}
+
+	catalog = heap_open(classId, RowExclusiveLock);
+	tup = SearchSysCacheCopy1(OPEROID, ObjectIdGetDatum(baseId));
+	if (HeapTupleIsValid(tup))
+	{
+		/*
+		 * loop over the definition list and

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-06 Thread Sawada Masahiko

On Thu, Jul 2, 2015 at 9:31 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Jul 2, 2015 at 5:44 PM, Beena Emerson memissemer...@gmail.com wrote:
 Hello,
 There has been a lot of discussion. It has become a bit confusing.
 I am summarizing my understanding of the discussion till now.
 Kindly let me know if I missed anything important.

 Backward compatibility:
 We have to provide support for the current format and behavior for
 synchronous replication (The first running standby from list s_s_names)
 In case the new format does not include GUC, then a special value to be
 specified for s_s_names to indicate that.

 Priority and quorum:
 Quorum treats all the standby with same priority while in priority behavior,
 each one has a different priority and ACK must be received from the
 specified k lowest priority servers.
 I am not sure how combining both will work out.
 Mostly we would like to have some standbys from each data center to be in
 sync. Can it not be achieved by quorum only?

 So you're wondering if there is the use case where both quorum and priority 
 are
 used together?

 For example, please imagine the case where you have two standby servers
 (say A and B) in local site, and one standby server (say C) in remote disaster
 recovery site. You want to set up sync replication so that the master waits 
 for
 ACK from either A or B, i.e., the setting of 1(A, B). Also only when either A
 or B crashes, you want to make the master wait for ACK from either the
 remaining local standby or C. On the other hand, you don't want to use the
 setting like 1(A, B, C). Because in this setting, C can be sync standby when
 the master craches, and both A and B might be very behind of C. In this case,
 you need to promote the remote standby server C to new master,,, this is what
 you'd like to avoid.

 The setting that you need is 1(1[A, C], 1[B, C]) in Michael's proposed 
 grammer.


If we set the remote disaster recovery site up as synch replica, we
would get some big latencies even though we use quorum commit.
So I think this case Fujii-san suggested is a good configuration, and
many users would want to use it.
I tend to agree with combine quorum and prioritization into one GUC
parameter while keeping backward compatibility.

Regards,

--
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix broken Install.bat when target directory contains a space


On 07/06/2015 04:38 AM, Michael Paquier wrote:

On Sat, Jul 4, 2015 at 3:57 AM, Heikki Linnakangas wrote:

Hmm. Why is install.bat not like build.bat, i.e. just a thin wrapper that
just calls install.pl, passing all arguments?


I guess we just haven't noticed it. And indeed it makes everything
more simple, and fixes as well the error reported when install path
contains a space.


Ok, committed that way.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in bttext_abbrev_convert()

2015-07-06 Thread Peter Geoghegan

On Mon, Jul 6, 2015 at 12:27 PM, Kevin Grittner kgri...@ymail.com wrote:
 What would it take to get something like that which uses the
 check-world target instead of just the check target?  Without the
 additional tests (like the isolation tests), some of these numbers
 don't reflect the coverage of regularly run tests.  While it is of
 some interest what coverage we get on the 30 second `make check`
 runs, I would be a lot more interested in what our coverage is when
 the full 8.5 minute set of regression tests we have in git is run.

I agree. Obviously in some cases the coverage is bad because
concurrency is naturally required. It's even more glaring that
recovery codepaths are always all-red in coverage reports.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in bttext_abbrev_convert()

2015-07-06 Thread Kevin Grittner

Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Peter Geoghegan wrote:

 It would be nice to always have a html report from gcov always
 available on the internet. That would be something useful to
 automate, IMV.

 http://pgci.eisentraut.org/jenkins/job/postgresql_master_coverage/Coverage/

What would it take to get something like that which uses the
check-world target instead of just the check target?  Without the
additional tests (like the isolation tests), some of these numbers
don't reflect the coverage of regularly run tests.  While it is of
some interest what coverage we get on the 30 second `make check`
runs, I would be a lot more interested in what our coverage is when
the full 8.5 minute set of regression tests we have in git is run.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Repeated pg_upgrade buildfarm failures on binturon

Andres Freund and...@anarazel.de writes:
 Binturon has repeatedly failed with errors like:
 ERROR:  could not open file base/16400/32052: No such file or directory

I agree that binturong seems to have something odd going on; but there are
a lot of other intermittent pg_upgrade test failures in the buildfarm
history, and the general situation is that you can't tell what actually
happened because the buildfarm script doesn't capture all the relevant log
files.  It would be real nice to improve that, but I lack the necessary
Perl-fu.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 alpha: some small comments on BRIN and btree_gin

On 07/06/2015 12:20 AM, Marc Mamin wrote:
There seems to be no fence against useless BRIN indexes that would allow 
 a fallback on a table scan.
But the time overhead remind small :)

When have we ever stopped users from creating useless indexes?  For one
thing, just because the index isn't useful *now* doesn't mean it won't
be in the future.

Now, it would be useful to have a brin_index_effectiveness() function so
that DBAs could check for themselves whether they should dump indexes.
However, I don't see needing that for 9.5.

Are there usage stats in pg_stat_user_indexes for BRIN?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add checksums without --initdb

2015-07-06 Thread David Christensen


 On Jul 2, 2015, at 3:43 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 
 On 07/02/2015 11:28 PM, Andres Freund wrote:
 On 2015-07-02 22:53:40 +0300, Heikki Linnakangas wrote:
 Add a enabling-checksums mode to the server where it calculates checksums
 for anything it writes, but doesn't check or complain about incorrect
 checksums on reads. Put the server into that mode, and then have a
 background process that reads through all data in the cluster, calculates
 the checksum for every page, and writes all the data back. Once that's
 completed, checksums can be fully enabled.
 
 You'd need, afaics, a bgworker that connects to every database to read
 pg_class, to figure out what type of page a relfilenode has. And this
 probably should call back into the relevant AM or such.
 
 Nah, we already assume that every relation data file follows the standard 
 page format, at least enough to have the checksum field at the right 
 location. See FlushBuffer() - it just unconditionally calculates the checksum 
 before writing out the page. (I'm not totally happy about that, but that ship 
 has sailed)
 - Heikki

So thinking some more about the necessary design to support enabling checksums 
post-initdb, what about the following?:

Introduce a new field in pg_control, data_checksum_state - (0 - disabled, 1 - 
enabling in process, 2 - enabled).  This could be set via (say) a pg_upgrade 
flag when creating a new cluster with --enable-checksums or a standalone 
program to adjust that field in pg_control.  Checksum enforcing behavior will 
be dependent on that setting; 0 is non-enforcing read or write, 1 is enforcing 
checksums on buffer write but ignoring on read, and 2 is the normal enforcing 
read/write mode.  Disabling checksums could be done with this tool as well, and 
would trivially just cause it to ignore the checksums (or alternately set to 0 
on page write, depending on if we think it matters).

Add new catalog fields pg_database.dathaschecksum, pg_class.relhaschecksum; 
initially set to 0, or 1 if checksums were enabled at initdb time.

Augment autovacuum to check if we are currently enabling checksums based on the 
value in pg_control; if so, loop over any database with 
!pg_database.dathaschecksum.

For any relation in said database, check for relations with 
!pg_class.relhaschecksum; if found, read/dirty/write (however) each block to 
force the checksum written out for each page.  As each relation is completely 
verified checksummed, update relhaschecksum = t.  When no relations remain, set 
pg_database.dathaschecksum = t.  (There may need to be some additional 
considerations for storing the checksum state of global relations or any other 
thing that uses the standard page format that live outside a specific database; 
i.e., all shared catalogs, quite possibly some things I haven't considered yet.)

If the data_checksum_state is enabling and there are no database needing to 
be enabled, then we can set data_checksum_state to enabled; everything then 
works as expected for the normal enforcing state.

External programs needing to be adjusted:
- pg_reset_xlog -- add the persistence of the data_checksum_state
- pg_controldata -- add the display of the data_checksum_state
- pg_upgrade -- add an --enable-checksums flag to transition a new cluster with 
the data pages.  May need some adjustments for the data_checksum_state field

Possible new tool:
- pg_enablechecksums -- basic tool to set the data_checksum_state flag of 
pg_control

Other thoughts
Do we need periodic CRC scanning background worker just to check buffers 
periodically?
- if so, does this cause any interference with frozen relations?

What additional changes would be required or what wrinkles would we have to 
work out?

David
--
David Christensen
PostgreSQL Team Manager
End Point Corporation
da...@endpoint.com
785-727-1171







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix broken Install.bat when target directory contains a space

Michael Paquier michael.paqu...@gmail.com writes:
 On Tue, Jul 7, 2015 at 4:19 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 Ok, committed that way.

 Shoudn't this patch be backpatched? In the backbranches install.bat
 does not work correctly with paths containing spaces.

I was about to ask the same.  AFAICS, the other scripts have been similar
one-liners at least since 9.0.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Spurious full-stop in message

2015-07-06 Thread Daniele Varrazzo

postgresql/src/backend$ grep must be superuser to change bypassrls
attribute commands/user.c | sed 's/   \+//'
errmsg(must be superuser to change bypassrls attribute.)));
 errmsg(must be superuser to change bypassrls attribute)));

Patch to fix attached.
diff --git a/src/backend/commands/user.c b/src/backend/commands/user.c
index 3b381c5..5b20994 100644
--- a/src/backend/commands/user.c
+++ b/src/backend/commands/user.c
@@ -301,7 +301,7 @@ CreateRole(CreateRoleStmt *stmt)
 		if (!superuser())
 			ereport(ERROR,
 	(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-errmsg(must be superuser to change bypassrls attribute.)));
+errmsg(must be superuser to change bypassrls attribute)));
 	}
 	else
 	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spurious full-stop in message

* Daniele Varrazzo (daniele.varra...@gmail.com) wrote:
 postgresql/src/backend$ grep must be superuser to change bypassrls
 attribute commands/user.c | sed 's/   \+//'
 errmsg(must be superuser to change bypassrls attribute.)));
  errmsg(must be superuser to change bypassrls attribute)));
 
 Patch to fix attached.

Will fix.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] 9.5 alpha: some small comments on BRIN and btree_gin

On Tue, Jul 7, 2015 at 9:04 AM, Josh Berkus j...@agliodbs.com wrote:
 Are there usage stats in pg_stat_user_indexes for BRIN?

Yes, they are here.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bypassing SQL lexer and parser

2015-07-06 Thread Jim Nasby


On 7/6/15 12:14 PM, Данила Поярков wrote:

Hello!

What is the best starting point to PostgreSQL internal APIs for operating 
directly with the storage (performing basic INSERTs, UPDATEs, SELECTs and 
simple JOINs by hand)? I'm looking for something similar to MySQL Cluster NDB 
API or InnoDB internal API (the late HailDB and Embedded InnoDB).


There is support for plugging into the parser and executor, so that 
might be a possibility, but...



Does the PostgreSQL support any other type of plugins/extensions other than FDW 
and custom data types? I mean, is it possible to start another daemon within 
Postgres without slightly modifying the main codebase?

In case you are wondering why could anyone need something like that: I'm 
looking for a way to implement a small subset of HTSQL (see 
[http://htsql.org/], of course not with the whole HTTP protocol) natively in 
one of popular RDBMS without extra URL-to-SQL conversion. Yes, I know that a 
lot of hard work was done for making query plans, optimizing etc. but I'm still 
really wish to do this for some very specific needs. I'm not going to 
completely replace the SQL and thus will be happy to do those manipulations on 
a SQL 2008-compliant DBMS.


AFAIK HTSQL is very happy producing SQL; why not just let it hand SQL to 
Postgres (which it already does...)


Have you by chance talked to Clark or Kirill about this?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-07-06 Thread Haribabu Kommi

On Fri, Jul 3, 2015 at 10:05 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 Attached, find the rebased version of patch.

 Note - You need to first apply the assess-parallel-safety patch which you
 can find at:
 http://www.postgresql.org/message-id/CAA4eK1JjsfE_dOsHTr_z1P_cBKi_X4C4X3d7Nv=vwx9fs7q...@mail.gmail.com

I ran some performance tests on a 16 core machine with large shared
buffers, so there is no IO involved.
With the default value of cpu_tuple_comm_cost, parallel plan is not
getting generated even if we are selecting 100K records from 40
million records. So I changed the value to '0' and collected the
performance readings.

Here are the performance numbers:

selectivity(millions)  Seq scan(ms)  Parallel scan
 2 workers
4 workers 8 workers
  0.1  11498.934821.40
3305.843291.90
  0.4  10942.984967.46
3338.583374.00
  0.8  11619.445189.61
3543.863534.40
  1.5  12585.515718.07
4162.712994.90
  2.7  14725.668346.96
10429.058049.11
  5.4  18719.00  20212.33 21815.19
 19026.99
  7.2  21955.79  28570.74 28217.60
 27042.27

The average table row size is around 500 bytes and query selection
column width is around 36 bytes.
when the query selectivity goes more than 10% of total table records,
the parallel scan performance is dropping.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

On Tue, Jul 7, 2015 at 2:56 AM, Josh Berkus j...@agliodbs.com wrote:
 pro-JSON:

 * standard syntax which is recognizable to sysadmins and devops.
 * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make
 additions/deletions from the synch rep config.
 * can add group labels (see below)

If we go this way, I think that managing a JSON blob with a GUC
parameter is crazy, this is way longer in character size than a simple
formula because of the key names. Hence, this JSON blob should be in a
separate place than postgresql.conf not within the catalog tables,
manageable using an SQL interface, and reloaded in backends using
SIGHUP.

 anti-JSON:
 * more verbose
 * syntax is not backwards-compatible, we'd need a switch

This point is valid as well in the pro-JSON portion.

 * people will want to use line breaks, which we can't support

Yes, this is caused by the fact of using a GUC. For a simple formula
this seems fine to me though, that's what we have today for s_s_names
and using a formula is not much longer in character size than what we
have now.

 Re: group labels: I see a lot of value in being able to add names to
 quorum groups.  Think about how this will be represented in system
 views; it will be difficult to show sync status of any quorum group in
 any meaningful way if the group has no label, and any system-assigned
 label would change unpredictably from the user's perspective.
 To give a JSON example, let's take the case of needing to sync to two of
 the servers in either London or NC:

 '{ remotes : { london_servers : { quorum : 2, servers : [
 london1, london2, london3 ] }, nc_servers : { quorum : 1,
 servers [ nc1, nc2 ] } }'

The JSON blob managing sync node information could contain additional
JSON objects that register a set of nodes as a given group. More
easily, you could use let's say the following structure to store the
blobs:
- pg_syncinfo/global, to store the root of the formula, that could use groups.
- pg_syncinfo/groups/$GROUP_NAME to store a set JSON blobs representing a group.

 The advantage of having these labels is that it becomes easy to
 represent statuses for them:

 sync_group  state   definition
 remotes waiting { london_servers : { quorum ...
 london_servers  synced  { quorum : 2, servers : ...
 nc_servers  waiting { quorum : 1, servers [  ...
 Without labels, we force the DBA to track groups by raw definitions,
 which would be difficult.  Also, there's the question of what we do on
 reload with any statuses of synch groups which are currently in-process,
 if we don't have a stable key with which to identify groups.

Well, yes.

 The other grammar issue has to do with the nesting nature of quorums and
 priorities.  A theoretical user could want:

 * a priority list of quorum groups
 * a quorum group of priority lists
 * a quorum group of quorum groups
 * a priority list of quorum groups of quorum groups
 * a quorum group of quorum groups of priority lists
 ... etc.

 I don't really see any possible end to the possible permutations, which
 is why it would be good to establish some real use cases from now in
 order to figure out what we really want to support.  Absent that, my
 inclination is that we should implement the simplest possible thing
 (i.e. no nesting) for 9.5.

I am not sure I agree that this will simplify the work. Currently
s_s_names has already 1 level, and we want to append groups to each
element of it as well, meaning that we'll need at least 2 level of
nesting.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-06 Thread Amit Langote

On 2015-07-07 AM 02:56, Josh Berkus wrote:
 
 Re: group labels: I see a lot of value in being able to add names to
 quorum groups.  Think about how this will be represented in system
 views; it will be difficult to show sync status of any quorum group in
 any meaningful way if the group has no label, and any system-assigned
 label would change unpredictably from the user's perspective.
 
 To give a JSON example, let's take the case of needing to sync to two of
 the servers in either London or NC:
 
 '{ remotes : { london_servers : { quorum : 2, servers : [
 london1, london2, london3 ] }, nc_servers : { quorum : 1,
 servers [ nc1, nc2 ] } }'
 

What if we write the above as:

remotes-1 (london_servers-2 [london1, london2, london3], nc_servers-1 [nc1, 
nc2])

That requires only slightly altering the proposed format, that is prepend sync
group label string to the quorum number. The monitoring view can be made to
internally generate JSON output (if needed) from it. It does not seem very
ALTER SYSTEM SET friendly but there are trade-offs either way.

Just my 2c.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Asynchronous execution on FDW

2015-07-06 Thread Kyotaro HORIGUCHI

Hello, thank you for looking this.

If it is acceptable to reconstruct the executor nodes to have
additional return state PREP_RUN or such (which means it needs
one more call for the first tuple) , I'll modify the whole
executor to handle the state in the next patch to do so.

I haven't take the advice I had so far in this sense. But I came
to think that it is the most reasonable way to solve this.


==
  - It was a problem when to give the first kick for async exec. It
 is not in ExecInit phase, and ExecProc phase does not fit,
 too. An extra phase ExecPreProc or something is too
 invasive. So I tried pre-exec callback.
 
 Any init-node can register callbacks on their turn, then the
 registerd callbacks are called just before ExecProc phase in
 executor. The first patch adds functions and structs to enable
 this.
 
 At a quick glance, I think this has all the same problems as starting
 the execution at ExecInit phase. The correct way to do this is to kick
 off the queries in the first IterateForeignScan() call. You said that
 ExecProc phase does not fit - why not?

Execution nodes are expected to return the first tuple if
available. But asynchronous execution can not return the first
tuple immediately. Simultaneous execution for the first tuple on
every foreign node is crucial than asynchronous fetching for many
cases, especially for the cases like sort/agg pushdown on FDW.

The reason why ExecProc does not fit is that the first loop
without returning tuple looks impact too large portion in
executor.

It is my mistake that it doesn't address the problem about
parameterized paths. Parameterized paths should be executed
within ExecProc loops so this patch would be like following.

- To gain the advantage of kicking execution before the first
  ExecProc loop, non-parameterized paths are started using the
  callback feature this patch provides.

- Parameterized paths need the upper nodes executed before it
  starts execution so they should be start in ExecProc loop, but
  runs asynchronously if possible.

This is rather a makeshift solution for the problem, but
considering current trend of parallelism, it might the time to
make the executor to fit parallel execution.

If it is acceptable to reconstruct the executor nodes to have
additional return state PREP_RUN or such (which means it needs
one more call for the first tuple) , I'll modify the whole
executor to handle the state in the next patch to do so.

I hate my stupidity if you suggested this kind of solution by do
it in ExecProc:(

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.5 alpha: some small comments on BRIN and btree_gin

2015-07-06 Thread Alvaro Herrera

Marc Mamin wrote:

 - BRIN cost: I've made a silly test, where all distinct values exist in all 
 BRIN page ranges:
   
   INSERT into tbrin_1 (cat_id, ) SELECT s%20, ... FROM 
 generate_series(1,300 )s;
   CREATE INDEX cat_brin_1 on tbrin_1 using BRIN (cat_id)with 
 (pages_per_range=64);
   SELECT * from tbrin_1 WHERE cat_id=10; 
  http://explain.depesz.com/s/9YQR
 
There seems to be no fence against useless BRIN indexes that would allow 
 a fallback on a table scan.
But the time overhead remind small :)

Hmm, I guess the costing function for brin could stand some improvement.
Clearly we're not covering all bases.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-07-06 Thread Amit Kapila

On Mon, Jul 6, 2015 at 10:54 PM, Jeff Davis pg...@j-davis.com wrote:

 On Mon, 2015-07-06 at 10:37 +0530, Amit Kapila wrote:

  Or the other way to look at it could be separate out fields which are
  required for parallel scan which is done currently by forming a
  separate structure ParallelHeapScanDescData.
 
 I was suggesting that you separate out both the normal scan fields and
 the partial scan fields, that way we're sure that rs_nblocks is not
 accessed during a parallel scan.


In patch rs_nblocks is used in paratial scan's as well, only the
way to initialize is changed.

 Or, you could try wrapping the parts of heapam.c that are affected by
 parallelism into new static functions.


Sounds sensible to me, but I would like to hear from Robert before
making this change, if he has any different opinions about this point, as
he has originally written this part of the patch.

  The reason why partial scan can't be mixed with sync scan is that in
  parallel
  scan, it performs the scan of heap by synchronizing blocks (each
  parallel worker
  scans a block and then asks for a next block to scan) among parallel
  workers.
  Now if we try to make sync scans work along with it, the
  synchronization among
  parallel workers will go for a toss.  It might not be impossible to
  make that
  work in some way, but not sure if it is important enough for sync
  scans to work
  along with parallel scan.

 I haven't tested it, but I think it would still be helpful. The block
 accesses are still in order even during a partial scan, so why wouldn't
 it help?

 You might be concerned about the reporting of a block location, which
 would become more noisy with increased parallelism. But in my original
 testing, sync scans weren't very sensitive to slight deviations, because
 of caching effects.


I am not sure how many blocks difference could be considered okay for
deviation?
In theory, making parallel scan perform sync scan could lead to difference
of multiple blocks, consider the case where there are 32 or more workers
participating in scan and each got one block to scan, it is possible that
first worker performs scan of 1st block after 32nd worker performs the
scan of 32nd block (it could lead to even bigger differences).


  tqueue.c is mainly designed to pass tuples between parallel workers
  and currently it is used in Funnel operator to gather the tuples
  generated
  by all the parallel workers.  I think we can use it for any other
  operator
  which needs tuple communication among parallel workers.

 Some specifics of the Funnel operator seem to be a part of tqueue, which
 doesn't make sense to me. For instance, reading from the set of queues
 in a round-robin fashion is part of the Funnel algorithm, and doesn't
 seem suitable for a generic tuple communication mechanism (that would
 never allow order-sensitive reading, for example).


Okay, this makes sense to me, I think it is better to move Funnel
operator specific parts out of tqueue.c unless Robert or anybody else
feels otherwise.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Support for N synchronous standby servers - take 2

On 07/06/2015 06:40 PM, Michael Paquier wrote:
 On Tue, Jul 7, 2015 at 2:56 AM, Josh Berkus j...@agliodbs.com wrote:
 pro-JSON:

 * standard syntax which is recognizable to sysadmins and devops.
 * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make
 additions/deletions from the synch rep config.
 * can add group labels (see below)
 
 If we go this way, I think that managing a JSON blob with a GUC
 parameter is crazy, this is way longer in character size than a simple
 formula because of the key names. Hence, this JSON blob should be in a
 separate place than postgresql.conf not within the catalog tables,
 manageable using an SQL interface, and reloaded in backends using
 SIGHUP.

I'm not following this at all.  What are you saying here?

 I don't really see any possible end to the possible permutations, which
 is why it would be good to establish some real use cases from now in
 order to figure out what we really want to support.  Absent that, my
 inclination is that we should implement the simplest possible thing
 (i.e. no nesting) for 9.5.
 
 I am not sure I agree that this will simplify the work. Currently
 s_s_names has already 1 level, and we want to append groups to each
 element of it as well, meaning that we'll need at least 2 level of
 nesting.

Well, we have to draw a line somewhere, unless we're going to support
infinite recursion.

And if we are going to support infinitie recursion, and kind of compact
syntax for a GUC isn't even worth talking about ...


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-07-06 Thread Amit Kapila

On Tue, Jul 7, 2015 at 6:19 AM, Haribabu Kommi kommi.harib...@gmail.com
wrote:

On Fri, Jul 3, 2015 at 10:05 PM, Amit Kapila amit.kapil...@gmail.com
wrote:
Attached, find the rebased version of patch.

Note - You need to first apply the assess-parallel-safety patch which
you
can find at:

http://www.postgresql.org/message-id/CAA4eK1JjsfE_dOsHTr_z1P_cBKi_X4C4X3d7Nv=vwx9fs7q...@mail.gmail.com

I ran some performance tests on a 16 core machine with large shared
buffers, so there is no IO involved.
With the default value of cpu_tuple_comm_cost, parallel plan is not
getting generated even if we are selecting 100K records from 40
million records. So I changed the value to '0' and collected the
performance readings.

For reasonable default values for these parameters, still more testing
is required. I think instead of 0, tests with 0.001 or 0.0025 for default
of cpu_tuple_comm_cost and 100 or 1000 for default of parallel_setup_cost
would have been more interesting.

Here are the performance numbers:

The average table row size is around 500 bytes and query selection
column width is around 36 bytes.
when the query selectivity goes more than 10% of total table records,
the parallel scan performance is dropping.

These are quite similar to what I have seen in my initial tests, now I
think if you add some complex condition in the filter, you will see gains
for even 25% or more selectivity (I have added factorial 10 calculation in
filter to mimic the complex filter condition).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Memory Accounting v11

2015-07-06 Thread David Rowley

On 15 June 2015 at 07:43, Jeff Davis pg...@j-davis.com wrote:

 This patch tracks memory usage (at the block level) for all memory
 contexts. Individual palloc()s aren't tracked; only the new blocks
 allocated to the memory context with malloc().

 It also adds a new function, MemoryContextMemAllocated() which can
 either retrieve the total allocated for the context, or it can recurse
 and sum up the allocations for all subcontexts as well.

 This is intended to be used by HashAgg in an upcoming patch that will
 bound the memory and spill to disk.

 Previous discussion here:

 http://www.postgresql.org/message-id/1407012053.15301.53.camel@jeff-desktop

 Previous concerns:

 * There was a slowdown reported of around 1-3% (depending on the exact
 version of the patch) on an IBM power machine when doing an index
 rebuild. The results were fairly noisy for me, but it seemed to hold up.
 See http://www.postgresql.org/message-id/CA
 +Tgmobnu7XEn1gRdXnFo37P79bF=qLt46=37ajp3cro9db...@mail.gmail.com
 * Adding a struct field to MemoryContextData may be bad for the CPU
 caching behavior, and may be the cause of the above slowdown.


I've been looking at this patch and trying to reproduce the reported
slowdown by using Tomas' function to try to exercise palloc() with minimal
overhead of other code:

https://github.com/tvondra/palloc_bench

I also wanted to attempt to determine if the slowdown was due to the
mem_allocated maintenance or if it was down to the struct size increasing.
I decided to test this just by simply commenting out all of
the context-mem_allocated += / -= and just keep the mem_allocated field in
the struct, but I've been really struggling to see any sort of performance
decrease anywhere. I'm just getting too much variation in the test results
to get any sort of idea.

I've attached a spreadsheet of the results I saw. It would be really good
if Robert was able to test with the IBM PPC just with the extra field in
the struct so we can see if the 1-3% lies there, or if it's in overhead of
keeping mem_allocated up-to-date.

My main question here is: How sure are you that none of your intended use
cases will need to call MemoryContextMemAllocated with recurse = true? I'd
imagine if you ever have to use MemoryContextMemAllocated(ctx, true) then
we'll all be sitting around with our stopwatches out to see if the call
incurs too much of a performance penalty.

Other small issues:


+ oldblksize = block-endptr - ((char *)block);
+
  block = (AllocBlock) realloc(block, blksize);
  if (block == NULL)
  return NULL;
+
+ context-mem_allocated += blksize - oldblksize;
+

Shouldn't you be setting oldblksize after the if? I'd imagine the
realloc() failure is rare, but it just seems better to do it just before
it's required and only when required.

+ if (recurse)
+ {
+ MemoryContext child = context-firstchild;
+ for (child = context-firstchild;
+  child != NULL;

Here there's a double assignment of child.

***
*** 632,637  AllocSetDelete(MemoryContext context)
--- 637,644 
  {
  AllocBlock next = block-next;

+ context-mem_allocated -= block-endptr - ((char *) block);
+

I might be mistaken here, but can't you just set context-mem_allocted = 0;
after that loop?
Or maybe it would be an improvement to only do the decrement
if MEMORY_CONTEXT_CHECKING is defined, then Assert() that mem_allocated ==
0 after the loop...


One other thing that I don't remember seeing discussed would be to just
store a List in each context which would store all of the mem_allocated's
that need to be updated during each allocation and deallocation of memory.
The list would be setup once when the context is created. When a child
context is setup we'd just loop up the parent context chain and
list_concat() all of the parent's lists onto the child's lists. Would this
method add too much overhead when a context is deleted? Has this been
explored already?

Regards

David Rowley

--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


mem_accounting_benchmark.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

On Tue, Jul 7, 2015 at 12:51 PM, Josh Berkus j...@agliodbs.com wrote:
 On 07/06/2015 06:40 PM, Michael Paquier wrote:
 On Tue, Jul 7, 2015 at 2:56 AM, Josh Berkus j...@agliodbs.com wrote:
 pro-JSON:

 * standard syntax which is recognizable to sysadmins and devops.
 * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make
 additions/deletions from the synch rep config.
 * can add group labels (see below)

 If we go this way, I think that managing a JSON blob with a GUC
 parameter is crazy, this is way longer in character size than a simple
 formula because of the key names. Hence, this JSON blob should be in a
 separate place than postgresql.conf not within the catalog tables,
 manageable using an SQL interface, and reloaded in backends using
 SIGHUP.

 I'm not following this at all.  What are you saying here?

A JSON string is longer in terms of number of characters than a
formula because it contains key names, and those key names are usually
repeated several times, making it harder to read in a configuration
file. So what I am saying that that we do not save it as a GUC, but as
a separate metadata that can be accessed with a set of SQL functions
to manipulate it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

On 07/06/2015 09:56 PM, Michael Paquier wrote:
 On Tue, Jul 7, 2015 at 12:51 PM, Josh Berkus j...@agliodbs.com wrote:
 On 07/06/2015 06:40 PM, Michael Paquier wrote:
 On Tue, Jul 7, 2015 at 2:56 AM, Josh Berkus j...@agliodbs.com wrote:
 pro-JSON:

 * standard syntax which is recognizable to sysadmins and devops.
 * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make
 additions/deletions from the synch rep config.
 * can add group labels (see below)

 If we go this way, I think that managing a JSON blob with a GUC
 parameter is crazy, this is way longer in character size than a simple
 formula because of the key names. Hence, this JSON blob should be in a
 separate place than postgresql.conf not within the catalog tables,
 manageable using an SQL interface, and reloaded in backends using
 SIGHUP.

 I'm not following this at all.  What are you saying here?
 
 A JSON string is longer in terms of number of characters than a
 formula because it contains key names, and those key names are usually
 repeated several times, making it harder to read in a configuration
 file. So what I am saying that that we do not save it as a GUC, but as
 a separate metadata that can be accessed with a set of SQL functions
 to manipulate it.

Where, though?  Someone already pointed out the issues with storing it
in a system catalog, and adding an additional .conf file with a
different format is too horrible to contemplate.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-06 Thread Beena Emerson

Josh Berkus wrote:

 '{ remotes : { london_servers : { quorum : 2, servers : [ 
 london1, london2, london3 ] }, nc_servers : { quorum : 1, 
 servers [ nc1, nc2 ] } }' 
 
 This says: as the remotes group, synch with a quorum of 2 servers in 
 london and a quorum of 1 server in NC.

I wanted to clarify about the format.
The remotes group does not specify any quorum, only its individual elements
mention the quorum.
remotes is said to sync in london_servers and NC.
Would absence of a quorum number in a group mean all elements?
Or the above would be represented as following to imply AND between the 2
DC.

'{ remotes : 
quorum : 2, servers :
{ london_servers : 
{ quorum : 2, servers : [ london1, london2, london3 ] 
},
  nc_servers : 
{ quorum : 1, servers : [ nc1, nc2 ] } 
}
}'





-
Beena Emerson

--
View this message in context: 
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5856868.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2015-07-06 Thread Beena Emerson

Amit wrote:
 What if we write the above as: 
 
 remotes-1 (london_servers-2 [london1, london2, london3], nc_servers-1
 [nc1, nc2])

Yes this we can consider.

Thanks,



-
Beena Emerson

--
View this message in context: 
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5856869.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dblink: add polymorphic functions.

On Sun, Jul 5, 2015 at 9:00 PM, Joe Conway m...@joeconway.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 07/05/2015 12:25 PM, Joe Conway wrote:
  On 02/22/2015 10:26 PM, Corey Huinker wrote:
  Changes in this patch: - added polymorphic versions of
  dblink_fetch() - upped dblink version # to 1.2 because of new
  functions - migration 1.1 - 1.2 - DocBook changes for dblink(),
  dblink_get_result(), dblink_fetch()
 
  The previous patch was missing dblink--1.1--1.2.sql and
  dblink--1.2.sql. I have added them, so it should apply cleanly
  against git master, but not done any actual review yet.

 Looking at the argument handling logic in dblink_fetch() and
 dblink_record_internal(), it has been messy for a while, but this
 patch serves to make it even worse.

 I wonder if it isn't better to just loop through all the args with
 get_fn_expr_argtype() every time and then test for the exact signature
 match? Another alternative might be to create a wrapper C function for
 each variant SQL function, but that seems like it could also get
 messy, especially with dblink_record_internal() since we would have to
 deal with every variant of all the external facing callers.

 Thoughts?


When I was digging into it, it seemed that the parameter sniffing was on
the verge of combinatorical chaos, which I assumed was due to parameter
checking being expensive.
And while there was no processing for the anyelement parameters, it did
throw off the parameter counts, thus the extra complexity.
If it's possible to suss out whether the last parameter is the polymorphic
type, then we could bump down the arg-count by one, and have the old messy
logic tree.

I also didn't delve into whether or not it was dangerous to ask
get_fn_expr_argtype() for parameters beyond PG_NARGS(). If it is, then we
would still have the complicated signature-testing tree, but at least we
could separate out the signature poking from the actual processing, and
that might add some clarity.

We could walk the arglist and handle each arg as you go, but that gets
complicated with the text params because their meaning is often determined
by whether the next param is also text. That has the potential to be every
bit as messy.

So from this, I see five options:

1. Leave it.
It's a mess, but we have test case coverage to show it works.

2. A bloody minded approach
if ((PG_NARGS() == 3)  (get_fn_expr_argtype(...,0) == TEXTOID) 
(get_fn_expr_argtype(...,1) ==  TEXTOID)  (get_fn_expr_argtype(...,2) ==
BOOLOID))
{
}
else if ((PG_NARGS() == 3)  (get_fn_expr_argtype(...,0) == TEXTOID)
 (get_fn_expr_argtype(...,1) ==  BOOLOID)  (get_fn_expr_argtype(...,2)
== WHATEVER_ANYELEMENTS_ARE_OID))
{
}
else if ...

Pro:
It's utterly clear which signature was found
Con:
A lot of param checks, though most will shortcircuit
The number of checks itself might become clutter.

3. Separate signature testing from using the parameters.
if (PG_NARGS() == 3)
{
 if (get_fn_expr_argtype(fcinfo-flinfo, 2) == BOOLOID)
{
 signature = TEXTTEXTBOOL
}
else if (get_fn_expr_argtype(fcinfo-flinfo, 1) == BOOLOID)
{
 signature = TEXTBOOLANY
}
else
{
 signature = TEXTTEXTANY
}
}

switch(signature)
{
 case TEXTTEXTBOOL:
  t1 = text_to_cstring(PG_GETARG_TEXT_PP(0);
  t2 = text_to_cstring(PG_GETARG_TEXT_PP(1);
  fail = PG_GETARG_BOOL(2);
  break;


Pro:
 Still getting to the signature with a minimum of comparisons.
Con:
 A fair number of #defines for each signature combination (minus
anyelements, as we never have to process those).
 Big(?) waterfall of options.
 The mess has just been segregated, not eliminated.



4. Walk the arglist, assigning as you go, on the assumption that if the
current arg is text field it will be the last one encountered, and
re-assigning the pointers when we discover otherwise.

Pro:
A tight loop, probably the least amount of code.
I *think* this is what Joe was suggesting earlier.
Con:
We just made a mini-state engine, sorta like what you have to do
with event driven parsers (XML SAX, for example).


5. Walk the arglist like in #3, assigning bool/int parameters immediately,
but just count the text parameters, and assign them based on how many we
found.

Pro:
Same as #4, but my high school CS teacher won't come out of
retirement just to scold me for repurposing variables.
Con:
If at some point in the future we add more text parameters such
that simple ordering is ambiguous (i.e. were those text fields at positions
1,2,3 or 1,2,5?) this would break down.


I'm happy to try to code whichever of these people think is the most
promising.

Re: [HACKERS] dblink: add polymorphic functions.

On Mon, Jul 6, 2015 at 4:25 AM, Joe Conway m...@joeconway.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 02/22/2015 10:26 PM, Corey Huinker wrote:
 Changes in this patch: - added polymorphic versions of
 dblink_fetch() - upped dblink version # to 1.2 because of new
 functions - migration 1.1 - 1.2 - DocBook changes for dblink(),
 dblink_get_result(), dblink_fetch()

 The previous patch was missing dblink--1.1--1.2.sql and
 dblink--1.2.sql. I have added them, so it should apply cleanly against
 git master, but not done any actual review yet.

This extension format is still incorrect, as there is no need for
dblink--1.1.sql as on the latest version of an extension needs to be
supported, and Makefile needs to be updated in consequence. Please
find attached an updated patch.

+-- fetch using anyelement, which will change the column names
 SELECT *
-FROM dblink_fetch('rmt_foo_cursor',4) AS t(a int, b text, c text[]);
- a | b | c
+---+
- 4 | e | {a4,b4,c4}
- 5 | f | {a5,b5,c5}
- 6 | g | {a6,b6,c6}
- 7 | h | {a7,b7,c7}
+FROM dblink_fetch('rmt_foo_cursor',4,null::foo) AS t;
+ f1 | f2 | f3
+++
+  4 | e  | {a4,b4,c4}
+  5 | f  | {a5,b5,c5}
+  6 | g  | {a6,b6,c6}
+  7 | h  | {a7,b7,c7}
 (4 rows)

Why isn't this test case not left alone? It looks like a regression to
me to not test it.
Regards,
-- 
Michael
diff --git a/contrib/dblink/Makefile b/contrib/dblink/Makefile
index b8d5157..5189758 100644
--- a/contrib/dblink/Makefile
+++ b/contrib/dblink/Makefile
@@ -6,7 +6,8 @@ PG_CPPFLAGS = -I$(libpq_srcdir)
 SHLIB_LINK = $(libpq)
 
 EXTENSION = dblink
-DATA = dblink--1.1.sql dblink--1.0--1.1.sql dblink--unpackaged--1.0.sql
+DATA = dblink--1.2.sql dblink--1.1--1.2.sql dblink--1.0--1.1.sql \
+	dblink--unpackaged--1.0.sql
 PGFILEDESC = dblink - connect to other PostgreSQL databases
 
 REGRESS = paths dblink
diff --git a/contrib/dblink/dblink--1.1--1.2.sql b/contrib/dblink/dblink--1.1--1.2.sql
new file mode 100644
index 000..a82e507
--- /dev/null
+++ b/contrib/dblink/dblink--1.1--1.2.sql
@@ -0,0 +1,44 @@
+/* contrib/dblink/dblink--1.1--1.2.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use ALTER EXTENSION dblink UPDATE TO '1.2' to load this file. \quit
+
+CREATE FUNCTION dblink (text, text, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_record'
+LANGUAGE C;
+
+CREATE FUNCTION dblink (text, text, boolean, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_record'
+LANGUAGE C;
+
+CREATE FUNCTION dblink (text, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_record'
+LANGUAGE C;
+
+CREATE FUNCTION dblink (text, boolean, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_record'
+LANGUAGE C;
+
+CREATE FUNCTION dblink_get_result (text, anyelement)
+RETURNS SETOF anyelement
+AS 'MODULE_PATHNAME', 'dblink_get_result'
+LANGUAGE C;
+
+CREATE FUNCTION dblink_get_result (text, bool, anyelement)
+RETURNS SETOF anyelement
+AS 'MODULE_PATHNAME', 'dblink_get_result'
+LANGUAGE C;
+
+CREATE FUNCTION dblink_fetch (text, int, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_fetch'
+LANGUAGE C;
+
+CREATE FUNCTION dblink_fetch (text, int, boolean, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_fetch'
+LANGUAGE C;
diff --git a/contrib/dblink/dblink--1.1.sql b/contrib/dblink/dblink--1.2.sql
similarity index 84%
rename from contrib/dblink/dblink--1.1.sql
rename to contrib/dblink/dblink--1.2.sql
index 8733553..1d84df2 100644
--- a/contrib/dblink/dblink--1.1.sql
+++ b/contrib/dblink/dblink--1.2.sql
@@ -1,4 +1,4 @@
-/* contrib/dblink/dblink--1.1.sql */
+/* contrib/dblink/dblink--1.2.sql */
 
 -- complain if script is sourced in psql, rather than via CREATE EXTENSION
 \echo Use CREATE EXTENSION dblink to load this file. \quit
@@ -71,6 +71,16 @@ RETURNS setof record
 AS 'MODULE_PATHNAME','dblink_fetch'
 LANGUAGE C STRICT;
 
+CREATE FUNCTION dblink_fetch (text, int, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_fetch'
+LANGUAGE C;
+
+CREATE FUNCTION dblink_fetch (text, int, boolean, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_fetch'
+LANGUAGE C;
+
 CREATE FUNCTION dblink_fetch (text, text, int)
 RETURNS setof record
 AS 'MODULE_PATHNAME','dblink_fetch'
@@ -121,6 +131,26 @@ RETURNS setof record
 AS 'MODULE_PATHNAME','dblink_record'
 LANGUAGE C STRICT;
 
+CREATE FUNCTION dblink (text, text, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_record'
+LANGUAGE C;
+
+CREATE FUNCTION dblink (text, text, boolean, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_record'
+LANGUAGE C;
+
+CREATE FUNCTION dblink (text, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_record'
+LANGUAGE C;
+
+CREATE FUNCTION dblink (text, boolean, anyelement)
+RETURNS setof anyelement
+AS 'MODULE_PATHNAME','dblink_record'
+LANGUAGE C;
+
 CREATE FUNCTION dblink_exec

Re: [HACKERS] WAL logging problem in 9.4.3?

On 2015-07-06 11:14:40 -0400, Tom Lane wrote:
 BEGIN;
 CREATE TABLE test (i int primary key);
 INSERT INTO test VALUES(-1);
 \copy test from /tmp/num.csv with csv
 COMMIT;
 SELECT COUNT(*) FROM test;
 
 The COUNT() correctly says 11 rows, but after crash-and-recover,
 only the row with -1 is there.  This is because the INSERT writes
 out an INSERT+INIT WAL record, which we happily replay, clobbering
 the data added later by COPY.

ISTM any WAL logged action that touches a relfilenode essentially needs
to disable further optimization based on the knowledge that the relation
is new.

 We might have to give up on this COPY optimization :-(.

A crazy, not well though through, bandaid for the INSERT+INIT case would
be to force COPY to use a new page when using the SKIP_WAL codepath.

 I'm not sure what would be a safe rule for deciding that we can skip
 WAL logging in this situation, but I am pretty sure that it would
 require keeping information we don't currently keep about what's
 happened earlier in the transaction.

It'd not be impossible to add more state to the relcache entry for the
relation. Whether it's likely that we'd find all the places that'd need
updating that state, I'm not sure.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] LSN Map


On 02/24/2015 04:55 AM, Robert Haas wrote:

On Mon, Feb 23, 2015 at 12:52 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Dunno, but Jim's got a point. This is a maintenance burden to all indexams,
if they all have to remember to update the LSN map separately. It needs to
be done in some common code, like in PageSetLSN or XLogInsert or something.

Aside from that, isn't this horrible from a performance point of view? The
patch doubles the buffer manager traffic, because any update to any page
will also need to modify the LSN map. This code is copied from the
visibility map code, but we got away with it there because the VM only needs
to be updated the first time a page is modified. Subsequent updates will
know the visibility bit is already cleared, and don't need to access the
visibility map.

Ans scalability: Whether you store one value for every N pages, or the LSN
of every page, this is going to have a huge effect of focusing contention to
the LSN pages. Currently, if ten backends operate on ten different heap
pages, for example, they can run in parallel. There will be some contention
on the WAL insertions (much less in 9.4 than before). But with this patch,
they will all fight for the exclusive lock on the single LSN map page.

You'll need to find a way to not update the LSN map on every update. For
example, only update the LSN page on the first update after a checkpoint
(although that would still have a big contention focusing effect right after
a checkpoint).


I think it would make more sense to do this in the background.
Suppose there's a background process that reads the WAL and figures
out which buffers it touched, and then updates the LSN map
accordingly.  Then the contention-focusing effect disappears, because
all of the updates to the LSN map are being made by the same process.
You need some way to make sure the WAL sticks around until you've
scanned it for changed blocks - but that is mighty close to what a
physical replication slot does, so it should be manageable.


If you implement this as a background process that reads WAL, as Robert 
suggested, you could perhaps implement this completely in an extension. 
That'd be nice, even if we later want to integrate this in the backend, 
in order to get you started quickly.


This is marked in the commitfest as Needs Review, but ISTM this got 
its fair share of review back in February. Marking as Returned with 
Feedback.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dblink: add polymorphic functions.

2015-07-06 Thread Joe Conway

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/06/2015 07:37 AM, Merlin Moncure wrote:
 yup, and at least one case now fails where previously it ran
 through: postgres=# select * from dblink('a', 'b', 'c'); ERROR:
 function dblink(unknown, unknown, unknown) is not unique

Hmm, that is an issue, possibly a fatal one.

When Cory first mentioned this to me over a year ago we discussed some
other, arguably better and more generic solutions. One was to build
support for:

  SELECT * FROM srf() AS TYPE OF(foo);

The second idea I think is actually SQL standard if I recall correctly:

  SELECT * FROM CAST(srf() AS foo) x;

Currently this works:

8
select *
from cast (row(11,'l','{a11,b11,c11}') as foo);
 f1 | f2 |  f3
- ++---
 11 | l  | {a11,b11,c11}
(1 row)
8

But this fails:

8
select *
from cast
(dblink('dbname=contrib_regression','select * from foo') as foo);
ERROR:  cannot cast type record to foo
8

Comments in the source have this to say:

8
/*
 * coerce_record_to_complex
 *  Coerce a RECORD to a specific composite type.
 *
 * Currently we only support this for inputs that are RowExprs or
 * whole-row Vars.
 */
8

That explains why the first example works while the second does not.
I'm not sure how hard it would be to fix that, but it appears that
that is where we should focus.

Joe

- -- 
Joe Conway
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVmpZDAAoJEDfy90M199hlIUsQAKcYeOXzzrlZjCeDSdIdx8YM
lAZ2kxjPwy5tLwm/WL1HwAM0epSBIMneF0JZuysFn77+bvbXId6LWtxkFKNakypP
tM5Ep0BjUFXgvzM3DVL0u9dkvrgcKlnwILsBjiWXqUVd/x8o2VAqfSOwfKG2SNQU
zo0l6u7ZeKY+gkFDsKU7eCo01dEI7MjGelKynRNE8TIpAJszNMsPpA4Y4WVRjbDT
GvA4O7f1Cws8Ewszle4gRWjx6TZ0mJVlt/FriFgtoRepoJjEalb5nfGLulx47Veq
iMFLbZr8xxc5u/ncR8bgi3Vc9g8H3eZsV6S85JeGuBS6cJOuw561gH5LkyEi5oeW
NHjZgf3oUnMZg1JE5cCGOyTQ/E/63itTor767f6KpaP/oEXckSCIUTh7azvD89sm
FD2udE3UWgC+16U11Ru+3TrRz11ETodF4TeW674CW3zA39dYjj3Us/PG6SDbM6zm
INO+pBvDIdhVaYvqy1yRqGMzgoQyCIAuI6sP+Q6CYqOd1Fdl42zPKGZ3F2SLEjq5
RfTKywrE1VHv1eafa6lCs5yaR7Jzr5FyRKFw9ob+TixN+x7vf6Gwcb9hkGzs9t2J
7NIiZ++U02fgVVVJGJ2VZ24gZVytP6sahq6Z6ccWacwic7lL7Us0mWgOKLzU1Umj
NLwHJoDkgV7SgGWVMsE/
=aRTA
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dblink: add polymorphic functions.

2015-07-06 Thread Joe Conway

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/06/2015 12:39 AM, Michael Paquier wrote:
 Yeah, particularly the use of first_optarg makes things harder
 to follow in the code with this patch. A C wrapper has the
 disadvantage to decentralize the argument checks to many places
 making the flow harder to follow hence using
 get_fn_expr_argtype() with PG_NARGS would be the way to go, at
 least to me. This way, you can easily find how many arguments
 there are, and which value is assigned to which variable before
 moving on to the real processing.
 
 Just to be clear I mean that: if (PG_NARGS() == 5) { if
 (get_fn_expr_argtype(fcinfo-flinfo, 1) == TYPEOID) var =
 PG_GETARG_BOOL(1) [...]

Actually, I had in mind something like:
8-
inti;
intnumargs;
int   *argtypes;

numargs = PG_NARGS();
argtypes = palloc(numargs * sizeof(int));
for (i = 0; i  numargs; i++)
argtypes[i] = get_fn_expr_argtype(fcinfo-flinfo, i);

if ((numargs == 4 || numargs == 5) 
argtypes[0] == TEXTOID 
argtypes[1] == TEXTOID 
argtypes[2] == INT4OID 
argtypes[3] == BOOLOID)
{
[...]
}
else if  ((numargs == 3 || numargs == 4) 
argtypes[0] == TEXTOID 
argtypes[1] == INT4OID 
argtypes[2] == BOOLOID)
{
[...]
8-
etc.

If the highest number of arguments is always checked first, the
pattern is not ambiguous even with the extra anyelement.

- -- 
Joe Conway
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVmovZAAoJEDfy90M199hlzDsP/j/v76Ecv3VtZ+02Lo9mUVp6
mf+ZTLgw2OE4sNI6PL2MeVzu5ayZAkHxPRAah9rR1A7nDNEyZovi3dymFRHeiJ6Z
rowjkdZiRX/xlV5s6RHrdmX6DkVkBeGcKMuIKB/Tud2uPCiuBULkcUuD1OvlxsCs
W0E+hsuYmpGtsH8Vth+ciKiBUDX/BVWCOnqZXISRf6BZ5BjzITOEuCyn4EChx2o4
9gOGPTf3P/4I3buuDuV+DEmO1N4L07VvSWb9e92NrdS3VI1ae8YJu3u248WVmdPY
+qHg0J7jLGYBFRZ+isC7p8OX7PANCm88GvMCqklZdPo+/76n4J6x5MJfjinQEfXN
rbScwRh3O1DCimw404WqYSGKGEcX7MtUt98h+//nMft3aEz1gRnsuopnM/eRmQcS
wYlbBYon5YEdamx2o5AP6NX5zFU+6HBcKcznCFQtcsaJeh03yz9zILuCWxg4dWNj
T5aVCYu58PN4ELKP3yF5wN2UUSE4/OZJHgvIrHF8LiDOSdygvfADdUAmfD0sry48
tjbL9GC7JwHxqQ8qYktDMogxZo+s3TBsmw3NsnYXrNYwObbGCP9J0K0zV76ukMEc
V1qiMXD4/gddkKXJz0cVfcActrDqEltKDPTCdhLpoc4Gb59mrohgk+j75f2af14V
+n/AvaymgwdKjQpBhaTb
=tESF
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dblink: add polymorphic functions.

On Mon, Jul 6, 2015 at 10:08 AM, Joe Conway m...@joeconway.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 07/06/2015 12:39 AM, Michael Paquier wrote:
  Yeah, particularly the use of first_optarg makes things harder
  to follow in the code with this patch. A C wrapper has the
  disadvantage to decentralize the argument checks to many places
  making the flow harder to follow hence using
  get_fn_expr_argtype() with PG_NARGS would be the way to go, at
  least to me. This way, you can easily find how many arguments
  there are, and which value is assigned to which variable before
  moving on to the real processing.
 
  Just to be clear I mean that: if (PG_NARGS() == 5) { if
  (get_fn_expr_argtype(fcinfo-flinfo, 1) == TYPEOID) var =
  PG_GETARG_BOOL(1) [...]

 Actually, I had in mind something like:
 8-
 inti;
 intnumargs;
 int   *argtypes;

 numargs = PG_NARGS();
 argtypes = palloc(numargs * sizeof(int));
 for (i = 0; i  numargs; i++)
 argtypes[i] = get_fn_expr_argtype(fcinfo-flinfo, i);

 if ((numargs == 4 || numargs == 5) 
 argtypes[0] == TEXTOID 
 argtypes[1] == TEXTOID 
 argtypes[2] == INT4OID 
 argtypes[3] == BOOLOID)
 {
 [...]
 }
 else if  ((numargs == 3 || numargs == 4) 
 argtypes[0] == TEXTOID 
 argtypes[1] == INT4OID 
 argtypes[2] == BOOLOID)
 {
 [...]
 8-
 etc.

 If the highest number of arguments is always checked first, the
 pattern is not ambiguous even with the extra anyelement.


Utterly reasonable and do-able. I'll get to work on that soonish, pending
resolution of other's concerns.

Re: [HACKERS] dblink: add polymorphic functions.

On Mon, Jul 6, 2015 at 9:52 AM, Joe Conway m...@joeconway.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 07/06/2015 07:37 AM, Merlin Moncure wrote:
 yup, and at least one case now fails where previously it ran
 through: postgres=# select * from dblink('a', 'b', 'c'); ERROR:
 function dblink(unknown, unknown, unknown) is not unique

 Hmm, that is an issue, possibly a fatal one.

 When Cory first mentioned this to me over a year ago we discussed some
 other, arguably better and more generic solutions. One was to build
 support for:

   SELECT * FROM srf() AS TYPE OF(foo);

 The second idea I think is actually SQL standard if I recall correctly:

   SELECT * FROM CAST(srf() AS foo) x;

 Currently this works:

 8
 select *
 from cast (row(11,'l','{a11,b11,c11}') as foo);
  f1 | f2 |  f3
 - ++---
  11 | l  | {a11,b11,c11}
 (1 row)
 8

 But this fails:

 8
 select *
 from cast
 (dblink('dbname=contrib_regression','select * from foo') as foo);
 ERROR:  cannot cast type record to foo
 8

 Comments in the source have this to say:

 8
 /*
  * coerce_record_to_complex
  *  Coerce a RECORD to a specific composite type.
  *
  * Currently we only support this for inputs that are RowExprs or
  * whole-row Vars.
  */
 8

 That explains why the first example works while the second does not.
 I'm not sure how hard it would be to fix that, but it appears that
 that is where we should focus.

Yeah.  FWIW, here's my 0.02$:  I use dblink all the time, for all
kinds of reasons, vastly preferring to have control over the query
string (vs. FDW type solutions).  I have two basic gripes with it.  #1
is that remote queries are not cancelled over all call sites when
cancelled locally (off-topic for this thread) and #2 is that the SRF
record describing mechanics are not abstractable because of using
syntax to describe the record.  Corey's proposal, overloading issues
aside, appears to neatly deal with this problem because anyelement can
be passed down through a wrapping API.

IOW, I'd like to do:
CREATE FUNCTION remote_call(...) RETURNS ... AS
$$
  SELECT dblink(...) AS r(...)
$$ language sql;

...which can't be done (discounting dynamic sql acrobatics) because of
the syntax based expression of the 'r' record.  So I like Corey's
idea...I just think the functions need to be named differently (maybe
to 'dblink_any', and dblink_get_result_any'?).   TBH, to do better
than that you'd need SQL level support for handling the return type in
the vein of NEW/OLD.  For fun, let's call it 'OUT'...then you could:

SELECT * FROM remote_call(...) RETURNS SETOF foo;

Inside remote_call, you'd see something like:

SELECT dblink(...) AS OUT;

As to the proposed syntax, I would vote to support the SQL standard
variant if it could be handled during parse.  I don't see what AS TYPE
OF really buys you because FWICT it does not support wrapping.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in bttext_abbrev_convert()

2015-07-06 Thread Alvaro Herrera

Peter Geoghegan wrote:

 It would be nice to always have a html report from gcov always
 available on the internet. That would be something useful to automate,
 IMV.

http://pgci.eisentraut.org/jenkins/job/postgresql_master_coverage/Coverage/

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dblink: add polymorphic functions.

On Mon, Jul 6, 2015 at 9:35 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Joe Conway m...@joeconway.com writes:
 Actually, I had in mind something like:
 8-
 inti;
 intnumargs;
 int   *argtypes;

 numargs = PG_NARGS();
 argtypes = palloc(numargs * sizeof(int));
 for (i = 0; i  numargs; i++)
 argtypes[i] = get_fn_expr_argtype(fcinfo-flinfo, i);

 if ((numargs == 4 || numargs == 5) 
 argtypes[0] == TEXTOID 
 argtypes[1] == TEXTOID 
 argtypes[2] == INT4OID 
 argtypes[3] == BOOLOID)
 {
 [...]
 }
 else if  ((numargs == 3 || numargs == 4) 
 argtypes[0] == TEXTOID 
 argtypes[1] == INT4OID 
 argtypes[2] == BOOLOID)
 {
 [...]
 8-
 etc.

 If the set of allowed argument-type combinations is so easily enumerable,
 I don't understand why this is being done at all.  Create a separate SQL
 function for each combination.  You can still let the called C functions
 call a common implementation routine if that's helpful.

 However, this might all be moot in view of Merlin's objection.  It is
 definitely completely uncool to have both of these:

  public | dblink | SETOF anyelement | text, anyelement| 
 normal
  public | dblink | SETOF record | text, boolean   | 
 normal

 It's quite unclear which one will get called for cases like, say, second
 argument is a domain over boolean.  And even if the second arg is just a
 boolean, maybe the user wanted the first case --- how will he get that
 behavior, if so?  These need to have different names, and that might well
 help resolve the implementation-level issue...

yup, and at least one case now fails where previously it ran through:
postgres=# select * from dblink('a', 'b', 'c');
ERROR:  function dblink(unknown, unknown, unknown) is not unique

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-06 Thread Sawada Masahiko

On Fri, Jul 3, 2015 at 5:25 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Fri, Jul 3, 2015 at 1:23 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 2 July 2015 at 16:30, Sawada Masahiko sawada.m...@gmail.com wrote:


 Also, the flags of each heap page header might be set PD_ALL_FROZEN,
 as well as all-visible


 Is it possible to have VM bits set to frozen but not visible?

 The description makes those two states sound independent of each other.

 Are they? Or not? Do we test for an impossible state?


 It's impossible to have VM bits set to frozen but not visible.
 These bit are controlled independently. But eventually, when
 all-frozen bit is set, all-visible is also set.

Attached latest version including some bug fix.
Please review it.

Regards,

--
Sawada Masahiko


000_add_frozen_bit_into_visibilitymap_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dblink: add polymorphic functions.

Joe Conway m...@joeconway.com writes:
 Actually, I had in mind something like:
 8-
 inti;
 intnumargs;
 int   *argtypes;

 numargs = PG_NARGS();
 argtypes = palloc(numargs * sizeof(int));
 for (i = 0; i  numargs; i++)
 argtypes[i] = get_fn_expr_argtype(fcinfo-flinfo, i);

 if ((numargs == 4 || numargs == 5) 
 argtypes[0] == TEXTOID 
 argtypes[1] == TEXTOID 
 argtypes[2] == INT4OID 
 argtypes[3] == BOOLOID)
 {
 [...]
 }
 else if  ((numargs == 3 || numargs == 4) 
 argtypes[0] == TEXTOID 
 argtypes[1] == INT4OID 
 argtypes[2] == BOOLOID)
 {
 [...]
 8-
 etc.

If the set of allowed argument-type combinations is so easily enumerable,
I don't understand why this is being done at all.  Create a separate SQL
function for each combination.  You can still let the called C functions
call a common implementation routine if that's helpful.

However, this might all be moot in view of Merlin's objection.  It is
definitely completely uncool to have both of these:

  public | dblink | SETOF anyelement | text, anyelement| normal
  public | dblink | SETOF record | text, boolean   | normal

It's quite unclear which one will get called for cases like, say, second
argument is a domain over boolean.  And even if the second arg is just a
boolean, maybe the user wanted the first case --- how will he get that
behavior, if so?  These need to have different names, and that might well
help resolve the implementation-level issue...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] more RLS oversights