Re: [HACKERS] Timeline following for logical slots

2016-03-31 Thread Andres Freund
Hi,

On 2016-04-01 08:46:01 +0200, Andres Freund wrote:
> That's a fundamental misunderstanding on your part (perhaps created by
> imprecise docs).

> > Speaking of which, did you see the proposed README I sent for
> > src/backend/replication/logical ?
> 
> I skimmed it. But given we have a CF full of patches, some submitted
> over a year ago, it seems unfair to spend time on a patch submitted a
> few days ago.

For that purpos

WRT design readme, it might be interesting to look at 0009 in
http://archives.postgresql.org/message-id/20140127162006.GA25670%40awork2.anarazel.de

That's not up2date obviously, but it still might help.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-03-31 Thread Andres Freund
Hi,

On 2016-04-01 13:16:18 +0800, Craig Ringer wrote:
> I think it's pretty unsafe from SQL, to be sure.
> 
> Unless failover slots get in to 9.6 we'll need to do exactly that from
> internal C stuff in pglogical to support following physical failover,

I know. And this makes me scared shitless.


I'll refuse to debug anything related to decoding, when this "feature"
has been used. And I think there'll be plenty; if you notice the issues
that is.


The more I think about it, the more I think we should rip this out
again, until we have the actual feature is in. With proper review.


> However: It's safe for the slot state on the replica to be somewhat behind
> the local replay from the master, so the slot state on the replica is older
> than what it would've been at an equivalent time on the master... so long
> as it's not so far behind that the replica has replayed vacuum activity
> that removes rows still required by the slot's declared catalog_xmin. Even
> then it should actually be OK in practice because the failing-over client
> will always have replayed past that point on the master (otherwise
> catalog_xmin couldn't have advanced on the master), so it'll always ask to
> start replay at a point at or after the lsn where the catalog_xmin was
> advanced to its most recent value on the old master before failover. It's
> only safe for walsender based decoding though, since there's no way to
> specify the startpoint for sql-level decoding.

This whole logic fails entirely flats on its face by the fact that even
if you specify a startpoint, we read older WAL and process the
records. The startpoint filters every transaction that commits earlier
than the "startpoint". But with a long-running transaction you still
will have old changes being processed, which need the corresponding
catalog state.


> My only real worry there is whether invalidations are a concern - if
> internal replay starts at the restart_lsn and replays over part of history
> where the catalog entries have been vacuumed before it starts collecting
> rows for return to the decoding client at the requested decoding
> startpoint, can that cause problems with invalidation processing?

Yes.


> It's also OK for the slot state to be *ahead* of the replica's replay
> position, i.e. from the future. restart_lsn and catalog_xmin will be higher
> than they were on the master at the same point in time, but that doesn't
> matter at all, since the client will always be requesting to start from
> that point or later, having replayed up to there on the old master before
> failover.

I don't think that's true. See my point above about startpoint.



> > Andres, I tried to address your comments as best I could. The main one
> > that
> > > I think stayed open was about the loop that finds the last timeline on a
> > > segment. If you think that's better done by directly scanning the List*
> > of
> > > timeline history entries I'm happy to prep a follow-up.
> >
> > Have to look again.
> >
> > +* We start reading xlog from the restart lsn, even though in
> > +* CreateDecodingContext we set the snapshot builder up using the
> > +* slot's confirmed_flush. This means we might read xlog we don't
> > +* actually decode rows from, but the snapshot builder might need
> > it
> > +* to get to a consistent point. The point we start returning data
> > to
> > +* *users* at is the confirmed_flush lsn set up in the decoding
> > +* context.
> > +*/
> > still seems pretty misleading - and pretty much unrelated to the
> > callsite.
> >
> 
> 
> I think it's pretty confusing that the function uses's the slot's
> restart_lsn and never refers to confirmed_flush which is where it actually
> starts decoding from as far as the user's concerned. It took me a while to
> figure out what was going on there.

That's a fundamental misunderstanding on your part (perhaps created by
imprecise docs). The startpoint isn't about where to start
decoding. It's about skip calling the output plugin of a transaction if
it *commits* before the startpoint. We almost always will *decode* rows
from before the startpoint.  And that's pretty much unavoidable: The
consumer of a decoding slot only really can do anything with commit LSNs
wrt replay progress: But for a remembered progress LSN there can be a
lot of in-progress xacts (about which said client doesn't know
anything); and they start *earlier* than the commit LSN of the just
replayed xact. So we have to be able to re-process their state and send
it out.


> I think the comment would be less necessary, and could be moved up to the
> CreateDecodingContext call, if we passed the slot's confirmed_flush
> explicitly to CreateDecodingContext instead of passing InvalidXLogRecPtr
> and having the CreateDecodingContext call infer the start point. That way
> what's going on would be a lot clearer when reading the code.

How would that solve anything? We still need to process the old records,
hence

[HACKERS] EPQ recheck across HashJoin, what does it actuall check?

2016-03-31 Thread Kouhei Kaigai
Folks,

Could you correct me if I misunderstand the execution path.

We may have the following example. Query try to get row level locking
on the table 't1' that is under the 'Hash' node.

postgres=# EXPLAIN
postgres-# SELECT * FROM t0 NATURAL JOIN t1 WHERE ax between 10 and 30 FOR 
UPDATE OF t1;
   QUERY PLAN
-
 LockRows  (cost=2682.53..263118.01 rows=1980172 width=93)
   ->  Hash Join  (cost=2682.53..243316.29 rows=1980172 width=93)
 Hash Cond: (t0.aid = t1.aid)
 ->  Seq Scan on t0  (cost=0.00..183332.58 rows=858 width=46)
 ->  Hash  (cost=2435.00..2435.00 rows=19802 width=51)
   ->  Seq Scan on t1  (cost=0.00..2435.00 rows=19802 width=51)
 Filter: ((ax >= '10'::double precision) AND (ax <= 
'30'::double precision))
(7 rows)

Concurrent update on 't1' by other backend may kick EPQ recheck to ensure
whether the source tuple is still visible or not.
In this example, if somebody updated 't1.ax' to 45 concurrently, LockRows
shall discard the joined tuple and shall not acquire row lock.

ExecLockRows() fills up es_epqTuple[] slot by locked/non-locked tuples,
then calls EvalPlanQualNext() to re-evaluate the tuple.
EvalPlanQualNext() is a thin wrapper of ExecProcNode(), thus, it walks
down into the child plan nodes.

If it would be some kind of scan node, ExecScan() -> ExecScanFetch() handles
the EPQ recheck appropriately. Here is no matter.

In case of HashJoin, it seems to me ExecHashJoin() takes no special behavior.
If a record that has 't1.ax' = 20.0 when it was fetched and loaded to hash
table was updated to 45.0, it is ought to be handled as an invisible row.
However, ExecHashJoin() does not walk down to the inner side again, thus,
the updated tuple might be considered as visible, and locked.

If it would not be my oversight, ExecHashJoin() and ExecHash() needs to call
ExecProcNode() then re-evaluate its join condition again, apart from the
hash key values of the latest tuples.

Thought?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches

2016-03-31 Thread Craig Ringer
On 1 April 2016 at 12:47, Craig Ringer  wrote:


> I'll prep a follow-up patch.
>
>
Done and attached.

Note that I can't use PG_GETARG_TRANSACTIONID directly since it's a macro
defined only in xid.c . It didn't seem worth extracting it and moving it to
postgres.h (where the other non-ADT-specific PG_GETARG_ macros are) or its
own new header just for this, so I've spelled it out each time.

I now remember that that's part of why I used bigint as an argument type.
The other part is that txid_current() returns bigint and there's no cast
from bigint to xid. So the tests would have to CREATE CAST or cast via
'text'. They now do the latter.

We should probably have a cast from bigint to/from xid, but the type is so
little-used that it's not much fuss.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 5e63a5f8e467c7855ab22fa7e5fbdac94d9c87e7 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 1 Apr 2016 13:36:31 +0800
Subject: [PATCH] Fix test_slot_timelines buildfarm failure on 32-bit ARM

The code was reading a NULL 64-bit Datum that is pass-by-value
on 64-bit platforms but pass-by-reference on 32-bit. On 64-bit
this just produces zero, but on 32-bit it'll dereference a NULL
pointer. Since InvalidXLogRecPtr is zero this went unnoticed.

Test for NULL arguments.

In the process, switch to taking 'xid'-typed arguments instead of
bigint. Since txid_current() returns bigint and there's no cast from
bigint to xid this requires a cast through 'text' in our tests.
---
 .../expected/load_extension.out|  2 +-
 .../test_slot_timelines/sql/load_extension.sql |  2 +-
 .../test_slot_timelines--1.0.sql   |  4 +--
 .../test_slot_timelines/test_slot_timelines.c  | 31 +++---
 4 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/src/test/modules/test_slot_timelines/expected/load_extension.out b/src/test/modules/test_slot_timelines/expected/load_extension.out
index 14a414a..7c2ad9d 100644
--- a/src/test/modules/test_slot_timelines/expected/load_extension.out
+++ b/src/test/modules/test_slot_timelines/expected/load_extension.out
@@ -5,7 +5,7 @@ SELECT test_slot_timelines_create_logical_slot('test_slot', 'test_decoding');
  
 (1 row)
 
-SELECT test_slot_timelines_advance_logical_slot('test_slot', txid_current(), txid_current(), pg_current_xlog_location(), pg_current_xlog_location());
+SELECT test_slot_timelines_advance_logical_slot('test_slot', txid_current()::text::xid, txid_current()::text::xid, pg_current_xlog_location(), pg_current_xlog_location());
  test_slot_timelines_advance_logical_slot 
 --
  
diff --git a/src/test/modules/test_slot_timelines/sql/load_extension.sql b/src/test/modules/test_slot_timelines/sql/load_extension.sql
index a71127d..2440355 100644
--- a/src/test/modules/test_slot_timelines/sql/load_extension.sql
+++ b/src/test/modules/test_slot_timelines/sql/load_extension.sql
@@ -2,6 +2,6 @@ CREATE EXTENSION test_slot_timelines;
 
 SELECT test_slot_timelines_create_logical_slot('test_slot', 'test_decoding');
 
-SELECT test_slot_timelines_advance_logical_slot('test_slot', txid_current(), txid_current(), pg_current_xlog_location(), pg_current_xlog_location());
+SELECT test_slot_timelines_advance_logical_slot('test_slot', txid_current()::text::xid, txid_current()::text::xid, pg_current_xlog_location(), pg_current_xlog_location());
 
 SELECT pg_drop_replication_slot('test_slot');
diff --git a/src/test/modules/test_slot_timelines/test_slot_timelines--1.0.sql b/src/test/modules/test_slot_timelines/test_slot_timelines--1.0.sql
index 31d7f8e..50449bf 100644
--- a/src/test/modules/test_slot_timelines/test_slot_timelines--1.0.sql
+++ b/src/test/modules/test_slot_timelines/test_slot_timelines--1.0.sql
@@ -8,9 +8,9 @@ LANGUAGE c AS 'MODULE_PATHNAME';
 COMMENT ON FUNCTION test_slot_timelines_create_logical_slot(text, text)
 IS 'Create a logical slot at a particular lsn and xid. Do not use in production servers, it is not safe. The slot is created with an invalid xmin and lsn.';
 
-CREATE OR REPLACE FUNCTION test_slot_timelines_advance_logical_slot(slot_name text, new_xmin bigint, new_catalog_xmin bigint, new_restart_lsn pg_lsn, new_confirmed_lsn pg_lsn)
+CREATE OR REPLACE FUNCTION test_slot_timelines_advance_logical_slot(slot_name text, new_xmin xid, new_catalog_xmin xid, new_restart_lsn pg_lsn, new_confirmed_lsn pg_lsn)
 RETURNS void
 LANGUAGE c AS 'MODULE_PATHNAME';
 
-COMMENT ON FUNCTION test_slot_timelines_advance_logical_slot(text, bigint, bigint, pg_lsn, pg_lsn)
+COMMENT ON FUNCTION test_slot_timelines_advance_logical_slot(text, xid, xid, pg_lsn, pg_lsn)
 IS 'Advance a logical slot directly. Do not use this in production servers, it is not safe.';
diff --git a/src/test/modules/test_slot_timelines/test_slot_timelines.c b/src/test/modules/test_slot_timelines/test_slot_timelines.c
index 74dd1a0..e818705 100644
--- a/src/test/module

Re: [HACKERS] [PATCH v10] GSSAPI encryption support

2016-03-31 Thread Michael Paquier
On Fri, Apr 1, 2016 at 12:31 PM, Robbie Harwood  wrote:
> - Fixed Windows build.  Thanks to Michael for patches.

This looks fine. I am not seeing build failures now.

> - Fixed buffering of large replies on the serverside.  This should fix
>   the traceback that was being seen.  The issue had to do with the
>   difference between the server and client calling conventions for the
>   _read and _write functions.

This does not fix the issue. I am still able to crash the server with
the same trace.

> - Move gss_encrypt out of the GUCs and into connection-specific logic.
>   Thanks to Tom Lane for pointing me in the right direction here.

+   /* delay processing until after AUTH_REQ_OK has been sent */
+   if (port->gss->gss_encrypt != NULL)
+   port->gss->encrypt = !strcmp(port->gss->gss_encrypt, "on");
You should use parse_bool here, because contrary to libpq, clients
should be able to use other values like "1", "0", "off", 'N', 'Y',
etc.

> - Replace writev() with two calls to _raw_write().  I'm not attached to
>   this design; if someone has a preference for allocating a buffer and
>   making a single write from that, I could be persuaded.  I don't know
>   what the performance tradeoffs are.

Relying on pqsecure_raw_write and pqsecure_raw_read is a better bet
IMO, there is already some low-level error handling.

 static ConfigVariable *ProcessConfigFileInternal(GucContext context,
  bool applySettings, int elevel);

-
 /*
Spurious noise.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline following for logical slots

2016-03-31 Thread Craig Ringer
On 31 March 2016 at 16:09, Andres Freund  wrote:


> > This gives us an option for failover of logical replication in 9.6, even
> if
> > it's a bit cumbersome and complex for the client, in case failover slots
> > don't make the cut. And, of course, it's a pre-req for failover slots,
> > which I'll rebase on top of it shortly.
>
> FWIW, I think it's dangerous to use this that way. If people manipulate
> slots that way we'll have hellishly to debug issues.


I think it's pretty unsafe from SQL, to be sure.

Unless failover slots get in to 9.6 we'll need to do exactly that from
internal C stuff in pglogical to support following physical failover,
though. I'm not thrilled by that, especially since we have no way (in 9.6)
to insert generic WAL records in xlog to allow the slot updates to
accurately pace replay. I don't mean logical wal messages here, they
wouldn't help, it'd actually be pluggable generic WAL that we'd need to
sync slots fully consistently in the absence of failover slots.

However: It's safe for the slot state on the replica to be somewhat behind
the local replay from the master, so the slot state on the replica is older
than what it would've been at an equivalent time on the master... so long
as it's not so far behind that the replica has replayed vacuum activity
that removes rows still required by the slot's declared catalog_xmin. Even
then it should actually be OK in practice because the failing-over client
will always have replayed past that point on the master (otherwise
catalog_xmin couldn't have advanced on the master), so it'll always ask to
start replay at a point at or after the lsn where the catalog_xmin was
advanced to its most recent value on the old master before failover. It's
only safe for walsender based decoding though, since there's no way to
specify the startpoint for sql-level decoding.

My only real worry there is whether invalidations are a concern - if
internal replay starts at the restart_lsn and replays over part of history
where the catalog entries have been vacuumed before it starts collecting
rows for return to the decoding client at the requested decoding
startpoint, can that cause problems with invalidation processing? I haven't
got my head around what, exactly, logical decoding is doing with its
invalidations stuff yet. I didn't find any problems in testing, but wasn't
sure how to create the conditions under which failures might occur.

It's also OK for the slot state to be *ahead* of the replica's replay
position, i.e. from the future. restart_lsn and catalog_xmin will be higher
than they were on the master at the same point in time, but that doesn't
matter at all, since the client will always be requesting to start from
that point or later, having replayed up to there on the old master before
failover.

It's even possible for the restart_lsn and catalog_xmin to be in the
replica's future, i.e. the lsn > the current replay lsn and the
catalog_xmin greater than the next xid. In that case we know the logical
client replayed state from the old master that hasn't been applied to the
replica by the time it's promoted. If this state of affairs exists when we
promote a replica to a master we should really kill the slot, since the
client has obviously replayed from the old master past the point of
divergence with the promoted replica and there's a  potentially an
unresolvable timeline fork.  I'd like to add this if I can manage it,
probably by WARNing a complaint then dropping the slot.

Needless to say I'd really rather just have failover slots, where we know
the slot will be consistent with the DB state and updated in sync with it.
This is a fallback plan and I don't like it too much.


> The test code needs
> a big disclaimer to never ever be used in production, and we should
> "disclaim any warranty" if somebody does that. To the point of not
> fixing issues around it in back branches.
>

I agree. The README has just that, and there are comments above the
individual functions like:

 * Note that this is test harness code. You shouldn't expose slot internals
 * to SQL like this for any real world usage. See the README.

> Andres, I tried to address your comments as best I could. The main one
> that
> > I think stayed open was about the loop that finds the last timeline on a
> > segment. If you think that's better done by directly scanning the List*
> of
> > timeline history entries I'm happy to prep a follow-up.
>
> Have to look again.
>
> +* We start reading xlog from the restart lsn, even though in
> +* CreateDecodingContext we set the snapshot builder up using the
> +* slot's confirmed_flush. This means we might read xlog we don't
> +* actually decode rows from, but the snapshot builder might need
> it
> +* to get to a consistent point. The point we start returning data
> to
> +* *users* at is the confirmed_flush lsn set up in the decoding
> +* context.
> +*/
> still seems pretty misleading - and pre

Re: [HACKERS] OOM in libpq and infinite loop with getCopyStart()

2016-03-31 Thread Michael Paquier
On Fri, Apr 1, 2016 at 12:48 PM, Tom Lane  wrote:
> I thought about this patch a bit more...
>
> When I first looked at the patch, I didn't believe that it worked at
> all: it failed to return a PGRES_COPY_XXX result to the application,
> and yet nonetheless switched libpq's asyncStatus to PGASYNC_COPY_XXX?
> Wouldn't things be hopelessly confused?  I tried it out and saw that
> indeed it seemed to work in psql, and after tracing through that found
> that psql has no idea what's going on, but when psql issues its next
> command PQexecStart manages to get us out of the copy state (barring
> more OOM failures, anyway).  That seems a bit accidental, though,
> and for sure it wasn't documented in the patch.

>From the patch, that's mentioned:
+* Stop if we are in copy mode and error has occurred, the pending results
+* will be discarded during next execution in PQexecStart.

> In any case, having
> libpq in that state would have side-effects on how it responds to
> application actions, which is the core of my complaint about
> PQgetResult behavior.  Those side effects only make sense if the app
> knows (or should have known) that the connection is in COPY state.

OK. So we'd want to avoid relying on subsequent PQ* calls and return
into a clean state once the OOM shows up.

> I think it might be possible to get saner behavior if we invent a
> set of new asyncStatus values PGASYNC_COPY_XXX_FAILED, having the
> semantics that we got the relevant CopyStart message from the backend
> but were unable to report it to the application.  PQexecStart would
> treat these the same as the corresponding normal PGASYNC_COPY_XXX
> states and do what's needful to get out of the copy mode.  But in
> PQgetResult, we would *not* treat those states as a reason to return a
> PGRES_COPY_XXX result to the application.  Probably we'd just return
> NULL, with the implication being "we're done so far as you're concerned,
> please give a new command".  I might be underthinking it though.

I raised something like that a couple of months back, based on the
fact that PGASYNC_* are flags internal to libpq on frontend:
http://www.postgresql.org/message-id/cab7npqtaedwprcqak-czzxwwapdeyr1gug0vwvg7rhzhmaj...@mail.gmail.com
In this case what was debated is PGASYNC_FATAL... Something that is
quite different with your proposal is that we return NULL or something
else, and it did not discard any pending data from the client. For
applications running PQgetResult in a loop that would work.

> Anyway, the short of my review is that we need more clarity of thought
> about what state libpq is in after a failure like this, and what that
> state looks like to the application, and how that should affect how
> libpq reacts to application calls.

Hm. I would have thought that returning to the caller a result
generated by PQmakeEmptyPGresult(PGRES_FATAL_ERROR) with an error
string marked "out of memory" would be the best thing to do instead of
NULL... If getCopyStart() complains because of a lack of data, we loop
back into pqParseInput3 to try again. If an OOM happens, we still loop
into pqParseInput3 but all the pending data received from client needs
to be discarded so as we don't rely on the next calls of PQexecStart
to do the cleanup, once all the data is received we get out of the
loop and generate the result with PGRES_FATAL_ERROR. Does that match
what you have in mind?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-03-31 Thread Amit Kapila
On Sat, Mar 19, 2016 at 5:45 PM, Magnus Hagander 
wrote:

> On Wed, Mar 2, 2016 at 6:49 PM, Marco Nenciarini <
> marco.nenciar...@2ndquadrant.it> wrote:
>
>>
>> I've attached an updated patch, which is rebased on current master and
> includes the oid fix.
>
>

+   Finish performing exclusive on-line backup (restricted to
superusers or replication roles)

+  

+  

+   

+pg_stop_backup(exclusive
boolean)

+

+   setof record

+   Finish performing exclusive or non-exclusive on-line backup
(restricted to superusers or replication roles)


Isn't it better to indicate that user needs to form a backup_label and
tablespace_map file from the output of this API and those needs to be
dropped into data directory?

Also, I think below part of documentation for pg_start_backup() needs to be
modified:



pg_start_backup accepts an

arbitrary user-defined label for the backup.  (Typically this would be

the name under which the backup dump file will be stored.)  The function

writes a backup label file (backup_label) and, if there

are any links in the pg_tblspc/ directory, a tablespace map

file (tablespace_map) into the database cluster's data

directory, performs a checkpoint, and then returns the backup's starting

transaction log location as text.  The user can ignore this result
value,

but it is provided in case it is useful.


Similarly, there is a description for pg_stop_backup which needs to be
modified.


CREATE OR REPLACE FUNCTION

-  pg_start_backup(label text, fast boolean DEFAULT false)

+  pg_start_backup(label text, fast boolean DEFAULT false, exclusive
boolean DEFAULT true)

   RETURNS pg_lsn STRICT VOLATILE LANGUAGE internal AS 'pg_start_backup';


One thing, that might be slightly inconvenient for user is if he or she
wants to use this API for non-exclusive backups then, they need to pass the
value of second parameter as well which doesn't seem to be a big issue.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches

2016-03-31 Thread Craig Ringer
On 1 April 2016 at 11:13, Petr Jelinek  wrote:


>
> The function does following:
> TransactionId new_xmin = (TransactionId) PG_GETARG_INT64(1);
>

This should be reasonable enough though; down-casting it will discard the
high bits but that's fine when we know there's nothing interesting there.

TransactionId is a uint32 anyway, but I had a reason for the above. There's
no cast from integer to xid, which is why I used bigint here, since we
don't have a uint32 native SQL type. What I *should've* done is simply used
quoted-literal arguments like

XID '1234'

or cast via text

1234::text::xid

so there was no need to use a bigint instead. I'll adjust appropriately so
it uses PG_GETARG_TRANSACTIONID(1) and is declared as accepting 'xid'.


> And we are passing NULL as that parameter, that could explain this.
>

If we're on an int64-by-value platform that'd be wrong but still work, it'd
just be zero. On an int64-by-reference platform it could indeed segfault.
So yes, I'd say that's the cause.


> Also while reading it I wonder if the function should be defined with
> xid type rather than bigint and use similar input code as xid.c.
>

Yes, it should.

I'll prep a follow-up patch.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Incorrect format in error message

2016-03-31 Thread Tom Lane
David Rowley  writes:
> The attached fixes an error message which is incorrectly using an
> unsigned format specifier instead of a signed one.

I think that's the tip of the iceberg :-(.  For starters, the code
allows ObjectIdAttributeNumber without regard for the fact that the
next line will dump core on a negative attno.  Really though, what
astonishes me about this example is that we allow indexes at all on
system columns other than OID.  None of the other ones can possibly
have either a use-case or sensible semantics, can they?  We certainly
would not stop to update indexes after changing xmax, for example.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Queries and PostGIS

2016-03-31 Thread David Rowley
On 30 March 2016 at 09:14, Robert Haas  wrote:
> On Tue, Mar 29, 2016 at 3:48 PM, Paul Ramsey  
> wrote:
>>> I have no idea why the cost adjustments that you need are different
>>> for the scan case and the aggregate case.  That does seem problematic,
>>> but I just don't know why it's happening.
>>
>> What might be a good way to debug it? Is there a piece of code I can
>> look at to try and figure out the contribution of COST in either case?
>
> Well, the cost calculations are mostly in costsize.c, but I dunno how
> much that helps.  Maybe it would help if you posted some EXPLAIN
> ANALYZE output for the different cases, with and without parallelism?
>
> One thing I noticed about this output (from your blog)...
>
> Finalize Aggregate
>  (cost=16536.53..16536.79 rows=1 width=8)
>  (actual time=2263.638..2263.639 rows=1 loops=1)
>->  Gather
>(cost=16461.22..16461.53 rows=3 width=32)
>(actual time=754.309..757.204 rows=4 loops=1)
>  Number of Workers: 3
>  ->  Partial Aggregate
>  (cost=15461.22..15461.23 rows=1 width=32)
>  (actual time=676.738..676.739 rows=1 loops=4)
>->  Parallel Seq Scan on pd
>(cost=0.00..13856.38 rows=64 width=2311)
>(actual time=3.009..27.321 rows=42 loops=4)
>  Filter: (fed_num = 47005)
>  Rows Removed by Filter: 17341
>  Planning time: 0.219 ms
>  Execution time: 2264.684 ms
>
> ...is that the finalize aggregate phase is estimated to be very cheap,
> but it's actually wicked expensive.  We get the results from the
> workers in only 750 ms, but it takes another second and a half to
> aggregate those 4 rows???

hmm, actually I've just realised that create_grouping_paths() should
be accounting agg_costs differently depending if it's partial
aggregation, finalize aggregation, or just normal. count_agg_clauses()
needs to be passed the aggregate type information to allow the walker
function to cost the correct portions of the aggregate correctly based
on what type of aggregation the costs will be used for.  In short,
please don't bother to spend too much time tuning your costs until I
fix this.

As of now the Partial Aggregate is including the cost of the final
function... that's certainly broken, as it does not call that
function.

I will try to get something together over the weekend to fix this, but
I have other work to do until then.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PQsendQuery+PQgetResult+PQsetSingleRowMode limitations and support

2016-03-31 Thread Shulgin, Oleksandr
On Apr 1, 2016 02:57, "Karl O. Pinc"  wrote:
>
> I assume there are no questions about supporting a
> similar functionality only without PQsetSingleRowMode,
> as follows:

Sorry, but I don't see what is your actual question here?

Both code examples are going to compile and work, AFAICS. The difference is
that the latter will try to fetch the whole result set into client's memory
before returning you a PGresult.

--
Alex


Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2016-03-31 Thread Pavel Stehule
2016-04-01 4:52 GMT+02:00 Kyotaro HORIGUCHI :

> At Thu, 31 Mar 2016 11:22:20 +0200, Pavel Stehule 
> wrote in <
> cafj8prd7vadunoiapb8exwc+c5ccis-rj2dphvzcwzkgxjb...@mail.gmail.com>
> > 2016-03-30 10:34 GMT+02:00 Kyotaro HORIGUCHI <
> > horiguchi.kyot...@lab.ntt.co.jp>:
> >
> > > Hi,
> > >
> > > At Wed, 30 Mar 2016 09:23:49 +0200, Pavel Stehule <
> pavel.steh...@gmail.com>
> > > wrote in <
> > > cafj8prbvka6ng4jwz2qmro7inudfjws5w0+demvgzznuf-h...@mail.gmail.com>
> > > > Hi
> > > >
> > > > ...
> > > > >> =# alter table if
> > > > >> =# alter table if exists
> > > > >> ==
> > > > >> =# alter table I
> > > > >> =# alter table IF EXISTS// "information_schema" doesn't match.
> > > > >>
> > > > >> Since this is another problem from IF (NOT) EXISTS, this is
> > > > >> in separate form.
> > > > >>
> > > > >> What do you think about this?
> > > > >>
> > > > >
> > > > > +1
> > > > >
> > > >
> > > > The new behave is much better.
> > >
> > > I'm glad to hear that.
> > >
> > > > I found new warning
> > > >
> > > >  tab-complete.c:1438:87: warning: right-hand operand of comma
> expression
> > > > has no effect [-Wunused-value]
> > >
> > > Mmm. Google said me that gcc 4.9 does so. I'm using 4.8.5 so I
> > > haven't see the warning.
> > >
> > > https://gcc.gnu.org/gcc-4.9/porting_to.html
> > >
> > > 1436:   else if (HeadMatches2("CREATE", "SCHEMA") &&
> > > 1437:SHIFT_TO_LAST1("CREATE") &&
> > > 1438:false) {} /* FALL THROUGH */
> > >
> > > > #define SHIFT_TO_LAST1(p1) \
> > > > (HEADSHIFT(find_last_index_of(p1, previous_words,
> > > previous_words_count)), \
> > > >  true)
> > >
> > > > #define HEADSHIFT(n) \
> > > > (head_shift += n, true)
> > >
> > > expanding the macros the lines at the error will be
> > >
> > > else if (HeadMatches2("CREATE", "SCHEMA") &&
> > >  (head_shift +=
> > >   find_last_index_of("CREATE", previous_words,
> > > previous_words_count),
> > >true) &&
> > >  false) {} /* FALL THROUGH */
> > >
> > > But the right hand value (true) is actually "used" in the
> > > expression (even though not effective). Perhaps (true && false)
> > > was potimized as false and the true is regarded to be unused?
> > > That's stupid.. Using functions instead of macros seems to solve
> > > this but they needed to be wraped by macros as
> > > additional_kw_query(). That's a pain..
> > >
> > > Any thougts?
> > >
> > > > There is small minor issue - I don't know if it is solvable.
> Autocomplete
> > > > is working only for "if" keyword. When I am writing "if " or "if "
> or "if
> > > > exi" - then autocomplete doesn't work. But this issue is exactly
> same for
> > > > other "multi words" completation like  "alter foreign data wrapper".
> So
> > > if
> > > > it is fixable, then it can be out of scope this patch.
> > >
> > > Yes.  It can be saved only by adding completion for every word,
> > > as some of the similar completion is doing.
> > >
> > > > anything else looks well.
> > >
> > > Thanks.
> > >
> >
> > please, final patch
>
> This needs to use gcc 4.9 to address, but CentOS7 doesn't have
> devtools-2 repo so now I'm building CentOS6 environment for this
> purpose. Please wait for a while.
>
>
sure, ok

regards

Pavel


> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>
>


Re: [HACKERS] OOM in libpq and infinite loop with getCopyStart()

2016-03-31 Thread Tom Lane
I thought about this patch a bit more...

When I first looked at the patch, I didn't believe that it worked at
all: it failed to return a PGRES_COPY_XXX result to the application,
and yet nonetheless switched libpq's asyncStatus to PGASYNC_COPY_XXX?
Wouldn't things be hopelessly confused?  I tried it out and saw that
indeed it seemed to work in psql, and after tracing through that found
that psql has no idea what's going on, but when psql issues its next
command PQexecStart manages to get us out of the copy state (barring
more OOM failures, anyway).  That seems a bit accidental, though,
and for sure it wasn't documented in the patch.  In any case, having
libpq in that state would have side-effects on how it responds to
application actions, which is the core of my complaint about
PQgetResult behavior.  Those side effects only make sense if the app
knows (or should have known) that the connection is in COPY state.

I think it might be possible to get saner behavior if we invent a
set of new asyncStatus values PGASYNC_COPY_XXX_FAILED, having the
semantics that we got the relevant CopyStart message from the backend
but were unable to report it to the application.  PQexecStart would
treat these the same as the corresponding normal PGASYNC_COPY_XXX
states and do what's needful to get out of the copy mode.  But in
PQgetResult, we would *not* treat those states as a reason to return a
PGRES_COPY_XXX result to the application.  Probably we'd just return
NULL, with the implication being "we're done so far as you're concerned,
please give a new command".  I might be underthinking it though.

Anyway, the short of my review is that we need more clarity of thought
about what state libpq is in after a failure like this, and what that
state looks like to the application, and how that should affect how
libpq reacts to application calls.

I'll set this back to Waiting on Author ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Queries and PostGIS

2016-03-31 Thread Amit Kapila
On Fri, Apr 1, 2016 at 12:49 AM, Paul Ramsey 
wrote:
>
> On Tue, Mar 29, 2016 at 12:51 PM, Paul Ramsey 
wrote:
> > On Tue, Mar 29, 2016 at 12:48 PM, Paul Ramsey 
wrote:
> >
> >>> On the join case, I wonder if it's possible that _st_intersects is not
> >>> marked parallel-safe?  If that's not the problem, I don't have a
> >>> second guess, but the thing to do would be to figure out whether
> >>> consider_parallel is false for the RelOptInfo corresponding to either
> >>> of pd and pts, or whether it's true for both but false for the
> >>> joinrel's RelOptInfo, or whether it's true for all three of them but
> >>> you don't get the desired path anyway.
> >>
> >> _st_intersects is definitely marked parallel safe, and in fact will
> >> generate a parallel plan if used alone (without the operator though,
> >> it's impossibly slow). It's the && operator that is the issue... and I
> >> just noticed that the PROCEDURE bound to the && operator
> >> (geometry_overlaps) is *not* marked parallel safe: could be the
> >> problem?
> >
> > Asked and answered: marking the geometry_overlaps as parallel safe
> > gets me a parallel plan! Now to play with costs and see how it behaves
> > when force_parallel_mode is not set.
>
> For the record I can get a non-forced parallel join plan, *only* if I
> reduce the parallel_join_cost by a factor of 10, from 0.1 to 0.01.
>

I think here you mean parallel_tuple_cost.

>
> http://blog.cleverelephant.ca/2016/03/parallel-postgis-joins.html
>
> This seems non-optimal. No amount of cranking up the underlying
> function COST seems to change this, perhaps because the join cost is
> entirely based on the number of expected tuples in the join relation?
>

Is the function cost not being considered when given as join clause or you
wanted to point in general for any parallel plan it is not considered?  I
think it should be considered when given as a clause for single table scan.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [PATCH v10] GSSAPI encryption support

2016-03-31 Thread Robbie Harwood
Hello friends,

Here is another version of GSSAPI encryption support, both on this email
and on my github:
https://github.com/frozencemetery/postgres/tree/feature/gssencrypt10

This version is intended to address Michael's review:

- Fixed Windows build.  Thanks to Michael for patches.

- Fixed buffering of large replies on the serverside.  This should fix
  the traceback that was being seen.  The issue had to do with the
  difference between the server and client calling conventions for the
  _read and _write functions.

- gss_enc_require renamed to gss_require_encrypt.  Slightly longer, but
  half the stomach churn.

- Move gss_encrypt out of the GUCs and into connection-specific logic.
  Thanks to Tom Lane for pointing me in the right direction here.

- Replace writev() with two calls to _raw_write().  I'm not attached to
  this design; if someone has a preference for allocating a buffer and
  making a single write from that, I could be persuaded.  I don't know
  what the performance tradeoffs are.

- Typo fixes.  I need to figure out spellchecking in my editor.

- More use of .

- Change _should_crypto() functions to return bool.  Also rename them to
  be the _should_encrypt functions.

- Error message cleanup.

Thanks!
From 945805d45e8021f92ad73518b3a74ac6bab89525 Mon Sep 17 00:00:00 2001
From: Robbie Harwood 
Date: Fri, 26 Feb 2016 16:07:05 -0500
Subject: [PATCH 1/3] Move common GSSAPI code into its own files

On both the frontend and backend, prepare for GSSAPI encryption suport
by moving common code for error handling into a common file.  Other than
build-system changes, no code changes occur in this patch.

Thanks to Michael Paquier for the Windows fixes.
---
 configure   |  2 +
 configure.in|  1 +
 src/Makefile.global.in  |  1 +
 src/backend/libpq/Makefile  |  4 ++
 src/backend/libpq/auth.c| 63 +---
 src/backend/libpq/be-gssapi-common.c| 74 +
 src/include/libpq/be-gssapi-common.h| 26 
 src/interfaces/libpq/Makefile   |  4 ++
 src/interfaces/libpq/fe-auth.c  | 48 +
 src/interfaces/libpq/fe-gssapi-common.c | 63 
 src/interfaces/libpq/fe-gssapi-common.h | 21 ++
 src/tools/msvc/Mkvcbuild.pm | 18 ++--
 12 files changed, 212 insertions(+), 113 deletions(-)
 create mode 100644 src/backend/libpq/be-gssapi-common.c
 create mode 100644 src/include/libpq/be-gssapi-common.h
 create mode 100644 src/interfaces/libpq/fe-gssapi-common.c
 create mode 100644 src/interfaces/libpq/fe-gssapi-common.h

diff --git a/configure b/configure
index b3f3abe..a5bd629 100755
--- a/configure
+++ b/configure
@@ -713,6 +713,7 @@ with_systemd
 with_selinux
 with_openssl
 krb_srvtab
+with_gssapi
 with_python
 with_perl
 with_tcl
@@ -5491,6 +5492,7 @@ $as_echo "$with_gssapi" >&6; }
 
 
 
+
 #
 # Kerberos configuration parameters
 #
diff --git a/configure.in b/configure.in
index 0bd90d7..4fd8f05 100644
--- a/configure.in
+++ b/configure.in
@@ -636,6 +636,7 @@ PGAC_ARG_BOOL(with, gssapi, no, [build with GSSAPI support],
   krb_srvtab="FILE:\$(sysconfdir)/krb5.keytab"
 ])
 AC_MSG_RESULT([$with_gssapi])
+AC_SUBST(with_gssapi)
 
 
 AC_SUBST(krb_srvtab)
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index e94d6a5..3dbc5c2 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -183,6 +183,7 @@ with_perl	= @with_perl@
 with_python	= @with_python@
 with_tcl	= @with_tcl@
 with_openssl	= @with_openssl@
+with_gssapi	= @with_gssapi@
 with_selinux	= @with_selinux@
 with_systemd	= @with_systemd@
 with_libxml	= @with_libxml@
diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile
index 09410c4..a8cc9a0 100644
--- a/src/backend/libpq/Makefile
+++ b/src/backend/libpq/Makefile
@@ -21,4 +21,8 @@ ifeq ($(with_openssl),yes)
 OBJS += be-secure-openssl.o
 endif
 
+ifeq ($(with_gssapi),yes)
+OBJS += be-gssapi-common.o
+endif
+
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 57c2f48..73d493e 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -136,11 +136,7 @@ bool		pg_krb_caseins_users;
  *
  */
 #ifdef ENABLE_GSS
-#if defined(HAVE_GSSAPI_H)
-#include 
-#else
-#include 
-#endif
+#include "libpq/be-gssapi-common.h"
 
 static int	pg_GSS_recvauth(Port *port);
 #endif   /* ENABLE_GSS */
@@ -715,63 +711,6 @@ recv_and_check_password_packet(Port *port, char **logdetail)
  */
 #ifdef ENABLE_GSS
 
-#if defined(WIN32) && !defined(WIN32_ONLY_COMPILER)
-/*
- * MIT Kerberos GSSAPI DLL doesn't properly export the symbols for MingW
- * that contain the OIDs required. Redefine here, values copied
- * from src/athena/auth/krb5/src/lib/gssapi/generic/gssapi_generic.c
- */
-static const gss_OID_desc GSS_C_NT_USER_NAME_desc =
-{1

Re: [HACKERS] Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches

2016-03-31 Thread Petr Jelinek
On 01/04/16 03:49, Alvaro Herrera wrote:
> Moving thread to -hackers, CC'ing Craig.
>
> Michael Paquier wrote:
>
>> hamster complains here:
>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2016-03-31%2016%3A00%3A06
>> [...]
>> # Copying slots to replica
>> after_basebackup|test_decoding||547|0/560|0/598
>> # Copying slot 
>> 'after_basebackup','test_decoding',NULL,'547','0/560','0/598'
>> connection error: 'psql::1: server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> psql::1: connection to server was lost'
>
> Ahem.
>
> So, I see this:
>
> DEBUG:  server process (PID 24166) exited with exit code 0
> DEBUG:  forked new backend, pid=24168 socket=7
> LOG:  statement: SELECT 
> test_slot_timelines_advance_logical_slot('after_basebackup', NULL, '547', 
> '0/560', '0/598');
> DEBUG:  server process (PID 24168) was terminated by signal 11: Segmentation 
> fault
> DETAIL:  Failed process was running: SELECT 
> test_slot_timelines_advance_logical_slot('after_basebackup', NULL, '547', 
> '0/560', '0/598');
> LOG:  server process (PID 24168) was terminated by signal 11: Segmentation 
> fault
>
> in 
> pgsql.build/src/test/recovery/tmp_check/log/006_logical_decoding_timelines_replica2.log
>
> Could you have a look at whether you have core dumps from these?  If so,
> backtraces would be very useful.
>

The function does following:
TransactionId new_xmin = (TransactionId) PG_GETARG_INT64(1);

And we are passing NULL as that parameter, that could explain this.
Also while reading it I wonder if the function should be defined with
xid type rather than bigint and use similar input code as xid.c.

-- 
   Petr Jelinek  http://www.2ndQuadrant.com/
   PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-03-31 Thread Amit Kapila
On Thu, Mar 31, 2016 at 5:49 PM, Magnus Hagander 
wrote:

> On Thu, Mar 31, 2016 at 4:00 AM, David Steele  wrote:
> So maybe we should at least start that way. And just document that
> clearly, and then use the patch that we have right now?
>
> Except we add so it still returns the stoppoint() as well (which is
> returned from the current pg_stop_backup, but not in the new one).
>
> We can always extend the function with more columns later if we need -
> it's changing the ones we have that's a big problem there :)
>
>
+1 for doing that way for now.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-03-31 Thread Thomas Munro
On Fri, Apr 1, 2016 at 11:44 AM, Alvaro Herrera
 wrote:
> The huge_pages feature is fairly new, and as I recall the intention is
> that other operating systems will be supported as we get patches for it.

BTW about other operating systems:

It looks like FreeBSD is already clever enough to use super pages as
it calls them[1][2].  This is something like Linux's 'transparent huge
pages' except apparently somewhat more transparent.  Though I wonder
if there is some advantage in adding a MAP_ALIGNED_SUPER flag to the
mmap request.

It looks like MacOSX can do it with VM_FLAGS_SUPERPAGE_SIZE_ANY but
such mappings are not inherited by child processes.

[1] https://lists.freebsd.org/pipermail/freebsd-hackers/2012-June/039018.html
[2] https://www.youtube.com/watch?v=0wIxny-n_Mg <-- a really good talk
about this stuff!

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IF (NOT) EXISTS in psql-completion

2016-03-31 Thread Kyotaro HORIGUCHI
At Thu, 31 Mar 2016 11:22:20 +0200, Pavel Stehule  
wrote in 
> 2016-03-30 10:34 GMT+02:00 Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp>:
> 
> > Hi,
> >
> > At Wed, 30 Mar 2016 09:23:49 +0200, Pavel Stehule 
> > wrote in <
> > cafj8prbvka6ng4jwz2qmro7inudfjws5w0+demvgzznuf-h...@mail.gmail.com>
> > > Hi
> > >
> > > ...
> > > >> =# alter table if
> > > >> =# alter table if exists
> > > >> ==
> > > >> =# alter table I
> > > >> =# alter table IF EXISTS// "information_schema" doesn't match.
> > > >>
> > > >> Since this is another problem from IF (NOT) EXISTS, this is
> > > >> in separate form.
> > > >>
> > > >> What do you think about this?
> > > >>
> > > >
> > > > +1
> > > >
> > >
> > > The new behave is much better.
> >
> > I'm glad to hear that.
> >
> > > I found new warning
> > >
> > >  tab-complete.c:1438:87: warning: right-hand operand of comma expression
> > > has no effect [-Wunused-value]
> >
> > Mmm. Google said me that gcc 4.9 does so. I'm using 4.8.5 so I
> > haven't see the warning.
> >
> > https://gcc.gnu.org/gcc-4.9/porting_to.html
> >
> > 1436:   else if (HeadMatches2("CREATE", "SCHEMA") &&
> > 1437:SHIFT_TO_LAST1("CREATE") &&
> > 1438:false) {} /* FALL THROUGH */
> >
> > > #define SHIFT_TO_LAST1(p1) \
> > > (HEADSHIFT(find_last_index_of(p1, previous_words,
> > previous_words_count)), \
> > >  true)
> >
> > > #define HEADSHIFT(n) \
> > > (head_shift += n, true)
> >
> > expanding the macros the lines at the error will be
> >
> > else if (HeadMatches2("CREATE", "SCHEMA") &&
> >  (head_shift +=
> >   find_last_index_of("CREATE", previous_words,
> > previous_words_count),
> >true) &&
> >  false) {} /* FALL THROUGH */
> >
> > But the right hand value (true) is actually "used" in the
> > expression (even though not effective). Perhaps (true && false)
> > was potimized as false and the true is regarded to be unused?
> > That's stupid.. Using functions instead of macros seems to solve
> > this but they needed to be wraped by macros as
> > additional_kw_query(). That's a pain..
> >
> > Any thougts?
> >
> > > There is small minor issue - I don't know if it is solvable. Autocomplete
> > > is working only for "if" keyword. When I am writing "if " or "if " or "if
> > > exi" - then autocomplete doesn't work. But this issue is exactly same for
> > > other "multi words" completation like  "alter foreign data wrapper". So
> > if
> > > it is fixable, then it can be out of scope this patch.
> >
> > Yes.  It can be saved only by adding completion for every word,
> > as some of the similar completion is doing.
> >
> > > anything else looks well.
> >
> > Thanks.
> >
> 
> please, final patch

This needs to use gcc 4.9 to address, but CentOS7 doesn't have
devtools-2 repo so now I'm building CentOS6 environment for this
purpose. Please wait for a while.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: index-only scans with partial indexes

2016-03-31 Thread Kyotaro HORIGUCHI
At Thu, 31 Mar 2016 14:51:02 -0400, Tom Lane  wrote in 
<19589.1459450...@sss.pgh.pa.us>
> Kyotaro HORIGUCHI  writes:
> > Thank you for the comment. The new version is attached.
> 
> Committed with rather heavy editorialization and a batch of regression
> test cases.
> 
>   regards, tom lane

Many thanks for the editorialization (or refactoring), and many
descriptive comments and testing, then committing.

There seems no problem for me of that. The followings are my
reviw and thought on that, FWIW.


==
check_index_predicates:
 + * At one time it was possible for this to get re-run after adding more
 + * restrictions to the rel, thus possibly letting us prove more indexes OK.
 + * That doesn't happen any more (at least not in the core code's usage),
!+ * but this code still supports it in case extensions want to mess with the
!+ * baserestrictinfo list.  We assume that adding more restrictions can't make
 + * an index not predOK.  We must recompute indrestrictinfo each time, though,
 + * to make sure any newly-added restrictions get into it if needed.

I didn't imagine that the assumption is for the sake of extensions..


+check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
...
+index->indrestrictinfo = rel->baserestrictinfo;

Ah. This has been retuened here.

+is_target_rel = (rel->relid == root->parse->resultRelation ||
+ get_plan_rowmark(root->rowMarks, rel->relid) != NULL);

This is very descriptive even for me.


- * We can also discard quals that are implied by a partial index's
- * predicate, but only in a plain SELECT; when scanning a target relation
- * of UPDATE/DELETE/SELECT FOR UPDATE, we must leave such quals in the
- * plan so that they'll be properly rechecked by EvalPlanQual testing.
- *

Uggg. I'm sorry that I couldn't find this commnet just above what
I removed.

+ * way because predicate conditions need to be rechecked if the scan
+ * becomes lossy, so they have to be included in bitmapqualorig.

Of couse 'lossy' means 'may contain unqualified data', my brain
must have been in powersaving mode.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Enable logical slots to follow timeline switches

2016-03-31 Thread Alvaro Herrera
Moving thread to -hackers, CC'ing Craig.

Michael Paquier wrote:

> hamster complains here:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2016-03-31%2016%3A00%3A06
> [...]
> # Copying slots to replica
> after_basebackup|test_decoding||547|0/560|0/598
> # Copying slot 
> 'after_basebackup','test_decoding',NULL,'547','0/560','0/598'
> connection error: 'psql::1: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> psql::1: connection to server was lost'

Ahem.

So, I see this:

DEBUG:  server process (PID 24166) exited with exit code 0
DEBUG:  forked new backend, pid=24168 socket=7
LOG:  statement: SELECT 
test_slot_timelines_advance_logical_slot('after_basebackup', NULL, '547', 
'0/560', '0/598');
DEBUG:  server process (PID 24168) was terminated by signal 11: Segmentation 
fault
DETAIL:  Failed process was running: SELECT 
test_slot_timelines_advance_logical_slot('after_basebackup', NULL, '547', 
'0/560', '0/598');
LOG:  server process (PID 24168) was terminated by signal 11: Segmentation fault

in 
pgsql.build/src/test/recovery/tmp_check/log/006_logical_decoding_timelines_replica2.log

Could you have a look at whether you have core dumps from these?  If so,
backtraces would be very useful.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-03-31 Thread Thomas Munro
On Fri, Apr 1, 2016 at 11:44 AM, Alvaro Herrera
 wrote:
> Thomas Munro wrote:
>> On Fri, Apr 1, 2016 at 2:53 AM, Andrew Dunstan  wrote:
>> > It would also be nice to find out why we can't usefully scale shared 
>> > buffers
>> > higher like we can on *nix.
>>
>> Has anyone ever looked into whether asking for SEC_LARGE_PAGES would
>> help with that?
>
> The huge_pages feature is fairly new, and as I recall the intention is
> that other operating systems will be supported as we get patches for it.
> Feel free to submit something.

I don't have a Windows development stack or any plans to get one but I
did make some notes and write some blind code for this a while back
when I was studying the shmem code.  Here it is, in the hope that it
might be a useful starting point for someone else...  I suppose the
first step is to find a benchmark that improves as you increase
shared_buffers on Windows and Linux but plateaus at a much lower
setting on Windows, and then get a patch like this working and see if
huge_pages = on changes anything...

-- 
Thomas Munro
http://www.enterprisedb.com


windows-large-pages-probably-doesnt-even-compile.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Incorrect format in error message

2016-03-31 Thread David Rowley
The attached fixes an error message which is incorrectly using an
unsigned format specifier instead of a signed one.

# create table t ();
# create unique index t_idx on t (ctid);
# alter table t replica identity using index t_idx;
ERROR:  internal column 4294967295 in unique index "t_idx"

I was just going to submit a patch to change the %u to a %d, but this
error message should use ereport() instead of elog(), so I fixed that
up too and added the missing regression test.

I picked ERRCODE_INVALID_COLUMN_REFERENCE as I thought it was the most
suitable. I'm not sure if it's correct though.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 96dc923..36b5448 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -11052,8 +11052,10 @@ ATExecReplicaIdentity(Relation rel, 
ReplicaIdentityStmt *stmt, LOCKMODE lockmode
 
/* Of the system columns, only oid is indexable. */
if (attno <= 0 && attno != ObjectIdAttributeNumber)
-   elog(ERROR, "internal column %u in unique index \"%s\"",
-attno, RelationGetRelationName(indexRel));
+   ereport(ERROR,
+   
(errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+errmsg("index \"%s\" cannot be used as 
replica identity because column %d is a system column",
+   
RelationGetRelationName(indexRel), attno)));
 
attr = rel->rd_att->attrs[attno - 1];
if (!attr->attnotnull)
diff --git a/src/test/regress/expected/replica_identity.out 
b/src/test/regress/expected/replica_identity.out
index 60d9a42..47678db 100644
--- a/src/test/regress/expected/replica_identity.out
+++ b/src/test/regress/expected/replica_identity.out
@@ -9,6 +9,7 @@ CREATE TABLE test_replica_identity (
 CREATE TABLE test_replica_identity_othertable (id serial primary key);
 CREATE INDEX test_replica_identity_keyab ON test_replica_identity (keya, keyb);
 CREATE UNIQUE INDEX test_replica_identity_keyab_key ON test_replica_identity 
(keya, keyb);
+CREATE UNIQUE INDEX test_replica_indentity_syscol ON test_replica_identity 
(ctid);
 CREATE UNIQUE INDEX test_replica_identity_nonkey ON test_replica_identity 
(keya, nonkey);
 CREATE INDEX test_replica_identity_hash ON test_replica_identity USING hash 
(nonkey);
 WARNING:  hash indexes are not WAL-logged and their use is discouraged
@@ -40,6 +41,9 @@ SELECT relreplident FROM pg_class WHERE oid = 
'pg_constraint'::regclass;
 -- fail, not unique
 ALTER TABLE test_replica_identity REPLICA IDENTITY USING INDEX 
test_replica_identity_keyab;
 ERROR:  cannot use non-unique index "test_replica_identity_keyab" as replica 
identity
+-- fail, index can't contain system columns
+ALTER TABLE test_replica_identity REPLICA IDENTITY USING INDEX 
test_replica_indentity_syscol;
+ERROR:  index "test_replica_indentity_syscol" cannot be used as replica 
identity because column -1 is a system column
 -- fail, not a candidate key, nullable column
 ALTER TABLE test_replica_identity REPLICA IDENTITY USING INDEX 
test_replica_identity_nonkey;
 ERROR:  index "test_replica_identity_nonkey" cannot be used as replica 
identity because column "nonkey" is nullable
@@ -91,6 +95,7 @@ Indexes:
 "test_replica_identity_partial" UNIQUE, btree (keya, keyb) WHERE keyb <> 
'3'::text
 "test_replica_identity_unique_defer" UNIQUE CONSTRAINT, btree (keya, keyb) 
DEFERRABLE
 "test_replica_identity_unique_nondefer" UNIQUE CONSTRAINT, btree (keya, 
keyb)
+"test_replica_indentity_syscol" UNIQUE, btree (ctid)
 "test_replica_identity_hash" hash (nonkey)
 "test_replica_identity_keyab" btree (keya, keyb)
 
@@ -121,6 +126,7 @@ Indexes:
 "test_replica_identity_partial" UNIQUE, btree (keya, keyb) WHERE keyb <> 
'3'::text
 "test_replica_identity_unique_defer" UNIQUE CONSTRAINT, btree (keya, keyb) 
DEFERRABLE
 "test_replica_identity_unique_nondefer" UNIQUE CONSTRAINT, btree (keya, 
keyb)
+"test_replica_indentity_syscol" UNIQUE, btree (ctid)
 "test_replica_identity_hash" hash (nonkey)
 "test_replica_identity_keyab" btree (keya, keyb)
 
@@ -169,6 +175,7 @@ Indexes:
 "test_replica_identity_partial" UNIQUE, btree (keya, keyb) WHERE keyb <> 
'3'::text
 "test_replica_identity_unique_defer" UNIQUE CONSTRAINT, btree (keya, keyb) 
DEFERRABLE
 "test_replica_identity_unique_nondefer" UNIQUE CONSTRAINT, btree (keya, 
keyb)
+"test_replica_indentity_syscol" UNIQUE, btree (ctid)
 "test_replica_identity_hash" hash (nonkey)
 "test_replica_identity_keyab" btree (keya, keyb)
 Replica Identity: FULL
diff --git a/src/test/regress/sql/replica_identity.sql 
b/src/test/regress/sql/replica_identity.sql
index 20b6826..436851c 100644
--- a/src/tes

Re: [HACKERS] OOM in libpq and infinite loop with getCopyStart()

2016-03-31 Thread Tom Lane
Aleksander Alekseev  writes:
> +1, same here. Lets see whats committer's opinion.

I fooled around with this patch quite a bit but could not bring myself
to pull the trigger, because I think it fundamentally breaks applications
that follow the "repeat PQgetResult until NULL" rule.  The only reason
that psql manages not to fail is that it doesn't do that, it just calls
PQexec; and you've hacked PQexecFinish so that it falls out without
pumping PQgetResult till dry.  But that's not a good solution, it's just
a hack that makes the behavior unprincipled and unlike direct use of
PQgetResult.  The key problem is that, assuming that the changes in
getCopyStart() succeed in returning a PGRES_FATAL_ERROR PGresult, the
application may just follow the rule of doing nothing with it unless it's
the last non-null result from PQgetResult.  And it won't be, because
you've switched libpq's asyncStatus into one or another COPY status, which
will cause PQgetResult to continually create and return PGRES_COPY_XXX
results, which is what it's supposed to do in that situation (cf the last
step in getCopyResult).

Now, this will accidentally fail to fail if PQgetResult's attempt to
manufacture a PGRES_COPY_XXX result fails and returns null, which is
certainly possible if we're up against an OOM situation.  But what if
it doesn't fail --- which is also possible, because a PGRES_COPY_XXX
with no attached data will not need as much space as a PGRES_FATAL_ERROR
with attached error message.  The app probably throws away the
PGRES_FATAL_ERROR and tries to use the PGRES_COPY_XXX result, which
is almost okay, except that it will lack the copy format information
which will be fatal if the application is relying on that.

So AFAICT, this patch kinda sorta works for psql but it is not going
to make things better for other apps.

The other problem is that I don't have a lot of faith in the theory
that getCopyStart is going to be able to make an error PGresult when
it couldn't make a COPY PGresult.  The existing message-receipt routines
that this is modeled on are dealing with PGresults that are expected to
grow large, so throwing away the accumulated PGresult is highly likely
to free enough memory to let us allocate an error PGresult.  Not so
here.

I have to run, but the bottom line is I don't feel comfortable with
this.  It's trying to fix what's a very corner-case problem (since
COPY PGresults aren't large) but it's introducing new corner case
behaviors of its own.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery test failure for recovery_min_apply_delay on hamster

2016-03-31 Thread Michael Paquier
On Fri, Apr 1, 2016 at 4:14 AM, Alvaro Herrera  wrote:
> Noah Misch wrote:
>
>> The above-described topic is currently a PostgreSQL 9.6 open item.  Alvaro,
>> since you committed the patch believed to have created it, you own this open
>> item.
>
> That's correct.  Since we already had a patch available, I pushed it.
> I'll wait for a few days before marking the open item as closed in the
> wiki, to make sure that hamster reports success a few times.

Thanks. I just did a couple of additional manual tests on hamster, and
the sporadic failure does not show up anymore, so the daily runs
should be in good shape now for this test.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PQsendQuery+PQgetResult+PQsetSingleRowMode limitations and support

2016-03-31 Thread Karl O. Pinc
Hi,

Bruce Momjian suggested I write and ask about using libpq
to submit multiple SQL statements to the backend, and
then get results for each of the submitted statements,
row-by-row without server-side caching of the results.

Bruce wrote:
> I think this would be good
> to post to hackers to get a general discussion of the limitations of
> this approach and allow to communicate if this is something we want
> those interfaces to support.  My guess is this never used to work, but
> now it does.

As I read the documentation this functionality is supported.
(Although I do believe that the wording could be more clear.)

http://www.postgresql.org/docs/9.5/static/libpq-single-row-mode.html

And (I suppose):
http://www.postgresql.org/docs/9.5/static/libpq-async.html

FWIW, I would use such functionality to support an
interactive interface for users wishing to write SQL
and query the db directly.  Like psql does, only not
from the command line.

The following example program exhibits this functionality.
It runs on Debian Jesse (8.3) postgresql 9.4 (from the
Debian repos).

--
/*
 * byrow.c
 *
 *  Test that libpq, the PostgreSQL frontend library, can be given
 *  multiple statements and get the results of executing each,
 *  row-by-row without server side buffering.
 */
#include 
#include 
#include 

int stmtcnt = 0;

static void
exit_nicely(PGconn *conn)
{
  PQfinish(conn);
  exit(1);
}

static void
setting_failed(PGconn *conn)
{
  fprintf(stderr, "Unable to enter single row mode: %s\n",
  PQerrorMessage(conn));
  exit_nicely(conn);
}

int
main(int argc, char **argv)
{
  const char *conninfo;
  PGconn *conn;
  PGresult   *res;
  int first = 1;
  int nFields;
  int i,
j;

  /* Construct some statements to execute. */
  char *stmts = "select * from pg_database;\n"
"select * from pg_roles;\n"
"select count(*) from pg_tables;\n";

  /*
   * If the user supplies a parameter on the command line, use it as
   * the conninfo string; otherwise default to setting
   * dbname=postgres and using environment variables or defaults for
   * all other connection parameters.
   */
  if (argc > 1)
conninfo = argv[1];
  else
conninfo = "dbname = postgres";

  /* Make a connection to the database */
  conn = PQconnectdb(conninfo);

  /* Check to see that the backend connection was successfully made */
  if (PQstatus(conn) != CONNECTION_OK)
{
  fprintf(stderr, "Connection to database failed: %s",
  PQerrorMessage(conn));
  exit_nicely(conn);
}

  /* Send our statements off to the server. */
  if (!PQsendQuery(conn, stmts))
{
  fprintf(stderr, "Sending statements to server failed: %s\n",
  PQerrorMessage(conn));
  exit_nicely(conn);
}
  
  /* We want results row-by-row. */
  if (!PQsetSingleRowMode(conn))
{
  setting_failed(conn);
}

  /* Loop through the results of our statements. */
  while (res = PQgetResult(conn))
{

  switch (PQresultStatus(res))
{
case PGRES_TUPLES_OK:  /* No more rows from current query. */
  {
/* We want the next statement's results row-by-row also. */
if (!PQsetSingleRowMode(conn))
  {
PQclear(res);
setting_failed(conn);
  }
first = 1;
break;
  }

case PGRES_SINGLE_TUPLE:
  {
if (first)
  {
/* Produce a "nice" header" */
printf("\n%s\nResults of statement number %d:\n\n",
   "-"
   "-",
   stmtcnt++);

/* print out the attribute names */
nFields = PQnfields(res);
for (i = 0; i < nFields; i++)
  printf("%-15s", PQfname(res, i));
printf("\n\n");

first = 0;
  }

/* print out the row */
for (j = 0; j < nFields; j++)
  printf("%-15s", PQgetvalue(res, 0, j));
printf("\n");

break;
  }
default:
  /* Always call PQgetResult until it returns null, even on
   * error. */
  {
fprintf(stderr,
"Query execution failed: %s", PQerrorMessage(conn));
  }
}

  PQclear(res);
}

  /* close the connection to the database and cleanup */
  PQfinish(conn);

  return 0;
}
--

(You may recognize much of the code above because it was cribbed
from the libpq docs example #1.)


I assume there are no questions about supporting a
similar functionality only without PQsetSingleRowMode,
as follows:
--
/*
 * testmultistmt.c
 *
 *  Test that libpq

Re: [HACKERS] So, can we stop supporting Windows native now?

2016-03-31 Thread Michael Paquier
On Fri, Apr 1, 2016 at 9:05 AM, Craig Ringer  wrote:
> I completely agree there. I wrote some documentation on how to make project
> files to build extensions, but it'd be nice to *generate* them instead, like
> we do for in-tree builds.

Yes, I maintain some code that builds extension on Windows, those are
using the msvc scripts by linking to the tree directly, with some
custom-made wrappers for the calls that cannot be exported. That's
ugly, but anybody would have guessed that..

> The problem is that we don't install all the Perl build stuff on Windows,
> it's only in the build tree. The time consuming part wx  is
> / sll be cleaning that up so it can run outside src/tools/msvc, getting it
> to use pg_config, and making it installable. Then of course there's the
> issue that it won't be available before 9.7 at the soonest... unless some
> brave soul backports it as an installable add-on. Which means extension
> authors are still screwed for 5+ years. That's the main reason I haven't
> tackled it - so far I've got by pretty well with VS project files + property
> sheets, and any PGXS-like capability almost certainly couldn't be backported
> so it's nearly useless for about 5 years.

Still, it would be interesting to see how cmake affects this area of
the code before tackling anything, doing something now may be time
wasted for nothing at the end. So I would refrain from even putting
resources in this project.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

2016-03-31 Thread Noah Misch
On Thu, Mar 31, 2016 at 04:48:26PM +0900, Masahiko Sawada wrote:
> On Thu, Mar 31, 2016 at 2:02 PM, Noah Misch  wrote:
> > On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
> >> As a result of looked into code around the recvoery, ISTM that the
> >> cause is related to relation cache clear.
> >> In heap_xlog_visible, if the standby server receives WAL record then
> >> relation cache is eventually cleared in vm_extend,  but If standby
> >> server receives FPI then relation cache would not be cleared.
> >> For example, after I applied attached patch to HEAD, (it might not be
> >> right way but) this problem seems to be resolved.
> >>
> >> Is this a bug? or not?
> >
> > It's a bug.  I don't expect it causes queries to return wrong answers, 
> > because
> > visibilitymap.c says "it's always safe to clear a bit in the map from
> > correctness point of view."  (The bug makes a visibility map bit temporarily
> > appear to have been cleared.)  I still call it a bug, because recovery
> > behavior becomes too difficult to verify when xlog replay produces 
> > conditions
> > that don't happen outside of recovery.  Even if there's no way to get a 
> > wrong
> > query answer today, this would be too easy to break later.  I wonder if we
> > make the same omission in other xlog replay functions.  Similar omissions 
> > may
> > cause wrong query answers, even if this particular one does not.
> >
> > Would you like to bisect for the commit, or at least the major release, at
> > which the bug first appeared?
> >
> > I wonder if your discovery has any relationship to this recently-reported 
> > case
> > of insufficient smgr invalidation:
> > http://www.postgresql.org/message-id/flat/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=nh2o1fxkkj6yvobtvy...@mail.gmail.com
> >
> 
> I'm not sure this bug has relationship to another issue you mentioned
> but after further investigation, this bug seems to be reproduced even
> on more older version.
> At least I reproduced it at 9.0.0.

Would you try PostgreSQL 9.2.16?  The visibility map was not crash safe and
had no correctness implications until 9.2.  If 9.2 behaves this way, it's
almost certainly not a recent regression.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-03-31 Thread Craig Ringer
On 31 March 2016 at 21:53, Andrew Dunstan  wrote:


>
> The worst thing about developing from my POV isn't something that actually
> affects core developers so much, namely the lack of a nice MSVC equivalent
> of PGXS.
>

I completely agree there. I wrote some documentation on how to make project
files to build extensions, but it'd be nice to *generate* them instead,
like we do for in-tree builds.

The problem is that we don't install all the Perl build stuff on Windows,
it's only in the build tree. The time consuming part wx  is
/ sll be cleaning that up so it can run outside src/tools/msvc, getting it
to use pg_config, and making it installable. Then of course there's the
issue that it won't be available before 9.7 at the soonest... unless some
brave soul backports it as an installable add-on. Which means extension
authors are still screwed for 5+ years. That's the main reason I haven't
tackled it - so far I've got by pretty well with VS project files +
property sheets, and any PGXS-like capability almost certainly couldn't be
backported so it's nearly useless for about 5 years.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [PATCH v9] GSSAPI encryption support

2016-03-31 Thread Michael Paquier
On Fri, Apr 1, 2016 at 6:10 AM, Tom Lane  wrote:
> If that's what it is, it seems fairly broken to have it connected up to a
> GUC variable.  Especially one that's USERSET; some people will wonder why
> frobbing it with SET does nothing, and others will bitch that they think
> it should be superuser-only or some such.  I'd keep it localized to the
> connection logic, myself.  There's already logic in ProcessStartupPacket
> for connection options that aren't GUC variables, so I'd suggest adding
> another case there instead of pretending this is a settable GUC variable.

Argh, yes right. That's what we should look for.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Correction for replication slot creation error message in 9.6

2016-03-31 Thread Michael Paquier
On Thu, Mar 31, 2016 at 11:18 PM, Alvaro Herrera
 wrote:
> Andres Freund wrote:
>> On 2016-03-31 10:15:21 +0900, Ian Barwick wrote:
>
>> > Patch changes the error message to:
>> >
>> >   ERROR:  replication slots can only be used if wal_level is "replica" or 
>> > "logical"
>> >
>> > Explicitly naming the valid WAL levels matches the wording in the wal_level
>> > error hint used in a couple of places, i.e.
>>
>> The explicit naming makes it much more verbose to change anything around
>> wal level though, so consider me not a fan of spelling out all levels.
>
> I thought we had agreed that we weren't going to consider the wal_level
> values as a linear scale -- in other words, wordings such as "greater
> than FOO" are discouraged.  That's always seemed a bit odd to me.

Yes, that's what I thought as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-03-31 Thread Alvaro Herrera
Thomas Munro wrote:
> On Fri, Apr 1, 2016 at 2:53 AM, Andrew Dunstan  wrote:
> > It would also be nice to find out why we can't usefully scale shared buffers
> > higher like we can on *nix.
> 
> Has anyone ever looked into whether asking for SEC_LARGE_PAGES would
> help with that?

The huge_pages feature is fairly new, and as I recall the intention is
that other operating systems will be supported as we get patches for it.
Feel free to submit something.

If you can actually research the performance problem with large
shared_buffers, that would also be welcome.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-31 Thread Andres Freund


On March 31, 2016 11:13:46 PM GMT+02:00, Jesper Pedersen 
 wrote:

>I can do a USE_CONTENT_LOCK run on 0003 if it is something for 9.6.

Yes please. I think the lock variant is realistic, the lockless did isn't.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes for next week's releases

2016-03-31 Thread Peter Geoghegan
On Thu, Mar 31, 2016 at 2:59 PM, Tom Lane  wrote:
> Well, too late for 9.5.2 anyway.  It still makes sense to correct that
> text for future releases.  I'm inclined to wait a little bit though and
> see what other improvements become apparent.  For instance, I think the
> point about non-first index columns not being affected is of greater
> weight than you seem to place on it.

The SQL query on the Wiki page does the right thing there now, so
users will have the benefit of not unnecessarily reindexing when text
was not the leading/first pg_index attribute. We have that covered, I
suppose, because everyone will look to the Wiki page for guidance.

I also noted quite a few non-obvious safe cases on the Wiki page, as
pointed out already over on the other thread.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes for next week's releases

2016-03-31 Thread Tom Lane
Peter Geoghegan  writes:
> I just noticed that the release notes mention char(n) as affected.
> That's not actually true, because char(n) SortSupport only came in
> 9.6. The Wiki page now shows this, which may be the most important
> place, but ideally we'd fix this in the release notes. I guess it's
> too late.

Well, too late for 9.5.2 anyway.  It still makes sense to correct that
text for future releases.  I'm inclined to wait a little bit though and
see what other improvements become apparent.  For instance, I think the
point about non-first index columns not being affected is of greater
weight than you seem to place on it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft release notes for next week's releases

2016-03-31 Thread Peter Geoghegan
On Sat, Mar 26, 2016 at 4:34 PM, Tom Lane  wrote:
> Probably the most discussion-worthy item is whether we can say
> anything more about the strxfrm mess.  Should we make a wiki
> page about that and have the release note item link to it?

I just noticed that the release notes mention char(n) as affected.
That's not actually true, because char(n) SortSupport only came in
9.6. The Wiki page now shows this, which may be the most important
place, but ideally we'd fix this in the release notes. I guess it's
too late.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-31 Thread Jesper Pedersen

Hi,

On 03/30/2016 07:09 PM, Andres Freund wrote:

Yes. That looks good. My testing shows that increasing the number of
buffers can increase both throughput and reduce latency variance. The
former is a smaller effect with one of the discussed patches applied,
the latter seems to actually increase in scale (with increased
throughput).


I've attached patches to:
0001: Increase the max number of clog buffers
0002: Implement 64bit atomics fallback and optimize read/write
0003: Edited version of Simon's clog scalability patch

WRT 0003 - still clearly WIP - I've:
- made group_lsn pg_atomic_u64*, to allow for tear-free reads
- split content from IO lock
- made SimpleLruReadPage_optShared always return with only share lock
   held
- Implement a different, experimental, concurrency model for
   SetStatusBit using cmpxchg. A define USE_CONTENT_LOCK controls which
   bit is used.

I've tested this and saw this outperform Amit's approach. Especially so
when using a read/write mix, rather then only reads. I saw over 30%
increase on a large EC2 instance with -btpcb-like@1 -bselect-only@3. But
that's in a virtualized environment, not very good for reproducability.

Amit, could you run benchmarks on your bigger hardware? Both with
USE_CONTENT_LOCK commented out and in?

I think we should go for 1) and 2) unconditionally. And then evaluate
whether to go with your, or 3) from above. If the latter, we've to do
some cleanup :)



I have been testing Amit's patch in various setups and work loads, with 
up to 400 connections on a 2 x Xeon E5-2683 (28C/56T @ 2 GHz), not 
seeing an improvement, but no regression either.


Testing with 0001 and 0002 do show up to a 5% improvement when using a 
HDD for data + wal - about 1% when using 2 x RAID10 SSD - unlogged.


I can do a USE_CONTENT_LOCK run on 0003 if it is something for 9.6.

Thanks for your work on this !

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v9] GSSAPI encryption support

2016-03-31 Thread Tom Lane
Robbie Harwood  writes:
>>> +   {
>>> +   {"gss_encrypt", PGC_USERSET, CONN_AUTH_SECURITY,
>>> +gettext_noop("Require encryption for all GSSAPI connections."),
>>> +NULL,
>>> +GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
>>> +   },
>>> +   &gss_encrypt, false, check_gss_encrypt, assign_gss_encrypt, NULL
>>> +   },

>> Why is this marked NOT_IN_SAMPLE?

> Well, because it's not something that's supposed to be set in the file
> (and indeed, you can't set it there, if I understand
> GUC_DISALLOW_IN_FILE).  It's used only as a connection parameter, and I
> use its presence / absence for the client and server to negotiate GSSAPI
> encryption support.

If that's what it is, it seems fairly broken to have it connected up to a
GUC variable.  Especially one that's USERSET; some people will wonder why
frobbing it with SET does nothing, and others will bitch that they think
it should be superuser-only or some such.  I'd keep it localized to the
connection logic, myself.  There's already logic in ProcessStartupPacket
for connection options that aren't GUC variables, so I'd suggest adding
another case there instead of pretending this is a settable GUC variable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> See "Approximating block accesses in database organizations", S. B. Yao,
>> Communications of the ACM, Volume 20 Issue 4, April 1977 Pages 260-261

> That sounds nice all right, but I'm not sure it's actually helpful,
> because the article text is not available anywhere.  I doubt most people
> will spend 15 bucks to buy that paper ... so we don't actually know
> whether the paper supports the chosen formula :-)  unless you have a
> CACM subscription and can verify it.

Well, I do and I did, see my previous response.

> I think it's good to have the ACM reference anyway, for posterity, but
> it'd be good to (additionally) have something that people can read.

I'm just concerned about what happens when the Dellera paper stops being
available.  I don't mind including that URL as a backup to the written-out
argument I just suggested.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Tom Lane
Dean Rasheed  writes:
> Yeah, that makes sense. In fact, if we only apply the adjustment when
> reldistinct > 0 and rel->rows < rel->tuples, and rewrite the first
> argument to pow() as (rel->tuples - rel->rows) / rel->tuples, then it
> is guaranteed to be non-negative. If rel->rows >= rel->tuples (not
> sure if it can be greater), then we just want the original
> reldistinct.

Works for me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Tom Lane
Dean Rasheed  writes:
> On 31 March 2016 at 21:40, Tom Lane  wrote:
>> Let's use something like this:
>> See "Approximating block accesses in database organizations", S. B. Yao,
>> Communications of the ACM, Volume 20 Issue 4, April 1977 Pages 260-261

Actually, having now looked at both the Dellera paper and the Yao one,
what the latter shows seems to be equivalent to Dellera's equation (2)
(the first one in his section 2.2).  But what the code is actually
using is the approximation in the second equation in 2.2, and that
one is certainly not in Yao, although it's not hard to see how you
get from the first to the second if you assume i << Nr.

I think it'd be worth writing out equation (2), attributing it to the Yao
paper, and then saying something like "Alberto Dellera points out that if
Nd is large, so that all the values of i are much less than Nr, then this
formula can be approximated by [the formula used in the code].  It turns
out that that formula also works well even when Nd is small."

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v9] GSSAPI encryption support

2016-03-31 Thread Robbie Harwood
Alvaro Herrera  writes:

> Robbie Harwood wrote:
>> Michael Paquier  writes:
>
>> > +   iov[0].iov_base = lenbuf;
>> > +   iov[0].iov_len = 4;
>> > +   iov[1].iov_base = output.value;
>> > +   iov[1].iov_len = output.length;
>> > +
>> > +   ret = writev(port->sock, iov, 2);
>> >
>> > writev and iovec are not present on Windows, so this code would never
>> > compile there, and it does not sound that this patch is a reason
>> > sufficient enough to drop support of GSSAPI on Windows.
>> 
>> Um.  Okay.  I guess on Windows I'll make two write calls then, since the
>> only other option I see is to hit alloc again here.
>
> Hmm, I wouldn't push my luck by using writev here at all.  We don't use
> writev/readv anywhere, and it's quite possible that they are not present
> on older Unixen which we still support.
> http://pubs.opengroup.org/onlinepubs/009695399/functions/writev.html
> says writev was introduced in "issue 4 version 2", which AFAICT is the
> 2004 version, but our baseline is SUSv2 (1997).  So it's definitely not
> workable.

Understood.  What do you suggest instead?  To give some context here,
writev() is being used here because I have a GSSAPI payload that is
(effectively) opaque and need to include length in it.  The only
alternatives I can see are either allocating a new buffer and reading
the payload + length into it (incurs additional memory overhead), or
calling a write/send function twice (incurs syscall overhead at
minimum).

>> > +   {
>> > +   {"gss_encrypt", PGC_USERSET, CONN_AUTH_SECURITY,
>> > +gettext_noop("Require encryption for all GSSAPI connections."),
>> > +NULL,
>> > +GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
>> > +   },
>> > +   &gss_encrypt, false, check_gss_encrypt, assign_gss_encrypt, NULL
>> > +   },
>
> Why is this marked NOT_IN_SAMPLE?

Well, because it's not something that's supposed to be set in the file
(and indeed, you can't set it there, if I understand
GUC_DISALLOW_IN_FILE).  It's used only as a connection parameter, and I
use its presence / absence for the client and server to negotiate GSSAPI
encryption support.


signature.asc
Description: PGP signature


Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Alvaro Herrera
Tom Lane wrote:

> > The article text refers to this 1977 S. B. Yao paper "Approximating
> > block accesses in database organizations" which doesn't appear to be
> > available online, except behind ACM's paywall at
> > http://dl.acm.org/citation.cfm?id=359475 
> 
> Well, a CACM citation is perfectly fine by my lights (especially one
> that's that far back and therefore certainly patent-free ...)
> 
> Let's use something like this:
> 
> See "Approximating block accesses in database organizations", S. B. Yao,
> Communications of the ACM, Volume 20 Issue 4, April 1977 Pages 260-261

That sounds nice all right, but I'm not sure it's actually helpful,
because the article text is not available anywhere.  I doubt most people
will spend 15 bucks to buy that paper ... so we don't actually know
whether the paper supports the chosen formula :-)  unless you have a
CACM subscription and can verify it.

I think it's good to have the ACM reference anyway, for posterity, but
it'd be good to (additionally) have something that people can read.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Dean Rasheed
On 31 March 2016 at 21:40, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> Another minor gripe is the use of a random URL as justification.  This
>>> code will still be around when that URL exists nowhere but the Wayback
>>> Machine.  Can't we find a more formal citation to use?
>
>> The article text refers to this 1977 S. B. Yao paper "Approximating
>> block accesses in database organizations" which doesn't appear to be
>> available online, except behind ACM's paywall at
>> http://dl.acm.org/citation.cfm?id=359475
>
> Well, a CACM citation is perfectly fine by my lights (especially one
> that's that far back and therefore certainly patent-free ...)
>
> Let's use something like this:
>
> See "Approximating block accesses in database organizations", S. B. Yao,
> Communications of the ACM, Volume 20 Issue 4, April 1977 Pages 260-261
>

Sounds good.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Dean Rasheed
On 31 March 2016 at 20:18, Tom Lane  wrote:
> Also, I wonder if it'd be a good idea to provide a guard against division
> by zero --- we know rel->tuples > 0 at this point, but I'm less sure that
> reldistinct can't be zero.  In the same vein, I'm worried about the first
> argument of pow() being slightly negative due to roundoff error, leading
> to a NaN result.

Yeah, that makes sense. In fact, if we only apply the adjustment when
reldistinct > 0 and rel->rows < rel->tuples, and rewrite the first
argument to pow() as (rel->tuples - rel->rows) / rel->tuples, then it
is guaranteed to be non-negative. If rel->rows >= rel->tuples (not
sure if it can be greater), then we just want the original
reldistinct.

> Maybe we should also consider clamping the final reldistinct estimate to
> an integer with clamp_row_est().  The existing code doesn't do that but
> it seems like a good idea on general principles.

OK, that seems sensible.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Another minor gripe is the use of a random URL as justification.  This
>> code will still be around when that URL exists nowhere but the Wayback
>> Machine.  Can't we find a more formal citation to use?

> The article text refers to this 1977 S. B. Yao paper "Approximating
> block accesses in database organizations" which doesn't appear to be
> available online, except behind ACM's paywall at
> http://dl.acm.org/citation.cfm?id=359475 

Well, a CACM citation is perfectly fine by my lights (especially one
that's that far back and therefore certainly patent-free ...)

Let's use something like this:

See "Approximating block accesses in database organizations", S. B. Yao,
Communications of the ACM, Volume 20 Issue 4, April 1977 Pages 260-261

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-03-31 Thread Thomas Munro
On Fri, Apr 1, 2016 at 2:53 AM, Andrew Dunstan  wrote:
> It would also be nice to find out why we can't usefully scale shared buffers
> higher like we can on *nix.

Has anyone ever looked into whether asking for SEC_LARGE_PAGES would
help with that?  I noticed that another popular RDBMS recommends
enabling this to see a gain when its buffer pool is "several
gigabytes".

I don't do Windows myself, but from poking around in the docs, it
looks like you need to grant SeLockMemoryPrivilege (Start > Control
Panel > Administrative Tools > Local Security Policy > User Rights
Assignment > Lock pages in memory > Action > Properties) because large
pages can't be swapped out (just like on other OSs).  So maybe it
could work like huge_pages = try on Linux so that it works out of the
box with 4K pages, but starts using 2MB (?) pages if you grant
SeLockMemoryPrivilege.

https://msdn.microsoft.com/en-us/library/windows/desktop/aa366543(v=vs.85).aspx

Just a thought.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Please correct/improve wiki page about abbreviated keys bug

2016-03-31 Thread Peter Geoghegan
On Wed, Mar 30, 2016 at 6:23 PM, Josh berkus  wrote:
> Based on that concept, I wrote a query which is now on the wiki page.
> Please fix it if it's not showing what we want it to show.

Sorry, I got caught up with something last night. I did this this
morning Pacific time.

I've made a number of passes over this. I've listed more non-affected
cases, since there are a number of non-obvious cases that are not at
risk, and fixed-up the SQL query, which didn't seem to be doing the
right thing about default collations or domains. I also made it
explicit that non-leading column indexes are not affected (although
that's just mentioned in passing when discussing the SQL query, as
it's a bit esoteric).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Alvaro Herrera
Tom Lane wrote:

> Another minor gripe is the use of a random URL as justification.  This
> code will still be around when that URL exists nowhere but the Wayback
> Machine.  Can't we find a more formal citation to use?

The article text refers to this 1977 S. B. Yao paper "Approximating
block accesses in database organizations" which doesn't appear to be
available online, except behind ACM's paywall at
http://dl.acm.org/citation.cfm?id=359475 

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Alexander Korotkov
On Thu, Mar 31, 2016 at 8:21 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

>
> I think these changes worth running benchmark again.  I'm going to run it
> on 4x18 Intel.
>

The results are following.

clients master  v3  v5  v9
1   11671   12507   12679   12408
2   24650   26005   25010   25495
4   49631   48863   49811   50150
8   96790   96441   99946   98383
10  121275  119928  124100  124180
20  243066  243365  246432  248723
30  359616  342241  357310  378881
40  431375  415310  441619  425653
50  489991  489896  500590  502549
60  538057  636473  554069  685010
70  588659  714426  738535  719383
80  405008  923039  902632  909126
90  295443  1181247 1155918 1179163
100 258695  1323125 1325019 1351578
110 238842  1393767 1410274 1421531
120 226018  1432504 1474982 1497122
130 215102  1465459 1503241 1521619
140 206415  1470454 1505380 1541816
150 197850  1475479 1519908 1515017
160 190935  1420915 1484868 1523150
170 185835  1438965 1453128 1499913
180 182519  1416252 1453098 1472945

It appears that atomic OR for LockBufHdr() gives small but measurable
effect.  Great idea, Andres!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v9] GSSAPI encryption support

2016-03-31 Thread Alvaro Herrera
Robbie Harwood wrote:
> Michael Paquier  writes:

> > +   iov[0].iov_base = lenbuf;
> > +   iov[0].iov_len = 4;
> > +   iov[1].iov_base = output.value;
> > +   iov[1].iov_len = output.length;
> > +
> > +   ret = writev(port->sock, iov, 2);
> >
> > writev and iovec are not present on Windows, so this code would never
> > compile there, and it does not sound that this patch is a reason
> > sufficient enough to drop support of GSSAPI on Windows.
> 
> Um.  Okay.  I guess on Windows I'll make two write calls then, since the
> only other option I see is to hit alloc again here.

Hmm, I wouldn't push my luck by using writev here at all.  We don't use
writev/readv anywhere, and it's quite possible that they are not present
on older Unixen which we still support.
http://pubs.opengroup.org/onlinepubs/009695399/functions/writev.html
says writev was introduced in "issue 4 version 2", which AFAICT is the
2004 version, but our baseline is SUSv2 (1997).  So it's definitely not
workable.

> > +   {
> > +   {"gss_encrypt", PGC_USERSET, CONN_AUTH_SECURITY,
> > +gettext_noop("Require encryption for all GSSAPI connections."),
> > +NULL,
> > +GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
> > +   },
> > +   &gss_encrypt, false, check_gss_encrypt, assign_gss_encrypt, NULL
> > +   },

Why is this marked NOT_IN_SAMPLE?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Queries and PostGIS

2016-03-31 Thread Paul Ramsey
On Tue, Mar 29, 2016 at 12:51 PM, Paul Ramsey  wrote:
> On Tue, Mar 29, 2016 at 12:48 PM, Paul Ramsey  
> wrote:
>
>>> On the join case, I wonder if it's possible that _st_intersects is not
>>> marked parallel-safe?  If that's not the problem, I don't have a
>>> second guess, but the thing to do would be to figure out whether
>>> consider_parallel is false for the RelOptInfo corresponding to either
>>> of pd and pts, or whether it's true for both but false for the
>>> joinrel's RelOptInfo, or whether it's true for all three of them but
>>> you don't get the desired path anyway.
>>
>> _st_intersects is definitely marked parallel safe, and in fact will
>> generate a parallel plan if used alone (without the operator though,
>> it's impossibly slow). It's the && operator that is the issue... and I
>> just noticed that the PROCEDURE bound to the && operator
>> (geometry_overlaps) is *not* marked parallel safe: could be the
>> problem?
>
> Asked and answered: marking the geometry_overlaps as parallel safe
> gets me a parallel plan! Now to play with costs and see how it behaves
> when force_parallel_mode is not set.

For the record I can get a non-forced parallel join plan, *only* if I
reduce the parallel_join_cost by a factor of 10, from 0.1 to 0.01.

http://blog.cleverelephant.ca/2016/03/parallel-postgis-joins.html

This seems non-optimal. No amount of cranking up the underlying
function COST seems to change this, perhaps because the join cost is
entirely based on the number of expected tuples in the join relation?

In general it seems like function COST values have been considered a
relatively unimportant input to planning in the past, but with
parallel processing it seems like they are now a lot more
determinative about what makes a good plan.

P.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Tom Lane
Dean Rasheed  writes:
> On 30 March 2016 at 14:03, Tomas Vondra  wrote:
>> Attached is v4 of the patch

> Thanks, I think this is good to go, except that I think we need to use
> pow() rather than powl() because AIUI powl() is new to C99, and so
> won't necessarily be available on all supported platforms. I don't
> think we need worry about loss of precision, since that would only be
> an issue if rel->rows / rel->tuples were smaller than maybe 10^-14 or
> so, and it seems unlikely we'll get anywhere near that any time soon.

I took a quick look.  I concur with using pow() not powl(); the latter
is not in SUS v2 which is our baseline portability expectation, and in
fact there is *noplace* where we expect long double to work.  Moreover,
I don't believe that any of the estimates we're working with are so
accurate that a double-width power result would be a useful improvement.

Also, I wonder if it'd be a good idea to provide a guard against division
by zero --- we know rel->tuples > 0 at this point, but I'm less sure that
reldistinct can't be zero.  In the same vein, I'm worried about the first
argument of pow() being slightly negative due to roundoff error, leading
to a NaN result.

Maybe we should also consider clamping the final reldistinct estimate to
an integer with clamp_row_est().  The existing code doesn't do that but
it seems like a good idea on general principles.

Another minor gripe is the use of a random URL as justification.  This
code will still be around when that URL exists nowhere but the Wayback
Machine.  Can't we find a more formal citation to use?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery test failure for recovery_min_apply_delay on hamster

2016-03-31 Thread Alvaro Herrera
Noah Misch wrote:

> The above-described topic is currently a PostgreSQL 9.6 open item.  Alvaro,
> since you committed the patch believed to have created it, you own this open
> item.

That's correct.  Since we already had a patch available, I pushed it.
I'll wait for a few days before marking the open item as closed in the
wiki, to make sure that hamster reports success a few times.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery test failure for recovery_min_apply_delay on hamster

2016-03-31 Thread Alvaro Herrera
Michael Paquier wrote:

> Actually, the attached is better. This one relies on time() to perform
> the delay checks, and takes care of things even for slow machines.

Thanks, pushed with some minor adjustments.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: index-only scans with partial indexes

2016-03-31 Thread Tom Lane
Kyotaro HORIGUCHI  writes:
> Thank you for the comment. The new version is attached.

Committed with rather heavy editorialization and a batch of regression
test cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving GROUP BY estimation

2016-03-31 Thread Dean Rasheed
On 30 March 2016 at 14:03, Tomas Vondra  wrote:
> Attached is v4 of the patch

Thanks, I think this is good to go, except that I think we need to use
pow() rather than powl() because AIUI powl() is new to C99, and so
won't necessarily be available on all supported platforms. I don't
think we need worry about loss of precision, since that would only be
an issue if rel->rows / rel->tuples were smaller than maybe 10^-14 or
so, and it seems unlikely we'll get anywhere near that any time soon.

I think this is a good, well thought-out change, so unless anyone
objects I'll push it (probably this weekend).

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-03-31 Thread Alvaro Herrera
What led you to choose the ? operator for the FOLLOWED BY semantics?
It doesn't seem a terribly natural choice -- most other things seems to
use ? as some sort of wildcard.  What about something like "...", so you
would do
  SELECT q @@ to_tsquery('fatal ... error');
and 
  SELECT q @@ (tsquery 'fatal' ... tsquery 'error');

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-03-31 Thread Dmitry Ivanov
Hi Teodor,

I've looked through your version and made a few adjustments.

> Pls, remove tsquery_setweight from patch because it's not a part of phrase
> search and it isn't mentioned in docs. Please, make a separate patch for it.

I've removed tsquery_setweight as you requested. I'm going to add it to the 
following commitfest later.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Companydiff --git a/contrib/tsearch2/expected/tsearch2.out b/contrib/tsearch2/expected/tsearch2.out
index 972f764..97379e7 100644
--- a/contrib/tsearch2/expected/tsearch2.out
+++ b/contrib/tsearch2/expected/tsearch2.out
@@ -278,15 +278,15 @@ SELECT '(!1|2)&3'::tsquery;
 (1 row)
 
 SELECT '1|(2|(4|(5|6)))'::tsquery;
- tsquery 
--
- '1' | ( '2' | ( '4' | ( '5' | '6' ) ) )
+   tsquery   
+-
+ '1' | '2' | '4' | '5' | '6'
 (1 row)
 
 SELECT '1|2|4|5|6'::tsquery;
- tsquery 
--
- ( ( ( '1' | '2' ) | '4' ) | '5' ) | '6'
+   tsquery   
+-
+ '1' | '2' | '4' | '5' | '6'
 (1 row)
 
 SELECT '1&(2&(4&(5&6)))'::tsquery;
@@ -340,7 +340,7 @@ select 'a' > 'b & c'::tsquery;
 select 'a | f' < 'b & c'::tsquery;
  ?column? 
 --
- t
+ f
 (1 row)
 
 select 'a | ff' < 'b & c'::tsquery;
@@ -443,9 +443,9 @@ select count(*) from test_tsquery where keyword >  'new & york';
 
 set enable_seqscan=on;
 select rewrite('foo & bar & qq & new & york',  'new & york'::tsquery, 'big & apple | nyc | new & york & city');
- rewrite  
---
- 'foo' & 'bar' & 'qq' & ( 'city' & 'new' & 'york' | ( 'nyc' | 'big' & 'apple' ) )
+   rewrite
+--
+ 'foo' & 'bar' & 'qq' & ( 'nyc' | 'big' & 'apple' | 'city' & 'new' & 'york' )
 (1 row)
 
 select rewrite('moscow', 'select keyword, sample from test_tsquery'::text );
@@ -461,9 +461,9 @@ select rewrite('moscow & hotel', 'select keyword, sample from test_tsquery'::tex
 (1 row)
 
 select rewrite('bar &  new & qq & foo & york', 'select keyword, sample from test_tsquery'::text );
-   rewrite   
--
- 'citi' & 'foo' & ( 'bar' | 'qq' ) & ( 'nyc' | ( 'big' & 'appl' | 'new' & 'york' ) )
+ rewrite 
+-
+ ( 'nyc' | 'big' & 'appl' | 'new' & 'york' ) & 'citi' & 'foo' & ( 'bar' | 'qq' )
 (1 row)
 
 select rewrite( ARRAY['moscow', keyword, sample] ) from test_tsquery;
@@ -479,9 +479,9 @@ select rewrite( ARRAY['moscow & hotel', keyword, sample] ) from test_tsquery;
 (1 row)
 
 select rewrite( ARRAY['bar &  new & qq & foo & york', keyword, sample] ) from test_tsquery;
-   rewrite   
--
- 'citi' & 'foo' & ( 'bar' | 'qq' ) & ( 'nyc' | ( 'big' & 'appl' | 'new' & 'york' ) )
+ rewrite 
+-
+ ( 'nyc' | 'big' & 'appl' | 'new' & 'york' ) & 'citi' & 'foo' & ( 'bar' | 'qq' )
 (1 row)
 
 select keyword from test_tsquery where keyword @> 'new';
@@ -520,9 +520,9 @@ select rewrite( ARRAY[query, keyword, sample] ) from test_tsquery, to_tsquery('e
 (1 row)
 
 select rewrite( ARRAY[query, keyword, sample] ) from test_tsquery, to_tsquery('english', 'bar &  new & qq & foo & york') as query where keyword <@ query;
-   rewrite   
--
- 'citi' & 'foo' & ( 'bar' | 'qq' ) & ( 'nyc' | ( 'big' & 'appl' | 'new' & 'york' ) )
+ rewrite 
+-
+ ( 'nyc' | 'big' & 'appl' | 'new' & 'york' ) & 'citi' & 'foo' & ( 'bar' | 'qq' )
 (1 row)
 
 select rewrite( ARRAY[query, keyword, sample] ) from test_tsquery, to_tsquery('english', 'moscow') as query where query @> keyword;
@@ -538,9 +538,9 @@ select rewrite( ARRAY[query, keyword, sample] ) from test_tsquery, to_tsquery('e
 (1 row)
 
 select rewrite( ARRAY[query, keyword, sample] ) from test_tsquery, to_tsquery('english', 'bar &  new & qq & foo & york') as query where query @> keyw

Re: [HACKERS] WIP: Access method extendability

2016-03-31 Thread Markus Nullmeier
On 03/31/16 17:29, Alexander Korotkov wrote:

> New revision of patches is attached.

> 1) API of generic xlog was changed: now there is no static variables,
>GenericXLogStart() returns palloc'd struct.

Attached are two trivial documentation editing fixes for this, as an
incremental patch.

-- 
Markus Nullmeierhttp://www.g-vo.org
German Astrophysical Virtual Observatory (GAVO)

diff --git a/doc/src/sgml/generic-wal.sgml b/doc/src/sgml/generic-wal.sgml
index 6655f22..b3388ba 100644
--- a/doc/src/sgml/generic-wal.sgml
+++ b/doc/src/sgml/generic-wal.sgml
@@ -43,7 +43,7 @@
 
 
  
-  GenericXLogAbort(state) — finish construction of
+  GenericXLogFinish(state) — finish construction of
   a generic xlog record.
  
 
@@ -52,7 +52,7 @@
 
   
The xlog record construction can be canceled between any of the above
-   steps by calling GenericXLogAbort().  This will discard all
+   steps by calling GenericXLogAbort(state).  This will discard all
changes to the page image copies.
   
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Alexander Korotkov
On Thu, Mar 31, 2016 at 7:14 PM, Andres Freund  wrote:

> > +/*
> > + * The following two macros are aimed to simplify buffer state
> modification
> > + * in CAS loop.  It's assumed that variable "uint32 state" is defined
> outside
> > + * of this loop.  It should be used as following:
> > + *
> > + * BEGIN_BUFSTATE_CAS_LOOP(bufHdr);
> > + * modifications of state variable;
> > + * END_BUFSTATE_CAS_LOOP(bufHdr);
> > + *
> > + * For local buffers, these macros shouldn't be used..  Since there is
> > + * no cuncurrency, local buffer state could be chaged directly by atomic
> > + * read/write operations.
> > + */
> > +#define BEGIN_BUFSTATE_CAS_LOOP(bufHdr) \
> > + do { \
> > + SpinDelayStatus delayStatus =
> init_spin_delay((Pointer)(bufHdr)); \
>
> > + uint32  oldstate; \
> > + oldstate = pg_atomic_read_u32(&(bufHdr)->state); \
> > + for (;;) { \
> > + while (oldstate & BM_LOCKED) \
> > + { \
> > + make_spin_delay(&delayStatus); \
> > + oldstate =
> pg_atomic_read_u32(&(bufHdr)->state); \
> > + } \
> > + state = oldstate
> > +
> > +#define END_BUFSTATE_CAS_LOOP(bufHdr) \
> > + if (pg_atomic_compare_exchange_u32(&bufHdr->state,
> &oldstate, state)) \
> > + break; \
> > + } \
> > + finish_spin_delay(&delayStatus); \
> > + } while (0)
>
> Hm. Not sure if that's not too much magic. Will think about it.
>

I'm not sure too...


> >   /*
> > +  * Lock buffer header - set BM_LOCKED in buffer state.
> > +  */
> > + uint32
> > + LockBufHdr(BufferDesc *desc)
> > + {
> > + uint32  state;
> > +
> > + BEGIN_BUFSTATE_CAS_LOOP(desc);
> > + state |= BM_LOCKED;
> > + END_BUFSTATE_CAS_LOOP(desc);
> > +
> > + return state;
> > + }
> > +
>
> Hm. It seems a bit over the top to do the full round here. How about
>
> uint32 oldstate;
> while (true)
> {
> oldstate = pg_atomic_fetch_or(..., BM_LOCKED);
> if (!(oldstate & BM_LOCKED)
> break;
> perform_spin_delay();
> }
> return oldstate | BM_LOCKED;
>
> Especially if we implement atomic xor on x86, that'd likely be a good
> bit more efficient.
>

Nice idea.  Done.


> >  typedef struct BufferDesc
> >  {
> >   BufferTag   tag;/* ID of page contained in
> buffer */
> > - BufFlagsflags;  /* see bit definitions
> above */
> > - uint8   usage_count;/* usage counter for clock sweep
> code */
> > - slock_t buf_hdr_lock;   /* protects a subset of fields,
> see above */
> > - unsignedrefcount;   /* # of backends holding
> pins on buffer */
> > - int wait_backend_pid;   /* backend
> PID of pin-count waiter */
> >
> > + /* state of the tag, containing flags, refcount and usagecount */
> > + pg_atomic_uint32 state;
> > +
> > + int wait_backend_pid;   /* backend
> PID of pin-count waiter */
> >   int buf_id; /* buffer's index
> number (from 0) */
> >   int freeNext;   /* link in
> freelist chain */
>
> Hm. Won't matter on most platforms, but it seems like a good idea to
> move to an 8 byte aligned boundary. Move buf_id up?
>

I think so.  Done.


> > +/*
> > + * Support for spin delay which could be useful in other places where
> > + * spinlock-like procedures take place.
> > + */
> > +typedef struct
> > +{
> > + int spins;
> > + int delays;
> > + int cur_delay;
> > + Pointer ptr;
> > + const char *file;
> > + int line;
> > +} SpinDelayStatus;
> > +
> > +#define init_spin_delay(ptr) {0, 0, 0, (ptr), __FILE__, __LINE__}
> > +
> > +void make_spin_delay(SpinDelayStatus *status);
> > +void finish_spin_delay(SpinDelayStatus *status);
> > +
> >  #endif/* S_LOCK_H */
>
> s/make_spin_delay/perform_spin_delay/? The former sounds like it's
> allocating something or such.
>

Done.

I think these changes worth running benchmark again.  I'm going to run it
on 4x18 Intel.


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v9] GSSAPI encryption support

2016-03-31 Thread Robbie Harwood
Michael Paquier  writes:

> On Thu, Mar 31, 2016 at 2:14 PM, Michael Paquier
>  wrote:
>> On Wed, Mar 30, 2016 at 1:01 PM, Robbie Harwood  wrote:
>>> A new version of my GSSAPI encryption patchset is available, both in
>>> this email and on my github:
>>> https://github.com/frozencemetery/postgres/tree/feature/gssencrypt9
>>>
>> postgres.h should be declared *before* be-gssapi-common.h. As I
>> suggested the refactoring and I guess you don't have a Windows
>> environment at hand, attached is a patch that fixes the build with and
>> without gssapi for Visual Studio, that can be applied on top of your
>> 0001. Feel free to rebase using it.

Thank you for the patch to fix 0001.  There is basically no way I would
have been able to make it work on Windows myself.

> +   iov[0].iov_base = lenbuf;
> +   iov[0].iov_len = 4;
> +   iov[1].iov_base = output.value;
> +   iov[1].iov_len = output.length;
> +
> +   ret = writev(port->sock, iov, 2);
>
> writev and iovec are not present on Windows, so this code would never
> compile there, and it does not sound that this patch is a reason
> sufficient enough to drop support of GSSAPI on Windows.

Um.  Okay.  I guess on Windows I'll make two write calls then, since the
only other option I see is to hit alloc again here.

> +   {
> +   {"gss_encrypt", PGC_USERSET, CONN_AUTH_SECURITY,
> +gettext_noop("Require encryption for all GSSAPI connections."),
> +NULL,
> +GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE
> +   },
> +   &gss_encrypt, false, check_gss_encrypt, assign_gss_encrypt, NULL
> +   },
>
> Hm. I think that this would be better as PGC_POSTMASTER, the idea
> behind this parameter being that the server enforces any connections
> from clients to be encrypted. Clients should not have the right to
> disable that at will depending on their session, and this is used in
> authentication code paths. Also, this parameter is not documented, the
> new parameters in pg_hba.conf and client-side are though.

Negative, setting PGC_POSTMASTER breaks the fallback support; I get

"psql: FATAL:  parameter "gss_encrypt" cannot be changed without
restarting the server"

when trying to connect a new client to a new server.

I will add documentation.

> +   /*
> +* We tried to request GSSAPI encryption, but the
> +* server doesn't support it.  Retries are permitted
> +* here, so hang up and try again.  A connection that
> +* doesn't support appname will also not support
> +* GSSAPI encryption.
> +*/
> +   const char *sqlstate;
> Hm... I'm having second thoughts regarding gss_enc_require... This
> code path would happen with a 9.6 client and a ~9.5 server, it seems a
> bit treacherous to not ERROR here and fallback to an unencrypted
> connection, client want to have an encrypted connection at the end.

I'm confused by what you're saying here.

1. If you have a 9.6 client and a 9.5 server, with no additional
   connection parameters the client will reconnect and get an
   unencrypted connection.  This is the "fallback" support.

2. If the client passes gss_enc_require=1, then it will just fail
   because the server cannot provide encryption.

> I ran as well some tests, and I was able to crash the server. For
> example take this SQL sequence:
> CREATE TABLE aa (a int);
> INSERT INTO aa VALUES (generate_series(1,100));
> COPY aa TO STDOUT; -- crash
> Which resulted in the following crash:
> 1046AssertArg(MemoryContextIsValid(context));
> (gdb) bt
> #0  0x00980552 in repalloc (pointer=0x2aa1bb0, size=8192) at 
> mcxt.c:1046
> #1  0x006b75d5 in enlargeStringInfo (str=0x2aa6b70,
> needed=7784) at stringinfo.c:286
> #2  0x006b7447 in appendBinaryStringInfo (str=0x2aa6b70,
> data=0x2b40f25
> "\337\321\261\026jy\352\334#\275l)\030\021s\235\f;\237\336\222PZsd\025>ioS\313`9C\375\a\340Z\354E\355\235\276y\307)D\261\344$D\347\323\036\177S\265\374\373\332~\264\377\317\375<\017\330\214P\a\237\321\375\002$=6\326\263\265{\237\344\214\344.W\303\216\373\206\325\257E\223N\t\324\223\030\363\252&\374\241T\322<\343,\233\203\320\252\343\344\f\036*\274\311\066\206\n\337\300\320L,>-A\016D\346\263pv+A>y\324\254k\003)\264\212zc\344\n\223\224\211\243\"\224\343\241Q\264\233\223\303\"\b\275\026%\302\352\065]8\207\244\304\353\220p\364\272\240\307\247l\216}N\325\aUO6\322\352\273"...,
> datalen=7783) at stringinfo.c:213
> #3  0x006c8359 in be_gssapi_write (port=0x2aa31d0,
> ptr=0x2aa8b48, len=8192) at be-secure-gssapi.c:158
> #4  0x006b9697 in secure_write (port=0x2aa31d0, ptr=0x2aa8b48,
> len=8192) at be-secure.c:248
>
> Which is actually related to your use of StringInfoData, pointing out
> that the buffer handling should be reworked.

This crash is not deterministic.  I do observe it though and will try to
debug.  Any tips on debugging this are apprecia

Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-03-31 Thread Teodor Sigaev
Looking at patch, I'm inlined to commit it in current commitfest, it looks 
workable and too many people desire it. I've did some changes, mostly in 
formatting and comments.


Patch makes some user-visible changes:
1 change order for tsquery. Assume, it's acceptable, only a few users store 
tsquery in table and have a Btree index. They will need to reindex.

2 less number of parenthesis in tsquery output, and tsquery becomes more 
readable.

Pls, remove tsquery_setweight from patch because it's not a part of phrase 
search and it isn't mentioned in docs. Please, make a separate patch for it.



Dmitry Ivanov wrote:

Sorry for the delay, I desperately needed some time to finish a bunch of
dangling tasks.

I've added some new comments and clarified the ones that were obscure.
Moreover, I felt an urge to recheck most parts of the code since apparently
nobody (besides myself) has gone so far yet.

On 25.03.16 18:42 MSK, David Steele wrote:

Time is short and it's not encouraging that you say there is "still much

work to be done".

Indeed, I was inaccurate. I am more than interested in the positive outcome.






--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


phrase-0.4.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Andres Freund
Hi,

> +/*
> + * The following two macros are aimed to simplify buffer state modification
> + * in CAS loop.  It's assumed that variable "uint32 state" is defined outside
> + * of this loop.  It should be used as following:
> + *
> + * BEGIN_BUFSTATE_CAS_LOOP(bufHdr);
> + * modifications of state variable;
> + * END_BUFSTATE_CAS_LOOP(bufHdr);
> + *
> + * For local buffers, these macros shouldn't be used..  Since there is
> + * no cuncurrency, local buffer state could be chaged directly by atomic
> + * read/write operations.
> + */
> +#define BEGIN_BUFSTATE_CAS_LOOP(bufHdr) \
> + do { \
> + SpinDelayStatus delayStatus = 
> init_spin_delay((Pointer)(bufHdr)); \

> + uint32  oldstate; \
> + oldstate = pg_atomic_read_u32(&(bufHdr)->state); \
> + for (;;) { \
> + while (oldstate & BM_LOCKED) \
> + { \
> + make_spin_delay(&delayStatus); \
> + oldstate = 
> pg_atomic_read_u32(&(bufHdr)->state); \
> + } \
> + state = oldstate
> +
> +#define END_BUFSTATE_CAS_LOOP(bufHdr) \
> + if (pg_atomic_compare_exchange_u32(&bufHdr->state, 
> &oldstate, state)) \
> + break; \
> + } \
> + finish_spin_delay(&delayStatus); \
> + } while (0)

Hm. Not sure if that's not too much magic. Will think about it.

>   /*
> +  * Lock buffer header - set BM_LOCKED in buffer state.
> +  */
> + uint32
> + LockBufHdr(BufferDesc *desc)
> + {
> + uint32  state;
> + 
> + BEGIN_BUFSTATE_CAS_LOOP(desc);
> + state |= BM_LOCKED;
> + END_BUFSTATE_CAS_LOOP(desc);
> + 
> + return state;
> + }
> + 

Hm. It seems a bit over the top to do the full round here. How about

uint32 oldstate;
while (true)
{
oldstate = pg_atomic_fetch_or(..., BM_LOCKED);
if (!(oldstate & BM_LOCKED)
break;
perform_spin_delay();
}
return oldstate | BM_LOCKED;

Especially if we implement atomic xor on x86, that'd likely be a good
bit more efficient.


>  typedef struct BufferDesc
>  {
>   BufferTag   tag;/* ID of page contained in 
> buffer */
> - BufFlagsflags;  /* see bit definitions above */
> - uint8   usage_count;/* usage counter for clock sweep code */
> - slock_t buf_hdr_lock;   /* protects a subset of fields, see 
> above */
> - unsignedrefcount;   /* # of backends holding pins 
> on buffer */
> - int wait_backend_pid;   /* backend PID 
> of pin-count waiter */
>  
> + /* state of the tag, containing flags, refcount and usagecount */
> + pg_atomic_uint32 state;
> +
> + int wait_backend_pid;   /* backend PID 
> of pin-count waiter */
>   int buf_id; /* buffer's index 
> number (from 0) */
>   int freeNext;   /* link in freelist 
> chain */

Hm. Won't matter on most platforms, but it seems like a good idea to
move to an 8 byte aligned boundary. Move buf_id up?

> +/*
> + * Support for spin delay which could be useful in other places where
> + * spinlock-like procedures take place.
> + */
> +typedef struct
> +{
> + int spins;
> + int delays;
> + int cur_delay;
> + Pointer ptr;
> + const char *file;
> + int line;
> +} SpinDelayStatus;
> +
> +#define init_spin_delay(ptr) {0, 0, 0, (ptr), __FILE__, __LINE__}
> +
> +void make_spin_delay(SpinDelayStatus *status);
> +void finish_spin_delay(SpinDelayStatus *status);
> +
>  #endif/* S_LOCK_H */

s/make_spin_delay/perform_spin_delay/? The former sounds like it's
allocating something or such.


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-03-31 Thread Vitaly Burovoy
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

I have reviewed this patch.
It applies cleanly at the top of current master (3501f71), compiles silently 
and implements declared feature.

The documentation describes behavior of the function with pointing to a 
difference between inserting into JsonbObject and JsonbArray.

The code is clean and commented. Linked comment is changed too.

Regression tests cover possible use cases and edge cases.


Notes for a committer:
1. pg_proc.h has changed, so the CATALOG_VERSION_NO must also be changed.
2. Code comments and changes in the documentation need proof-reading by a native
English speaker, which the Author and I are not.


The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-03-31 Thread Joe Conway
On 03/30/2016 09:17 PM, Tom Lane wrote:
> Craig Ringer  writes:
>> On 31 March 2016 at 07:49, Josh berkus  wrote:
>>> So, can we stop supporting Windows native now?
> 
>> Why would we want to?
> 
>> The cost is small.
> 
> Surely you jest.  Windows is the single biggest PITA platform from a
> portability perspective, and has been in every release cycle since we
> first had a Windows port, and there is no close second.

Add to that the cost on extension authors. I would surely *love* to dump
Windows support in PL/R as it is a major league PITA. It is probably an
understatement to say that over the last 10+ years, 95+% of the time I
have spent maintaining and supporting PL/R has been directly
attributable to the Windows port.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-03-31 Thread Joe Conway
On 03/31/2016 06:53 AM, Andrew Dunstan wrote:
> The worst thing about developing from my POV isn't something that
> actually affects core developers so much, namely the lack of a nice MSVC
> equivalent of PGXS.

Bingo!

+1

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] [PATCH] we have added support for box type in SP-GiST index

2016-03-31 Thread Emre Hasegeli
> Thank you, pushed

Thank you for working on this.

I noticed that have overlooked this:

 static RectBox *
-initRectBox()
+initRectBox(void)
 {


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Alexander Korotkov
Hi!

On Thu, Mar 31, 2016 at 4:59 PM, Amit Kapila 
wrote:

> On Tue, Mar 29, 2016 at 10:52 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> Hi, Andres!
>>
>> Please, find next revision of patch in attachment.
>>
>>
> Couple of minor comments:
>
> +  * The following two macroses
>
> is macroses right word to be used here?
>
> +  * of this loop.  It should be used as fullowing:
>
> /fullowing/following
>
> +  * For local buffers usage of these macros shouldn't be used.
>
> isn't it better to write it as
>
> For local buffers, these macros shouldn't be used.
>
>
>   static int ts_ckpt_progress_comparator(Datum a, Datum b, void *arg);
>
> -
>
> Spurious line deletion.
>

All of above is fixed.

+  * Since buffers are pinned/unpinned very frequently, this functions tries
> +  * to pin buffer as cheap as possible.
>
> /this functions tries
>
> which functions are you referring here? Comment seems to be slightly
> unclear.
>

I meant just PinBuffer() there.  UnpinBuffer() has another comment in the
body.  Fixed.


> ! if (XLogHintBitIsNeeded() && (pg_atomic_read_u32(&bufHdr->state) &
> BM_PERMANENT))
>
> Is there a reason that you have kept macro's to read refcount and
> usagecount, but not for flags?
>

We could change dealing with flags to GET/SET macros.  But I guess such
change would make review more complicated, because it would touch some
places which are unchanged for now.  I think it could be done in a separate
refactoring patch.

Apart from this, I have verified that patch compiles on Windows and passed
> regressions (make check)!
>

Thank you!  I didn't manage to try this on Windows.


> Nice work!
>

Thank you!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pinunpin-cas-8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Access method extendability

2016-03-31 Thread Alexander Korotkov
Hi!

New revision of patches is attached.

Changes are following:
1) API of generic xlog was changed: now there is no static variables,
GenericXLogStart()
returns palloc'd struct.
2) Generic xlog use elog instead ereport since it reports internal errors
which shouldn't happen normally.
3) Error messages doesn't contains name of the function.
4) Bloom contrib was pgindented.
5) More comments for bloomb.
6) One more assert was added to bloom.
7) makeDefaultBloomOptions was renamed to adjustBloomOptions.  Now it only
modifies its parameter. palloc is done outside of this function.
8) BloomBuildState is explicitly zeroed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0002-generic-xlog.15.patch
Description: Binary data


0003-bloom-contrib.15.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Correction for replication slot creation error message in 9.6

2016-03-31 Thread Alvaro Herrera
Andres Freund wrote:
> On 2016-03-31 10:15:21 +0900, Ian Barwick wrote:

> > Patch changes the error message to:
> > 
> >   ERROR:  replication slots can only be used if wal_level is "replica" or 
> > "logical"
> > 
> > Explicitly naming the valid WAL levels matches the wording in the wal_level
> > error hint used in a couple of places, i.e.
> 
> The explicit naming makes it much more verbose to change anything around
> wal level though, so consider me not a fan of spelling out all levels.

I thought we had agreed that we weren't going to consider the wal_level
values as a linear scale -- in other words, wordings such as "greater
than FOO" are discouraged.  That's always seemed a bit odd to me.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Tom Lane
Andres Freund  writes:
> Oh. I confused my approaches. I was thinking about going for 2):

>> 2) Replace the lwlock spinlock by a bit in LWLock->state. That'd avoid
>> embedding the spinlock, and actually might allow to avoid one atomic
>> op in a number of cases.

> precisely because of that concern.

Oh, okay, ignore my comment just now in the other thread.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit 6150a1b0

2016-03-31 Thread Tom Lane
Robert Haas  writes:
> On Thu, Mar 31, 2016 at 6:45 AM, Andres Freund  wrote:
>> On 2016-03-31 06:43:19 -0400, Robert Haas wrote:
>>> To which proposal are you referring?

>> 1) in 
>> http://www.postgresql.org/message-id/20160328130904.4mhugvkf4f3wg...@awork2.anarazel.de

> OK.  So, Noah, my proposed strategy is to wait and see if Andres can
> make that work, and if not, then revisit the issue of what to do.

I thought that proposal had already crashed and burned, on the grounds
that byte-size spinlocks require instructions that many PPC machines
don't have.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Amit Kapila
On Tue, Mar 29, 2016 at 10:52 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Hi, Andres!
>
> Please, find next revision of patch in attachment.
>
>
Couple of minor comments:

+  * The following two macroses

is macroses right word to be used here?

+  * of this loop.  It should be used as fullowing:

/fullowing/following

+  * For local buffers usage of these macros shouldn't be used.

isn't it better to write it as

For local buffers, these macros shouldn't be used.


  static int ts_ckpt_progress_comparator(Datum a, Datum b, void *arg);

-

Spurious line deletion.



+  * Since buffers are pinned/unpinned very frequently, this functions tries
+  * to pin buffer as cheap as possible.

/this functions tries

which functions are you referring here? Comment seems to be slightly
unclear.


! if (XLogHintBitIsNeeded() && (pg_atomic_read_u32(&bufHdr->state) &
BM_PERMANENT))

Is there a reason that you have kept macro's to read refcount and
usagecount, but not for flags?


Apart from this, I have verified that patch compiles on Windows and passed
regressions (make check)!

Nice work!

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-03-31 Thread Andrew Dunstan



On 03/31/2016 06:38 AM, Andres Freund wrote:

On 2016-03-31 13:30:58 +0300, Yury Zhuravlev wrote:

Craig Ringer wrote:

Yeah, you're right. He's not the only one either.

I was reacting to the original post, and TBH didn't think it through. The
commit logs suggest there's a decent amount of work that goes in, and I'm
sure a lot of it isn't visible when just looking for 'windows', 'win32',
'msvc', etc.

Even the build system affects people who don't use it, if they're adding
features. I recently backported a bunch of 9.3 functionality to 9.1, and
in the process simply stripped out all the Windows build system changes as
"meh, too hard, don't care".

So yeah. I casually handwaved away a lot of work that's not highly
visible, but still happens and is important, and was wrong to do so. I've
done a bit on Windows myself but didn't fully recognise the burden support
for it places on patches to core infrastructure and on committers.

Maybe someone in the community to appoint a chief for Windows?

If somebody wanted to be that they'd just have to start doing it. It's
not like windows problems are an infrequent occurrance.




For the most part Magnus and I have been that de facto, as the 
committers  most involved with Windows over the 11 or so years we've had 
the port.


We also have a few other prominent developers who help out - e.g. with 
the recent VS2015 stuff.


The worst thing about developing from my POV isn't something that 
actually affects core developers so much, namely the lack of a nice MSVC 
equivalent of PGXS.


It would also be nice to find out why we can't usefully scale shared 
buffers higher like we can on *nix.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too old, configured by time

2016-03-31 Thread Kevin Grittner
On Wed, Mar 30, 2016 at 9:19 PM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>
>> Just a note: I began looking at the tests, but finished looking at the
>> patch entirely at the end by curiosity. Regarding the integration of
>> this patch for 9.6, I think that bumping that to 9.7 would be wiser
>> because the patch needs to be re-written largely, and that's never a
>> good sign at this point of the development cycle.
>
> Not rewritten surelY?  It will need a very large mechanical change to
> existing BufferGetPage calls, but to me that doesn't equate "rewriting"
> it.

I'll submit patches later today to make the mechanical change to
the nearly 500 BufferGetPage() calls and to tweak to the 36 places
to use the new "test" flag with the new signature rather than
adding a line for the test.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-31 Thread Dilip Kumar
On Thu, Mar 31, 2016 at 4:28 PM, Robert Haas  wrote:

> Yeah, kind of.  But obviously if we could make the limit smaller
> without hurting performance, that would be better.
>
> Per my note yesterday about performance degradation with parallel
> COPY, I wasn't able to demonstrate that this patch gives a consistent
> performance benefit on hydra - the best result I got was speeding up a
> 9.5 minute load to 8 minutes where linear scalability would have been
> 2 minutes.  And I found cases where it was actually slower with the
> patch.  Now maybe hydra is just a crap machine, but I'm feeling
> nervous.
>


I see the performance gain when  either
 "*complete data is in SSD*"
or *"data on MD and WAL on SSD"*
or *unlogged table*.


What machines did you use to test this?  Have you tested really large
> data loads, like many MB or even GB of data?
>


With INSERT Script within 2 mins run data size is  18GB I am running 5-10
Mins means at least 85GB data.
(Inserts 1000 1KB tuples in each transactions)

With COPY Script within 2 mins run data size is 23GB and I am running 5-10
Mins means at least 100GB data.
(Inserts 1 tuples in each transactions (tuple size is 1byte to 5 bytes)

Machine Details
---
I tested in 8 socket NUMA machine with 64 core.
Machine Details:
--
[dilip.kumar@cthulhu ~]$ lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):128
On-line CPU(s) list:   0-127
Thread(s) per core:2
Core(s) per socket:8
Socket(s): 8
NUMA node(s):  8
Vendor ID: GenuineIntel
CPU family:6
Model: 47
Model name:Intel(R) Xeon(R) CPU E7- 8830  @ 2.13GHz
Stepping:  2
CPU MHz:   1064.000
BogoMIPS:  4266.62


If you need some more information please let me know ?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-03-31 Thread Dmitry Dolgov
On 31 March 2016 at 17:31, Vitaly Burovoy  wrote:

> it is logical to insert new value if "before", then current value, then new
> value if "after".
>

Oh, I see now. There is a slightly different logic: `v` is a current value
and `newval` is a new value.
So basically we insert a current item in case of "after", then a new value
(if it's not a delete operation),
then a current item in case of "before". But I agree, this code can be more
straightforward. I've attached
a new version, pls take a look (it contains the same logic that you've
mentioned).
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1bc9fbc..8fb2624 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10903,6 +10903,9 @@ table2-mapping
jsonb_set
   
   
+   jsonb_insert
+  
+  
jsonb_pretty
   
 
@@ -11184,6 +11187,40 @@ table2-mapping
 

   
+   
+   
+   jsonb_insert(target jsonb, path text[], new_value jsonb, insert_after boolean)
+   
+   
+   jsonb
+   
+ Returns target with
+ new_value inserted.
+ If target section designated by
+ path is in a JSONB array,
+ new_value will be inserted before target or
+ after if insert_after is true (default is
+ false). If target section
+ designated by path is in JSONB object,
+ jsonb_insert behaves as
+ jsonb_set function where
+ create_missing is true. As with the
+ path orientated operators, negative integers that appear in
+ path count from the end of JSON arrays.
+   
+   
+   
+   jsonb_insert('{"a": [0,1,2]}', '{a, 1}', '"new_value"')
+   
+   
+   jsonb_insert('{"a": [0,1,2]}', '{a, 1}', '"new_value"', true)
+   
+   
+   {"a": [0, "new_value", 1, 2]}
+ {"a": [0, 1, "new_value", 2]}
+
+   
+  
jsonb_pretty(from_json jsonb)
  
text
@@ -11235,10 +11272,11 @@ table2-mapping
   
 
   All the items of the path parameter of jsonb_set
-  must be present in the target, unless
-  create_missing is true, in which case all but the last item
-  must be present. If these conditions are not met the target
-  is returned unchanged.
+  as well as jsonb_insert except the last item must be present
+  in the target. If create_missing is false, all
+  items of the path parameter of jsonb_set must be
+  present. If these conditions are not met the target is
+  returned unchanged.
 
 
   If the last path item is an object key, it will be created if it
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 9ae1ef4..a6e661c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -997,3 +997,11 @@ RETURNS text[]
 LANGUAGE INTERNAL
 STRICT IMMUTABLE
 AS 'parse_ident';
+
+CREATE OR REPLACE FUNCTION
+  jsonb_insert(jsonb_in jsonb, path text[] , replacement jsonb,
+insert_after boolean DEFAULT false)
+RETURNS jsonb
+LANGUAGE INTERNAL
+STRICT IMMUTABLE
+AS 'jsonb_insert';
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index 97e0e8e..97bb08e 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -33,6 +33,15 @@
 #include "utils/memutils.h"
 #include "utils/typcache.h"
 
+/* Operations available for setPath */
+#define JB_PATH_NOOP	0x
+#define JB_PATH_CREATE	0x0001
+#define JB_PATH_DELETE	0x0002
+#define JB_PATH_INSERT_BEFORE			0x0004
+#define JB_PATH_INSERT_AFTER			0x0008
+#define JB_PATH_CREATE_OR_INSERT (JB_PATH_INSERT_BEFORE | JB_PATH_INSERT_AFTER \
+		| JB_PATH_CREATE)
+
 /* semantic action functions for json_object_keys */
 static void okeys_object_field_start(void *state, char *fname, bool isnull);
 static void okeys_array_start(void *state);
@@ -130,14 +139,14 @@ static JsonbValue *IteratorConcat(JsonbIterator **it1, JsonbIterator **it2,
 static JsonbValue *setPath(JsonbIterator **it, Datum *path_elems,
 		bool *path_nulls, int path_len,
 		JsonbParseState **st, int level, Jsonb *newval,
-		bool create);
+		int op_type);
 static void setPathObject(JsonbIterator **it, Datum *path_elems,
 			  bool *path_nulls, int path_len, JsonbParseState **st,
 			  int level,
-			  Jsonb *newval, uint32 npairs, bool create);
+			  Jsonb *newval, uint32 npairs, int op_type);
 static void setPathArray(JsonbIterator **it, Datum *path_elems,
 			 bool *path_nulls, int path_len, JsonbParseState **st,
-			 int level, Jsonb *newval, uint32 nelems, bool create);
+			 int level, Jsonb *newval, uint32 nelems, int op_type);
 static void addJsonbToParseState(JsonbParseState **jbps, Jsonb *jb);
 
 /* state for json_object_keys */
@@ -3544,7 +3553,7 @@ jsonb_set(PG_FUNCTION_ARGS)
 	it = JsonbIteratorInit(&in->root);
 
 	res = setPath(&it, path_elems, path_nulls, path_len, &st,
-  0, newval, 

Re: [HACKERS] PATCH: index-only scans with partial indexes

2016-03-31 Thread Kyotaro HORIGUCHI
Thank you for the comment. The new version is attached.

Some issues has not been addressed but the rest will be addresses
in the next version.

At Thu, 31 Mar 2016 08:42:50 +0200, Tomas Vondra  
wrote in <90d885f2-e5ce-6668-226f-c817154e4...@2ndquadrant.com>
> On 03/31/2016 01:36 AM, Tom Lane wrote:
> > Kevin Grittner  writes:
> >> I'm taking my name off as committer and marking it "Ready for
> >> Committer".  If someone else wants to comment on the issues where
> >> Tom and Kyotaro-san still seem unsatisfied to the point where I
> >> can get my head around it, I could maybe take it back on as
> >> committer -- if anyone feels that could be a net win.
> >
> > I'll pick it up.  In a quick scan I see some things I don't like, but
> > nothing insoluble:
> >
> > * Having plancat.c initialize the per-IndexOptInfo restriction lists
> > * seems
> > fairly bizarre.  I realize that Tomas or Kyotaro probably did that in
> > response to my complaint about having check_partial_indexes have
> > side-effects on non-partial indexes, but this isn't an improvement.

I felt the same thing for it. It is for discussion. It is the
earliest point where we can initialize baserestrictinfo of each
IndexOptInfo. And the original location is just after and is the
latest point. Reverted.

> > For one thing, it means we will produce an incorrect plan if
> > more restriction clauses are added to the rel after plancat.c
> > runs, as the comment for check_partial_indexes contemplates.

Mmm. Thank you. I overlooked that. The following code seems
saying related thing.

> if (index->predOK)
> continue;/* don't repeat work if already proven OK */

So, index restrinctioninfo should be calculated even if
index->predOK is already true. But refactroring suggested by the
following comment will fix this.

> > (I *think* that comment may be obsolete, but I'm not sure.)
> > I think a better answer to that complaint is to rename
> > check_partial_indexes to something else, and more importantly
> > adjust its header comment to reflect its expanded
> > responsibilities --- as the patch I was commenting on at the
> > time failed to do.

Ok, I removed the index baserestrictinfo (a bit long to repeat..)
creation out of check_partial_indexes and named
rebuild_index_restrictinfo. (calling for better names..)

This loosens the restriction on the timing to trim an index
restrictinfo. It is now moved to create_index_paths.

> > * It took me some time to convince myself that the patch doesn't break
> > generation of plans for EvalPlanQual.  I eventually found where it
> > skips removal of restrictions for target relations, but that's
> > drastically undercommented.

It is originally in create_indexscan_plan and had no
documentation about EPQ. But I also want such documentation there
so I added it on in rebuild_index_restrictinfo.

> > * Likewise, there's inadequate documentation of why it doesn't break
> > bitmap scans, which also have to be able to recheck correctly.

This trims clauses that implied by index predicate, which
donesn't need recheck even if it is a bitmap scan, I believe. Is
it wrong? The original comment on check_index_only said that,

>  * XXX this is overly conservative for partial indexes, since we will
>  * consider attributes involved in the index predicate as required even
>  * though the predicate won't need to be checked at runtime.  (The same is
>  * true for attributes used only in index quals, if we are certain that
>  * the index is not lossy.)  However, it would be quite expensive to
>  * determine that accurately at this point, so for now we take the easy
>  * way out.

This seems to me that such clauses are safely ignored. But not
for attributes used only in index quals, because of possible
lossy charcter of the index. It seems quite reasonable. This
optimization won't be hold if this comment is wrong.

> > * I'm not sure that we have any regression test cases covering the
> > above two points, but we should.

I agree. Will try to add in the next version, maybe posted tomorrow.

> > * The comments leave a lot to be desired in general, eg there are
> > repeated failures to update comments only a few lines away from a
> > change.

I'll recheck that and fix in the next version.

> Kyotaro,
> 
> I may look into fixing those issues early next week, but that's fairly
> close to the freeze. Also, I'll have to look into parts that I'm not
> very familiar with (like the EvalPlanQual), so feel free to address
> those issues ;-)

Aye sir!

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From aee7790b3f02826a94b4e7195ffb50afa398cf26 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 9 Mar 2016 16:37:00 +0900
Subject: [PATCH] v11

---
 src/backend/nodes/outfuncs.c |  1 +
 src/backend/optimizer/path/costsize.c|  8 ++--
 src/backend/optimizer/path/indxpath.c| 80 +++-
 src/backend/optimizer/plan/createplan.c  | 4

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-31 Thread Andres Freund
On 2016-03-31 17:52:12 +0530, Amit Kapila wrote:
> On Thu, Mar 31, 2016 at 3:48 PM, Andres Freund  wrote:
> >
> > On 2016-03-31 15:07:22 +0530, Amit Kapila wrote:
> > > On Thu, Mar 31, 2016 at 4:39 AM, Andres Freund 
> wrote:
> > > >
> > > > On 2016-03-28 22:50:49 +0530, Amit Kapila wrote:
> > > > > On Fri, Sep 11, 2015 at 8:01 PM, Amit Kapila <
> amit.kapil...@gmail.com>
> > > > > wrote:
> > > > > >
> > > >
> > > > Amit, could you run benchmarks on your bigger hardware? Both with
> > > > USE_CONTENT_LOCK commented out and in?
> > > >
> > >
> > > Yes.
> >
> > Cool.
> >
> >
> > > > I think we should go for 1) and 2) unconditionally.
> >
> > > Yes, that makes sense.  On 20 min read-write pgbench --unlogged-tables
> > > benchmark, I see that with HEAD Tps is 36241 and with increase the clog
> > > buffers patch, Tps is 69340 at 128 client count (very good performance
> > > boost) which indicates that we should go ahead with 1) and 2) patches.
> >
> > Especially considering the line count... I do wonder about going crazy
> > and increasing to 256 immediately. It otherwise seems likely that we'll
> > have the the same issue in a year.  Could you perhaps run your test
> > against that as well?
> >
> 
> Unfortunately, it dipped to 65005 with 256 clog bufs.  So I think 128 is
> appropriate number.

Ah, interesting. Then let's go with that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Andres Freund
On 2016-03-31 12:58:55 +0200, Andres Freund wrote:
> On 2016-03-31 06:54:02 -0400, Robert Haas wrote:
> > On Wed, Mar 30, 2016 at 3:16 AM, Andres Freund  wrote:
> > > Yea, as Tom pointed out that's not going to work.  I'll try to write a
> > > patch for approach 1).
> > 
> > Does this mean that any platform that wants to perform well will now
> > need a sub-4-byte spinlock implementation?  That's has a somewhat
> > uncomfortable sound to it.
> 
> Oh. I confused my approaches. I was thinking about going for 2):
> 
> > 2) Replace the lwlock spinlock by a bit in LWLock->state. That'd avoid
> >embedding the spinlock, and actually might allow to avoid one atomic
> >op in a number of cases.
> 
> precisely because of that concern.

Here's a WIP patch to evaluate. Dilip/Ashutosh, could you perhaps run
some benchmarks, to see whether this addresses the performance issues?

I guess it'd both be interesting to compare master with master + patch,
and this thread's latest patch with the patch additionally applied.

Thanks!

Andres
>From 623581574409bdf5cbfbba005bb2e961cb689573 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 31 Mar 2016 14:14:20 +0200
Subject: [PATCH 1/4] WIP: Avoid the use of a separate spinlock to protect
 LWLock's wait queue.

Previously we used a spinlock, in adition to the atomically manipulated
->state field, to protect the wait queue. But it's pretty simple to
instead perform the locking using a flag in state.

This tries to address a performance regression on PPC due to
6150a1b0. Our PPC spinlocks are 4 byte each, which makes BufferDesc
exceed 64bytes, causing cacheline-sharing issues.x

Discussion: caa4ek1+zeb8pmwwktf+3brs0pt4ux6rs6aom0uip8c6shjw...@mail.gmail.com
20160327121858.zrmrjegmji2ym...@alap3.anarazel.de
---
 src/backend/storage/lmgr/lwlock.c | 178 --
 src/include/storage/lwlock.h  |   1 -
 2 files changed, 94 insertions(+), 85 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 53ae7d5..ec6baf6 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -97,6 +97,7 @@ extern slock_t *ShmemLock;
 
 #define LW_FLAG_HAS_WAITERS			((uint32) 1 << 30)
 #define LW_FLAG_RELEASE_OK			((uint32) 1 << 29)
+#define LW_FLAG_LOCKED((uint32) 1 << 28)
 
 #define LW_VAL_EXCLUSIVE			((uint32) 1 << 24)
 #define LW_VAL_SHARED1
@@ -711,7 +712,6 @@ RequestNamedLWLockTranche(const char *tranche_name, int num_lwlocks)
 void
 LWLockInitialize(LWLock *lock, int tranche_id)
 {
-	SpinLockInit(&lock->mutex);
 	pg_atomic_init_u32(&lock->state, LW_FLAG_RELEASE_OK);
 #ifdef LOCK_DEBUG
 	pg_atomic_init_u32(&lock->nwaiters, 0);
@@ -842,6 +842,56 @@ LWLockAttemptLock(LWLock *lock, LWLockMode mode)
 	pg_unreachable();
 }
 
+static void
+LWLockWaitListLock(LWLock *lock)
+{
+	uint32 old_state;
+#ifdef LWLOCK_STATS
+	lwlock_stats *lwstats;
+	uint32 delays = 0;
+
+	lwstats = get_lwlock_stats_entry(lock);
+#endif
+
+	old_state = pg_atomic_read_u32(&lock->state);
+	while (true)
+	{
+		if (old_state & LW_FLAG_LOCKED)
+		{
+			/* FIXME: add exponential backoff */
+			pg_spin_delay();
+			old_state = pg_atomic_read_u32(&lock->state);
+#ifdef LWLOCK_STATS
+			delays++;
+#endif
+		}
+		else
+		{
+			old_state = pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_LOCKED);
+			if (!(old_state & LW_FLAG_LOCKED))
+			{
+/* got lock */
+break;
+			}
+		}
+	}
+
+#ifdef LWLOCK_STATS
+	lwstats->spin_delay_count += delays;
+#endif
+
+}
+
+static void
+LWLockWaitListUnlock(LWLock *lock)
+{
+	uint32 old_state PG_USED_FOR_ASSERTS_ONLY;
+
+	old_state = pg_atomic_fetch_and_u32(&lock->state, ~LW_FLAG_LOCKED);
+
+	Assert(old_state & LW_FLAG_LOCKED);
+}
+
 /*
  * Wakeup all the lockers that currently have a chance to acquire the lock.
  */
@@ -852,22 +902,13 @@ LWLockWakeup(LWLock *lock)
 	bool		wokeup_somebody = false;
 	dlist_head	wakeup;
 	dlist_mutable_iter iter;
-#ifdef LWLOCK_STATS
-	lwlock_stats *lwstats;
-
-	lwstats = get_lwlock_stats_entry(lock);
-#endif
 
 	dlist_init(&wakeup);
 
 	new_release_ok = true;
 
 	/* Acquire mutex.  Time spent holding mutex should be short! */
-#ifdef LWLOCK_STATS
-	lwstats->spin_delay_count += SpinLockAcquire(&lock->mutex);
-#else
-	SpinLockAcquire(&lock->mutex);
-#endif
+	LWLockWaitListLock(lock);
 
 	dlist_foreach_modify(iter, &lock->waiters)
 	{
@@ -904,19 +945,34 @@ LWLockWakeup(LWLock *lock)
 
 	Assert(dlist_is_empty(&wakeup) || pg_atomic_read_u32(&lock->state) & LW_FLAG_HAS_WAITERS);
 
-	/* Unset both flags at once if required */
-	if (!new_release_ok && dlist_is_empty(&wakeup))
-		pg_atomic_fetch_and_u32(&lock->state,
-~(LW_FLAG_RELEASE_OK | LW_FLAG_HAS_WAITERS));
-	else if (!new_release_ok)
-		pg_atomic_fetch_and_u32(&lock->state, ~LW_FLAG_RELEASE_OK);
-	else if (dlist_is_empty(&wakeup))
-		pg_atomic_fetch_and_u32(&lock->state, ~LW_FLAG_HAS_WAITERS);
-	else if (new_release_ok)
-		pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_RELEASE_OK);
+	/* unse

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-31 Thread Amit Kapila
On Thu, Mar 31, 2016 at 3:48 PM, Andres Freund  wrote:
>
> On 2016-03-31 15:07:22 +0530, Amit Kapila wrote:
> > On Thu, Mar 31, 2016 at 4:39 AM, Andres Freund 
wrote:
> > >
> > > On 2016-03-28 22:50:49 +0530, Amit Kapila wrote:
> > > > On Fri, Sep 11, 2015 at 8:01 PM, Amit Kapila <
amit.kapil...@gmail.com>
> > > > wrote:
> > > > >
> > >
> > > Amit, could you run benchmarks on your bigger hardware? Both with
> > > USE_CONTENT_LOCK commented out and in?
> > >
> >
> > Yes.
>
> Cool.
>
>
> > > I think we should go for 1) and 2) unconditionally.
>
> > Yes, that makes sense.  On 20 min read-write pgbench --unlogged-tables
> > benchmark, I see that with HEAD Tps is 36241 and with increase the clog
> > buffers patch, Tps is 69340 at 128 client count (very good performance
> > boost) which indicates that we should go ahead with 1) and 2) patches.
>
> Especially considering the line count... I do wonder about going crazy
> and increasing to 256 immediately. It otherwise seems likely that we'll
> have the the same issue in a year.  Could you perhaps run your test
> against that as well?
>

Unfortunately, it dipped to 65005 with 256 clog bufs.  So I think 128 is
appropriate number.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-03-31 Thread Magnus Hagander
On Thu, Mar 31, 2016 at 2:19 PM, Magnus Hagander 
wrote:

> On Thu, Mar 31, 2016 at 4:00 AM, David Steele  wrote:
>
>> On 3/30/16 4:18 AM, Magnus Hagander wrote:
>> >
>> > On Wed, Mar 30, 2016 at 4:10 AM, David Steele > > > wrote:
>> >
>> > This certainly looks like it would work but it raises the barrier
>> for
>> > implementing backups by quite a lot.  It's fine for backrest or
>> barman
>> > but it won't be pleasant for anyone who has home-grown scripts.
>> >
>> >
>> > How much does it really raise the bar, though?
>> >
>> > It would go from "copy all files and make damn sure you copy pg_control
>> > last, and rename it to pg_control.backup" to "take this binary blob you
>> > got from the server and write it to pg_control.backup"?
>> >
>> > Also, the target of these APIs is specifically the backup tools and not
>> > homewritten scripts.
>>
>> Then what would home-grown scripts use, the deprecated API that we know
>> has issues?
>>
>
> They should use either pg_basebackup/pg_receivexlog, or they should use a
> tool like backrest or barman.
>
>
>
>> > A simple shellscript will have trouble enough using
>> > it in the first place since it requires a persistent connection to the
>> > database.
>>
>> All that's required is to spawn a psql process.  I'm no shell expert but
>> that's simple enough.
>>
>
> Oh, it's certainly doable, but you also need to come back and talk to it
> at a later time (to call stop backup), and do something useful with a
> multiline output. None of that is particularly easy. Certainly doable, but
> it becomes the wrong tool for the job.
>
>
>
>> > But those scripts are likely broken anyway.
>>
>> Yes, probably.  Backup and especially archiving correctly are harder
>> than most people realize.
>>
>
> Exactly. Which is why we should discourage people from doing that, and
> instead point them to the tools that do the job right. This is the whole
> reason we're making this change in the first place.
>
>
>
>> > The main reason for Heikki to suggest this one over the other basic one
>> > is that it brings protection against the "backup script/program crashed
>> > halfway through but the user still tried to restore from that". They
>> will
>> > outright fail because there is no pg_control.backup in that case.
>>
>> But if we are going to make this complicated I'm not sure it's a big
>> deal just to require pg_control to be copied last.  pgBackRest already
>> does that and Barman probably does, too.
>>
>
> It does (I did doublecheck that at some point).
>
>
>
>> I don't see "don't copy pg_control" and "copy pg_control last" as all
>> that different in terms of complexity.
>>
>> pgBackRest also *restores* pg_control last which I think is pretty
>> important and would not be addressed by this solution.
>>
>
> So maybe we should at least start that way. And just document that
> clearly, and then use the patch that we have right now?
>
> Except we add so it still returns the stoppoint() as well (which is
> returned from the current pg_stop_backup, but not in the new one).
>

Eh, nevermind. We do already return the stoppoint. I was looking at the
wrong version of the patch. So basically that's current version of patch,
but with proper docs of course.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Updated backup APIs for non-exclusive backups

2016-03-31 Thread Magnus Hagander
On Thu, Mar 31, 2016 at 4:00 AM, David Steele  wrote:

> On 3/30/16 4:18 AM, Magnus Hagander wrote:
> >
> > On Wed, Mar 30, 2016 at 4:10 AM, David Steele  > > wrote:
> >
> > This certainly looks like it would work but it raises the barrier for
> > implementing backups by quite a lot.  It's fine for backrest or
> barman
> > but it won't be pleasant for anyone who has home-grown scripts.
> >
> >
> > How much does it really raise the bar, though?
> >
> > It would go from "copy all files and make damn sure you copy pg_control
> > last, and rename it to pg_control.backup" to "take this binary blob you
> > got from the server and write it to pg_control.backup"?
> >
> > Also, the target of these APIs is specifically the backup tools and not
> > homewritten scripts.
>
> Then what would home-grown scripts use, the deprecated API that we know
> has issues?
>

They should use either pg_basebackup/pg_receivexlog, or they should use a
tool like backrest or barman.



> > A simple shellscript will have trouble enough using
> > it in the first place since it requires a persistent connection to the
> > database.
>
> All that's required is to spawn a psql process.  I'm no shell expert but
> that's simple enough.
>

Oh, it's certainly doable, but you also need to come back and talk to it at
a later time (to call stop backup), and do something useful with a
multiline output. None of that is particularly easy. Certainly doable, but
it becomes the wrong tool for the job.



> > But those scripts are likely broken anyway.
>
> Yes, probably.  Backup and especially archiving correctly are harder
> than most people realize.
>

Exactly. Which is why we should discourage people from doing that, and
instead point them to the tools that do the job right. This is the whole
reason we're making this change in the first place.



> > The main reason for Heikki to suggest this one over the other basic one
> > is that it brings protection against the "backup script/program crashed
> > halfway through but the user still tried to restore from that". They will
> > outright fail because there is no pg_control.backup in that case.
>
> But if we are going to make this complicated I'm not sure it's a big
> deal just to require pg_control to be copied last.  pgBackRest already
> does that and Barman probably does, too.
>

It does (I did doublecheck that at some point).



> I don't see "don't copy pg_control" and "copy pg_control last" as all
> that different in terms of complexity.
>
> pgBackRest also *restores* pg_control last which I think is pretty
> important and would not be addressed by this solution.
>

So maybe we should at least start that way. And just document that clearly,
and then use the patch that we have right now?

Except we add so it still returns the stoppoint() as well (which is
returned from the current pg_stop_backup, but not in the new one).

We can always extend the function with more columns later if we need - it's
changing the ones we have that's a big problem there :)

Then we start this way and we can keep improving if necessary. But the
advantage of sticking to this for now is we don't have to touch the
recovery side at all. And we're pretty late in the cycle ATM.



> > If we
> > don't care about that, then we can go back to just saying "copy
> > pg_control last and we're done". But you yourself complained about that
> > requirement because it's too easy to get wrong (though you advocated
> > using backup_label to transfer the data over -- but that has the
> > potential for getting more complicated if we now or at any point in the
> > future want more than one field to transfer, for example).
>
> Perhaps.  I'm still not convinced that getting some information from
> backup_label and other information from pg_control is a good solution.
> I would rather write the recovery point into the backup_label and use
> that instead of the value in pg_control.  Text files are much easier to
> parse and push around accurately (and test).
>

What about all the *other* functionality that's then in pg_control?

We could do as Heikki at one point suggested in our chat as well which is
basically write all of pg_control out to the backup_label file, and
reconstruct it from that. But that makes that file a lot more complicated...

//Magnus


Re: [HACKERS] [PATH] Jsonb, insert a new value into an array at arbitrary position

2016-03-31 Thread Vitaly Burovoy
On 3/30/16, Dmitry Dolgov <9erthali...@gmail.com> wrote:
> On 31 March 2016 at 05:04, Vitaly Burovoy  wrote:
>> The documentation changes still has to be fixed.
> Thanks for help. Looks like I'm not so good at text formulation. Fixed.
Never mind. I'm also not so good at it. It is still should be
rewritten by a native English speaker, but it is better to give him
good source for it.

===
I'm almost ready to mark it as "Ready for committer".

Few notes again.
1.
> a current item was pushed to parse state after JB_PATH_INSERT_BEFORE

I paid attention that the function's name 'pushJsonbValue' looks as
working with stack (push/pop), but there is no function "pop*" neither
in jsonfuncs.c nor jsonb_util.c.
That's why I wrote "it seems the logic in the code is correct" - it is
logical to insert new value if "before", then current value, then new
value if "after".
But yes, following by executing path to the "pushState" function
anyone understands "JsonbParseState" is constructed as a stack, not a
queue. So additional comment is needed why "JB_PATH_INSERT_AFTER"
check is placed before inserting curent value and
JB_PATH_INSERT_BEFORE check.

The comment can be located just before "if (op_type &
JB_PATH_INSERT_AFTER)" (line 3986) and look similar to:

/*
 * JsonbParseState struct behaves as a stack -- see the "pushState" function,
 * so check for JB_PATH_INSERT_BEFORE/JB_PATH_INSERT_AFTER in a reverse order.
 */

2.
A comment for the function "setPath" is obsolete: "If newval is null"
check and "If create is true" name do not longer exist.
Since I answered so late last time I propose the next formulating to
avoid one (possible) round:

/*
 * Do most of the heavy work for jsonb_set/jsonb_insert
 *
 * If JB_PATH_DELETE bit is set in op_type, the element is to be removed.
 *
 * If any bit mentioned in JB_PATH_CREATE_OR_INSERT is set in op_type,
 * we create the new value if the key or array index does not exist.
 *
 * Bits JB_PATH_INSERT_BEFORE and JB_PATH_INSERT_AFTER in op_type
 * behave as JB_PATH_CREATE if new value is inserted in JsonbObject.
 *
 * All path elements before the last must already exist
 * whatever bits in op_type are set, or nothing is done.
 */

===
I hope that notes are final (additional check in formulating is appreciated).

P.S.: if it is not hard for you, please rebase the patch to the
current master to avoid offset in func.sgml.

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-31 Thread Amit Kapila
On Thu, Mar 31, 2016 at 4:28 PM, Robert Haas  wrote:

> On Thu, Mar 31, 2016 at 12:59 AM, Dilip Kumar 
> wrote:
> > On Tue, Mar 29, 2016 at 10:08 AM, Amit Kapila 
> > wrote:
> >> Yes, that makes sense.  One more point is that if the reason for v13
> >> giving better performance is extra blocks (which we believe in certain
> cases
> >> can leak till the time Vacuum updates the FSM tree), do you think it
> makes
> >> sense to once test by increasing lockWaiters * 20 limit to may be
> >> lockWaiters * 25 or lockWaiters * 30.
> >
> >
> > I tested COPY 1 record, by increasing the number of blocks just to
> find
> > out why we are not as good as V13
> >  with extraBlocks = Min( lockWaiters * 40, 2048) and got below results..
> >
> > COPY 1
> > 
> > Client  Patch(extraBlocks = Min( lockWaiters * 40, 2048))
> >    -
> > 16 752
> > 32 708
> >
> > This proves that main reason of v13 being better is its adding extra
> blocks
> > without control.
> > though v13 is better than these results, I think we can get that also by
> > changing multiplier and max limit .
> >
> > But I think we are ok with the max size as 4MB (512 blocks) right?
>
> Yeah, kind of.  But obviously if we could make the limit smaller
> without hurting performance, that would be better.
>
> Per my note yesterday about performance degradation with parallel
> COPY, I wasn't able to demonstrate that this patch gives a consistent
> performance benefit on hydra - the best result I got was speeding up a
> 9.5 minute load to 8 minutes where linear scalability would have been
> 2 minutes.


Is this test for unlogged tables?  As far as I understand this patch will
show benefit if Data and WAL are on separate disks or if you test them with
unlogged tables, otherwise the WAL contention supersedes the benefit of
this patch.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Andres Freund
On 2016-03-31 06:54:02 -0400, Robert Haas wrote:
> On Wed, Mar 30, 2016 at 3:16 AM, Andres Freund  wrote:
> > Yea, as Tom pointed out that's not going to work.  I'll try to write a
> > patch for approach 1).
> 
> Does this mean that any platform that wants to perform well will now
> need a sub-4-byte spinlock implementation?  That's has a somewhat
> uncomfortable sound to it.

Oh. I confused my approaches. I was thinking about going for 2):

> 2) Replace the lwlock spinlock by a bit in LWLock->state. That'd avoid
>embedding the spinlock, and actually might allow to avoid one atomic
>op in a number of cases.

precisely because of that concern.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-31 Thread Robert Haas
On Thu, Mar 31, 2016 at 12:59 AM, Dilip Kumar  wrote:
> On Tue, Mar 29, 2016 at 10:08 AM, Amit Kapila 
> wrote:
>> Yes, that makes sense.  One more point is that if the reason for v13
>> giving better performance is extra blocks (which we believe in certain cases
>> can leak till the time Vacuum updates the FSM tree), do you think it makes
>> sense to once test by increasing lockWaiters * 20 limit to may be
>> lockWaiters * 25 or lockWaiters * 30.
>
>
> I tested COPY 1 record, by increasing the number of blocks just to find
> out why we are not as good as V13
>  with extraBlocks = Min( lockWaiters * 40, 2048) and got below results..
>
> COPY 1
> 
> Client  Patch(extraBlocks = Min( lockWaiters * 40, 2048))
>    -
> 16 752
> 32 708
>
> This proves that main reason of v13 being better is its adding extra blocks
> without control.
> though v13 is better than these results, I think we can get that also by
> changing multiplier and max limit .
>
> But I think we are ok with the max size as 4MB (512 blocks) right?

Yeah, kind of.  But obviously if we could make the limit smaller
without hurting performance, that would be better.

Per my note yesterday about performance degradation with parallel
COPY, I wasn't able to demonstrate that this patch gives a consistent
performance benefit on hydra - the best result I got was speeding up a
9.5 minute load to 8 minutes where linear scalability would have been
2 minutes.  And I found cases where it was actually slower with the
patch.  Now maybe hydra is just a crap machine, but I'm feeling
nervous.

What machines did you use to test this?  Have you tested really large
data loads, like many MB or even GB of data?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-31 Thread Robert Haas
On Wed, Mar 30, 2016 at 3:16 AM, Andres Freund  wrote:
> On 2016-03-30 07:13:16 +0530, Dilip Kumar wrote:
>> On Tue, Mar 29, 2016 at 10:43 PM, Andres Freund  wrote:
>>
>> > My gut feeling is that we should do both 1) and 2).
>> >
>> > Dilip, could you test performance of reducing ppc's spinlock to 1 byte?
>> > Cross-compiling suggest that doing so "just works".  I.e. replace the
>> > #if defined(__ppc__) typedef from an int to a char.
>> >
>>
>> I set that, but after that it hangs, even Initdb hangs..
>
> Yea, as Tom pointed out that's not going to work.  I'll try to write a
> patch for approach 1).

Does this mean that any platform that wants to perform well will now
need a sub-4-byte spinlock implementation?  That's has a somewhat
uncomfortable sound to it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit 6150a1b0

2016-03-31 Thread Robert Haas
On Thu, Mar 31, 2016 at 6:45 AM, Andres Freund  wrote:
> On 2016-03-31 06:43:19 -0400, Robert Haas wrote:
>> On Thu, Mar 31, 2016 at 3:51 AM, Andres Freund  wrote:
>> >>My attribution above was incorrect.  Robert Haas is the committer and
>> >>owner of
>> >>this one.  I apologize.
>> >
>> > Fine in this case I guess. I've posted a proposal nearby either way, it 
>> > appears to be a !x86 problem.
>>
>> To which proposal are you referring?
>
> 1) in 
> http://www.postgresql.org/message-id/20160328130904.4mhugvkf4f3wg...@awork2.anarazel.de

OK.  So, Noah, my proposed strategy is to wait and see if Andres can
make that work, and if not, then revisit the issue of what to do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit 6150a1b0

2016-03-31 Thread Andres Freund
On 2016-03-31 06:43:19 -0400, Robert Haas wrote:
> On Thu, Mar 31, 2016 at 3:51 AM, Andres Freund  wrote:
> >>My attribution above was incorrect.  Robert Haas is the committer and
> >>owner of
> >>this one.  I apologize.
> >
> > Fine in this case I guess. I've posted a proposal nearby either way, it 
> > appears to be a !x86 problem.
> 
> To which proposal are you referring?

1) in 
http://www.postgresql.org/message-id/20160328130904.4mhugvkf4f3wg...@awork2.anarazel.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance degradation in commit 6150a1b0

2016-03-31 Thread Robert Haas
On Thu, Mar 31, 2016 at 3:51 AM, Andres Freund  wrote:
>>My attribution above was incorrect.  Robert Haas is the committer and
>>owner of
>>this one.  I apologize.
>
> Fine in this case I guess. I've posted a proposal nearby either way, it 
> appears to be a !x86 problem.

To which proposal are you referring?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-03-31 Thread Andres Freund
On 2016-03-31 13:30:58 +0300, Yury Zhuravlev wrote:
> Craig Ringer wrote:
> >Yeah, you're right. He's not the only one either.
> >
> >I was reacting to the original post, and TBH didn't think it through. The
> >commit logs suggest there's a decent amount of work that goes in, and I'm
> >sure a lot of it isn't visible when just looking for 'windows', 'win32',
> >'msvc', etc.
> >
> >Even the build system affects people who don't use it, if they're adding
> >features. I recently backported a bunch of 9.3 functionality to 9.1, and
> >in the process simply stripped out all the Windows build system changes as
> >"meh, too hard, don't care".
> >
> >So yeah. I casually handwaved away a lot of work that's not highly
> >visible, but still happens and is important, and was wrong to do so. I've
> >done a bit on Windows myself but didn't fully recognise the burden support
> >for it places on patches to core infrastructure and on committers.
> 
> Maybe someone in the community to appoint a chief for Windows?

If somebody wanted to be that they'd just have to start doing it. It's
not like windows problems are an infrequent occurrance.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-03-31 Thread Yury Zhuravlev

Craig Ringer wrote:

Yeah, you're right. He's not the only one either.

I was reacting to the original post, and TBH didn't think it 
through. The commit logs suggest there's a decent amount of work 
that goes in, and I'm sure a lot of it isn't visible when just 
looking for 'windows', 'win32', 'msvc', etc.


Even the build system affects people who don't use it, if 
they're adding features. I recently backported a bunch of 9.3 
functionality to 9.1, and in the process simply stripped out all 
the Windows build system changes as "meh, too hard, don't care".


So yeah. I casually handwaved away a lot of work that's not 
highly visible, but still happens and is important, and was 
wrong to do so. I've done a bit on Windows myself but didn't 
fully recognise the burden support for it places on patches to 
core infrastructure and on committers.


Maybe someone in the community to appoint a chief for Windows?

--
Yury Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] So, can we stop supporting Windows native now?

2016-03-31 Thread Craig Ringer
On 31 March 2016 at 16:20, Andres Freund  wrote:

> On 2016-03-31 09:04:35 +0800, Craig Ringer wrote:
> > The cost is small.
>
> First off I agree we don't want to drop proper windows support.
>
> But I think "the cost is small" is a pretty bad mischaracterization. I
> don't do windows, and yet I've spent a lot of time figuring out windows
> only stuff, even writing windows only things (atomics, latches, recent
> bugfixes). There's a lot of architectural gunk in postgres just geared
> towards supporting windows (c.f. EXEC_BACKEND), and that makes new
> development harder in a number of cases. E.g. background workers,
> paralellism and such had quite some extra work cut out for them because
> of that.
>

Fair point. It's not just about fixing windows or needing Windows-specific
*features*, it's about making it harder to develop things because of
Windows. I've seen that myself.



> > but is a burden carried mainly by those who care about Windows
> > support.
>
> I don't think that's true. Tom e.g. seems to fight battles with it on a
> regular base.
>

Yeah, you're right. He's not the only one either.

I was reacting to the original post, and TBH didn't think it through. The
commit logs suggest there's a decent amount of work that goes in, and I'm
sure a lot of it isn't visible when just looking for 'windows', 'win32',
'msvc', etc.

Even the build system affects people who don't use it, if they're adding
features. I recently backported a bunch of 9.3 functionality to 9.1, and in
the process simply stripped out all the Windows build system changes as
"meh, too hard, don't care".

So yeah. I casually handwaved away a lot of work that's not highly visible,
but still happens and is important, and was wrong to do so. I've done a bit
on Windows myself but didn't fully recognise the burden support for it
places on patches to core infrastructure and on committers.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-31 Thread Andres Freund
On 2016-03-31 15:07:22 +0530, Amit Kapila wrote:
> On Thu, Mar 31, 2016 at 4:39 AM, Andres Freund  wrote:
> >
> > On 2016-03-28 22:50:49 +0530, Amit Kapila wrote:
> > > On Fri, Sep 11, 2015 at 8:01 PM, Amit Kapila 
> > > wrote:
> > > >
> >
> > Amit, could you run benchmarks on your bigger hardware? Both with
> > USE_CONTENT_LOCK commented out and in?
> >
> 
> Yes.

Cool.


> > I think we should go for 1) and 2) unconditionally.

> Yes, that makes sense.  On 20 min read-write pgbench --unlogged-tables
> benchmark, I see that with HEAD Tps is 36241 and with increase the clog
> buffers patch, Tps is 69340 at 128 client count (very good performance
> boost) which indicates that we should go ahead with 1) and 2) patches.

Especially considering the line count... I do wonder about going crazy
and increasing to 256 immediately. It otherwise seems likely that we'll
have the the same issue in a year.  Could you perhaps run your test
against that as well?


> I think we should change comments on top of this function.

Yes, definitely.


> 0001-Improve-64bit-atomics-support
> 
> +#if 0
> +#ifndef PG_HAVE_ATOMIC_READ_U64
> +#define PG_HAVE_ATOMIC_READ_U64
> +static inline uint64
> 
> What the purpose of above #if 0?  Other than that patch looks good to me.

I think I was investigating something. Other than that obviously there's
no point. Sorry for that.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small patch: --disable-setproctitle flag

2016-03-31 Thread Andres Freund
On 2016-03-31 13:06:05 +0300, Aleksander Alekseev wrote:
> Hello
> 
> Recently I discovered that renaming processes using setproctitle() call
> on BSD systems may sometimes cause problems. For instance there is
> currently a bug in all versions of LLDB which makes it impossible to
> debug a process that called setproctitle():
> 
> https://llvm.org/bugs/show_bug.cgi?id=26924#c3
> 
> Since LLVM stack is used by default in FreeBSD I believe it's quite a
> severe problem. In case there is other software that doesn't handle
> stproctitle() well and for users of LLDB <= 3.8 (most recent version)
> I propose to add a --disable-setproctitle flag to configure script.
> Corresponding patch is attached.

Seems more appropriate to simply manually add a #undef HAVE_SETPROCTITLE
to pg_config_manual.h in that case. Adding configure flags for ephemeral
debugger issues seems like a high churn activity.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Small patch: --disable-setproctitle flag

2016-03-31 Thread Aleksander Alekseev
Hello

Recently I discovered that renaming processes using setproctitle() call
on BSD systems may sometimes cause problems. For instance there is
currently a bug in all versions of LLDB which makes it impossible to
debug a process that called setproctitle():

https://llvm.org/bugs/show_bug.cgi?id=26924#c3

Since LLVM stack is used by default in FreeBSD I believe it's quite a
severe problem. In case there is other software that doesn't handle
stproctitle() well and for users of LLDB <= 3.8 (most recent version)
I propose to add a --disable-setproctitle flag to configure script.
Corresponding patch is attached.

-- 
Best regards,
Aleksander Alekseev
http://eax.me/
diff --git a/configure b/configure
index 24655dc..87e845e 100755
--- a/configure
+++ b/configure
@@ -819,6 +819,7 @@ with_wal_segsize
 with_CC
 enable_depend
 enable_cassert
+enable_setproctitle
 enable_thread_safety
 with_tcl
 with_tclconfig
@@ -1485,6 +1486,8 @@ Optional Features:
   --enable-tap-tests  enable TAP tests (requires Perl and IPC::Run)
   --enable-depend turn on automatic dependency tracking
   --enable-cassertenable assertion checks (for debugging)
+  --disable-setproctitle  do not use setproctitle() procedure for changing
+  process name
   --disable-thread-safety disable thread-safety in client libraries
   --disable-largefile omit support for large files
   --disable-float4-byval  disable float4 passed by value
@@ -5283,6 +5286,47 @@ done
 IFS=$ac_save_IFS
 
 
+#
+# setproctitle()
+#
+
+
+# Check whether --enable-setproctitle was given.
+if test "${enable_setproctitle+set}" = set; then :
+  enableval=$enable_setproctitle;
+  case $enableval in
+yes)
+  :
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --enable-setproctitle option" "$LINENO" 5
+  ;;
+  esac
+
+else
+  enable_setproctitle=yes
+
+fi
+
+
+# calling setproctitle() makes it impossible to connect to PostgreSQL backend
+# using LLDB <= 3.8, see https://llvm.org/bugs/show_bug.cgi?id=26924#c3
+if test "$enable_setproctitle" = yes; then
+  for ac_func in setproctitle
+do :
+  ac_fn_c_check_func "$LINENO" "setproctitle" "ac_cv_func_setproctitle"
+if test "x$ac_cv_func_setproctitle" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_SETPROCTITLE 1
+_ACEOF
+
+fi
+done
+
+fi
 
 #
 # Library directories
@@ -12425,7 +12468,7 @@ fi
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-for ac_func in cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l
+for ac_func in cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.in b/configure.in
index c564a76..40daa32 100644
--- a/configure.in
+++ b/configure.in
@@ -580,6 +579,16 @@ done
 IFS=$ac_save_IFS
 AC_SUBST(INCLUDES)
 
+#
+# setproctitle()
+#
+PGAC_ARG_BOOL(enable, setproctitle, yes,
+  [do not use setproctitle() procedure for changing process name])
+# calling setproctitle() makes it impossible to connect to PostgreSQL backend
+# using LLDB <= 3.8, see https://llvm.org/bugs/show_bug.cgi?id=26924#c3
+if test "$enable_setproctitle" = yes; then
+  AC_CHECK_FUNCS([setproctitle])
+fi
 
 #
 # Library directories
@@ -1432,7 +1440,7 @@ PGAC_FUNC_WCSTOMBS_L
 LIBS_including_readline="$LIBS"
 LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
 
-AC_CHECK_FUNCS([cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setproctitle setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l])
+AC_CHECK_FUNCS([cbrt dlopen fdatasync getifaddrs getpeerucred getrlimit mbstowcs_l memmove poll pstat pthread_is_threaded_np readlink setsid shm_open symlink sync_file_range towlower utime utimes wcstombs wcstombs_l])
 
 AC_REPLACE_FUNCS(fseeko)
 case $host_os in

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-03-31 Thread Amit Kapila
On Thu, Mar 31, 2016 at 4:39 AM, Andres Freund  wrote:
>
> On 2016-03-28 22:50:49 +0530, Amit Kapila wrote:
> > On Fri, Sep 11, 2015 at 8:01 PM, Amit Kapila 
> > wrote:
> > >
>
> Amit, could you run benchmarks on your bigger hardware? Both with
> USE_CONTENT_LOCK commented out and in?
>

Yes.

> I think we should go for 1) and 2) unconditionally.


Yes, that makes sense.  On 20 min read-write pgbench --unlogged-tables
benchmark, I see that with HEAD Tps is 36241 and with increase the clog
buffers patch, Tps is 69340 at 128 client count (very good performance
boost) which indicates that we should go ahead with 1) and 2) patches.

0002-Increase-max-number-of-buffers-in-clog-SLRU-to-128

Size

 CLOGShmemBuffers(void)

 {

- return Min(32, Max(4, NBuffers / 512));

+ return Min(128, Max(4, NBuffers / 512));

 }


I think we should change comments on top of this function.  I have changed
the comments as per my previous patch and attached the modified patch with
this mail, see if that makes sense.


0001-Improve-64bit-atomics-support

+#if 0
+#ifndef PG_HAVE_ATOMIC_READ_U64
+#define PG_HAVE_ATOMIC_READ_U64
+static inline uint64

What the purpose of above #if 0?  Other than that patch looks good to me.


> And then evaluate
> whether to go with your, or 3) from above. If the latter, we've to do
> some cleanup :)
>

Yes, that makes sense to me, so lets go with 1) and 2) first.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


increase_clog_bufs_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >