Re: [HACKERS] Microvacuum support for Hash Index

2017-03-16 Thread Ashutosh Sharma
On Fri, Mar 17, 2017 at 8:20 AM, Amit Kapila  wrote:
> On Thu, Mar 16, 2017 at 9:39 PM, Ashutosh Sharma  
> wrote:

>>>
>>> Don't you think, we should also clear it during the replay of
>>> XLOG_HASH_DELETE?  We might want to log the clear of flag along with
>>> WAL record for XLOG_HASH_DELETE.
>>>
>>
>> Yes, it should be cleared. I completely missed this part in a hurry.
>> Thanks for informing. I have taken care of it in the attached v2
>> patch.
>>
>
> + /*
> + * Mark the page as not containing any LP_DEAD items. See comments
> + * in hashbucketcleanup() for details.
> + */
> + pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
> + pageopaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;
>
> Your comment here says, refer hashbucketcleanup and in that function,
> the comment says "Clearing this flag is just a hint; replay won't redo
> this.".  Both seems contradictory.  You need to change the comment in
> hashbucketcleanup.

Done. Please check the attached v3 patch.

As I said in my previous e-mail, I think you need
> to record clearing of this flag in WAL record XLOG_HASH_DELETE as you
> are not doing this unconditionally and then during replay clear it
> only when the WAL record indicates the same.

Thank you so much for putting that point. I too think that we should
record the flag status in the WAL record and clear it only when
required during replay.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


0001-Reset-LH_PAGE_HAS_DEAD_TUPLES-flag-during-replay-v3.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-16 Thread Amit Kapila
On Sun, Mar 12, 2017 at 8:11 AM, Robert Haas  wrote:
> On Fri, Mar 10, 2017 at 7:39 PM, Amit Kapila  wrote:
>> I agree that more analysis can help us to decide if we can use subxids
>> from PGPROC and if so under what conditions.  Have you considered the
>> another patch I have posted to fix the issue which is to do this
>> optimization only when subxids are not present?  In that patch, it
>> will remove the dependency of relying on subxids in PGPROC.
>
> Well, that's an option, but it narrows the scope of the optimization
> quite a bit.  I think Simon previously opposed handling only the
> no-subxid cases (although I may be misremembering) and I'm not that
> keen about it either.
>
> I was wondering about doing an explicit test: if the XID being
> committed matches the one in the PGPROC, and nsubxids matches, and the
> actual list of XIDs matches, then apply the optimization.  That could
> replace the logic that you've proposed to exclude non-commit cases,
> gxact cases, etc. and it seems fundamentally safer.  But it might be a
> more expensive test, too, so I'm not sure.
>

I think if the number of subxids is very small let us say under 5 or
so, then such a check might not matter, but otherwise it could be
expensive.

> It would be nice to get some other opinions on how (and whether) to
> proceed with this.  I'm feeling really nervous about this right at the
> moment, because it seems like everybody including me missed some
> fairly critical points relating to the safety (or lack thereof) of
> this patch, and I want to make sure that if it gets committed again,
> we've really got everything nailed down tight.
>

I think the basic thing that is missing in the last patch was that we
can't apply this optimization during WAL replay as during
recovery/hotstandby the xids/subxids are tracked in KnownAssignedXids.
The same is mentioned in header file comments in procarray.c and in
GetSnapshotData (look at an else loop of the check if
(!snapshot->takenDuringRecovery)).  As far as I can see, the patch has
considered that in the initial versions but then the check got dropped
in one of the later revisions by mistake. The patch version-5 [1] has
the check for recovery, but during some code rearrangement, it got
dropped in version-6 [2].  Having said that, I think the improvement
in case there are subtransactions will be lesser because having
subtransactions means more work under LWLock and that will have lesser
context switches.  This optimization is all about the reduction in
frequent context switches, so I think even if we don't optimize the
case for subtransactions we are not leaving much on the table and it
will make this optimization much safe.  To substantiate this theory
with data, see the difference in performance when subtransactions are
used [3] and when they are not used [4].

So we have four ways to proceed:
1. Have this optimization for subtransactions and make it safe by
having some additional conditions like check for recovery, explicit
check for if the actual transaction ids match with ids stored in proc.
2. Have this optimization when there are no subtransactions. In this
case, we can have a very simple check for this optimization.
3. Drop this patch and idea.
4. Consider it for next version.

I personally think second way is okay for this release as that looks
safe and gets us the maximum benefit we can achieve by this
optimization and then consider adding optimization for subtransactions
(first way) in the future version if we think it is safe and gives us
the benefit.

Thoughts?

[1] - 
https://www.postgresql.org/message-id/CAA4eK1KUVPxBcGTdOuKyvf5p1sQ0HeUbSMbTxtQc%3DP65OxiZog%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAA4eK1L4iV-2qe7AyMVsb%2Bnz7SiX8JvCO%2BCqhXwaiXgm3CaBUw%40mail.gmail.com
[3] - 
https://www.postgresql.org/message-id/CAFiTN-u3%3DXUi7z8dTOgxZ98E7gL1tzL%3Dq9Yd%3DCwWCtTtS6pOZw%40mail.gmail.com
[4] - 
https://www.postgresql.org/message-id/CAFiTN-u-XEzhd%3DhNGW586fmQwdTy6Qy6_SXe09tNB%3DgBcVzZ_A%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-16 Thread Fabien COELHO


Hello Corey & Tom,


What is not done:
- skipped slash commands still consume the rest of the line

That last part is big, to quote Tom:

* More generally, I do not think that the approach of having exec_command
simply fall out immediately when in a false branch is going to work,
because it ignores the fact that different backslash commands have
different argument parsing rules.  Some will eat the rest of the line and
some won't.  I'm afraid that it might be necessary to remove that code
block and add a test to every single backslash command that decides
whether to actually perform its action after it's consumed its arguments.
That would be tedious :-(.  But as it stands, backslash commands will get
parsed differently (ie with potentially-different ending points) depending
on whether they're in a live branch or not, and that seems just way too
error-prone to be allowed to stand.


ISTM that I've tried to suggest to work around that complexity by:
 - document that \if-related commands should only occur at line start
   (and extend to eol).
 - detect and complain when this is not the case.
 - if some border cases are not detected, call it a feature.

ISTM that Tom did not respond to this possibly simpler approach... Maybe a 
"no" would be enough before starting heavy work which would touch all 
other commands...


Tom?

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

2017-03-16 Thread Kyotaro HORIGUCHI
At Fri, 10 Mar 2017 12:15:52 +0300, Nikita Glukhov  
wrote in <48f6934b-b994-4aa2-b6ad-aaa4f5a12...@postgrespro.ru>
> On 10.03.2017 02:13, Tels wrote:
> 
> > I can't comment on the code, but the grammar on the comments caught my
> > eye:
> >> +/* Can any range from range_box does not extend higher than this
> >> argument? */
> >> +static bool
> >> +overLower2D(RangeBox *range_box, Range *query)
> >> +{
> >> +  return FPle(range_box->left.low, query->high) &&
> >> +  FPle(range_box->right.low, query->high);
> >> +}
> > The sentence sounds quite garbled in English. I'm not entirely sure
> > what
> > it should be, but given the comment below "/* Can any range from
> > range_box
> > to be higher than this argument? */" maybe something like:
> >
> > /* Does any range from range_box extend to the right side of the
> > query? */
> >
> > If used, an analog wording should be used for overHigher2D's comment
> > like:
> >
> > /* Does any range from range_box extend to the left side of the query?
> > */
> 
> I've changed comments as you proposed, but I've added missing "not"
> and left "Can":
> 
> /* Can any range from range_box not extend to the right side of the
> query? */
> /* Can any range from range_box not extend to the left side of the
> query? */
> 
> See also comments on overhigher and overlower operators from
> documentation:
> 
> & &>Does not extend to the left of?
> 
> > Another question: Does it make sense to add the "minimal bad example
> > for
> > the '&<' case" as test case, too? After all, it should pass the test
> > after
> > the patch.
> 
> Yes, it will make sense, but the Kyotaro's test case doesn't work for
> me and
> I still don't know how to force SP-GiST to create inner leaves without
> inserting hundreds of rows.

I admit that the case is quite unstable, or environment-dependent
and the minimal bad case no longer replays even for me. On the
other hand, inserting some hundreds of boxes makes it more stable
than only one box. I'm not sure that it is enough but it seems to
be the best we can. As I mentioned previously, box cannot be used
as the key for outer join so comparing one-by-one is out of hand
of regression test.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-16 Thread Beena Emerson
Hello,

Thank you for your comments, I will post an updated patch soon.

On Fri, Mar 17, 2017 at 6:40 AM, Robert Haas  wrote:
>
>
> +assign_wal_segment_size(int newval, void *extra)
>
> Why does a PGC_INTERNAL GUC need an assign hook?  I think the GUC
> should only be there to expose the value; it shouldn't have
> calculation logic associated with it.
>

The Checkpoint Segments and the UsableBytesInSegment had to be changed
depending on the value of  wal_segment_size set during initdb. I will
figure out another way to assign these values without using this
assign_hook.


> +   wal_segment_size = atoi(str_wal_segment_size);
>
> So, you're comfortable interpreting --wal-segsize=1TB or
> --wal-segsize=1GB as 1?  Implicitly, 1MB?
>

The option was intended to only accept values in MB as the original  config
--with-wal-segsize option, unfortunately, the patch does not throw error as
in the config option when the units are specified.

Error with config option --with-wal-segsize=1MB
configure: error: Invalid WAL segment size. Allowed values are
1,2,4,8,16,32,64.

Should we imitate this behaviour and just add a check to see if it only
contains numbers? or would it be better to allow the use of the units and
make appropriate code changes?

-- 
Thank you,

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


[HACKERS] Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together.

2017-03-16 Thread Fabien COELHO


Hello David,


Repost from bugs.


This patch does not apply at cccbdde:


Indeed. It should not. The fix is for the 9.6 branch. The issue has been 
fixed by some heavy but very welcome restructuring in master.



Marked as "Waiting for Author".


I put it back to "Needs review".

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] <> join selectivity estimate question

2017-03-16 Thread Thomas Munro
Hi hackers,

While studying a regression reported[1] against my parallel hash join
patch, I noticed that we can also reach a good and a bad plan in
unpatched master.  One of the causes seems to be the estimated
selectivity of a semi-join with an extra <> filter qual.

Here are some times I measured for TPCH Q21 at scale 10 and work_mem
of 1GB.  That is a query with a large anti-join and a large semi-join.

  8 workers = 8.3s
  7 workers = 8.2s
  6 workers = 8.5s
  5 workers = 8.9s
  4 workers = 9.5s
  3 workers = 39.7s
  2 workers = 36.9s
  1 worker  = 38.2s
  0 workers = 47.9s

Please see the attached query plans showing the change in plan from
Hash Semi Join to Nested Loop Semi Join that happens only once we
reach 4 workers and the (partial) base relation size becomes smaller.
The interesting thing is that row estimate for the semi-join and
anti-join come out as 1 (I think this is 0 clamped to 1).

The same thing can be seen with a simple semi-join, if you happen to
have TPCH loaded.  Compare these two queries:

 SELECT *
   FROM lineitem l1
  WHERE EXISTS (SELECT *
  FROM lineitem l2
 WHERE l1.l_orderkey = l2.l_orderkey);

 -> estimates 59986012 rows, actual rows 59,986,052 (scale 10 TPCH)

 SELECT *
   FROM lineitem l1
  WHERE EXISTS (SELECT *
  FROM lineitem l2
 WHERE l1.l_orderkey = l2.l_orderkey
   AND l1.l_suppkey <> l2.l_suppkey);

 -> estimates 1 row, actual rows 57,842,090 (scale 10 TPCH)

Or for a standalone example:

  CREATE TABLE foo AS
  SELECT (generate_series(1, 100) / 4)::int AS a,
 (generate_series(1, 100) % 100)::int AS b;

  ANALYZE foo;

  SELECT *
FROM foo f1
   WHERE EXISTS (SELECT *
   FROM foo f2
  WHERE f1.a = f2.a);

 -> estimates 1,000,000 rows

  SELECT *
FROM foo f1
   WHERE EXISTS (SELECT *
   FROM foo f2
  WHERE f1.a = f2.a
AND f1.b <> f2.b);

 -> estimates 1 row

I'm trying to wrap my brain around the selectivity code, but am too
green to grok how this part of the planner that I haven't previously
focused on works so far, and I'd like to understand whether this is
expected behaviour so that I can figure out how to tackle the reported
regression with my patch.  What is happening here?

Thanks for reading.

[1] 
https://www.postgresql.org/message-id/CAEepm%3D3Og-7-b3WOkiT%3Dc%2B6y3eZ0VVSyb1K%2BSOvF17BO5KAt0A%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

   QUERY PLAN   
 
-
 Limit  (cost=4642943.81..4642943.82 rows=1 width=34) (actual 
time=39806.068..39806.080 rows=100 loops=1)
   ->  Sort  (cost=4642943.81..4642943.82 rows=1 width=34) (actual 
time=39806.064..39806.070 rows=100 loops=1)
 Sort Key: (count(*)) DESC, supplier.s_name
 Sort Method: top-N heapsort  Memory: 38kB
 ->  GroupAggregate  (cost=4642943.78..4642943.80 rows=1 width=34) 
(actual time=39796.916..39804.615 rows=3945 loops=1)
   Group Key: supplier.s_name
   ->  Sort  (cost=4642943.78..4642943.79 rows=1 width=26) (actual 
time=39796.906..39800.147 rows=38905 loops=1)
 Sort Key: supplier.s_name
 Sort Method: quicksort  Memory: 4576kB
 ->  Nested Loop Anti Join  (cost=2861152.85..4642943.77 
rows=1 width=26) (actual time=19851.808..39565.632 rows=38905 loops=1)
   ->  Nested Loop  (cost=2861152.29..4642900.65 rows=1 
width=42) (actual time=19850.550..35747.936 rows=696628 loops=1)
 ->  Gather  (cost=2861151.85..4642893.24 
rows=1 width=50) (actual time=19850.323..29654.121 rows=1441593 loops=1)
   Workers Planned: 3
   Workers Launched: 3
   ->  Hash Semi Join  
(cost=2860151.85..4641893.14 rows=1 width=50) (actual time=21036.398..30323.561 
rows=360398 loops=4)
 Hash Cond: (l1.l_orderkey = 
l2.l_orderkey)
 Join Filter: (l2.l_suppkey <> 
l1.l_suppkey)
 Rows Removed by Join Filter: 93610
 ->  Hash Join  
(cost=2585.58..1486212.61 rows=258004 width=42) (actual time=17.681..3985.606 
rows=373726 loops=4)
   Hash Cond: (l1.l_suppkey = 
supplier.s_suppkey)
   ->  Parallel Seq Scan on 
lineitem l1  (cos

Re: [HACKERS] Protect syscache from bloating with negative cache entries

2017-03-16 Thread Kyotaro HORIGUCHI
At Tue, 7 Mar 2017 19:23:14 -0800, David Steele  wrote in 
<3b7b7f90-db46-8c37-c4f7-443330c3a...@pgmasters.net>
> On 3/3/17 4:54 PM, David Steele wrote:
> 
> > On 2/1/17 1:25 AM, Kyotaro HORIGUCHI wrote:
> >> Hello, thank you for moving this to the next CF.
> >>
> >> At Wed, 1 Feb 2017 13:09:51 +0900, Michael Paquier
> >>  wrote in
> >> 
> >>> On Tue, Jan 24, 2017 at 4:58 PM, Kyotaro HORIGUCHI
> >>>  wrote:
>  Six new syscaches in 665d1fa was conflicted and 3-way merge
>  worked correctly. The new syscaches don't seem to be targets of
>  this patch.
> >>> To be honest, I am not completely sure what to think about this patch.
> >>> Moved to next CF as there is a new version, and no new reviews to make
> >>> the discussion perhaps move on.
> >> I'm thinking the following is the status of this topic.
> >>
> >> - The patch stll is not getting conflicted.
> >>
> >> - This is not a hollistic measure for memory leak but surely
> >>saves some existing cases.
> >>
> >> - Shared catcache is another discussion (and won't really
> >>proposed in a short time due to the issue on locking.)
> >>
> >> - As I mentioned, a patch that caps the number of negative
> >>entries is avaiable (in first-created - first-delete manner)
> >>but it is having a loose end of how to determine the
> >>limitation.
> > While preventing bloat in the syscache is a worthwhile goal, it
> > appears
> > there are a number of loose ends here and a new patch has not been
> > provided.
> >
> > It's a pretty major change so I recommend moving this patch to the
> > 2017-07 CF.
> 
> Not hearing any opinions pro or con, I'm moving this patch to the
> 2017-07 CF.

Ah. Yes, I agree on this. Thanks.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Radix tree for character conversion

2017-03-16 Thread Kyotaro HORIGUCHI
Thank you for committing this.

At Mon, 13 Mar 2017 21:07:39 +0200, Heikki Linnakangas  wrote 
in 
> On 03/13/2017 08:53 PM, Tom Lane wrote:
> > Heikki Linnakangas  writes:
> >> It would be nice to run the map_checker tool one more time, though, to
> >> verify that the mappings match those from PostgreSQL 9.6.
> >
> > +1
> >
> >> Just to be sure, and after that the map checker can go to the dustbin.
> >
> > Hm, maybe we should keep it around for the next time somebody has a
> > bright
> > idea in this area?
> 
> The map checker compares old-style maps with the new radix maps. The
> next time 'round, we'll need something that compares the radix maps
> with the next great thing. Not sure how easy it would be to adapt.
> 
> Hmm. A somewhat different approach might be more suitable for testing
> across versions, though. We could modify the perl scripts slightly to
> print out SQL statements that exercise every mapping. For every
> supported conversion, the SQL script could:
> 
> 1. create a database in the source encoding.
> 2. set client_encoding=''
> 3. SELECT a string that contains every character in the source
> encoding.

There are many encodings that can be client-encoding but cannot
be database-encoding. And some encodings such as UTF-8 has
several one-way conversion. If we do something like this, it
would be as the following.

1. Encoding test
1-1. create a database in UTF-8
1-2. set client_encoding=''
1-3. INSERT all characters defined in the source encoding.
1-4. set client_encoding='UTF-8'
1-5. SELECT a string that contains every character in UTF-8.
2. Decoding test

 sucks!


I would like to use convert() function. It can be a large
PL/PgSQL function or a series of "SELECT convert(...)"s. The
latter is doable on-the-fly (by not generating/storing the whole
script).

| -- Test for SJIS->UTF-8 conversion
| ...
| SELECT convert('\', 'SJIS', 'UTF-8'); -- results in error
| ...
| SELECT convert('\897e', 'SJIS', 'UTF-8');

> You could then run those SQL statements against old and new server
> version, and verify that you get the same results.

Including the result files in the repository will make this easy
but unacceptably bloats. Put mb/Unicode/README.sanity_check?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2017-03-16 Thread Michael Paquier
On Thu, Mar 16, 2017 at 9:25 PM, Michael Paquier
 wrote:
> On Thu, Mar 16, 2017 at 7:18 PM, Nikhil Sontakke
>  wrote:
>>> + *  * RecoverPreparedTransactions(),
>>> StandbyRecoverPreparedTransactions()
>>> + *and PrescanPreparedTransactions() have been modified to go
>>> throug
>>> + *gxact->inredo entries that have not made to disk yet.
>>>
>>> It seems to me that there should be an initial scan of pg_twophase at
>>> the beginning of recovery, discarding on the way with a WARNING
>>> entries that are older than the checkpoint redo horizon. This should
>>> fill in shmem entries using something close to PrepareRedoAdd(), and
>>> mark those entries as inredo. Then, at the end of recovery,
>>> PrescanPreparedTransactions does not need to look at the entries in
>>> pg_twophase. And that's the case as well of
>>> RecoverPreparedTransaction(). I think that you could get the patch
>>> much simplified this way, as any 2PC data can be fetched directly from
>>> WAL segments and there is no need to rely on scans of pg_twophase,
>>> this is replaced by scans of entries in TwoPhaseState.
>>>
>>
>> I don't think this will work. We cannot replace pg_twophase with shmem
>> entries + WAL pointers. This is because we cannot expect to have WAL entries
>> around for long running prepared queries which survive across checkpoints.
>
> But at the beginning of recovery, we can mark such entries with ondisk
> and inredo, in which case the WAL pointers stored in the shmem entries
> do not matter because the data is already on disk.

Nikhil, do you mind if I try something like that? As we already know
what is the first XID when beginning redo via
ShmemVariableCache->nextXid it is possible to discard 2PC files that
should not be here. What makes me worry is the control of the maximum
number of entries in shared memory. If there are legit 2PC files that
are flushed on disk at checkpoint, you would finish with potentially
more 2PC transactions than what should be possible (even if updates of
max_prepared_xacts are WAL-logged).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-03-16 Thread Tsunakawa, Takayuki
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
> On Fri, Mar 17, 2017 at 1:47 PM, Tsunakawa, Takayuki
>  wrote:
> > BTW, does the developer of each feature have to modify the catalog version
> in catversion.h?  It's a bit annoying to see the patch application failure
> on catversion.h.
> 
> Committers take care of this part.

I understood the committer modifies the catalog version based on the patch 
content, so the patch submitter doesn't have to modify it.  I'm relieved.

> > Isn't it enough to modify the catalog version only when
> alpha/beta/RC/final versions are released?
> 
> That's useful at least for developers to bump it even during a development
> cycle as you can notice with a hard failure at startup if a data folder
> you have created is compatible with the new binaries or not.

Oh, you're absolutely right.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-03-16 Thread Michael Paquier
On Fri, Mar 17, 2017 at 1:47 PM, Tsunakawa, Takayuki
 wrote:
> BTW, does the developer of each feature have to modify the catalog version in 
> catversion.h?  It's a bit annoying to see the patch application failure on 
> catversion.h.

Committers take care of this part.

> Isn't it enough to modify the catalog version only when alpha/beta/RC/final 
> versions are released?

That's useful at least for developers to bump it even during a
development cycle as you can notice with a hard failure at startup if
a data folder you have created is compatible with the new binaries or
not.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-03-16 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of David Steele
> The attached patch udpates the docs per your suggestion and has been rebased
> on master at d69fae2.

I made this ready for committer.  The patch applied except for catversion.h, 
the patch content looks good, and the target test passed as follows:

I set archive_command to 'sleep 10'.  pg_stop_backup() with archive wait took 
about 10 seconds, emitting NOTICE messages.

postgres=# select pg_stop_backup(false, true);
NOTICE:  pg_stop_backup cleanup done, waiting for required WAL segments to be 
archived

NOTICE:  pg_stop_backup complete, all required WAL segments have been archived
  pg_stop_backup
---
 (0/BF8,"START WAL LOCATION: 0/B28 (file 0001000B)+
 CHECKPOINT LOCATION: 0/B60   +
 BACKUP METHOD: streamed  +
 BACKUP FROM: master  +
 START TIME: 2017-03-17 13:26:47 JST  +
 LABEL: a +
 ","")
(1 row)


pg_stop_backup() without archive wait returned immediately without displaying 
any NOTICE messages.


postgres=# select pg_stop_backup(false, false);
  pg_stop_backup
---
 (0/D000130,"START WAL LOCATION: 0/D28 (file 0001000D)+
 CHECKPOINT LOCATION: 0/D60   +
 BACKUP METHOD: streamed  +
 BACKUP FROM: master  +
 START TIME: 2017-03-17 13:29:46 JST  +
 LABEL: a +
 ","")
(1 row)


BTW, does the developer of each feature have to modify the catalog version in 
catversion.h?  It's a bit annoying to see the patch application failure on 
catversion.h.  Isn't it enough to modify the catalog version only when 
alpha/beta/RC/final versions are released?


Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

2017-03-16 Thread Tom Lane
Andres Freund  writes:
> On 2017-03-16 13:00:54 +0100, Petr Jelinek wrote:
>> Looks like that didn't help either.
>> 
>> I setup my own Windows machine and can reproduce the issue. I played
>> around a bit and could not really find a fix other than adding
>> WL_TIMEOUT and short timeout to WaitLatchOrSocket (it does wait a very
>> long time on the WaitLatchOrSocket otherwise before failing).

> Hm. Could you use process explorer or such to see the exact events
> happening?  Seing that might help us to nail this down.

At this point it's hard to escape the conclusion that we're looking at
some sort of bug in the Windows version of the WaitEventSet infrastructure
--- or at least, the WaitEventSet-based waits for socket events are
evidently behaving differently from the implementation on the libpq side,
which is based on pqWaitTimed and thence pqSocketCheck and thence
pqSocketPoll.

Noticing that bowerbird builds with openssl, I'm a bit suspicious that
the issue might have something to do with the fact that pqSocketCheck
has a special case for SSL, which I don't think WaitEventSet knows about.
But I don't think SSL is actually active in the buildfarm run, so it's
hard to see how we get to that scenario exactly --- or even less why it
would only matter on Windows.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

2017-03-16 Thread Andres Freund
On 2017-03-16 13:00:54 +0100, Petr Jelinek wrote:
> On 15/03/17 17:55, Tom Lane wrote:
> > Andrew Dunstan  writes:
> >> On 03/03/2017 11:11 PM, Tom Lane wrote:
> >>> Yeah, I was wondering if this is just exposing a pre-existing bug.
> >>> However, the "normal" path operates by repeatedly invoking PQconnectPoll
> >>> (cf. connectDBComplete) so it's not immediately obvious how such a bug
> >>> would've escaped detection.
> > 
> >> (After a long period of fruitless empirical testing I turned to the code)
> >> Maybe I'm missing something, but connectDBComplete() handles a return of
> >> PGRESS_POLLING_OK as a success while connectDBStart() seems not to. I
> >> don't find anywhere in our code other than libpqwalreceiver that
> >> actually uses that interface, so it's not surprising if it's now
> >> failing. So my bet is it is indeed a long-standing bug.
> > 
> > Meh ... that argument doesn't hold water, because the old code here called
> > PQconnectdbParams which is just PQconnectStartParams then
> > connectDBComplete.  So the problem cannot be in connectDBStart; that's
> > common to both paths.  It has to be some discrepancy between what
> > connectDBComplete does and what the new loop in libpqwalreceiver is doing.
> > 
> > The original loop coding in 1e8a85009 was not very close to the documented
> > spec for PQconnectPoll at all, and while e434ad39a made it closer, it's
> > still not really the same: connectDBComplete doesn't call PQconnectPoll
> > until the socket is known read-ready or write-ready.  The walreceiver loop
> > does not guarantee that, but would make an additional call after any
> > random other wakeup.  It's not very clear why bowerbird, and only
> > bowerbird, would be seeing such wakeups --- but I'm having a really hard
> > time seeing any other explanation for the change in behavior.  (I wonder
> > whether bowerbird is telling us that WaitLatchOrSocket can sometimes
> > return prematurely on Windows.)
> > 
> > I'm also pretty sure that the ResetLatch call is in the wrong place which
> > could lead to missed wakeups, though that's the opposite of the immediate
> > problem.
> > 
> > I'll try correcting these things and we'll see if it gets any better.
> > 
> 
> Looks like that didn't help either.
> 
> I setup my own Windows machine and can reproduce the issue. I played
> around a bit and could not really find a fix other than adding
> WL_TIMEOUT and short timeout to WaitLatchOrSocket (it does wait a very
> long time on the WaitLatchOrSocket otherwise before failing).

Hm. Could you use process explorer or such to see the exact events
happening?  Seing that might help us to nail this down.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] scram and \password

2017-03-16 Thread Michael Paquier
On Thu, Mar 16, 2017 at 10:52 PM, Heikki Linnakangas  wrote:
> On 03/14/2017 11:14 PM, Tom Lane wrote:
>>
>> In short, I don't think that argument refutes my position that "md5"
>> in pg_hba.conf should be understood as allowing SCRAM passwords too.
>
>
> Yeah, let's do that. Here's a patch.

At least this has the merit of making \password simpler from psql
without a kind of --method option: if the backend is 9.6 or older,
just generate a MD5-hash, and SCRAM-hash for newer versions.
PQencryptPassword still needs to be extended so as it accepts a hash
method though.

> I had some terminology trouble with the docs. What do you call a user that
> has "md5X" in pgauthid.rolpassword? What about someone with a SCRAM
> verifier? I used the terms "those users that have an MD5 hash set in the
> system catalog", and "users that have set their password as a SCRAM
> verifier", but it feels awkward.

MD5-hashed values are verifiers as well, the use of the term
"password" looks weird to me.

> The behavior when a user doesn't exist, or doesn't have a valid password, is
> a bit subtle. Previously, with 'md5' authentication, we would send the
> client an MD5 challenge, and fail with "invalid password" error after
> receiving the response. And with 'scram' authentication, we would perform a
> dummy authentication exchange, with a made-up salt. This is to avoid
> revealing to an unauthenticated client whether or not the user existed.
>
> With this patch, the dummy authentication logic for 'md5' is a bit more
> complicated. I made it look at the password_encryption GUC, and send the
> client a dummy MD5 or SCRAM challenge based on that. The idea is that most
> users presumably have a password of that type, so we use that method for the
> dummy authentication, to make it look as "normal" as possible. It's not
> perfect, if password_encryption is set to 'scram', and you probe for a user
> that has an MD5 password set, you can tell that it's a valid user from the
> fact that the server sends an MD5 challenge.

No objections to that. If password_encryption is set off or plain, it
is definitely better to switch to scram as this patch does.

> In practice, I'm not sure how good this dummy authentication thing really is
> anyway. Even on older versions, I'd wager a guess that if you tried hard
> enough, you could tell if a user exists or not based on timing, for example.
> So I think this is good enough. But it's worth noting and discussing.

Regression tests are proving to be useful here (it would be nice to
get those committed first!). I am noticing that this patch breaks
connection for users with cleartext or md5-hashed verifier when
"password" is used in pg_hba.conf. The case where a user has a
scram-hashed verifier when "md5" is used in the matching hba entry
works though. The missing piece is visibly in CheckPWChallengeAuth(),
which should also be used with uaPassword.

-# Most users use SCRAM authentication, but some users use older clients
-# that don't support SCRAM authentication, and need to be able to log
-# in using MD5 authentication. Such users are put in the @md5users
-# group, everyone else must use SCRAM.
+# Require SCRAM authentication for most users, but make an exception
+# for user 'mike', who uses an older client that doesn't support SCRAM
+# authentication.
 #
 # TYPE  DATABASEUSERADDRESS METHOD
-hostall @md5users   .example.commd5
+hostall mike.example.commd5
Why not still using @md5users?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Consistency checking for hash indexes

2017-03-16 Thread Amit Kapila
On Thu, Mar 16, 2017 at 1:15 PM, Ashutosh Sharma  wrote:
> Hi,
>
> Attached is the patch that allows WAL consistency tool to mask
> 'LH_PAGE_HAS_DEAD_TUPLES' flag in hash index. The flag got added as a
> part of 'Microvacuum support for Hash index' patch . I have already
> tested it using Kuntal's WAL consistency tool and it works fine.
>

+ * unlogged. So, mask it. See _hash_kill_items(), MarkBufferDirtyHint()
+ * for
details.
+ */

I think in above comment, a reference to _hash_kill_items is
sufficient.  Other than that patch looks okay.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-16 Thread Peter Eisentraut
On 3/16/17 14:55, Andres Freund wrote:
> I indeed think that's the right consequence.  One question is what to
> replace it with exactly - are we guaranteed we can dynamically lookup
> symbols by name in the main binary on every platform?

I think there is probably a way to do this on all platforms.  But it
seems that at least the Windows port of pg_dlopen would need to be
updated to support this.

> Alternatively we
> can just hardcode a bunch of bgw_function_name values that are matched
> to specific functions if bgw_library_name is NULL - I suspect that'd be
> the easiest / least worrysome portability-wise.

Basically a variant of fmgrtab, which addresses the same sort of problem.


-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible encoding issues with libxml2 functions

2017-03-16 Thread Noah Misch
On Sun, Mar 12, 2017 at 10:26:33PM +0100, Pavel Stehule wrote:
> 2017-03-12 21:57 GMT+01:00 Noah Misch :
> > On Sun, Mar 12, 2017 at 08:36:58PM +0100, Pavel Stehule wrote:
> > > 2017-03-12 0:56 GMT+01:00 Noah Misch :
> > Please add a test case.
> 
> It needs a application - currently there is not possibility to import XML
> document via recv API :(

I think xml_in() can create every value that xml_recv() can create; xml_recv()
is just more convenient given diverse source encodings.  If you make your
application store the value into a table, does "pg_dump --inserts" emit code
that reproduces the same value?  If so, you can use that in your test case.
If not, please provide precise instructions (code, SQL commands) for
reproducing the bug manually.

> > Why not use xml_parse() instead of calling xmlCtxtReadMemory() directly?
> > The
> > answer is probably in the archives, because someone understood the problem
> > enough to document "Some XML-related functions may not work at all on
> > non-ASCII data when the server encoding is not UTF-8. This is known to be
> > an
> > issue for xpath() in particular."
> 
> 
> Probably there are two possible issues

Would you research in the archives to confirm?

> 1. what I touched - recv function does encoding to database encoding - but
> document encoding is not updated.

Using xml_parse() would fix that, right?

> 2. there are not possibility to encode from document encoding to database
> encoding.

Both xml_in() and xml_recv() require the value to be representable in the
database encoding, so I don't think this particular problem can remain by the
time we reach an xpath_internal() call.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Remove objname/objargs split for referring to objects

2017-03-16 Thread Peter Eisentraut
On 3/16/17 11:56, Alvaro Herrera wrote:
> Michael Paquier wrote:
> 
>> What are you using as CFLAGS? As both typenames should be normally
>> set, what about initializing those fields with NULL and add an
>> assertion like the attached?
> 
> Actually, my compiler was right -- this was an ancient bug I introduced
> in 9.5 (commit a61fd533), and this new warning was my compiler being a
> bit smarter now for some reason.  The problem is we were trying to
> extract String value from a TypeName node, which obviously doesn't work
> very well.
> 
> I pushed a real fix, not just a compiler-silencer, along with a few
> lines in object_address.sql to make sure it works properly.  Maybe we
> need a few more tests cases for other parts of pg_get_object_address.
> 
> Pushed fix, backpatched to 9.5.

I am now seeing the warnings that Michael was reporting *after* your
commit in 9.5 and 9.6.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v25)

2017-03-16 Thread David Rowley
On 17 March 2017 at 11:20, Alvaro Herrera  wrote:
> (I think I lost some regression test files.  I couldn't make up my mind
> about putting each statistic type's tests in a separate file, or all
> together in stats_ext.sql.)

+1 for stats_ext.sql. I wanted to add some tests for
pg_statisticsextdef(), but I didn't see a suitable location.
stats_ext.sql would have been a good spot.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Microvacuum support for Hash Index

2017-03-16 Thread Amit Kapila
On Thu, Mar 16, 2017 at 9:39 PM, Ashutosh Sharma  wrote:
>>>
>>
>> Don't you think, we should also clear it during the replay of
>> XLOG_HASH_DELETE?  We might want to log the clear of flag along with
>> WAL record for XLOG_HASH_DELETE.
>>
>
> Yes, it should be cleared. I completely missed this part in a hurry.
> Thanks for informing. I have taken care of it in the attached v2
> patch.
>

+ /*
+ * Mark the page as not containing any LP_DEAD items. See comments
+ * in hashbucketcleanup() for details.
+ */
+ pageopaque = (HashPageOpaque) PageGetSpecialPointer(page);
+ pageopaque->hasho_flag &= ~LH_PAGE_HAS_DEAD_TUPLES;

Your comment here says, refer hashbucketcleanup and in that function,
the comment says "Clearing this flag is just a hint; replay won't redo
this.".  Both seems contradictory.  You need to change the comment in
hashbucketcleanup.  As I said in my previous e-mail, I think you need
to record clearing of this flag in WAL record XLOG_HASH_DELETE as you
are not doing this unconditionally and then during replay clear it
only when the WAL record indicates the same.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-16 Thread Craig Ringer
On 16 March 2017 at 19:52, Stas Kelvich  wrote:

>
> I’m working right now on issue with building snapshots for decoding prepared 
> tx.
> I hope I'll send updated patch later today.


Great.

What approach are you taking?

It looks like the snapshot builder actually does most of the work we
need for this already, maintaining a stack of snapshots we can use. It
might be as simple as invalidating the relcache/syscache when we exit
(and enter?) decoding of a prepared 2pc xact, since it violates the
usual assumption of logical decoding that we decode things strictly in
commit-time order.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-16 Thread Craig Ringer
On 17 March 2017 at 08:10, Stas Kelvich  wrote:

> While working on this i’ve spotted quite a nasty corner case with aborted 
> prepared
> transaction. I have some not that great ideas how to fix it, but maybe i 
> blurred my
> view and missed something. So want to ask here at first.
>
> Suppose we created a table, then in 2pc tx we are altering it and after that 
> aborting tx.
> So pg_class will have something like this:
>
> xmin | xmax | relname
> 100  | 200| mytable
> 200  | 0| mytable
>
> After previous abort, tuple (100,200,mytable) becomes visible and if we will 
> alter table
> again then xmax of first tuple will be set current xid, resulting in 
> following table:
>
> xmin | xmax | relname
> 100  | 300| mytable
> 200  | 0| mytable
> 300  | 0| mytable
>
> In that moment we’ve lost information that first tuple was deleted by our 
> prepared tx.

Right. And while the prepared xact has aborted, we don't control when
it aborts and when those overwrites can start happening. We can and
should check if a 2pc xact is aborted before we start decoding it so
we can skip decoding it if it's already aborted, but it could be
aborted *while* we're decoding it, then have data needed for its
snapshot clobbered.

This hasn't mattered in the past because prepared xacts (and
especially aborted 2pc xacts) have never needed snapshots, we've never
needed to do something from the perspective of a prepared xact.

I think we'll probably need to lock the 2PC xact so it cannot be
aborted or committed while we're decoding it, until we finish decoding
it. So we lock it, then check if it's already aborted/already
committed/in progress. If it's aborted, treat it like any normal
aborted xact. If it's committed, treat it like any normal committed
xact. If it's in progress, keep the lock and decode it.

People using logical decoding for 2PC will presumably want to control
2PC via logical decoding, so they're not so likely to mind such a
lock.

> * Try at first to scan catalog filtering out tuples with xmax bigger than 
> snapshot->xmax
> as it was possibly deleted by our tx. Than if nothing found scan in a usual 
> way.

I don't think that'll be at all viable with the syscache/relcache
machinery. Way too intrusive.

> * Do not decode such transaction at all.

Yes, that's what I'd like to do, per above.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential data loss of 2PC files

2017-03-16 Thread Ashutosh Bapat
On Thu, Mar 16, 2017 at 10:17 PM, David Steele  wrote:
> On 2/13/17 12:10 AM, Michael Paquier wrote:
>> On Tue, Jan 31, 2017 at 11:07 AM, Michael Paquier
>>  wrote:
>>> On Mon, Jan 30, 2017 at 10:52 PM, Heikki Linnakangas  
>>> wrote:
 If that can happen, don't we have the same problem in many other places?
 Like, all the SLRUs? They don't fsync the directory either.
>>>
>>> Right, pg_commit_ts and pg_clog enter in this category.
>>
>> Implemented as attached.
>>
 Is unlink() guaranteed to be durable, without fsyncing the directory? If
 not, then we need to fsync() the directory even if there are no files in it
 at the moment, because some might've been removed earlier in the checkpoint
 cycle.
>>>
>>> Hm... I am not an expert in file systems. At least on ext4 I can see
>>> that unlink() is atomic, but not durable. So if an unlink() is
>>> followed by a power failure, the previously unlinked file could be
>>> here if the parent directory is not fsync'd.
>>
>> So I have been doing more work on this patch, with the following things done:
>> - Flush pg_clog, pg_commit_ts and pg_twophase at checkpoint phase to
>> ensure their durability.
>> - Create a durable_unlink() routine to give a way to perform a durable
>> file removal.
>> I am now counting 111 calls to unlink() in the backend code, and
>> looking at all of them most look fine with plain unlink() if they are
>> not made durable as they work on temporary files (see timeline.c for
>> example), with some exceptions:
>> - In pg_stop_backup, the old backup_label and tablespace_map removal
>> should be durable to avoid putting the system in a wrong state after
>> power loss. Other calls of unlink() are followed by durable_rename so
>> they are fine if let as such.
>> - Removal of old WAL segments should be durable as well. There is
>> already an effort to rename them durably in case of a segment
>> recycled. In case of a power loss, a file that should have been
>> removed could remain in pg_xlog.
>>
>> Looking around, I have bumped as well on the following bug report for
>> SQlite which is in the same category of things:
>> http://sqlite.1065341.n5.nabble.com/Potential-bug-in-crash-recovery-code-unlink-and-friends-are-not-synchronous-td68885.html
>> Scary to see that in this case durability can be a problem at
>> transaction commit...
>
> This patch applies cleanly and compiles at cccbdde.
>
> Ashutosh, do you know when you'll have a chance to review?

The scope of this work has expanded, since last time I reviewed and
marked it as RFC. Right now I am busy with partition-wise joins and do
not have sufficient time to take a look at the expanded scope.
However, I can come back to this after partition-wise join, but that
may stretch to the end of the commitfest. Sorry.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2017-03-16 Thread Michael Paquier
On Fri, Mar 17, 2017 at 11:17 AM, Robert Haas  wrote:
> I understand that the point of renaming pg_clog to pg_xact is that
> pg_clog contains the dreaded letters l-o-g, which we hypothesize
> causes DBAs to remove it.  (Alternate hypothesis: "So, that's what's
> clogging my database!")
>
> Renaming pg_subtrans to pg_subxact has no such redeeming properties.
>
> More, with each of these renamings, we're further separating what
> things are called in the code (xlog, clog, subtrans) with what they're
> called in the filesystem (wal, xact, subxact).
>
> So if we must rename pg_clog, OK, but can't we leave pg_subtrans
> alone?  It's not hurting anybody.

The only argument behind the renaming of pg_subtrans is really
consistency with pg_xact, because both deal with transactions. I don't
personally mind if this portion of the renaming is left off, as you
say anything labelled with "log" is at the origin of this thread.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2017-03-16 Thread Robert Haas
On Thu, Mar 16, 2017 at 9:31 PM, Michael Paquier
 wrote:
> On Fri, Mar 17, 2017 at 12:19 AM, David Steele  wrote:
>> This patch does not apply cleanly at cccbdde:
>>
>> $ git apply ../other/0001-Rename-pg_clog-to-pg_xact.patch
>> error: doc/src/sgml/ref/pg_resetxlog.sgml: No such file or directory
>> error: patch failed: src/backend/postmaster/autovacuum.c:2468
>> error: src/backend/postmaster/autovacuum.c: patch does not apply
>
> This has rotten again...
>
>> Marked "Waiting on Author".
>>
>> I'd really like to see the rest of the renames happen for v10.  It seems
>> like the process got stalled after the pg_wal rename.
>
> Let's see what happens, attached are refreshed versions. Uncertainty
> is the fun part of a CF.

I understand that the point of renaming pg_clog to pg_xact is that
pg_clog contains the dreaded letters l-o-g, which we hypothesize
causes DBAs to remove it.  (Alternate hypothesis: "So, that's what's
clogging my database!")

Renaming pg_subtrans to pg_subxact has no such redeeming properties.

More, with each of these renamings, we're further separating what
things are called in the code (xlog, clog, subtrans) with what they're
called in the filesystem (wal, xact, subxact).

So if we must rename pg_clog, OK, but can't we leave pg_subtrans
alone?  It's not hurting anybody.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] reorder tablespaces in basebackup tar stream for backup_label

2017-03-16 Thread Michael Paquier
On Fri, Mar 17, 2017 at 3:38 AM, Michael Banck
 wrote:
>> Your patch would work with the stream mode though.
>
> Or, if not requesting the "WAL" option of the replication protocol's
> BASE_BACKUP command.
>
> I agree it doesn't make sense to start messing with fetch mode, but I
> don't think we guarantee any ordering of tablespaces (to wit, Bernd was
> pretty sure it was the other way around all the time), neither do I
> think having the main tablespace be the first for non-WAL/stream and the
> last for WAL/fetch would be confusing personally, though I understand
> this is debatable.

>From the docs:
https://www.postgresql.org/docs/9.6/static/protocol-replication.html
"After the second regular result set, one or more CopyResponse results
will be sent, one for the main data directory and one for each
additional tablespace other than pg_default and pg_global."
So yes there is no guarantee about the order tablespaces are sent.

> So I've updated the patch to only switch the main tablespace to be first
> in case WAL isn't included, please find it attached.

-   /* Add a node for the base directory at the end */
+   /* Add a node for the base directory, either at the end or (if
+* WAL is not included) at the beginning (if WAL is not
+* included).  This means the backup_label file is the first
+* file to be sent in the latter case. */
ti = palloc0(sizeof(tablespaceinfo));
ti->size = opt->progress ? sendDir(".", 1, true, tablespaces,
true) : -1;
-   tablespaces = lappend(tablespaces, ti);
+   if (opt->includewal)
+   tablespaces = lappend(tablespaces, ti);
+   else
+   tablespaces = lcons(ti, tablespaces);
The comment block format is incorrect. I would think as well that this
comment should say it is important to have the main tablespace listed
last it includes the WAL segments, and those need to contain all the
latest WAL segments for a consistent backup.

FWIW, I have no issue with changing the ordering of backups the way
you are proposing here as long as the comment of this code path is
clear.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE command progress checker

2017-03-16 Thread Robert Haas
On Fri, Mar 10, 2017 at 2:46 AM, vinayak
 wrote:
> Thank you for reviewing the patch.
>
> The attached patch incorporated Michael and Amit comments also.

I reviewed this tonight.

+/* Report compute index stats phase */
+pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+
PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS);

Hmm, is there really any point to this?  And is it even accurate?  It
doesn't look to me like we are computing any index statistics here;
we're only allocating a few in-memory data structures here, I think,
which is not a statistics computation and probably doesn't take any
significant time.

+/* Report compute heap stats phase */
+pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS);

The phase you label as computing heap statistics also includes the
computation of index statistics.  I wouldn't bother separating the
computation of heap and index statistics; I'd just have a "compute
statistics" phase that begins right after the comment that starts
"Compute the statistics."

+/* Report that we are now doing index cleanup */
+pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP);

OK, this doesn't make any sense either.  We are not cleaning up the
indexes here.  We are just closing them and releasing resources.  I
don't see why you need this phase at all; it can't last more than some
tiny fraction of a second.  It seems like you're trying to copy the
exact same phases that exist for vacuum instead of thinking about what
makes sense for ANALYZE.

+/* Report number of heap blocks scaned so far*/
+pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED,
targblock);

While I don't think this is necessarily a bad thing to report, I find
it very surprising that you're not reporting what seems to me like the
most useful thing here - namely, the number of blocks that will be
sampled in total and the number of those that you've selected.
Instead, you're just reporting the length of the relation and the
last-sampled block, which is a less-accurate guide to total progress.
There are enough counters to report both things, so maybe we should.

+/* Report total number of sample rows*/
+pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS, numrows);

As an alternative to the previous paragraph, yet another thing you
could report is number of rows sampled out of total rows to be
sampled.  But this isn't the way to do it: you are reporting the
number of rows you sampled only once you've finished sampling.  The
point of progress reporting is to report progress -- that is, to
report this value as it's updated, not just to report it when ANALYZE
is practically done and the value has reached its maximum.

The documentation for this patch isn't very good, either.  You forgot
to update the part of the documentation that says that VACUUM is the
only command that currently supports progress reporting, and the
descriptions of the phases and progress counters are brief and not
entirely accurate - e.g. heap_blks_scanned is not actually the number
of heap blocks scanned, because we are sampling, not reading all the
blocks.

When adding calls to the progress reporting functions, please consider
whether a blank line should be added before or after the new code, or
both, or neither.  I'd say some blank lines are needed in a few places
where you didn't add them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2017-03-16 Thread Michael Paquier
On Fri, Mar 17, 2017 at 12:19 AM, David Steele  wrote:
> This patch does not apply cleanly at cccbdde:
>
> $ git apply ../other/0001-Rename-pg_clog-to-pg_xact.patch
> error: doc/src/sgml/ref/pg_resetxlog.sgml: No such file or directory
> error: patch failed: src/backend/postmaster/autovacuum.c:2468
> error: src/backend/postmaster/autovacuum.c: patch does not apply

This has rotten again...

> Marked "Waiting on Author".
>
> I'd really like to see the rest of the renames happen for v10.  It seems
> like the process got stalled after the pg_wal rename.

Let's see what happens, attached are refreshed versions. Uncertainty
is the fun part of a CF.
-- 
Michael


0001-Rename-pg_clog-to-pg_xact.patch
Description: Binary data


0002-Rename-pg_subtrans-to-pg_subxact.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-16 Thread Corey Huinker
Attached is the latest work. Not everything is done yet. I post it because
the next step is likely to be "tedious" as Tom put it, and if there's a way
out of it, I want to avoid it.

What is done:
- all changes here built off the v22 patch
- any function which had scan_state and cond_stack passed in now only has
scan_state, and cond_stack is extracted from the cb_passthrough pointer.
- ConditonalStack is now only explictly passed to get_prompt ... which
doesn't have scan state
- Conditional commands no longer reset scan state, nor do they clear the
query buffer
- boolean expressions consume all options, but only evaluate variables and
backticks in situations where those would be active
- invalid boolean arguments are treated as false
- contextually wrong \else, \endif, \elif are still errors

What is not done:
- TAP tests are not converted to regular regression test(s)
- skipped slash commands still consume the rest of the line

That last part is big, to quote Tom:

* More generally, I do not think that the approach of having exec_command
simply fall out immediately when in a false branch is going to work,
because it ignores the fact that different backslash commands have
different argument parsing rules.  Some will eat the rest of the line and
some won't.  I'm afraid that it might be necessary to remove that code
block and add a test to every single backslash command that decides
whether to actually perform its action after it's consumed its arguments.
That would be tedious :-(.  But as it stands, backslash commands will get
parsed differently (ie with potentially-different ending points) depending
on whether they're in a live branch or not, and that seems just way too
error-prone to be allowed to stand.


If that's what needs to be done, does it make sense to first commit a
pre-patch that encapsulates each command family ( \c and \connect are a
family,  all \d* commands are one family) into its own static function? It
would make the follow-up patch to if-endif cleaner and easier to review.


On Thu, Mar 16, 2017 at 5:14 PM, Tom Lane  wrote:

> Corey Huinker  writes:
> > Ok, I've got some time now and I'm starting to dig into this. I'd like to
> > restate what I *think* my feedback is, in case I missed or misunderstood
> > something.
> > ...
> > 3. Change command scans to scan the whole boolean expression, not just
> > OT_NORMAL.
> > There's a couple ways to go about this. My gut reaction is to create a
> new
> > scan type OT_BOOL_EXPR, which for the time being is the same as
> > OT_WHOLE_LINE, but could one day be something different.
>
> OT_WHOLE_LINE is not what you want because that results in verbatim
> copying, without variable expansion or anything.  My vote would be to
> repeatedly do OT_NORMAL until you get a NULL, thereby consuming as
> many regular arguments as the backslash command has.  (After which,
> if it wasn't exactly one argument, complain, for the moment.  But this
> leaves the door open for something like "\if :foo = :bar".)  Note that
> this implies that "\if some-expression \someothercommand" will be allowed,
> but I think that's fine, as I see no reason to allow backslashes in
> whatever if-expression syntax we invent later.  OT_WHOLE_LINE is a bit of
> a bastard child and I'd just as soon not define it as being the lexing
> behavior of any new commands.
>
> > 5. Allow contextually-correct invalid boolean expressions to map to
> false.
>
> > Out-of-context \endif, \else, and \elif commands remain as errors to be
> > ignored, invalid expressions in an \if or legallyl-placed \elif are just
> > treated as false.
>
> WFM.
>
> regards, tom lane
>
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 2a9c412..7743fb0 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2064,6 +2064,102 @@ hello 10
 
 
   
+\if expression
+\elif expression
+\else
+\endif
+
+
+This group of commands implements nestable conditional blocks.
+A conditional block must begin with an \if and end
+with an \endif.  In between there may be any number
+of \elif clauses, which may optionally be followed
+by a single \else clause.  Ordinary queries and
+other types of backslash commands may (and usually do) appear between
+the commands forming a conditional block.
+
+
+The \if and \elif commands read
+the rest of the line and evaluate it as a boolean expression.  If the
+expression is true then processing continues
+normally; otherwise, lines are skipped until a
+matching \elif, \else,
+or \endif is reached.  Once
+an \if or \elif has succeeded,
+later matching \elif commands are not evaluated but
+are treated as false.  Lines following an \else are
+processed only if no earlier matching \if
+or \elif succeeded.
+
+   

Re: [HACKERS] CREATE/ALTER ROLE PASSWORD ('value' USING 'method')

2017-03-16 Thread Michael Paquier
On Fri, Mar 17, 2017 at 2:30 AM, Jeff Janes  wrote:
> On Thu, Mar 9, 2017 at 4:59 AM, Michael Paquier 
> wrote:
>>
>> On Thu, Mar 9, 2017 at 1:17 AM, Joe Conway  wrote:
>> > On 03/07/2017 08:29 PM, Tom Lane wrote:
>> >> Michael Paquier  writes:
>> >>> here is a separate thread dedicated to the following extension for
>> >>> CREATE/ALTER ROLE: PASSWORD ('value' USING 'method').
>> >>
>> >> The parentheses seem weird ... do we really need those?
>> >
>> > +1
>>
>> Seeing 3 opinions in favor of that, let's do so then. I have updated
>> the patch to not use parenthesis.
>
> The regression tests only exercise the CREATE ROLE...USING version, not the
> ALTER ROLE...USING version.

Done.

> +and plain for an non-hashed password.  If the password
> +string is already in MD5-hashed or SCRAM-hashed, then it is
> +stored hashed as-is.
>
> In the last line, I think "stored as-is" sounds better.

Okay.

> Other than that, it looks good to me.

Thanks for the review. Attached is an updated patch.
-- 
Michael


0001-Add-clause-PASSWORD-val-USING-protocol-to-CREATE-ALT.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-16 Thread Robert Haas
On Tue, Mar 14, 2017 at 1:44 AM, Beena Emerson  wrote:
> Attached is the updated patch. It fixes the issues and also updates few code
> comments.

I did an initial readthrough of this patch tonight just to get a
feeling for what's going on.  Based on that, here are a few review
comments:

The changes to pg_standby seem to completely break the logic to wait
until the file has attained the correct size.  I don't know how to
salvage that logic off-hand, but just breaking it isn't acceptable.

+ Note that changing this value requires an initdb.

Instead, maybe say something like "Note that this value is fixed for
the lifetime of the database cluster."

-intmax_wal_size = 64;/* 1 GB */
-intmin_wal_size = 5;/* 80 MB */
+intwal_segment_size = 2048;/* 16 MB */
+intmax_wal_size = 1024 * 1024;/* 1 GB */
+intmin_wal_size = 80 * 1024;/* 80 MB */

If wal_segment_size is now measured in multiple of XLOG_BLCKSZ, then
it's not the case that 2048 is always 16MB.  If the other values are
now measured in kB, perhaps rename the variables to add _kb, to avoid
confusion with the way it used to work (and in general).  The problem
with leaving this as-is is that any existing references to
max_wal_size in core or extension code will silently break; you want
it to break in a noticeable way so that it gets fixed.

+ * UsableBytesInSegment: It is set in assign_wal_segment_size and stores the
+ * number of bytes in a WAL segment usable for WAL data.

The comment doesn't need to say where it gets set, and it doesn't need
to repeat the variable name.  Just say "The number of bytes in a..."

+assign_wal_segment_size(int newval, void *extra)

Why does a PGC_INTERNAL GUC need an assign hook?  I think the GUC
should only be there to expose the value; it shouldn't have
calculation logic associated with it.

 /*
+ * initdb passes the WAL segment size in an environment variable. We don't
+ * bother doing any sanity checking, we already check in initdb that the
+ * user gives a sane value.
+ */
+XLogSegSize = pg_atoi(getenv("XLOG_SEG_SIZE"), sizeof(uint32), 0);

I think we should bother.  I don't like the idea of the postmaster
crashing in flames without so much as a reasonable error message if
this parameter-passing mechanism goes wrong.

+{"wal-segsize", required_argument, NULL, 'Z'},

When adding an option with no documented short form, generally one
picks a number that isn't a character for the value at the end.  See
pg_regress.c or initdb.c for examples.

+   wal_segment_size = atoi(str_wal_segment_size);

So, you're comfortable interpreting --wal-segsize=1TB or
--wal-segsize=1GB as 1?  Implicitly, 1MB?

+ * ControlFile is not accessible here so use SHOW wal_segment_size command
+ * to set the XLogSegSize

Breaks compatibility with pre-9.6 servers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-16 Thread Tsunakawa, Takayuki
From: David Steele [mailto:da...@pgmasters.net]
> Any idea when you'll have a chance to review?

I'll do it by early next week.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-16 Thread Stas Kelvich

>> On 2 Mar 2017, at 11:00, Craig Ringer  wrote:
>> 
>> BTW, I've been reviewing the patch in more detail. Other than a bunch
>> of copy-and-paste that I'm cleaning up, the main issue I've found is
>> that in DecodePrepare, you call:
>> 
>>   SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
>>  parsed->nsubxacts, parsed->subxacts);
>> 
>> but I am not convinced it is correct to call it at PREPARE TRANSACTION
>> time, only at COMMIT PREPARED time. We want to see the 2pc prepared
>> xact's state when decoding it, but there might be later commits that
>> cannot yet see that state and shouldn't have it visible in their
>> snapshots. 
> 
> Agree, that is problem. That allows to decode this PREPARE, but after that
> it is better to mark this transaction as running in snapshot or perform 
> prepare
> decoding with some kind of copied-end-edited snapshot. I’ll have a look at 
> this.
> 

While working on this i’ve spotted quite a nasty corner case with aborted 
prepared
transaction. I have some not that great ideas how to fix it, but maybe i 
blurred my
view and missed something. So want to ask here at first.

Suppose we created a table, then in 2pc tx we are altering it and after that 
aborting tx.
So pg_class will have something like this:

xmin | xmax | relname
100  | 200| mytable
200  | 0| mytable

After previous abort, tuple (100,200,mytable) becomes visible and if we will 
alter table
again then xmax of first tuple will be set current xid, resulting in following 
table:

xmin | xmax | relname
100  | 300| mytable
200  | 0| mytable
300  | 0| mytable

In that moment we’ve lost information that first tuple was deleted by our 
prepared tx.
And from POV of historic snapshot that will be constructed to decode prepare 
first
tuple is visible, but actually send tuple should be used. Moreover such 
snapshot could
see both tuples violating oid uniqueness, but heapscan stops after finding 
first one.

I see here two possible workarounds:

* Try at first to scan catalog filtering out tuples with xmax bigger than 
snapshot->xmax
as it was possibly deleted by our tx. Than if nothing found scan in a usual way.

* Do not decode such transaction at all. If by the time of decoding prepare 
record we
already know that it is aborted than such decoding doesn’t have a lot of sense.
IMO intended usage of logical 2pc decoding is to decide about commit/abort based
on answers from logical subscribers/replicas. So there will be barrier between
prepare and commit/abort and such situations shouldn’t happen.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \h tab-completion

2017-03-16 Thread Andreas Karlsson

On 03/17/2017 12:01 AM, Peter Eisentraut wrote:

Committed with some tweaking.


Thanks!

Andreas



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \h tab-completion

2017-03-16 Thread Peter Eisentraut
On 3/15/17 22:46, Andreas Karlsson wrote:
> On 03/01/2017 02:47 PM, Peter Eisentraut wrote:
>> Instead of creating another copy of list_ALTER, let's use the
>> words_after_create list and write a version of
>> create_command_generator/drop_command_generator.
> 
> Good idea. Here is a patch with that.

Committed with some tweaking.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Remove defunct and unnecessary link

2017-03-16 Thread David Christensen
The HA docs reference a “glossary” link which is no longer accessible, nor is 
it likely to be useful in general to link off-site IMHO.  This simple patch 
removes this link.

Best,

David
--
David Christensen
End Point Corporation
da...@endpoint.com
785-727-1171




0001-Remove-defunct-and-unnecessary-doc-link.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #13755: pgwin32_is_service not checking if SECURITY_SERVICE_SID is disabled

2017-03-16 Thread MauMau
From: Heikki Linnakangas
So, I think we still need the check for Local System.


Thanks, fixed and confirmed that the error message is output in the
event log.

Regards
MauMau



win32-security-service-v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size vs size_t

2017-03-16 Thread Tom Lane
Thomas Munro  writes:
> Naive replacement in new files (present in master but not in 9.6) with
> the attached script, followed by a couple of manual corrections where
> Size was really an English word in a comment, gets the attached diff.

In the case of mmgr/slab.c, a lot of those uses of Size probably
correspond to instantiations of the MemoryContext APIs; so blindly
changing them to "size_t" seems like a bit of a type violation
(and might indeed draw warnings from pickier compilers).  Don't
know if any of the other things you've identified here have similar
entanglements.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size vs size_t

2017-03-16 Thread Thomas Munro
On Fri, Mar 17, 2017 at 10:39 AM, Andres Freund  wrote:
> On 2017-03-16 17:24:17 -0400, Tom Lane wrote:
>> Robert Haas  writes:
>> > On Thu, Mar 16, 2017 at 5:01 PM, Andres Freund  wrote:
>> >> On 2017-03-16 16:59:29 -0400, Robert Haas wrote:
>> > Well, I don't think we want to end up with a mix of Size and size_t in
>> > related code.  That buys nobody anything.  I'm fine with replacing
>> > Size with size_t if they are always equivalent, but there's no sense
>> > in having a jumble of styles.
>>
>> I'm not in a hurry to do "s/Size/size_t/g" because I'm afraid it'll create
>> a lot of merge pain for back-patching, while not actually buying anything
>> much concretely.  I think this falls under the same policy we use for many
>> other stylistic details, ie make new code look like the code right around
>> it.  But I'm fine with entirely-new files standardizing on size_t.
>
> That seems like sane policy.  I'm a bit doubtful that the pain would be
> all that bad, but I'm also not wild about trying.

Naive replacement in new files (present in master but not in 9.6) with
the attached script, followed by a couple of manual corrections where
Size was really an English word in a comment, gets the attached diff.

 src/backend/access/hash/hash_xlog.c|  26 ++--
 src/backend/replication/logical/launcher.c |   4 +-
 src/backend/utils/misc/backend_random.c|   4 +-
 src/backend/utils/mmgr/dsa.c   |  94 ++---
 src/backend/utils/mmgr/freepage.c  | 202 ++--
 src/backend/utils/mmgr/slab.c  |  34 ++---
 src/include/lib/simplehash.h   |   6 +-
 src/include/replication/logicallauncher.h  |   2 +-
 src/include/utils/backend_random.h |   2 +-
 src/include/utils/dsa.h|  10 +-
 src/include/utils/freepage.h   |  24 ++--
 src/include/utils/relptr.h |   4 +-
 12 files changed, 206 insertions(+), 206 deletions(-)

That might be just about enough for size_t to catch up...

-- 
Thomas Munro
http://www.enterprisedb.com


Size-to-size_t.sh
Description: Bourne shell script


Size-to-size_t.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size vs size_t

2017-03-16 Thread Tom Lane
Andres Freund  writes:
> On 2017-03-16 17:24:17 -0400, Tom Lane wrote:
>> The short answer to that is that "Size" predates the universal acceptance
>> of size_t.  If we were making these decisions today, or anytime since the
>> early 2000s, we'd surely have just gone with size_t.  But it wasn't a
>> realistic option in the 90s.

> Just out of curiosity I checked when we switched to backing Size with
> size_t:
> 1998 - 0ad5d2a3a886e72b429ea2b84bfcb36c0680f84d

Yeah.  We inherited the previous definition (as "unsigned int") from
Berkeley.  I wasn't involved then, of course, but I follow their reasoning
perfectly because I remember fighting the same type of portability battles
with libjpeg in the early 90s.  "size_t" was invented by the ANSI C
committee (hence, 1989 or 1990) and had only very haphazard penetration
until the late 90s.  If you wanted to write portable code you couldn't
depend on it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making clausesel.c Smarter

2017-03-16 Thread Tom Lane
David Steele  writes:
> Anyone familiar with the planner available to review this patch?

FWIW, it's on my radar but I don't expect to get to it real soon,
as there's other stuff I deem higher priority.  In the meantime,
I don't want to stand in the way of someone else looking at it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making clausesel.c Smarter

2017-03-16 Thread David Steele
On 2/26/17 1:41 AM, Robert Haas wrote:
> On Fri, Feb 24, 2017 at 3:30 PM, David Rowley
>  wrote:
>> It would be good to improve the situation here in the back branches
>> too, but I'm thinking that the attached might be a little invasive for
>> that?
> 
> My experience has been that customers - at least EnterpriseDB
> customers - do not appreciate plan changes when they upgrade to a new
> minor release.  Most people have production installations that are
> basically working; if not, they wouldn't be in production.  Even if
> they're getting a good plan only by some happy accident, they're still
> getting it, and a change can cause a good plan to flop over into a bad
> plan, which can easily turn into a service outage.  The people who had
> a bad plan and get flipped to a good plan will be happy once they
> realize what has changed, of course, but that's not really enough to
> make up from the panicked calls from customers whose stuff falls over
> when they try to apply the critical security update.
> 
> I think the basic think you are trying to accomplish is sensible,
> though.  I haven't reviewed the patch.

This patch applies cleanly and compiles at cccbdde.  I agree with Robert
that back patching would likely not be a good idea.

Anyone familiar with the planner available to review this patch?

It also seems like this would be a (relatively) simple patch to start
with for anyone that is interested in learning more about the planner.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size vs size_t

2017-03-16 Thread Andres Freund
On 2017-03-16 17:24:17 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Thu, Mar 16, 2017 at 5:01 PM, Andres Freund  wrote:
> >> On 2017-03-16 16:59:29 -0400, Robert Haas wrote:
> >>> I guess I assumed that we wouldn't have defined PG-specific types if
> >>> we wanted to just use the OS-supplied ones.
> 
> >> I think, in this case, defining Size in the first place was a bad call
> >> on behalf of the project.
> 
> The short answer to that is that "Size" predates the universal acceptance
> of size_t.  If we were making these decisions today, or anytime since the
> early 2000s, we'd surely have just gone with size_t.  But it wasn't a
> realistic option in the 90s.

Just out of curiosity I checked when we switched to backing Size with
size_t:
1998 - 0ad5d2a3a886e72b429ea2b84bfcb36c0680f84d


> > Well, I don't think we want to end up with a mix of Size and size_t in
> > related code.  That buys nobody anything.  I'm fine with replacing
> > Size with size_t if they are always equivalent, but there's no sense
> > in having a jumble of styles.
> 
> I'm not in a hurry to do "s/Size/size_t/g" because I'm afraid it'll create
> a lot of merge pain for back-patching, while not actually buying anything
> much concretely.  I think this falls under the same policy we use for many
> other stylistic details, ie make new code look like the code right around
> it.  But I'm fine with entirely-new files standardizing on size_t.

That seems like sane policy.  I'm a bit doubtful that the pain would be
all that bad, but I'm also not wild about trying.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/Python: Add cursor and execute methods to plan object

2017-03-16 Thread David Steele
On 2/25/17 1:27 PM, Peter Eisentraut wrote:
> Something that has been bothering me in PL/Python for a long time is the
> non-object-oriented way in which plans are prepared and executed:
> 
> plan = plpy.prepare(...)
> res = plpy.execute(plan, ...)
> 
> where plpy.execute() takes either a plan or a query string.
> 
> I think a better style would be
> 
> plan = plpy.prepare(...)
> res = plan.execute(...)
> 
> so that the "plan" is more like a statement handle that one finds in
> other APIs.
> 
> This ended up being very easy to implement, so I'm proposing to allow
> this new syntax as an alternative.
> 
> I came across this again as I was developing the background sessions API
> for PL/Python.  So I'm also wondering here which style people prefer so
> I can implement it there.

This patch applies cleanly at cccbdde.

Any Python folks out there who would like to take a crack at reviewing this?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] Problem in using pgbench's --connect(-C) and --rate=rate(-R rate) options together.

2017-03-16 Thread David Steele
On 1/25/17 2:58 PM, Fabien COELHO wrote:
> 
> Repost from bugs.

This patch does not apply at cccbdde:

$ patch -p1 < ../other/pgbench-CR-bug-1.patch
(Stripping trailing CRs from patch.)
patching file src/bin/pgbench/pgbench.c
Hunk #1 FAILED at 1967.
Hunk #2 succeeded at 4180 with fuzz 2 (offset -164 lines).

Marked as "Waiting for Author".

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size vs size_t

2017-03-16 Thread Tom Lane
Robert Haas  writes:
> On Thu, Mar 16, 2017 at 5:01 PM, Andres Freund  wrote:
>> On 2017-03-16 16:59:29 -0400, Robert Haas wrote:
>>> I guess I assumed that we wouldn't have defined PG-specific types if
>>> we wanted to just use the OS-supplied ones.

>> I think, in this case, defining Size in the first place was a bad call
>> on behalf of the project.

The short answer to that is that "Size" predates the universal acceptance
of size_t.  If we were making these decisions today, or anytime since the
early 2000s, we'd surely have just gone with size_t.  But it wasn't a
realistic option in the 90s.

> Well, I don't think we want to end up with a mix of Size and size_t in
> related code.  That buys nobody anything.  I'm fine with replacing
> Size with size_t if they are always equivalent, but there's no sense
> in having a jumble of styles.

I'm not in a hurry to do "s/Size/size_t/g" because I'm afraid it'll create
a lot of merge pain for back-patching, while not actually buying anything
much concretely.  I think this falls under the same policy we use for many
other stylistic details, ie make new code look like the code right around
it.  But I'm fine with entirely-new files standardizing on size_t.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous execution

2017-03-16 Thread Corey Huinker
On Thu, Mar 16, 2017 at 4:17 PM, Tom Lane  wrote:

> Corey Huinker  writes:
> > I reworked the test such that all of the foreign tables inherit from the
> > same parent table, and if you query that you do get async execution. But
> It
> > doesn't work when just stringing together those foreign tables with UNION
> > ALLs.
>
> > I don't know how to proceed with this review if that was a goal of the
> > patch.
>
> Whether it was a goal or not, I'd say there is something either broken
> or incorrectly implemented if you don't see that.  The planner (and
> therefore also the executor) generally treats inheritance the same as
> simple UNION ALL.  If that's not the case here, I'd want to know why.
>
> regards, tom lane
>

Updated commitfest entry to "Returned With Feedback".


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-16 Thread Tom Lane
Corey Huinker  writes:
> Ok, I've got some time now and I'm starting to dig into this. I'd like to
> restate what I *think* my feedback is, in case I missed or misunderstood
> something.
> ...
> 3. Change command scans to scan the whole boolean expression, not just
> OT_NORMAL.
> There's a couple ways to go about this. My gut reaction is to create a new
> scan type OT_BOOL_EXPR, which for the time being is the same as
> OT_WHOLE_LINE, but could one day be something different.

OT_WHOLE_LINE is not what you want because that results in verbatim
copying, without variable expansion or anything.  My vote would be to
repeatedly do OT_NORMAL until you get a NULL, thereby consuming as
many regular arguments as the backslash command has.  (After which,
if it wasn't exactly one argument, complain, for the moment.  But this
leaves the door open for something like "\if :foo = :bar".)  Note that
this implies that "\if some-expression \someothercommand" will be allowed,
but I think that's fine, as I see no reason to allow backslashes in
whatever if-expression syntax we invent later.  OT_WHOLE_LINE is a bit of
a bastard child and I'd just as soon not define it as being the lexing
behavior of any new commands.

> 5. Allow contextually-correct invalid boolean expressions to map to false.

> Out-of-context \endif, \else, and \elif commands remain as errors to be
> ignored, invalid expressions in an \if or legallyl-placed \elif are just
> treated as false.

WFM.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size vs size_t

2017-03-16 Thread Robert Haas
On Thu, Mar 16, 2017 at 5:01 PM, Andres Freund  wrote:
> On 2017-03-16 16:59:29 -0400, Robert Haas wrote:
>> On Thu, Mar 16, 2017 at 4:40 PM, Thomas Munro
>>  wrote:
>> > Noticing that the assembled hackers don't seem to agree on $SUBJECT in
>> > new patches, I decided to plot counts of lines matching \ and
>> > \ over time.  After a very long run in the lead, size_t has
>> > recently been left in the dust by Size.
>>
>> I guess I assumed that we wouldn't have defined PG-specific types if
>> we wanted to just use the OS-supplied ones.
>
> I think, in this case, defining Size in the first place was a bad call
> on behalf of the project.  It gains us absolutely nothing, but makes it
> harder to read for people that don't know PG all that well.  I think we
> should slowly phase out Size usage, at least in new code.

Well, I don't think we want to end up with a mix of Size and size_t in
related code.  That buys nobody anything.  I'm fine with replacing
Size with size_t if they are always equivalent, but there's no sense
in having a jumble of styles.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: new set of psql patches for loading (saving) data from (to) text, binary files

2017-03-16 Thread Stephen Frost
Pavel,

* Pavel Stehule (pavel.steh...@gmail.com) wrote:
> 2017-03-15 17:21 GMT+01:00 Stephen Frost :
> > I started looking through this to see if it might be ready to commit and
> > I don't believe it is.  Below are my comments about the first patch, I
> > didn't get to the point of looking at the others yet since this one had
> > issues.
> >
> > * Pavel Stehule (pavel.steh...@gmail.com) wrote:
> > > 2017-01-09 17:24 GMT+01:00 Jason O'Donnell :
> > > > gstore/gbstore:
> >
> > I don't see the point to 'gstore'- how is that usefully different from
> > just using '\g'?  Also, the comments around these are inconsistent, some
> > say they can only be used with a filename, others say it could be a
> > filename or a pipe+command.
> 
> \gstore ensure dump row data. It can be replaced by \g with some other
> setting, but if query is not unique, then the result can be messy. What is
> not possible with \gbstore.

I don't understand what you mean by "the result can be messy."  We have
lots of options for controlling the output of the query and those can be
used with \g just fine.  This seems like what you're doing is inventing
something entirely new which is exactly the same as setting the right
options which already exist and that seems odd to me.

Is it any different from setting \a and \t and then calling \g?  If not,
then I don't see why it would be useful to add.

> More interesting is \gbstore that uses binary API - it can be used for
> bytea fields or for XML fields with implicit correct encoding change.
> \gbstore is not possible to replace by \g.

Yes, having a way to get binary data out using psql and into a file is
interesting and I agree that we should have that capability.

Further, what I think we will definitely need is a way to get binary
data out using psql at the command-line too.  We have the -A and -t
switches which correspond to \a and \t, we should have something for
this too.  Perhaps what that really calls for is a '\b' and a '-B'
option to go with it which will flip psql into binary mode, similar to
the other Formatting options.  I realize it might seem a bit
counter-intuitive, but I can actually see use-cases for having binary
data spit out to $PAGER (when you have a $PAGER that handles it
cleanly, as less does, for example).

> > There's a whitespace-only hunk that shouldn't be included.
> >
> > I don't agree with the single-column/single-row restriction on these.  I
> > can certainly see a case where someone might want to, say, dump out a
> > bunch of binary integers into a file for later processing.
> >
> > The tab-completion for 'gstore' wasn't correct (you didn't include the
> > double-backslash).  The patch also has conflicts against current master
> > now.
> >
> > I guess my thinking about moving this forward would be to simplify it to
> > just '\gb' which will pull the data from the server side in binary
> > format and dump it out to the filename or command given.  If there's a
> > new patch with those changes, I'll try to find time to look at it.
> 
> ok I'll prepare patch

Great, thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Size vs size_t

2017-03-16 Thread Andres Freund
On 2017-03-16 16:59:29 -0400, Robert Haas wrote:
> On Thu, Mar 16, 2017 at 4:40 PM, Thomas Munro
>  wrote:
> > Noticing that the assembled hackers don't seem to agree on $SUBJECT in
> > new patches, I decided to plot counts of lines matching \ and
> > \ over time.  After a very long run in the lead, size_t has
> > recently been left in the dust by Size.
> 
> I guess I assumed that we wouldn't have defined PG-specific types if
> we wanted to just use the OS-supplied ones.

I think, in this case, defining Size in the first place was a bad call
on behalf of the project.  It gains us absolutely nothing, but makes it
harder to read for people that don't know PG all that well.  I think we
should slowly phase out Size usage, at least in new code.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Size vs size_t

2017-03-16 Thread Robert Haas
On Thu, Mar 16, 2017 at 4:40 PM, Thomas Munro
 wrote:
> Noticing that the assembled hackers don't seem to agree on $SUBJECT in
> new patches, I decided to plot counts of lines matching \ and
> \ over time.  After a very long run in the lead, size_t has
> recently been left in the dust by Size.

I guess I assumed that we wouldn't have defined PG-specific types if
we wanted to just use the OS-supplied ones.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-16 Thread Corey Huinker
On Mon, Mar 13, 2017 at 5:21 PM, Tom Lane  wrote:

> "Daniel Verite"  writes:
> > Tom Lane wrote:
> >> when we see \if is that we do nothing but absorb text
> >> until we see the matching \endif.  At that point we could bitch and
> throw
> >> everything away if, say, there's \elif after \else, or anything else you
> >> want to regard as a "compile time error".  Otherwise we start execution,
> >> and from there on it probably has to behave as we've been discussing.
> >> But this'd be pretty unfriendly from an interactive standpoint, and I'm
> >> not really convinced that it makes for significantly better error
> >> reporting.
>
> > On the whole, isn't that a reasonable model to follow for psql?
>
> One thing that occurs to me after more thought is that with such a model,
> we could not have different lexing rules for live vs not-live branches,
> since we would not have made those decisions before scanning the input.
> This seems problematic.  Even if you discount the question of whether
> variable expansion is allowed to change command-boundary decisions, we'd
> still not want backtick execution to happen everywhere in the block, ISTM.
>
> Maybe we could fix things so that backtick execution happens later, but
> it would be a pretty significant and invasive change to backslash command
> execution, I'm afraid.
>
> regards, tom lane
>

Ok, I've got some time now and I'm starting to dig into this. I'd like to
restate what I *think* my feedback is, in case I missed or misunderstood
something.

1. Convert perl tests to a single regular regression test.

2. Have MainLoop() pass the cond_stack to the lexer via
psql_scan_set_passthrough(scan_state, (void *) cond_stack);

3. Change command scans to scan the whole boolean expression, not just
OT_NORMAL.

There's a couple ways to go about this. My gut reaction is to create a new
scan type OT_BOOL_EXPR, which for the time being is the same as
OT_WHOLE_LINE, but could one day be something different.

4. Change variable expansion and backtick execution in false branches to
match new policy.

I've inferred that current preference would be for no expansion and no
execution.

5. Allow contextually-correct invalid boolean expressions to map to false.

Out-of-context \endif, \else, and \elif commands remain as errors to be
ignored, invalid expressions in an \if or legallyl-placed \elif are just
treated as false.

Did I miss anything?


Re: [HACKERS] GSOC - TOAST'ing in slices

2017-03-16 Thread Stephen Frost
George,

* George Papadrosou (gpapadro...@gmail.com) wrote:
> Stephen, you mentioned PostGIS, but the conversation seems to lean towards 
> JSONB. What are your thoughts?

Both are important.  I brought up PostGIS specifically because it's an
external project which could benefit from this work and to explain that
PostgreSQL can be extended to beyond the built-in data types.  Focusing
on JSONB to start seems alright but keep in mind that we'll want to have
a way to let PostGIS do whatever it is we do for JSONB in core.

> Also, if I am to include some ideas/approaches in the proposal, it seems I 
> should really focus on understanding how a specific data type is used, 
> queried and indexed, which is a lot of exploring for a newcomer in postgres 
> code.

This is true, but, really, the sooner the better. :)  While it's not a
small amount to go through it's also really pretty clean code, in
general, so hopefully you won't have too much trouble.  I would
recommend jumping on irc.freenode.net to the #postgresql channel where
you can find a number of PostgreSQL hacker who are generally quite happy
to answer specific questions you may have as you go through the code.

> In the meanwhile, I am trying to find how jsonb is indexed and queried. After 
> I grasp the current situation I will be to think about new approaches.

I would suggest you make sure that you first understand how TOAST works
generally and review the on-disk storage format before jumping in to try
and understand jsonb indexing and queries.  Would be good to also
understand how the PGLZ compression works.

Thanks!

Stephen


signature.asc
Description: Digital signature


[HACKERS] Size vs size_t

2017-03-16 Thread Thomas Munro
Hi,

Noticing that the assembled hackers don't seem to agree on $SUBJECT in
new patches, I decided to plot counts of lines matching \ and
\ over time.  After a very long run in the lead, size_t has
recently been left in the dust by Size.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous execution

2017-03-16 Thread Tom Lane
Corey Huinker  writes:
> I reworked the test such that all of the foreign tables inherit from the
> same parent table, and if you query that you do get async execution. But It
> doesn't work when just stringing together those foreign tables with UNION
> ALLs.

> I don't know how to proceed with this review if that was a goal of the
> patch.

Whether it was a goal or not, I'd say there is something either broken
or incorrectly implemented if you don't see that.  The planner (and
therefore also the executor) generally treats inheritance the same as
simple UNION ALL.  If that's not the case here, I'd want to know why.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-03-16 Thread Robert Haas
On Thu, Mar 16, 2017 at 6:27 AM, Amit Khandekar  wrote:
> Attached is an updated patch v7, which does the above.

Some comments:

- You've added a GUC (which is good) but not documented it (which is
bad) or added it to postgresql.conf.sample (also bad).

- You've used a loop inside a spinlock-protected critical section,
which is against project policy.  Use an LWLock; define and document a
new builtin tranche ID.

- The comment for pa_finished claims that it is the number of workers
executing the subplan, but it's a bool, not a count; I think this
comment is just out of date.

- paths_insert_sorted_by_cost() is a hand-coded insertion sort.  Can't
we find a way to use qsort() for this instead of hand-coding a slower
algorithm?  I think we could just create an array of the right length,
stick each path into it from add_paths_to_append_rel, and then qsort()
the array based on .  Then the result can be
turned into a list.

- Maybe the new helper functions in nodeAppend.c could get names
starting with exec_append_, to match the style of
exec_append_initialize_next().

- There's a superfluous whitespace change in add_paths_to_append_rel.

- The substantive changes in add_paths_to_append_rel don't look right
either.  It's not clear why accumulate_partialappend_subpath is
getting called even in the non-enable_parallelappend case.  I don't
think the logic for the case where we're not generating a parallel
append path needs to change at all.

- When parallel append is enabled, I think add_paths_to_append_rel
should still consider all the same paths that it does today, plus one
extra.  The new path is a parallel append path where each subpath is
the cheapest subpath for that childrel, whether partial or
non-partial.  If !enable_parallelappend, or if all of the cheapest
subpaths are partial, then skip this.  (If all the cheapest subpaths
are non-partial, it's still potentially useful.)  In other words,
don't skip consideration of parallel append just because you have a
partial path available for every child rel; it could be

- I think the way cost_append() works is not right.  What you've got
assumes that you can just multiply the cost of a partial plan by the
parallel divisor to recover the total cost, which is not true because
we don't divide all elements of the plan cost by the parallel divisor
-- only the ones that seem like they should be divided.  Also, it
could be smarter about what happens with the costs of non-partial
paths. I suggest the following algorithm instead.

1. Add up all the costs of the partial paths.  Those contribute
directly to the final cost of the Append.  This ignores the fact that
the Append may escalate the parallel degree, but I think we should
just ignore that problem for now, because we have no real way of
knowing what the impact of that is going to be.

2. Next, estimate the cost of the non-partial paths.  To do this, make
an array of Cost of that length and initialize all the elements to
zero, then add the total cost of each non-partial plan in turn to the
element of the array with the smallest cost, and then take the maximum
of the array elements as the total cost of the non-partial plans.  Add
this to the result from step 1 to get the total cost.

- In get_append_num_workers, instead of the complicated formula with
log() and 0.693, just add the list lengths and call fls() on the
result.  Integer arithmetic FTW!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] temp_buffers vs temp vs local and explain

2017-03-16 Thread Joshua D. Drake

-hackers,

I was reviewing an explain plan today and with some help from Andrew G, 
I got a lot more information than I deserved. It did however bring up 
quite a usability issue that I think we should consider.


Let's review the following two lines:

Sort Method: external merge  Disk: 19352kB
   Buffers: shared hit=257714, temp read=8822 written=8808

Now the first line is pretty obvious. We spilled over work_mem and hit 
the disk for ~ 20MB of use.


The second line is not so clear.

Buffers, shared_buffers? We hit 257714 of those. That makes sense but 
what about temp? Temp refers to temp files, not temp_buffers or temp 
tables. Temp buffers refers to a temp table (ala create temp table) but 
is represented as local in an explain plan. Further the values of temp 
are blocks, not bytes.


Basically, it is a little convoluted.

I am not 100% what the answer here is but it seems more consistency 
might be a good start.


Also, it would be a huge boon for many (almost all) of our users if we 
could just do (something like) this:


EXPLAIN (ANALYZE,SUMMARY)

And it said:

Query 1

shared_buffers
  *
  *
work_mem
  * Total Used =
  * In Memory =
  * On Disk =
Rows
  * Estimated =
  * Actual =

etc...

I know that access to the details are needed but for day to day 
operations for a huge portion of our users, they just want to know how 
much memory they need, or if they need a faster disk etc...


Thanks,

JD




--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-03-16 Thread Robert Haas
On Thu, Mar 16, 2017 at 8:48 AM, Ashutosh Bapat
 wrote:
> Why do we need following code in both ExecAppendInitializeWorker() and
> ExecAppendInitializeDSM()? Both of those things happen before starting the
> actual execution, so one of those should suffice?
> +/* Choose the optimal subplan to be executed. */
> +(void) parallel_append_next(node);

ExecAppendInitializeWorker runs only in workers, but
ExecAppendInitializeDSM runs only in the leader.

> BTW, sa_finished seems to be a misnomor. The plan is not finished yet, but it
> wants no more workers. So, should it be renamed as sa_no_new_workers or
> something like that?

I think that's not going to improve clarity.  The comments can clarify
the exact semantics.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-16 Thread Robert Haas
On Thu, Mar 16, 2017 at 2:55 PM, Andres Freund  wrote:
> I indeed think it's not safe, and it's going to get less and less safe
> on windows (or EXEC_BACKEND).  I don't think we can afford to disable
> ASLR in the long run (I indeed supect that'll just be disallowed at some
> point), and that's the only thing making it safe-ish in combination with
> EXEC_BACKEND.

Ugh.

>> If it's not even safe there, then I guess we should remove it entirely
>> as a useless foot-gun.
>
> I indeed think that's the right consequence.  One question is what to
> replace it with exactly - are we guaranteed we can dynamically lookup
> symbols by name in the main binary on every platform?

I don't know the answer to that question.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-16 Thread Robert Haas
On Thu, Mar 16, 2017 at 1:50 PM, Dilip Kumar  wrote:
> fixed

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_ls_waldir() & pg_ls_logdir()

2017-03-16 Thread Robert Haas
On Thu, Mar 16, 2017 at 6:09 AM, Dave Page  wrote:
> Hmm, good point. Google seems to be saying there isn't one. Patch
> updated as you suggest (and I've added back in a function declaration
> that got lost in the rebasing of the last version).

OK, I took another look at this:

- The documentation wasn't consistent with itself about the order in
which the three columns were mentioned.  I changed it to say name,
size, modification time both places and made the code also return the
columns in that order.  And I renamed the columns to name, size, and
modification, the last of which was chosen to match pg_stat_file().

- I added an error check for the stat() call.

- I moved the code to genfile.c where pg_ls_dir() already is; it seems
to fit within the charter of that file.

- I changed it to build a heap tuple directly instead of converting to
text and then back to datums.  Seems less error-prone that way, and
more consistent with what's done elsewhere in genfile.c.

- I made it use a static-allocated buffer instead of a palloc'd one,
just so it doesn't leak into the surrounding context.

- I removed the function prototype and instead declared the helper
function static.  If there's an intent to expose that function to
extensions, the prototype should be in a header, not the .c file.

- I adjusted the language in the documentation to be a bit more
similar to what we've done elsewhere.

With those changes, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Monitoring roles patch

2017-03-16 Thread Denish Patel
Hi Dave,

The patch failed applied...

patch -p1 < /home/vagrant/pg_monitor.diff
patching file contrib/pg_buffercache/Makefile
patching file contrib/pg_buffercache/pg_buffercache--1.2--1.3.sql
patching file contrib/pg_buffercache/pg_buffercache.control
patching file contrib/pg_freespacemap/Makefile
patching file contrib/pg_freespacemap/pg_freespacemap--1.1--1.2.sql
patching file contrib/pg_freespacemap/pg_freespacemap.control
patching file contrib/pg_stat_statements/Makefile
patching file contrib/pg_stat_statements/pg_stat_statements--1.4--1.5.sql
patching file contrib/pg_stat_statements/pg_stat_statements.c
patching file contrib/pg_stat_statements/pg_stat_statements.control
patching file contrib/pg_visibility/Makefile
Hunk #1 succeeded at 4 with fuzz 1.
patching file contrib/pg_visibility/pg_visibility--1.1--1.2.sql
patching file contrib/pg_visibility/pg_visibility.control
patching file contrib/pgrowlocks/pgrowlocks.c
patching file contrib/pgstattuple/pgstattuple--1.4--1.5.sql
patching file doc/src/sgml/catalogs.sgml
Hunk #1 succeeded at 10027 (offset 11 lines).
patching file doc/src/sgml/func.sgml
Hunk #1 succeeded at 19364 (offset 311 lines).
Hunk #2 succeeded at 19648 (offset 311 lines).
patching file doc/src/sgml/pgbuffercache.sgml
patching file doc/src/sgml/pgfreespacemap.sgml
patching file doc/src/sgml/pgrowlocks.sgml
patching file doc/src/sgml/pgstatstatements.sgml
patching file doc/src/sgml/pgstattuple.sgml
patching file doc/src/sgml/pgvisibility.sgml
patching file doc/src/sgml/user-manag.sgml
patching file src/backend/catalog/system_views.sql
Hunk #1 FAILED at 1099.
1 out of 1 hunk FAILED -- saving rejects to file 
src/backend/catalog/system_views.sql.rej
patching file src/backend/replication/walreceiver.c
patching file src/backend/utils/adt/dbsize.c
Hunk #1 succeeded at 17 (offset -1 lines).
Hunk #2 succeeded at 89 (offset -1 lines).
Hunk #3 succeeded at 179 (offset -1 lines).
patching file src/backend/utils/adt/pgstatfuncs.c
patching file src/backend/utils/misc/guc.c
Hunk #2 succeeded at 6678 (offset 10 lines).
Hunk #3 succeeded at 6728 (offset 10 lines).
Hunk #4 succeeded at 8021 (offset 10 lines).
Hunk #5 succeeded at 8053 (offset 10 lines).
patching file src/include/catalog/pg_authid.h

Reject file contents...

cat src/backend/catalog/system_views.sql.rej
--- src/backend/catalog/system_views.sql
+++ src/backend/catalog/system_views.sql
@@ -1099,3 +1099,7 @@

 REVOKE EXECUTE ON FUNCTION pg_ls_logdir() FROM public;
 REVOKE EXECUTE ON FUNCTION pg_ls_waldir() FROM public;
+GRANT EXECUTE ON FUNCTION pg_ls_logdir() TO pg_monitor;
+GRANT EXECUTE ON FUNCTION pg_ls_waldir() TO pg_monitor;
+
+GRANT pg_read_all_gucs TO pg_monitor;

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-03-16 Thread Andrew Borodin
2017-03-16 23:55 GMT+05:00 Peter Geoghegan :
> On Thu, Mar 16, 2017 at 11:53 AM, Andrew Borodin  wrote:
>> 2.  Thus, L&S fully concurrent vacuum is possible, indeed, and
>> furthermore Theodor suggested that I should implement not only page
>> eviction, but also page merge and tree condence algorithm.
>
> I think that it's very hard to make merging of pages that are not
> completely empty work, while also using the L&Y algorithm.

That's true. This is a distant plan...

Best regards, Andrey Borodin.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench more operators & functions

2017-03-16 Thread Fabien COELHO


Hello David,

This patch applies cleanly and compiles at cccbdde with some whitespace 
issues.


$ patch -p1 < ../other/pgbench-more-ops-funcs-9.patch
(Stripping trailing CRs from patch.)


My guess is that your mailer changed the eol-style of the file when saving 
it:


  sh> sha1sum pg-patches/pgbench-more-ops-funcs-9.patch
  608a601561f4cba982f0ce92df30d1868338342b

ISTM that standard mime-type of *.patch and *.diff is really 
"text/x-diff", so my ubuntu laptop is somehow right to put that in 
"/etc/mime.types", but this seems to have anoying consequences at least on 
Macs.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-03-16 Thread Peter Geoghegan
On Thu, Mar 16, 2017 at 11:53 AM, Andrew Borodin  wrote:
> 2.  Thus, L&S fully concurrent vacuum is possible, indeed, and
> furthermore Theodor suggested that I should implement not only page
> eviction, but also page merge and tree condence algorithm.

I think that it's very hard to make merging of pages that are not
completely empty work, while also using the L&Y algorithm.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-16 Thread Andres Freund
On 2017-03-16 09:27:59 -0400, Robert Haas wrote:
> On Thu, Mar 16, 2017 at 5:13 AM, Petr Jelinek
>  wrote:
> > Hmm now that you mention it, I remember discussing something similar
> > with you last year in Dallas in regards to parallel query. IIRC Windows
> > should not have this problem but other systems with EXEC_BACKEND do.
> > Don't remember the details though.
> 
> Generally, extension code can't use bgw_main safely, and must use
> bgw_library_name and bgw_function_name.  But bgw_main is supposedly
> safe for core code.

I indeed think it's not safe, and it's going to get less and less safe
on windows (or EXEC_BACKEND).  I don't think we can afford to disable
ASLR in the long run (I indeed supect that'll just be disallowed at some
point), and that's the only thing making it safe-ish in combination with
EXEC_BACKEND.


> If it's not even safe there, then I guess we should remove it entirely
> as a useless foot-gun.

I indeed think that's the right consequence.  One question is what to
replace it with exactly - are we guaranteed we can dynamically lookup
symbols by name in the main binary on every platform?  Alternatively we
can just hardcode a bunch of bgw_function_name values that are matched
to specific functions if bgw_library_name is NULL - I suspect that'd be
the easiest / least worrysome portability-wise.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-03-16 Thread Andrew Borodin
2017-03-16 21:27 GMT+05:00 David Steele :
> This patch applies cleanly and compiles at cccbdde.
>
> Jeff, any thoughts on Andrew's responses?

Hi, David!

I've got some updates on the matter of this patch, since the
understanding of the B-tree bothered me much.
Currently, I'm at PgConf.Russia, where I've contacted Theodor Sigaev,
and he answered my questions about the GIN.
0. I think that proposed patch is safe (deadlock free, does not
introduce new livelocks, all the resources guarded properly)
1. There _are_ high keys at the posting trees, they are just called
rightmost keys, but in fact they are high keys in terms of L&Y
algorithm.
2.  Thus, L&S fully concurrent vacuum is possible, indeed, and
furthermore Theodor suggested that I should implement not only page
eviction, but also page merge and tree condence algorithm.
3. Eventually, I'll do that, certainly, but, currently, I can't
predict the time it'll take. I think I'll start somewhere in the
summer, may be right after GiST intrapage indexing.

As for now, I think that having this patch in PostgreSQL 10 is viable.

Best regards, Andrey Borodin.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] reorder tablespaces in basebackup tar stream for backup_label

2017-03-16 Thread Michael Banck
Hi,

sorry, it took me a while to get back to this.

Am Freitag, den 03.03.2017, 15:44 +0900 schrieb Michael Paquier:
> On Wed, Feb 22, 2017 at 9:23 PM, Bernd Helmle  wrote:
> > The comment in the code says explicitely to add the base directory to
> > the end of the list, not sure if that is out of a certain reason.
> >
> > I'd say this is an oversight in the implementation. I'm currently
> > working on a tool using the streaming protocol directly and i've
> > understood it exactly the way, that the default tablespace is the first
> > one in the stream.
> >
> > So +1 for the patch.
> 
> Commit 507069de has switched the main directory from the beginning to
> the end of the list, and the thread about this commit is here:
> https://www.postgresql.org/message-id/AANLkTikgmZRkBuQ%2B_hcwPBv7Cd7xW48Ev%3DUBHA-k4v0W%40mail.gmail.com
> 
> +   /* Add a node for the base directory at the beginning.  This way, the
> +* backup_label file is always the first file to be sent. */
> ti = palloc0(sizeof(tablespaceinfo));
> ti->size = opt->progress ? sendDir(".", 1, true, tablespaces,
> true) : -1;
> -   tablespaces = lappend(tablespaces, ti);
> +   tablespaces = lcons(ti, tablespaces);
> So, the main directory is located at the end on purpose. When using
> --wal-method=fetch the WAL segments are part of the main tarball, so
> if you send the main tarball first you would need to generate a second
> tarball with the WAL segments that have been generated between the
> moment the main tarball has finished until the end of the last
> tablespace taken if you want to have a consistent backup. 

Ah, thanks for pointing that out, I've missed that in my testing.

> Your patch would work with the stream mode though.

Or, if not requesting the "WAL" option of the replication protocol's
BASE_BACKUP command.

I agree it doesn't make sense to start messing with fetch mode, but I
don't think we guarantee any ordering of tablespaces (to wit, Bernd was
pretty sure it was the other way around all the time), neither do I
think having the main tablespace be the first for non-WAL/stream and the
last for WAL/fetch would be confusing personally, though I understand
this is debatable.

So I've updated the patch to only switch the main tablespace to be first
in case WAL isn't included, please find it attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

From 2532c4a659eb32527c489d1a65caa080e181dbd0 Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Sun, 26 Feb 2017 17:59:38 +0100
Subject: [PATCH] Reorder tablespaces for non-WAL streaming basebackups.

The replication protocol documentation appears to express that the main
tablespace is the first to be sent, however, it is actually the last
one in order for the WAL files to be appended to it. This makes the
backup_label file (which gets prepended to the main tablespace)
inconveniently end up in the middle of the basebackup stream if other
tablespaces are present.

Change this so that the main tablespace is the first to be sent in case
no WAL files are requested, ensuring that backup_label is the first file
in the stream in this case.
---
 src/backend/replication/basebackup.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 09ecc15..ef3c115 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -230,10 +230,16 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
 		else
 			statrelpath = pgstat_stat_directory;
 
-		/* Add a node for the base directory at the end */
+		/* Add a node for the base directory, either at the end or (if
+		 * WAL is not included) at the beginning (if WAL is not
+		 * included).  This means the backup_label file is the first
+		 * file to be sent in the latter case. */
 		ti = palloc0(sizeof(tablespaceinfo));
 		ti->size = opt->progress ? sendDir(".", 1, true, tablespaces, true) : -1;
-		tablespaces = lappend(tablespaces, ti);
+		if (opt->includewal)
+			tablespaces = lappend(tablespaces, ti);
+		else
+			tablespaces = lcons(ti, tablespaces);
 
 		/* Send tablespace header */
 		SendBackupHeader(tablespaces);
-- 
2.1.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-16 Thread Dilip Kumar
On Thu, Mar 16, 2017 at 8:26 PM, Emre Hasegeli  wrote:
>> Hopefully, this time I got it correct.  Since I am unable to reproduce
>> the issue so I will again need your help in verifying the fix.
>
> It is not crashing with the new patch.  Thank you.

Thanks for verifying.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-16 Thread Dilip Kumar
On Thu, Mar 16, 2017 at 8:42 PM, Robert Haas  wrote:
> Thanks for confirming.  Some review comments on v2:
>
> +if (istate->pagetable)
fixed
>
> Please compare explicitly to InvalidDsaPointer.
>
> +if (iterator->ptbase)
> +ptbase = iterator->ptbase->ptentry;
> +if (iterator->ptpages)
> +idxpages = iterator->ptpages->index;
> +if (iterator->ptchunks)
> +idxchunks = iterator->ptchunks->index;
>
> Similarly.
fixed

Also fixed at
+ if (ptbase)
+   pg_atomic_init_u32(&ptbase->refcount, 0);

>
> Dilip, please also provide a proposed commit message describing what
> this is fixing.  Is it just the TBM_EMPTY case, or is there anything
> else?

Okay, I have added the commit message in the patch.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


fix_tbm_empty_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PinBuffer() no longer makes use of strategy

2017-03-16 Thread David Steele
On 2/4/17 2:47 PM, Alexander Korotkov wrote:
> On Sat, Feb 4, 2017 at 4:33 AM, Andres Freund  > wrote:
> 
> On 2017-02-03 19:13:45 -0600, Jim Nasby wrote:
> > No, I noticed it while reading code. Removing that does mean that if any
> > non-default strategy (in any backend) hits that buffer again then the 
> buffer
> > will almost certainly migrate into the main buffer pool the next time 
> one of
> > the rings hits that buffer
> 
> Well, as long as the buffer is used from the ring, BufferAlloc() -
> BufferAlloc() will reset the usagecount when rechristening the
> buffer. So unless anything happens inbetween the buffer being remapped
> last and remapped next, it'll be reused. Right?
> 
> The only case where I can see the old logic mattering positively is for
> synchronized seqscans.  For pretty much else that logic seems worse,
> because it essentially prevents any buffers ever staying in s_b when
> only ringbuffer accesses are performed.
> 
> I'm tempted to put the old logic back, but more because this likely was
> unintentional, not because I think it's clearly better.
> 
> 
> +1
> Yes, it was unintentional change.  So we should put old logic back
> unless we have proof that this change make it better.
> Patch is attached.  I tried to make some comments, but probably they are
> not enough.

This patch looks pretty straight forward and applies cleanly and
compiles at cccbdde.

It's not a straight revert, though, so still seems to need review.

Jim, do you know when you'll have a chance to look at that?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more full joins in postgres_fdw

2017-03-16 Thread Robert Haas
On Thu, Mar 16, 2017 at 12:48 PM, Robert Haas  wrote:
> On Thu, Mar 16, 2017 at 11:46 AM, David Steele  wrote:
>> $ patch -p1 < ../other/postgres-fdw-subquery-support-v15.patch
>> patching file contrib/postgres_fdw/deparse.c
>> Hunk #11 succeeded at 1371 (offset -3 lines).
>> Hunk #12 succeeded at 1419 (offset -3 lines).
>> Hunk #13 succeeded at 1486 (offset -3 lines).
>> Hunk #14 succeeded at 2186 (offset -3 lines).
>> Hunk #15 succeeded at 3082 (offset -3 lines).
>> patching file contrib/postgres_fdw/expected/postgres_fdw.out
>> patching file contrib/postgres_fdw/postgres_fdw.c
>> Hunk #1 succeeded at 669 (offset 1 line).
>> Hunk #2 succeeded at 1245 (offset -1 lines).
>> Hunk #3 succeeded at 2557 (offset -1 lines).
>> Hunk #4 succeeded at 4157 (offset 3 lines).
>> Hunk #5 succeeded at 4183 (offset 3 lines).
>> Hunk #6 succeeded at 4212 (offset 3 lines).
>> Hunk #7 succeeded at 4315 (offset 3 lines).
>> patching file contrib/postgres_fdw/postgres_fdw.h
>> patching file contrib/postgres_fdw/sql/postgres_fdw.sql
>>
>> Since these are just offsets I'll leave the patch as "Needs review" but
>> an updated patch would be appreciated.
>
> I don't think that's really needed.  Offsets don't hurt anything.
> Even fuzz is OK.  As long as the hunks are applying, I think it's
> fine.
>
> Incidentally, I'm reading through this one now.

And ... I don't see anything to complain about, so, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-16 Thread David Steele
On 2/21/17 9:54 AM, Bernd Helmle wrote:
> Am Dienstag, den 14.02.2017, 15:53 +0300 schrieb Alexander Korotkov:
>> +1
>> And you could try to use pg_wait_sampling
>>  to sampling of wait
>> events.
> 
> I've tried this with your example from your blog post[1] and got this:
> 
> (pgbench scale 1000)
> 
> pgbench -Mprepared -S -n -c 100 -j 100 -T 300 -P2 pgbench2
> 
> SELECT-only:
> 
> SELECT * FROM profile_log ;
>  ts |  event_type   | event | count 
> +---+---+---
>  2017-02-21 15:21:52.45719  | LWLockNamed   | ProcArrayLock | 8
>  2017-02-21 15:22:11.19594  | LWLockTranche | lock_manager  | 1
>  2017-02-21 15:22:11.19594  | LWLockNamed   | ProcArrayLock |24
>  2017-02-21 15:22:31.220803 | LWLockNamed   | ProcArrayLock | 1
>  2017-02-21 15:23:01.255969 | LWLockNamed   | ProcArrayLock | 1
>  2017-02-21 15:23:11.272254 | LWLockNamed   | ProcArrayLock | 2
>  2017-02-21 15:23:41.313069 | LWLockNamed   | ProcArrayLock | 1
>  2017-02-21 15:24:31.37512  | LWLockNamed   | ProcArrayLock |19
>  2017-02-21 15:24:41.386974 | LWLockNamed   | ProcArrayLock | 1
>  2017-02-21 15:26:41.530399 | LWLockNamed   | ProcArrayLock | 1
> (10 rows)
> 
> writes pgbench runs have far more events logged, see the attached text
> file. Maybe this is of interest...
> 
> 
> [1] http://akorotkov.github.io/blog/2016/03/25/wait_monitoring_9_6/

This patch applies cleanly at cccbdde.  It doesn't break compilation on
amd64 but I can't test on a Power-based machine

Alexander, have you had a chance to look at Bernd's results?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE/ALTER ROLE PASSWORD ('value' USING 'method')

2017-03-16 Thread Jeff Janes
On Thu, Mar 9, 2017 at 4:59 AM, Michael Paquier 
wrote:

> On Thu, Mar 9, 2017 at 1:17 AM, Joe Conway  wrote:
> > On 03/07/2017 08:29 PM, Tom Lane wrote:
> >> Michael Paquier  writes:
> >>> here is a separate thread dedicated to the following extension for
> >>> CREATE/ALTER ROLE: PASSWORD ('value' USING 'method').
> >>
> >> The parentheses seem weird ... do we really need those?
> >
> > +1
>
> Seeing 3 opinions in favor of that, let's do so then. I have updated
> the patch to not use parenthesis.
>

The regression tests only exercise the CREATE ROLE...USING version, not the
ALTER ROLE...USING version.

+and plain for an non-hashed password.  If the password
+string is already in MD5-hashed or SCRAM-hashed, then it is
+stored hashed as-is.

In the last line, I think "stored as-is" sounds better.

Other than that, it looks good to me.

Cheers,

Jeff


Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-16 Thread David Steele
On 2/21/17 4:58 AM, Mithun Cy wrote:
> Thanks, Amit
> 
> On Mon, Feb 20, 2017 at 9:51 PM, Amit Kapila  wrote:
>> How will high and lowmask calculations work in this new strategy?
>> Till now they always work on doubling strategy and I don't see you
>> have changed anything related to that code.  Check below places.
> 
> It is important that the mask has to be (2^x) -1, if we have to retain
> the same hash map function. So mask variables will take same values as
> before. Only place I think we need change is  _hash_metapinit();
> unfortunately, I did not test for the case where we build the hash
> index on already existing tuples. Now I have fixed in the latest
> patch.
> 
> 
>> Till now, we have worked hard for not changing the page format in a
>> backward incompatible way, so it will be better if we could find some
>> way of doing this without changing the meta page format in a backward
>> incompatible way.
> 
> We are not adding any new variable/ deleting some, we increase the
> size of hashm_spares and hence mapping functions should be adjusted.
> The problem is the block allocation, and its management is based on
> the fact that all of the buckets(will be 2^x in number) belonging to a
> particular split-point is allocated at once and together. The
> hashm_spares is used to record those allocations and that will be used
> further by map functions to reach a particular block in the file. If
> we want to change the way we allocate the buckets then hashm_spares
> will change and hence mapping function. So I do not think we can avoid
> incompatibility issue.
> 
> One thing I can think of is if we can increase the hashm_version of
> hash index; then for old indexes, we can continue to use doubling
> method and its mapping. For new indexes, we can use new way as above.
> 
> Have you considered to store some information in
>> shared memory based on which we can decide how much percentage of
>> buckets are allocated in current table half?  I think we might be able
>> to construct this information after crash recovery as well.
> 
> I think all of above data has to be persistent. I am not able to
> understand what should be/can be stored in shared buffers. Can you
> please correct me if I am wrong?

This patch does not apply at cccbdde:

$ patch -p1 < ../other/expand_hashbucket_efficiently_02
patching file src/backend/access/hash/hashovfl.c
Hunk #1 succeeded at 49 (offset 1 line).
Hunk #2 succeeded at 67 (offset 1 line).
patching file src/backend/access/hash/hashpage.c
Hunk #1 succeeded at 502 with fuzz 1 (offset 187 lines).
Hunk #2 succeeded at 518 with fuzz 2 (offset 168 lines).
Hunk #3 succeeded at 562 (offset 163 lines).
Hunk #4 succeeded at 744 (offset 124 lines).
Hunk #5 FAILED at 774.
Hunk #6 succeeded at 869 (offset 19 lines).
Hunk #7 succeeded at 1450 (offset 242 lines).
Hunk #8 succeeded at 1464 (offset 242 lines).
Hunk #9 succeeded at 1505 (offset 242 lines).
1 out of 9 hunks FAILED -- saving rejects to file
src/backend/access/hash/hashpage.c.rej
patching file src/backend/access/hash/hashutil.c
Hunk #1 succeeded at 150 (offset 1 line).
patching file src/include/access/hash.h
Hunk #2 succeeded at 180 (offset 12 lines).
Hunk #3 succeeded at 382 (offset 18 lines).

It does apply with fuzz on 2b32ac2, so it looks like c11453c and
subsequent commits are the cause.  They represent a fairly substantial
change to hash indexes by introducing WAL logging so I think you should
reevaluate your patches to be sure they still function as expected.

Marked "Waiting on Author".

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC - TOAST'ing in slices

2017-03-16 Thread George Papadrosou
Hello all, 

thank you for your replies.  I agree with Alexander Korotkov that it is 
important to have a quality patch at the end of the summer. 

Stephen, you mentioned PostGIS, but the conversation seems to lean towards 
JSONB. What are your thoughts?

Also, if I am to include some ideas/approaches in the proposal, it seems I 
should really focus on understanding how a specific data type is used, queried 
and indexed, which is a lot of exploring for a newcomer in postgres code.

In the meanwhile, I am trying to find how jsonb is indexed and queried. After I 
grasp the current situation I will be to think about new approaches.

Regards,
George 

> On 15 Μαρ 2017, at 15:53, Tom Lane  wrote:
> 
> Robert Haas mailto:robertmh...@gmail.com>> writes:
>> On Tue, Mar 14, 2017 at 10:03 PM, George Papadrosou
>>  wrote:
>>> The project’s idea is implement different slicing approaches according to
>>> the value’s datatype. For example a text field could be split upon character
>>> boundaries while a JSON document would be split in a way that allows fast
>>> access to it’s keys or values.
> 
>> Hmm.  So if you had a long text field containing multibyte characters,
>> and you split it after, say, every 1024 characters rather than after
>> every N bytes, then you could do substr() without detoasting the whole
>> field.  On the other hand, my guess is that you'd waste a fair amount
>> of space in the TOAST table, because it's unlikely that the chunks
>> would be exactly the right size to fill every page of the table
>> completely.  On balance it seems like you'd be worse off, because
>> substr() probably isn't all that common an operation.
> 
> Keep in mind also that slicing on "interesting" boundaries rather than
> with the current procrustean-bed approach could save you at most one or
> two chunk fetches per access.  So the upside seems limited.  Moreover,
> how are you going to know whether a given toast item has been stored
> according to your newfangled approach?  I doubt we're going to accept
> forcing a dump/reload for this.
> 
> IMO, the real problem here is to be able to predict which chunk(s) to
> fetch at all, and I'd suggest focusing on that part of the problem rather
> than changes to physical storage.  It's hard to see how to do anything
> very smart for text (except in the single-byte-encoding case, which is
> already solved).  But the JSONB format was designed with some thought
> to this issue, so you might be able to get some traction there.
> 
>   regards, tom lane



Re: [HACKERS] asynchronous execution

2017-03-16 Thread Corey Huinker
On Mon, Mar 13, 2017 at 9:28 PM, Amit Langote  wrote:

> On 2017/03/14 10:08, Corey Huinker wrote:
> >> I don't think the plan itself will change as a result of applying this
> >> patch. You might however be able to observe some performance
> improvement.
> >
> > I could see no performance improvement, even with 16 separate queries
> > combined with UNION ALL. Query performance was always with +/- 10% of a
> 9.6
> > instance given the same script. I must be missing something.
>
> Hmm, maybe I'm missing something too.
>
> Anyway, here is an older message on this thread from Horiguchi-san where
> he shared some of the test cases that this patch improves performance for:
>
> https://www.postgresql.org/message-id/20161018.103051.
> 30820907.horiguchi.kyotaro%40lab.ntt.co.jp
>
> From that message:
>
> 
> I measured performance and had the following result.
>
> t0  - SELECT sum(a) FROM ;
> pl  - SELECT sum(a) FROM <4 local children>;
> pf0 - SELECT sum(a) FROM <4 foreign children on single connection>;
> pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>;
>
> The result is written as "time (std dev )"
>
> sync
>   t0: 3820.33 (  1.88)
>   pl: 1608.59 ( 12.06)
>  pf0: 7928.29 ( 46.58)
>  pf1: 8023.16 ( 26.43)
>
> async
>   t0: 3806.31 (  4.49)0.4% faster (should be error)
>   pl: 1629.17 (  0.29)1.3% slower
>  pf0: 6447.07 ( 25.19)   18.7% faster
>  pf1: 1876.80 ( 47.13)   76.6% faster
> 
>
> IIUC, pf0 and pf1 is the same test case (all 4 foreign tables target the
> same server) measured with different implementations of the patch.
>
> Thanks,
> Amit
>

I reworked the test such that all of the foreign tables inherit from the
same parent table, and if you query that you do get async execution. But It
doesn't work when just stringing together those foreign tables with UNION
ALLs.

I don't know how to proceed with this review if that was a goal of the
patch.


[HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

2017-03-16 Thread David Steele
On 2/14/17 4:03 PM, Tom Lane wrote:
> Jim Nasby  writes:
>> On 2/14/17 1:18 PM, Tom Lane wrote:
>>> One point that could use further review is whether the de-duplication
>>> algorithm is actually correct.  I'm only about 95% convinced by the
>>> argument I wrote in planunionor.c's header comment.
> 
>> I'll put some thought into it and see if I can find any holes. Are you 
>> only worried about the removal of "useless" rels or is there more?
> 
> Well, the key point is whether it's really OK to de-dup on the basis
> of only the CTIDs that are not eliminated in any UNION arm.  I was
> feeling fairly good about that until I thought of the full-join-to-
> left-join-to-no-join conversion issue mentioned in the comment.
> Now I'm wondering if there are other holes; or maybe I'm wrong about
> that one and it's not necessary to be afraid of full joins.

This patch applies cleanly (with offsets) and compiles at cccbdde.

Jim, have you had time to think about this?  Any insights?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more full joins in postgres_fdw

2017-03-16 Thread Robert Haas
On Thu, Mar 16, 2017 at 11:46 AM, David Steele  wrote:
> $ patch -p1 < ../other/postgres-fdw-subquery-support-v15.patch
> patching file contrib/postgres_fdw/deparse.c
> Hunk #11 succeeded at 1371 (offset -3 lines).
> Hunk #12 succeeded at 1419 (offset -3 lines).
> Hunk #13 succeeded at 1486 (offset -3 lines).
> Hunk #14 succeeded at 2186 (offset -3 lines).
> Hunk #15 succeeded at 3082 (offset -3 lines).
> patching file contrib/postgres_fdw/expected/postgres_fdw.out
> patching file contrib/postgres_fdw/postgres_fdw.c
> Hunk #1 succeeded at 669 (offset 1 line).
> Hunk #2 succeeded at 1245 (offset -1 lines).
> Hunk #3 succeeded at 2557 (offset -1 lines).
> Hunk #4 succeeded at 4157 (offset 3 lines).
> Hunk #5 succeeded at 4183 (offset 3 lines).
> Hunk #6 succeeded at 4212 (offset 3 lines).
> Hunk #7 succeeded at 4315 (offset 3 lines).
> patching file contrib/postgres_fdw/postgres_fdw.h
> patching file contrib/postgres_fdw/sql/postgres_fdw.sql
>
> Since these are just offsets I'll leave the patch as "Needs review" but
> an updated patch would be appreciated.

I don't think that's really needed.  Offsets don't hurt anything.
Even fuzz is OK.  As long as the hunks are applying, I think it's
fine.

Incidentally, I'm reading through this one now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential data loss of 2PC files

2017-03-16 Thread David Steele
On 2/13/17 12:10 AM, Michael Paquier wrote:
> On Tue, Jan 31, 2017 at 11:07 AM, Michael Paquier
>  wrote:
>> On Mon, Jan 30, 2017 at 10:52 PM, Heikki Linnakangas  wrote:
>>> If that can happen, don't we have the same problem in many other places?
>>> Like, all the SLRUs? They don't fsync the directory either.
>>
>> Right, pg_commit_ts and pg_clog enter in this category.
> 
> Implemented as attached.
> 
>>> Is unlink() guaranteed to be durable, without fsyncing the directory? If
>>> not, then we need to fsync() the directory even if there are no files in it
>>> at the moment, because some might've been removed earlier in the checkpoint
>>> cycle.
>>
>> Hm... I am not an expert in file systems. At least on ext4 I can see
>> that unlink() is atomic, but not durable. So if an unlink() is
>> followed by a power failure, the previously unlinked file could be
>> here if the parent directory is not fsync'd.
> 
> So I have been doing more work on this patch, with the following things done:
> - Flush pg_clog, pg_commit_ts and pg_twophase at checkpoint phase to
> ensure their durability.
> - Create a durable_unlink() routine to give a way to perform a durable
> file removal.
> I am now counting 111 calls to unlink() in the backend code, and
> looking at all of them most look fine with plain unlink() if they are
> not made durable as they work on temporary files (see timeline.c for
> example), with some exceptions:
> - In pg_stop_backup, the old backup_label and tablespace_map removal
> should be durable to avoid putting the system in a wrong state after
> power loss. Other calls of unlink() are followed by durable_rename so
> they are fine if let as such.
> - Removal of old WAL segments should be durable as well. There is
> already an effort to rename them durably in case of a segment
> recycled. In case of a power loss, a file that should have been
> removed could remain in pg_xlog.
> 
> Looking around, I have bumped as well on the following bug report for
> SQlite which is in the same category of things:
> http://sqlite.1065341.n5.nabble.com/Potential-bug-in-crash-recovery-code-unlink-and-friends-are-not-synchronous-td68885.html
> Scary to see that in this case durability can be a problem at
> transaction commit...

This patch applies cleanly and compiles at cccbdde.

Ashutosh, do you know when you'll have a chance to review?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-03-16 Thread Robert Haas
On Thu, Mar 16, 2017 at 12:39 PM, David Steele  wrote:
>> Anyway, I committed the patch posted here.  Or the important line out
>> of the two, anyway.  :-)
>
> It seems that this submission should be marked as "Committed" with
> Robert as the committer.  Am I missing something?

I think you are right.  Sorry that I missed that step.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-03-16 Thread David Steele
On 3/16/17 12:41 PM, Robert Haas wrote:
> On Thu, Mar 16, 2017 at 12:39 PM, David Steele  wrote:
>>> Anyway, I committed the patch posted here.  Or the important line out
>>> of the two, anyway.  :-)
>>
>> It seems that this submission should be marked as "Committed" with
>> Robert as the committer.  Am I missing something?
> 
> I think you are right.  Sorry that I missed that step.

Done.

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time to up bgwriter_lru_maxpages?

2017-03-16 Thread David Steele
On 2/2/17 2:47 PM, Robert Haas wrote:
> On Wed, Feb 1, 2017 at 9:47 PM, Jim Nasby  wrote:
>> Before doing that the first thing to look at would be why the limit is
>> currently INT_MAX / 2 instead of INT_MAX.
> 
> Generally the rationale for GUCs with limits of that sort is that
> there is or might be code someplace that multiplies the value by 2 and
> expects the result not to overflow.
> 
> I expect that increasing the maximum value of shared_buffers beyond
> what can be stored by an integer could have a noticeable distributed
> performance cost for the entire system.  It might be a pretty small
> cost, but then again maybe not; for example, BufferDesc's buf_id
> member would have to get wider, and probably the freeNext member, too.
> Andres already did unspeakable things to make a BufferDesc fit into
> one cache line for performance reasons, so that wouldn't be great
> news.
> 
> Anyway, I committed the patch posted here.  Or the important line out
> of the two, anyway.  :-)

It seems that this submission should be marked as "Committed" with
Robert as the committer.  Am I missing something?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-03-16 Thread David Steele
On 2/5/17 11:04 AM, Andrew Borodin wrote:
> Hi, Jeff!
> 
> 2017-02-05 3:45 GMT+05:00 Jeff Davis :
>> On Sun, Jan 22, 2017 at 10:32 PM, Jeff Davis  wrote:
>>> On Sat, Jan 21, 2017 at 4:25 AM, Andrew Borodin  
>>> wrote:
>>> One idea I had that might be simpler is to use a two-stage page
>>> delete. The first stage would remove the link from the parent and mark
>>> the page deleted, but leave the right link intact and prevent
>>> recycling. The second stage would follow the chain of right links
>>> along each level, removing the right links to deleted pages and
>>> freeing the page to be recycled.
>>
>> Do you think this approach is viable as a simplification?
> 
> To consider this approach I need to ask several questions.
> 
> 1. Are we discussing simplification of existing GIN vacuum? Or
> proposed GIN vacuum? Please, note that they do not differ in the way
> page is unlinked, function ginDeletePage() is almost untoched.
> 
> 2. What do you mean by "stage"? In terms of the paper "A symmetric
> concurrent B-tree algorithm" by Lanin&Shasha: stage is an
> uninterrupted period of holding lock on nonempty page set.
> Here's the picture https://www.screencast.com/t/xUpGKgkkU from L&S
> Both paper (L&Y and L&S) tend to avoid lock coupling, which is
> inevitable if you want to do parent unlink first. To prevent insertion
> of records on a page, you have to mark it. If you are doing this in
> the stage when you unlink from parent - you have to own both locks.
> 
> 3. What do you want to simplify? Currently we unlink a page from
> parent and left sibling in one stage, at cost of aquiring CleanUp lock
> (the way we aquire it - is the diference between current and patched
> version).
> 2-stage algorithms will not be simplier, yet it will be more concurrent.
> Please note, that during absence of high fence keys in GIN B-tree we
> actually should implement algorithm from figure 3A
> https://www.screencast.com/t/2cfGZtrzaz0z  (It would be incorrect, but
> only with presence of high keys)

This patch applies cleanly and compiles at cccbdde.

Jeff, any thoughts on Andrew's responses?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench more operators & functions

2017-03-16 Thread David Steele
On 2/4/17 4:51 AM, Fabien COELHO wrote:
> 
> Hello,
> 
>> For my 2c, at least, while I'm definitely interested in this, it's not
>> nearly high enough on my plate with everything else going on to get any
>> attention in the next few weeks, at least.
>>
>> I do think that, perhaps, this patch may deserve a bit of a break, to
>> allow people to come back to it with a fresh perspective, so perhaps
>> moving it to the next commitfest would be a good idea, in a Needs Review
>> state.
> 
> So, let's try again for the next CF...
> 
> Here is a v9 which includes some more cleanup, hopefully in the expected
> direction which is to make pgbench expressions behave as SQL
> expressions, and I hope taking into account all other feedback as well.
> 
> 
> CONTEXT
> 
> Pgbench has been given an expression parser (878fdcb8) which allows to
> use full expressions instead of doing one-at-a-time operations. This
> parser has been extended with functions (7e137f84) & double type
> (86c43f4e). The first batch of functions was essentially a poc about how
> to add new functions with various requirements. Pgbench default
> "tpcb-like" test takes advantage of these additions to reduce the number
> of lines it needs.
> 
> 
> MOTIVATION
> 
> This patch aims at providing actually useful functions for benchmarking.
> The functions and operators provided here are usual basic operations.
> They are not chosen randomly, but are simply taken from existing
> benchmarks:
> 
> In TPC-B 2.0.0 section 5.3.5 and TPC-C 5.11 section 2.5.1.2, the
> selection of accounts uses a test (if ...), logical conditions (AND, OR)
> and comparisons (<, =, >=, >).
> 
> In TPC-C 5.11 section 2.1.6, a bitwise or (|) is used to skew a
> distribution based on two uniform distributions.
> 
> In TPC-C 5.11 section 5.2.5.4, a log function is used to determine
> "think time", which can be truncated (i.e. "least" function, already in
> pgbench).
> 
> 
> CONTENTS
> 
> The attached patch provides a consistent set of functions and operators
> based on the above examples, with operator precedence taken from
> postgres SQL parser:
> 
> - "boolean" type support is added, because it has been requested that
> pgbench should be as close as SQL expressions as possible. This induced
> some renaming as some functions & struct fields where named "num"
> because they where expecting an int or a double, but a boolean is not
> really a numeral.
> 
> - SQL comparisons (= <> < > <= >=) plus pg SQL "!=", which result in a
> boolean.
> 
> - SQL logical operators (and or not) on booleans.
> 
> - SQL bitwise operators taken from pg: | & # << >> ~.
> 
> - mod SQL function as a synonymous for %.
> 
> - ln and exp SQL functions.
> 
> - SQL CASE/END conditional structure.
> 
> The patch also includes documentation and additional tap tests.
> A test script is also provided.
> 
> This version is strict about typing, mimicking postgres behavior. For
> instance, using an int as a boolean results in a error. It is easy to
> make it more tolerant to types, which was the previous behavior before
> it was suggested to follow SQL behavior.
> 
> Together with another submitted patch about retrieving query results,
> the added capabilities allow to implement strictly conforming TPC-B
> transactions.

This patch applies cleanly and compiles at cccbdde with some whitespace
issues.

$ patch -p1 < ../other/pgbench-more-ops-funcs-9.patch
(Stripping trailing CRs from patch.)
patching file doc/src/sgml/ref/pgbench.sgml
(Stripping trailing CRs from patch.)
patching file src/bin/pgbench/exprparse.y
(Stripping trailing CRs from patch.)
patching file src/bin/pgbench/exprscan.l
(Stripping trailing CRs from patch.)
patching file src/bin/pgbench/pgbench.c
(Stripping trailing CRs from patch.)
patching file src/bin/pgbench/pgbench.h
(Stripping trailing CRs from patch.)
patching file src/bin/pgbench/t/002_pgbench.pl

Any reviewers want to have a look?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-03-16 Thread David Steele
On 2/1/17 3:59 PM, Pavel Stehule wrote:
> Hi
> 
> 2017-01-24 21:33 GMT+01:00 Pavel Stehule  >:
> 
> Perhaps that's as simple as renaming all the existing _ns_*
> functions to _block_ and then adding support for pragmas...
> 
> Since you're adding cursor_options to PLpgSQL_expr it should
> probably be removed as an option to exec_*.
> 
> I have to recheck it. Some cursor options going from dynamic
> cursor variables and are related to dynamic query - not query
> that creates query string.  
> 
> hmm .. so current state is better due using options like
> CURSOR_OPT_PARALLEL_OK
> 
>  if (expr->plan == NULL)
> exec_prepare_plan(estate, expr, (parallelOK ?
>   CURSOR_OPT_PARALLEL_OK : 0) |
> expr->cursor_options);
> 
> This options is not permanent feature of expression - and then I
> cannot to remove cursor_option argument from exec_*
> 
> I did minor cleaning - remove cursor_options from plpgsql_var
> 
> + basic doc

This patch still applies cleanly and compiles at cccbdde.

Any reviewers want to have a look?

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Floating point comparison inconsistencies of the geometric types

2017-03-16 Thread David Steele
On 2/1/17 6:36 AM, Emre Hasegeli wrote:
>> Got it, but if other people don't agree then this is going nowhere.
> 
> Yes.  As I wrote, I don't particularly care about functions like "is
> point on line".  I can prepare a patch to fix as many problems as
> possible around those operators by preserving the current epsilon.
> 
> I though we were arguing about *all* operators.  Having containment
> and placement operators consistent with each other, is the primary
> thing I am trying to fix.  Is removing epsilon from them is
> acceptable?
> 
> We can also stay away from changing operators like "~=" to minimise
> compatibility break, if we keep the epsilon on some places.  We can
> instead document this operator as "close enough", and introduce
> another symbol for really "the same" operator.
> 
> That said, there are some places where it is hard to decide to apply
> the epsilon or not.  For example, we can keep the epsilon to check for
> two lines being parallel, but then should we return the intersection
> point, or not?  Those issues may become more clear when I start
> working on it, if preserving epsilon for those operators is the way to
> go forward.

The current patches do not apply cleanly at cccbdde:

$ git apply ../other/0001-float-header-v03.patch
error: patch failed: contrib/btree_gist/btree_ts.c:1
error: contrib/btree_gist/btree_ts.c: patch does not apply
error: patch failed: contrib/postgres_fdw/postgres_fdw.c:26
error: contrib/postgres_fdw/postgres_fdw.c: patch does not apply
error: patch failed: src/backend/access/gist/gistutil.c:14
error: src/backend/access/gist/gistutil.c: patch does not apply
error: patch failed: src/backend/utils/adt/float.c:339
error: src/backend/utils/adt/float.c: patch does not apply
error: patch failed: src/backend/utils/adt/geo_ops.c:14
error: src/backend/utils/adt/geo_ops.c: patch does not apply
error: patch failed: src/backend/utils/misc/guc.c:68
error: src/backend/utils/misc/guc.c: patch does not apply
error: patch failed: src/include/utils/builtins.h:334
error: src/include/utils/builtins.h: patch does not apply

I don't believe this patch should be in the "Needs review" state anyway.
 There are clearly a number of issues that need work and agreement.

Given that this thread has been idle since the beginning of February and
no resolution is likely for v10, I'm marking this submission "Returned
with Feedback".

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Microvacuum support for Hash Index

2017-03-16 Thread Ashutosh Sharma
>>> Sure, but we are not clearing in conditionally.  I am not sure, how
>>> after recovery it will be cleared it gets set during normal operation.
>>> Moreover, btree already clears similar flag during replay (refer
>>> btree_xlog_delete).
>>
>> You were right. In case datachecksum is enabled or wal_log_hint is set
>> to true, 'LH_PAGE_HAS_DEAD_TUPLES' will get wal logged and therefore
>> needs to be cleared on the standby as well.
>>
>
> I was thinking what bad can happen if we don't clear this flag during
> replay, the main thing that comes to mind is that after crash
> recovery, if the flag is set the inserts on that page might need to
> traverse all the tuples on that page once the page is full even if
> there are no dead tuples in that page.  It can be later cleared when
> there are dead tuples in that page and we actually delete them, but I
> don't think it is worth the price to pay for not clearing the flag
> during replay.

Yes, you are absolutely correct. If we do not clear this flag  during
replay then there is a possibility of _hash_doinsert() unnecessarily
scanning the page with no space assuming that the page has got some
dead tuples in it which is not true.

>
>> Attached is the patch that
>> clears this flag on standby during replay.
>>
>
> Don't you think, we should also clear it during the replay of
> XLOG_HASH_DELETE?  We might want to log the clear of flag along with
> WAL record for XLOG_HASH_DELETE.
>

Yes, it should be cleared. I completely missed this part in a hurry.
Thanks for informing. I have taken care of it in the attached v2
patch.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


0001-Reset-LH_PAGE_HAS_DEAD_TUPLES-flag-on-standby-during.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE

2017-03-16 Thread Robert Haas
On Thu, Feb 2, 2017 at 12:24 PM, David Fetter  wrote:
>> Also, somebody who wants a check like that isn't necessarily going
>> to want "no WHERE clause" training wheels.  So you're going to need
>> to think about facilities to enable or disable different checks.
>
> This is just the discussion I'd hoped for.  I'll draft up a patch in
> the next day or two, reflecting what's gone so far.

It looks like this was never produced, and it's been over a month.  A
patch that hasn't been updated in over a month and doesn't have
complete consensus doesn't seem like something we should still be
thinking about committing in the second half of March, so I'm going to
mark this Returned with Feedback.

On the substance of the issue, I think there's no problem with having
a module like this, and I think it's fine if it only handles the
WHERE-less case in the first version.  Somebody can add more later if
they want.  But naming the module in a generic way so that it lends
itself to such additions seems like a pretty good plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Remove objname/objargs split for referring to objects

2017-03-16 Thread Alvaro Herrera
Michael Paquier wrote:

> What are you using as CFLAGS? As both typenames should be normally
> set, what about initializing those fields with NULL and add an
> assertion like the attached?

Actually, my compiler was right -- this was an ancient bug I introduced
in 9.5 (commit a61fd533), and this new warning was my compiler being a
bit smarter now for some reason.  The problem is we were trying to
extract String value from a TypeName node, which obviously doesn't work
very well.

I pushed a real fix, not just a compiler-silencer, along with a few
lines in object_address.sql to make sure it works properly.  Maybe we
need a few more tests cases for other parts of pg_get_object_address.

Pushed fix, backpatched to 9.5.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send

2017-03-16 Thread David Steele
On 3/16/17 11:53 AM, Jon Nelson wrote:
> 
> 
> On Thu, Mar 16, 2017 at 9:59 AM, David Steele  > wrote:
> 
> On 1/9/17 11:33 PM, Jon Nelson wrote:
> >
> > On Sat, Jan 7, 2017 at 7:48 PM, Jim Nasby  
> > >> 
> wrote:
> >
> > On 1/5/17 12:55 PM, Jonathon Nelson wrote:
> >
> > Attached please find a patch for PostgreSQL 9.4 which changes 
> the
> > maximum amount of data that the wal sender will send at any 
> point in
> > time from the hard-coded value of 128KiB to a user-controllable
> > value up
> > to 16MiB. It has been primarily tested under 9.4 but there has
> > been some
> > testing with 9.5.
> >
> >
> > To make sure this doesn't get lost, please add it to
> > https://commitfest.postgresql.org 
> 
> >  >. Please verify the patch will
> > apply against current HEAD and pass make check-world.
> >
> >
> > Attached please find a revision of the patch, changed in the following 
> ways:
> >
> > 1. removed a call to debug2.
> > 2. applies cleanly against master (as of
> > 8c5722948e831c1862a39da2bb5d793a6f2aabab)
> > 3. one small indentation fix, one small verbiage fix.
> > 4. switched to calculating the upper bound using XLOG_SEG_SIZE rather
> > than hard-coding 16384.
> > 5. the git author is - obviously - different.
> >
> > make check-world passes.
> > I have added it to the commitfest.
> > I have verified with strace that up to 16MB sends are being used.
> > I have verified that the GUC properly grumps about values greater than
> > XLOG_SEG_SIZE / 1024 or smaller than 4.
> 
> This patch applies cleanly on cccbdde and compiles.  However,
> documentation in config.sgml is needed.
> 
> The concept is simple enough though there seems to be some argument
> about whether or not the patch is necessary.  In my experience 128K
> should be more than large enough for a chunk size, but I'll buy the
> argument that libpq is acting as a barrier in this case.
>   (as
> I'm marking this patch "Waiting on Author" for required documentation.
> 
> 
> Thank you for testing and the comments.  I have some updates:
> 
> - I set up a network at home and - in some very quick testing - was
> unable to observe any obvious performance difference regardless of chunk
> size
> - Before I could get any real testing done, one of the machines I was
> using for testing died and won't even POST, which has put a damper on
> said testing (as you might imagine).
> - There is a small issue with the patch: a lower-bound of 4 is not
> appropriate; it should be XLOG_BLCKSZ / 1024 (I can submit an updated
> patch if that is appropriate)
> - I am, at this time, unable to replicate the earlier results however I
> can't rule them out, either.

My recommendation is that we mark this patch "Returned with Feedback" to
allow you time to test and refine the patch.  You can resubmit once it
is ready.

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send

2017-03-16 Thread Robert Haas
On Mon, Jan 9, 2017 at 4:27 PM, Jonathon Nelson  wrote:
> [I have not done a rigid analysis, here, but...]
>
> I *think* libpq is the culprit here.
>
> walsender says "Hey, libpq - please send (up to) 128KB of data!" and doesn't
> "return" until it's "sent". Then it sends more.  Regardless of the
> underlying cause (nagle, tcp congestion control algorithms, umpteen
> different combos of hardware and settings, etc..) in almost every test I saw
> improvement (usually quite a bit). This was most easily observable with high
> bandwidth-delay product links, but my time in the lab is somewhat limited.

This seems plausible to me.  If it takes X amount of time for the
upper layers to put Y amount of data into libpq's buffers, that
imposes some limit on overall throughput.

I mean, is it not sufficient to know that the performance improvement
is happening?  If it's happening, there's an explanation for why it's
happening.

It would be good if somebody else could try to reproduce these results, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: recursive json_populate_record()

2017-03-16 Thread David Steele
On 2/1/17 12:53 AM, Michael Paquier wrote:
> On Thu, Jan 26, 2017 at 6:49 AM, Tom Lane  wrote:
>> Nikita Glukhov  writes:
>>> On 25.01.2017 23:58, Tom Lane wrote:
 I think you need to take a second look at the code you're producing
 and realize that it's not so clean either.  This extract from
 populate_record_field, for example, is pretty horrid:
>>
>>> But what if we introduce some helper macros like this:
>>
>>> #define JsValueIsNull(jsv) \
>>>  ((jsv)->is_json ? !(jsv)->val.json.str \
>>>  : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull)
>>
>>> #define JsValueIsString(jsv) \
>>>  ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \
>>>  : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString)
>>
>> Yeah, I was wondering about that too.  I'm not sure that you can make
>> a reasonable set of helper macros that will fix this, but if you want
>> to try, go for it.
>>
>> BTW, just as a stylistic thing, I find "a?b:c||d" unreadable: I have
>> to go back to the manual every darn time to convince myself whether
>> that means (a?b:c)||d or a?b:(c||d).  It's better not to rely on
>> the reader (... or the author) having memorized C's precedence rules
>> in quite that much detail.  Extra parens help.
> 
> Moved to CF 2017-03 as discussion is going on and more review is
> needed on the last set of patches.
> 

The current patches do not apply cleanly at cccbdde:

$ git apply ../other/0001-introduce-JsonContainerXxx-macros-v04.patch
error: patch failed: src/backend/utils/adt/jsonb_util.c:328
error: src/backend/utils/adt/jsonb_util.c: patch does not apply
error: patch failed: src/backend/utils/adt/jsonfuncs.c:1266
error: src/backend/utils/adt/jsonfuncs.c: patch does not apply
error: patch failed: src/include/utils/jsonb.h:218
error: src/include/utils/jsonb.h: patch does not apply

In addition, it appears a number of suggestions have been made by Tom
that warrant new patches.  Marked "Waiting on Author".

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] guc-ify the formerly hard-coded MAX_SEND_SIZE to max_wal_send

2017-03-16 Thread Jon Nelson
On Thu, Mar 16, 2017 at 9:59 AM, David Steele  wrote:

> On 1/9/17 11:33 PM, Jon Nelson wrote:
> >
> > On Sat, Jan 7, 2017 at 7:48 PM, Jim Nasby  > > wrote:
> >
> > On 1/5/17 12:55 PM, Jonathon Nelson wrote:
> >
> > Attached please find a patch for PostgreSQL 9.4 which changes the
> > maximum amount of data that the wal sender will send at any
> point in
> > time from the hard-coded value of 128KiB to a user-controllable
> > value up
> > to 16MiB. It has been primarily tested under 9.4 but there has
> > been some
> > testing with 9.5.
> >
> >
> > To make sure this doesn't get lost, please add it to
> > https://commitfest.postgresql.org
> > . Please verify the patch will
> > apply against current HEAD and pass make check-world.
> >
> >
> > Attached please find a revision of the patch, changed in the following
> ways:
> >
> > 1. removed a call to debug2.
> > 2. applies cleanly against master (as of
> > 8c5722948e831c1862a39da2bb5d793a6f2aabab)
> > 3. one small indentation fix, one small verbiage fix.
> > 4. switched to calculating the upper bound using XLOG_SEG_SIZE rather
> > than hard-coding 16384.
> > 5. the git author is - obviously - different.
> >
> > make check-world passes.
> > I have added it to the commitfest.
> > I have verified with strace that up to 16MB sends are being used.
> > I have verified that the GUC properly grumps about values greater than
> > XLOG_SEG_SIZE / 1024 or smaller than 4.
>
> This patch applies cleanly on cccbdde and compiles.  However,
> documentation in config.sgml is needed.
>
> The concept is simple enough though there seems to be some argument
> about whether or not the patch is necessary.  In my experience 128K
> should be more than large enough for a chunk size, but I'll buy the
> argument that libpq is acting as a barrier in this case.
>   (as
> I'm marking this patch "Waiting on Author" for required documentation.
>

Thank you for testing and the comments.  I have some updates:

- I set up a network at home and - in some very quick testing - was unable
to observe any obvious performance difference regardless of chunk size
- Before I could get any real testing done, one of the machines I was using
for testing died and won't even POST, which has put a damper on said
testing (as you might imagine).
- There is a small issue with the patch: a lower-bound of 4 is not
appropriate; it should be XLOG_BLCKSZ / 1024 (I can submit an updated patch
if that is appropriate)
- I am, at this time, unable to replicate the earlier results however I
can't rule them out, either.


--
Jon


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-16 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> I don't think there's a danger similar to f0c7b789a here, because the
>> "caller" (i.e. the node that needs the expression's result) expects
>> resvalue/null to be overwritten.

> Yeah, that's what I thought when I wrote the broken code in ExecEvalCase,
> too.  It was wrong.

Along the same line, I notice that you've got some expr step types
overwriting their own input, the various flavors of EEOP_BOOLTEST for
example.  Maybe that's all right but it doesn't really give me a warm
feeling, especially when other single-argument operations like
EEOP_BOOL_NOT_STEP are done differently.  Again, I think a clear
explanation of the design is essential to allow people to reason about
whether this sort of trickery is safe.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Push down more full joins in postgres_fdw

2017-03-16 Thread David Steele
On 1/30/17 6:30 AM, Etsuro Fujita wrote:
> On 2017/01/27 21:25, Etsuro Fujita wrote:
>> On 2017/01/27 20:04, Ashutosh Bapat wrote:
>>> I think we should pick up your patch on
>>> 27th December, update the comment per your mail on 5th Jan. I will
>>> review it once and list down the things left to committer's judgement.
> 
>> Sorry, I started thinking we went in the wrong direction.  I added to
>> deparseSelectStmtForRel build_subquery_tlists, which creates a tlist for
>> each subquery present in a given join tree prior to deparsing a whole
>> remote query.  But that's nothing but an overhead; we need to create a
>> tlist for the top-level query because we use it as fdw_scan_tlist, but
>> not for subqueries, and we need to create retrieved_attrs for the
>> top-level query while deparsing the targetlist in
>> deparseExplicitTargetList, but not for subqueries.  So, we should need
>> some work to avoid such a useless overhead.
> 
> I think we can avoid the useless overhead by adding a new function,
> deparseSubqueryTargetList, that deparses expressions in the given
> relation's reltarget, not the tlist, as a SELECT clause of the subquery
> representing the given relation.  That would also allow us to make the
> 1-to-1 relationship between the subquery output columns and their
> aliases more explicit, which was your original comment.  Please find
> attached the new version.  (The patch doesn't need the patch to avoid
> duplicate construction of the tlist, discussed upthread.)
> 
> Other changes:
> * I went back to make_outerrel_subquery and make_innerrel_subquery,
> which are flags to indicate whether to deparse the input relations as
> subqueries.  is_subquery_rel would work well for handling the cases of
> full joins with restrictions on the input relations, but I noticed that
> that wouldn't work well when extending to handle the cases where we
> deparse the input relations as subqueries to evaluate PHVs remotely.
> * Since appendSubqueryAlias in the previous patch is pretty small, I
> included the code into deparseRangeTblRef.

This patch does not apply cleanly at cccbdde:

$ patch -p1 < ../other/postgres-fdw-subquery-support-v15.patch
patching file contrib/postgres_fdw/deparse.c
Hunk #11 succeeded at 1371 (offset -3 lines).
Hunk #12 succeeded at 1419 (offset -3 lines).
Hunk #13 succeeded at 1486 (offset -3 lines).
Hunk #14 succeeded at 2186 (offset -3 lines).
Hunk #15 succeeded at 3082 (offset -3 lines).
patching file contrib/postgres_fdw/expected/postgres_fdw.out
patching file contrib/postgres_fdw/postgres_fdw.c
Hunk #1 succeeded at 669 (offset 1 line).
Hunk #2 succeeded at 1245 (offset -1 lines).
Hunk #3 succeeded at 2557 (offset -1 lines).
Hunk #4 succeeded at 4157 (offset 3 lines).
Hunk #5 succeeded at 4183 (offset 3 lines).
Hunk #6 succeeded at 4212 (offset 3 lines).
Hunk #7 succeeded at 4315 (offset 3 lines).
patching file contrib/postgres_fdw/postgres_fdw.h
patching file contrib/postgres_fdw/sql/postgres_fdw.sql

Since these are just offsets I'll leave the patch as "Needs review" but
an updated patch would be appreciated.

Thanks,
-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw bug in 9.6

2017-03-16 Thread David Steele
On 1/23/17 4:56 AM, Etsuro Fujita wrote:
> On 2017/01/20 14:24, Ashutosh Bapat wrote:
>> On Thu, Jan 19, 2017 at 10:38 PM, Robert Haas 
>> wrote:
>>> On Wed, Jan 18, 2017 at 10:26 PM, Ashutosh Bapat
>>>  wrote:
> Yes, I think that's broadly the approach Tom was recommending.
> 
 I don't have problem with that approach, but the latest patch has
 following problems.
> 
 2. There are many cases where the new function would return no local
 path and hence postgres_fdw doesn't push down the join [1]. This will
 be performance regression compared to 9.6.
> 
>>> Some, or many?  How many?
> 
>> AFAIU, the problem esp. with parameterized paths is this: If rel1 is
>> required to be parameterized by rel2 (because of lateral references?),
>> a nested loop join with rel2 as outer relation and rel1 as inner
>> relation is possible. postgres_fdw and hence this function, however
>> doesn't consider all the possible join combinations and thus when this
>> function is presented with rel1 as outer and rel2 as inner would
>> refuse to create a path. More generally, while creating local paths,
>> we try many combinations of participating relations, some of which do
>> not produce local paths and some of them do (AFAIK, it happens in case
>> of lateral references, but there might be other cases). postgres_fdw,
>> however considers only a single combination, which may or may not have
>> produced local path. In such a case, postgres_fdw would create a
>> foreign join path but won't get a local path and thus bail out.
> 
> I had second thoughts; one idea how to build parameterized paths to
> avoid this issue is: as in postgresGetForeignPaths, to (1) identify
> which outer relations could supply additional safe-to-send-to-remote
> join clauses, and (2) build a parameterized path and its alternative
> local-join path for each such outer relation.  In #1, we could look at
> the join relation's ppilist to identify interesting outer relations.  In
> #2, the local-join path corresponding to each such outer relation could
> be built from the cheapest-total paths for the outer and inner
> relations, using CreateLocalJoinPath, so that the result path has the
> outer relation as its required_outer.
> 
>>> I'm a bit sketchy about this kind of thing, though:
>>>
>>> ! if (required_outer)
>>>   {
>>> ! bms_free(required_outer);
>>> ! return NULL;
>>>   }
>>>
>>> I don't understand what would stop that from making this fail to
>>> generate parameterized paths in some cases in which it would be legal
>>> to do so, and the comment is very brief.
> 
>> I am not so much worried about this though :).
>> GetExistingLocalJoinPath() also does not handle that case. The new
>> function is not making it worse in this case.
>> 731 /* Skip parameterised paths. */
>> 732 if (path->param_info != NULL)
>> 733 continue;
> 
> One idea to remove such extra checks is to pass the required_outer to
> CreateLocalJoinPath like the attached.  As described above, the caller
> would have that set before calling that function, so we wouldn't need to
> calculate that set in that function.
> 
> Other changes:
> 
> * Also modified CreateLocalJoinPath so that we pass outer/inner paths,
> not outer/inner rels, because it would be more flexible for the FDW to
> build the local-join path from paths it chose.
> * Fixed a bug that I missed the RIGHT JOIN case in CreateLocalJoinPath.

This patch does not apply cleanly at cccbdde:

$ patch -p1 < ../other/epqpath-for-foreignjoin-5.patch
patching file contrib/postgres_fdw/expected/postgres_fdw.out
Hunk #6 succeeded at 4134 (offset 81 lines).
Hunk #7 succeeded at 4275 (offset 81 lines).
patching file contrib/postgres_fdw/postgres_fdw.c
Hunk #1 succeeded at 4356 (offset 3 lines).
Hunk #2 succeeded at 4386 (offset 3 lines).
patching file contrib/postgres_fdw/sql/postgres_fdw.sql
patching file doc/src/sgml/fdwhandler.sgml
patching file src/backend/foreign/foreign.c
Hunk #2 FAILED at 696.
1 out of 2 hunks FAILED -- saving rejects to file
src/backend/foreign/foreign.c.rej
patching file src/backend/optimizer/path/joinpath.c
Hunk #1 FAILED at 25.
Hunk #2 succeeded at 109 (offset 27 lines).
Hunk #3 succeeded at 134 (offset 27 lines).
Hunk #4 succeeded at 197 (offset 27 lines).
Hunk #5 succeeded at 208 (offset 27 lines).
Hunk #6 succeeded at 225 (offset 27 lines).
Hunk #7 succeeded at 745 (offset 113 lines).
Hunk #8 FAILED at 894.
Hunk #9 succeeded at 1558 (offset 267 lines).
Hunk #10 succeeded at 1609 (offset 268 lines).
2 out of 10 hunks FAILED -- saving rejects to file
src/backend/optimizer/path/joinpath.c.rej
patching file src/include/foreign/fdwapi.h
Hunk #1 succeeded at 237 (offset 2 lines).
patching file src/include/nodes/relation.h
Hunk #1 succeeded at 914 (offset 10 lines).
Hunk #2 succeeded at 2057 (offset 47 lines).

Marked "Waiting on Author".

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql

  1   2   >