Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-23 Thread Thomas Munro
On Wed, Feb 24, 2016 at 5:48 PM, Thomas Munro
 wrote:
> Here is a first pass at that. [...]

On Wed, Feb 24, 2016 at 1:23 AM, Robert Haas  wrote:
> file_fdw is parallel-safe, ...

And here is a patch to apply on top of the last one, to make file_fdw
return true.  But does it really work correctly under parallelism?

-- 
Thomas Munro
http://www.enterprisedb.com


file-fdw-parallel-safe.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-02-23 Thread Michael Paquier
On Wed, Feb 24, 2016 at 7:26 AM, Tomas Vondra wrote:
> 1) I'm not quite sure why the patch adds missing_ok to fsync_fname()? The
> only place where we use missing_ok=true is in rename_safe, where right at
> the beginning we do this:
>
> fsync_fname(newfile, false, true);
>
> I.e. we're fsyncing the rename target first, in case it exists. But that
> seems to be conflicting with the comments in xlog.c where we explicitly
> state that the target file should not exist. Why should it be OK to call
> rename_safe() when the target already exists? If this really is the right
> thing to do, it should be explained in the comment above rename_safe(),
> probably.

The point is to mimic rename(), which can manage the case where the
new entry exists, and to look for a consistent on-disk behavior. It is
true that the new argument of fsync_fname is actually not necessary,
we could just check for the existence of the new entry with stat(),
and perform an fsync if it exists.
I have added the following comment in rename_safe():
+   /*
+* First fsync the old entry and new entry, it this one exists, to ensure
+* that they are properly persistent on disk. Calling this routine with
+* an existing new target file is fine, rename() will first remove it
+* before performing its operation.
+*/
How does that look?

> 2) If rename_safe/link_safe are meant as crash-safe replacements for
> rename/link, then perhaps we should use the same signatures, including the
> "const" pointer parameters. So while currently the signatures look like
> this:
>
> int rename_safe(char *oldfile, char *newfile);
> int link_safe(char *oldfile, char *newfile);
>
> it should probably look like this
>
> int rename_safe(const char *oldfile, const char *newfile);
> int link_safe(const char *oldfile, const char *newfile);
>
> I've noticed this in CheckPointReplicationOrigin() where the current code
> has to cast the parameters to (char*) to silence the compiler.

I recall considering that, and the reason why I did not do so was that
fsync_fname() is not doing it either, because OpenTransientFile() is
using FileName which is not defined as a constant. At the end we'd
finish with one or two casts anyway. So it seems that changing
fsync_fname makes sense though by looking at fsync_fname_ext,
OpenTransientFile() is using an explicit cast for the file name.

> 3) Both rename_safe and link_safe do this at the very end:
>
> fsync_parent_path(newfile);
>
> That however assumes both the oldfile and newfile are placed in the same
> directory - otherwise we'd fsync only one of them. I don't think we have a
> place where we're renaming files between directories (or do we), so we're OK
> with respect to this. But it seems like a good idea to defend against this,
> or at least mention that in the comments.

No, we don't have a code path where a file is renamed between
different directories, which is why this code is doing so. The comment
is a good addition to have though, so I have added it. I guess that we
could complicate more this patch to check if the parent directories of
the new and old entries match or not, then fsync both of them, but I'd
rather keep things simple.

> 4) nitpicking: There are some unnecessary newlines added/removed in
> RemoveXlogFile, XLogArchiveForceDone.

Fixed. I missed those three ones.
-- 
Michael
diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index f6da673..67e0f73 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -418,19 +418,20 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 	TLHistoryFilePath(path, newTLI);
 
 	/*
-	 * Prefer link() to rename() here just to be really sure that we don't
-	 * overwrite an existing file.  However, there shouldn't be one, so
-	 * rename() is an acceptable substitute except for the truly paranoid.
+	 * Prefer link_safe() to rename_safe() here just to be really sure that
+	 * we don't overwrite an existing file.  However, there shouldn't be one,
+	 * so rename_safe() is an acceptable substitute except for the truly
+	 * paranoid.
 	 */
 #if HAVE_WORKING_LINK
-	if (link(tmppath, path) < 0)
+	if (link_safe(tmppath, path) < 0)
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not link file \"%s\" to \"%s\": %m",
 		tmppath, path)));
 	unlink(tmppath);
 #else
-	if (rename(tmppath, path) < 0)
+	if (rename_safe(tmppath, path) < 0)
 		ereport(ERROR,
 (errcode_for_file_access(),
  errmsg("could not rename file \"%s\" to \"%s\": %m",
@@ -508,19 +509,20 @@ writeTimeLineHistoryFile(TimeLineID tli, char *content, int size)
 	TLHistoryFilePath(path, tli);
 
 	/*
-	 * Prefer link() to rename() here just to be really sure that we don't
-	 * overwrite an existing logfile.  However, there shouldn't be one, so
-	 * rename() is an acceptable substitute except for the truly paranoid.
+	 * Prefer link_safe() to rename_safe() here just to be really sur

Re: [HACKERS] Allow to specify (auto-)vacuum cost limits relative to the database/cluster size?

2016-02-23 Thread Robert Haas
On Tue, Jan 12, 2016 at 6:12 PM, Andres Freund  wrote:
> right now the defaults for autovacuum cost limiting are so low that they
> regularly cause problems for our users. It's not exactly obvious that
> any installation above a couple gigabytes definitely needs to change
> autovacuum_vacuum_cost_delay &
> autovacuum_vacuum_cost_limit/vacuum_cost_limit. Especially
> anti-wraparound/full table vacuums basically take forever with the
> default settings.
>
> On the other hand we don't want a database of a couple hundred megabytes
> to be vacuumed as fast as possible and trash the poor tiny system. So we
> can't just massively increase the limits by default; although I believe
> some default adjustment would be appropriate anyway.
>
> I wonder if it makes sense to compute the delays / limits in relation to
> either cluster or relation size. If you have a 10 TB table, you
> obviously don't want to scan with a few megabytes a second, which the
> default settings will do for you. With that in mind we could just go for
> something like the autovacuum_*_scale_factor settings. But e.g. for
> partitioned workloads with a hundreds of tables in the couple gigabyte
> range that'd not work that well.
>
> Somehow computing the speed in relation to the cluster/database size is
> probably possible, but I wonder how we can do so without constantly
> re-computing something relatively expensive?
>
> Thoughts?

Thanks for bringing this up.  I fully agree we should try to do
something about this.  This comes up quite regularly in EnterpriseDB
support discussions, and I'm sure lots of other people have problems
with it too.  It seems to me that what we really want to do is try to
finish vacuuming the table before we again need to vacuum the table.
For the sake of simplicity, just consider the anti-wraparound case for
a second.  If it takes three days to vacuum the table and we consume
200 million XIDs in two days, we are pretty clearly not vacuuming fast
enough.

I think we should do something similar to what we do for checkpoints.
We estimate when the table will next need vacuuming based on the rate
of XID advancement and the rate at which dead tuples are being
created.  We can also estimate what percentage of the relation we've
vacuumed and derive some estimate of when we'll be done - perhaps
assuming only one index pass, for the sake of simplicity.  If we're
behind, we should vacuum faster to try to catch up.  We could even try
to include some fudge factor in the calculation - e.g. if the time
until the next vacuum is estimated to be 30 hours from the start of
the current vacuum, we try to make the current vacuum finish in no
more than 75% * 30 hours = 22.5 hours.

I think this is better than your proposal to scale it just based on
the size of the relation because it may be find for the vacuum to run
slowly if we're creating very few dead tuples and consuming very few
XIDs.  IME, there's one very specific scenario where the wheels come
off, and that's when the table doesn't get fully vacuumed before it's
due to be vacuumed again.  Of course, anything we did here wouldn't be
perfect - it would all be based on estimates - but I bet we could make
things a lot better.  There's an even more global version of this
problem, which is that you could have a situation when any given table
gets vacuumed it runs quick enough to finish before that table gets
vacuumed again, but there are lots of large tables so overall we don't
make enough progress.  It would be nice to fix that, too, but even
something simple that ignored that more global problem would help a
lot of people.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] significant semi join overestimates (with CTEs)

2016-02-23 Thread Robert Haas
On Mon, Feb 22, 2016 at 4:32 AM, Tomas Vondra
 wrote:
> Firstly, we could clamp the join cardinality estimate to the size of
> cartesian product, eliminating at least the most obvious over-estimates.
>
> Secondly, I think it'd be nice to somehow eliminate the sudden jumps in the
> estimates - I'm not quite sure why need the three different formulas in
> eqjoinsel_semi, so maybe we can get rid of some of them?

I don't have any particularly intelligent commentary on the second
point right at the moment, but I emphatically agree with you on the
first one.  I've seen that problem a number of times and it seems
completely ludicrous to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-23 Thread Robert Haas
On Wed, Feb 24, 2016 at 9:17 AM, Tomas Vondra
 wrote:
>> Well, turns out there's a quite significant difference, actually. The
>> index sizes I get (quite stable after multiple runs):
>>
>> 9.5 : 2428 MB
>> 9.6 + alone cleanup : 730 MB
>> 9.6 + pending lock : 488 MB
>>
>> So that's quite a significant difference, I guess. The load duration for
>> each version look like this:
>>
>> 9.5 : 1415 seconds
>> 9.6 + alone cleanup : 1310 seconds
>> 9.6 + pending lock  : 1380 seconds
>>
>> I'd say I'm happy with sacrificing ~5% of time in exchange for ~35%
>> reduction of index size.
>>
>> The size of the index on 9.5 after VACUUM FULL (so pretty much the
>> smallest index possible) is 440MB, which suggests the "pending lock"
>> patch does a quite good job.
>>
>> The gin_metapage_info at the end of one of the runs (pretty much all the
>> runs look exactly the same) looks like this:
>>
>>pending lock   alone cleanup  9.5
>> 
>>   pending_head2   2   310460
>>   pending_tail  338 345   310806
>>   tail_free_size812 812  812
>>   n_pending_pages   330 339  347
>>   n_pending_tuples 10031037 1059
>>   n_total_pages   2   22
>>   n_entry_pages   1   11
>>   n_data_pages0   00
>>   n_entries   0   00
>>   version 2   22
>>
>> So almost no difference, except for the pending_* attributes, and even
>> in that case the values are only different for 9.5 branch. Not sure what
>> conclusion to draw from this - maybe it's necessary to collect the
>> function input while the load is running (but that'd be tricky to
>> process, I guess).
>
> Are we going to anything about this? While the bug is present in 9.5 (and
> possibly other versions), fixing it before 9.6 gets out seems important
> because reproducing it there is rather trivial (while I've been unable to
> reproduce it on 9.5).

I'm not going to promise to commit anything here, because GIN is not
usually my area, but could you provide a link to the email that
contains the patch you think should be committed?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why vacuum_freeze_table_age etc. doc in "Statement Behavior" section?

2016-02-23 Thread Michael Paquier
On Wed, Feb 24, 2016 at 2:00 PM, Tatsuo Ishii  wrote:
> The explanations for vacuum_freeze_table_age etc. are in the section
> "Statement Behavior", which is a subsection of "Client Connection
> Defaults". To me vacuum_freeze_table_age etc. are totally unrelated
> to "Client Connection Defaults".
>
> I think "Resource Consumption" section is more appropriate for their
> place. There's already a section "Cost-based Vacuum Delay". Maybe we
> can add a new section for below under "Resource Consumption" something
> like "Managing Vacuum Freeze".
>
> vacuum_freeze_min_age
> vacuum_freeze_table_age
> vacuum_multixact_freeze_min_age
> vacuum_multixact_freeze_table_age

Those are parameters related controlling the way the query VACUUM
behaves, that's why they are placed where they are now, but so do the
cost-based parameters. So +1 for a new section under Resource
Consumption".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why vacuum_freeze_table_age etc. doc in "Statement Behavior" section?

2016-02-23 Thread Tatsuo Ishii
The explanations for vacuum_freeze_table_age etc. are in the section
"Statement Behavior", which is a subsection of "Client Connection
Defaults". To me vacuum_freeze_table_age etc. are totally unrelated
to "Client Connection Defaults".

I think "Resource Consumption" section is more appropriate for their
place. There's already a section "Cost-based Vacuum Delay". Maybe we
can add a new section for below under "Resource Consumption" something
like "Managing Vacuum Freeze".

vacuum_freeze_min_age
vacuum_freeze_table_age
vacuum_multixact_freeze_min_age
vacuum_multixact_freeze_table_age

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-23 Thread Thomas Munro
On Tue, Feb 23, 2016 at 6:45 PM, Robert Haas  wrote:
> On Tue, Feb 23, 2016 at 2:06 AM, Tom Lane  wrote:
>> Robert Haas  writes:
 Foreign tables are supposed to be categorically excluded from
 parallelism.  Not sure why that's not working in this instance.
>>
>> BTW, I wonder where you think that's supposed to be enforced, because
>> I sure can't find any such logic.
>>
>> I suppose that has_parallel_hazard() would be the logical place to
>> notice foreign tables, but it currently doesn't even visit RTEs,
>> much less contain any code to check if their tables are foreign.
>> Or did you have another place in mind to do that?
>
> RTEs are checked in set_rel_consider_parallel(), and I thought there
> was a check there related to foreign tables, but there isn't.  Oops.
> In view of 69d34408e5e7adcef8ef2f4e9c4f2919637e9a06, we shouldn't
> blindly assume that foreign scans are not parallel-safe, but we can't
> blindly assume the opposite either.  Maybe we should assume that the
> foreign scan is parallel-safe only if one or more of the new methods
> introduced by the aforementioned commit are set, but actually that
> doesn't seem quite right.  That would tell us whether the scan itself
> can be parallelized, not whether it's safe to run serially but within
> a parallel worker.  I think maybe we need a new FDW API that gets
> called from set_rel_consider_parallel() with the root, rel, and rte as
> arguments and which can return a Boolean.  If the callback is not set,
> assume false.

Here is a first pass at that.  The patch adds
IsForeignScanParallelSafe to the FDW API.   postgres_fdw returns false
(unnecessary but useful to verify that the regression test breaks if
you change it to true), and others don't provide the function so fall
back to false.

I suspect there may be opportunities to return true even if snapshots
and uncommitted reads aren't magically coordinated among workers.  For
example: users of MongoDB-type systems and text files have no
expectation of either snapshot semantics or transaction isolation in
the first place, so doing stuff in parallel won't be any less safe
than usual as far as visibility is concerned; postgres_fdw could in
theory export/import snapshots and allow parallelism in limited cases
if it can somehow prove there have been no uncommitted writes; and
non-MVCC/snapshot RDBMSs might be OK in lower isolation levels if you
haven't written anything or have explicitly opted in to uncommitted
reads (otherwise you'd risk invisible deadlock against the leader when
trying to read what it has written).

Please also find attached a tiny patch to respect TEMP_CONFIG for contribs.

-- 
Thomas Munro
http://www.enterprisedb.com


temp-config-for-contribs.patch
Description: Binary data


fdw-parallel-safe-api.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN data corruption bug(s) in 9.6devel

2016-02-23 Thread Tomas Vondra

Hi,

On 01/05/2016 10:38 AM, Tomas Vondra wrote:

Hi,


...


There shouldn't be a difference between the two approaches (although I
guess there could be if one left a larger pending list than the other,
as pending lists is very space inefficient), but since you included
9.5 in your test I thought it would be interesting to see how either
patched version under 9.6 compared to 9.5.


Well, turns out there's a quite significant difference, actually. The
index sizes I get (quite stable after multiple runs):

9.5 : 2428 MB
9.6 + alone cleanup : 730 MB
9.6 + pending lock : 488 MB

So that's quite a significant difference, I guess. The load duration for
each version look like this:

9.5 : 1415 seconds
9.6 + alone cleanup : 1310 seconds
9.6 + pending lock  : 1380 seconds

I'd say I'm happy with sacrificing ~5% of time in exchange for ~35%
reduction of index size.

The size of the index on 9.5 after VACUUM FULL (so pretty much the
smallest index possible) is 440MB, which suggests the "pending lock"
patch does a quite good job.

The gin_metapage_info at the end of one of the runs (pretty much all the
runs look exactly the same) looks like this:

   pending lock   alone cleanup  9.5

  pending_head2   2   310460
  pending_tail  338 345   310806
  tail_free_size812 812  812
  n_pending_pages   330 339  347
  n_pending_tuples 10031037 1059
  n_total_pages   2   22
  n_entry_pages   1   11
  n_data_pages0   00
  n_entries   0   00
  version 2   22

So almost no difference, except for the pending_* attributes, and even
in that case the values are only different for 9.5 branch. Not sure what
conclusion to draw from this - maybe it's necessary to collect the
function input while the load is running (but that'd be tricky to
process, I guess).


Are we going to anything about this? While the bug is present in 9.5 
(and possibly other versions), fixing it before 9.6 gets out seems 
important because reproducing it there is rather trivial (while I've 
been unable to reproduce it on 9.5).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-23 Thread Bruce Momjian
On Wed, Feb 24, 2016 at 01:08:29AM +, Simon Riggs wrote:
> On 23 February 2016 at 16:43, Bruce Momjian  wrote:
> 
> There was discussion at the FOSDEM/PGDay Developer Meeting
> (https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting)
> about sharding so I wanted to outline where I think we are going with
> sharding and FDWs.
> 
> I think we need to be very careful to understand that "FDWs and Sharding" is
> one tentative proposal amongst others, not a statement of direction for the
 --

What other directions are proposed to add sharding to the existing
Postgres code?  If there are, I have not heard of them.  Or are they
only (regularly updated?) forks of Postgres?

> PostgreSQL project since there is not yet any universal agreement.

As I stated clearly, we are going in the FDW direction because improving
FDWs have uses beyond sharding, and once it is done we can see how well
it works for sharding.

> We know Postgres XC/XL works, and scales
> 
> 
> Agreed. 
> 
> In contrast, the FDW/sharding approach is as-yet unproven, and significantly
> without any detailed technical discussion of the exact approach and how it
> would work, even after more than 6 months since we first heard of it openly.
> Since we don't know how it will work, we have no idea how long it will take
> either, or even if it ever will.

Yep.

> I'd like to see discussion of the details in presentation/wiki form and an
> initial prototype, with measurements. Without these things we are still just 
> at
> the speculation stage. Some alternate proposals are also at that stage.

Uh, what "alternate proposals"?

My point was that we know XC/XL works, but there is too much code change
for us, so maybe FDWs will make built-in sharding possible/easier.

> , but we also know they require
> too many code changes to be merged into Postgres (at least based on
> previous discussions).  The FDW sharding approach is to enhance the
> existing features of Postgres to allow as much sharding as possible.
> 
> Once that is done, we can see what workloads it covers and
> decide if we are willing to copy the volume of code necessary
> to implement all supported Postgres XC or XL workloads.
> (The Postgres XL license now matches the Postgres license,
> http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
> Postgres XC has always used the Postgres license.)
> 
> 
> It's never been our policy to try to include major projects in single code
> drops. Any move of XL/XC code into PostgreSQL core would need to be done piece
> by piece across many releases. XL is definitely too big for the elephant to 
> eat
> in one mouthful.

Is there any plan to move the XL/XC code into Postgres?  If so, I have
not heard of it.  I thought everyone agreed it was too much code change,
which is why it is a separate code tree.  Is that incorrect?

> If we are not willing to add code for the missing Postgres XC/XL
> features, Postgres XC/XL will probably remain a separate fork of
> Postgres. 
> 
> 
> And if the FDW approach doesn't work, that won't be part of PostgreSQL core
> either...

Uh, duh.  Yeah, that's what I said.  What is your point?  I said we
don't know if it will work, as you quoted below:

> I don't think anyone knows the answer to this question, and I
> don't know how to find the answer except to keep going with our current
> FDW sharding approach.
> 
> 
> This is exactly the wrong time to discuss this, since we are days away from 
> the
> final deadline for PostgreSQL 9.6 and the community should be focusing on that
> for next few months, not futures.

I posted this because of the discussion at the FOSDEM meeting, and to
address the questions you asked in that meeting.  I even told you last
week on IM that I was going to post this for that stated purpose.  I
didn't pick the time at random.

> What I notice is that when Greenplum announced it would publish as open source
> its modified version of Postgres, there was some scary noise made immediately
> about that concerning patents etc..

> Now, Postgres-XL 9.5 is recently announced and we see another scary sounding
> pronouncement about that *maybe* it won't be included in core. While the
> comments made are true, they do not solely apply to XC/XL, in fact the
> uncertainty applies to all approaches equally since notably we have
> approximately five proposals for future designs.
> 
> These comments, given their timing and nature could easily cause "Fear,
> Uncertainty and Doubt" in people seeing this. FUD is also the name of a sales
> technique designed to undermine proposals. I hope and presume it was not the
> intention and reason for discussing uncertainty now and earlier.

Oh, I absolutely did this as a way to undermine what _everyone_ else is
doing?  Is there another way to behave?

I find this insulting.  Others made the same remarks when I questioned
the patents, and e

Re: [HACKERS] [JDBC] JDBC behaviour

2016-02-23 Thread Craig Ringer
On 23 February 2016 at 22:46, Tom Lane  wrote:

> Craig Ringer  writes:
> > On 23 February 2016 at 21:34, Robert Haas  wrote:
> >> I believe Sridhar is imagining that someday "set autocommit to false"
> >> might be a command that the server would understand.
>
> > ... I guess. Yeah.
>
> We've been there, we've done that.  We're not doing it again.
>

Thanks for the pointer to the history.

I had zero enthusiasm for going that way anyway and was mostly trying to
figure out what Sridhar was talking about. It's useful to know it's already
been explored though.

I think we know where we need to go from here - updating that PgJDBC patch
to add a connection option, making sure it doesn't add round-trips, adding
tests and merging it. At this point it's up to Sridhar to start putting
time and development effort into it to push it forward if desired.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-02-23 Thread Tomas Vondra

Hi,

On 02/22/2016 08:04 PM, Corey Huinker wrote:

>
> Given that counterexample, I think we not only shouldn't back-patch such a
> change but should reject it altogether.

Ouch, good point. The overflows are a different problem that we had
better address though (still on my own TODO list)...


So I should remove the bounds checking from
generate_series(date,date,[int]) altogether?


I feel rather uneasy about simply removing the 'infinity' checks. Is 
there a way to differentiate those two cases, i.e. when the 
generate_series is called in target list and in the FROM part? If yes, 
we could do the check only in the FROM part, which is the case that does 
not work (and consumes arbitrary amounts of memory).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] improving GROUP BY estimation

2016-02-23 Thread Tomas Vondra

Hi,

while working of using correlation statistics when estimating GROUP BY 
(or generally whatever uses estimate_num_groups), I've noticed that we 
apply the selectivity in a rather naive way.


After estimating number of groups in the whole table - either by reading 
pg_stats.n_distinct for a single column, or multiplying (and doing some 
additional magic) for a group of columns, we do this:


   reldistinct *= rel->rows / rel->tuples;

which essentially applies the selectivity to the ndistinct estimate.

So when you have a table with 1000 distinct values in a column, and we 
read 10% of the table, we expect to get 100 distinct values.


But consider for example the table has 100M rows, that the column has 
n_distinct=1 and we're reading 1% of the table (i.e. 1M rows). The 
current code will estimate the GROUP BY cardinality to be 100 (as 1% of 
the n_distinct), but in reality we're more likely to see all 1 
distinct values.


Example:

CREATE TABLE t (a INT, b FLOAT);
INSERT INTO t SELECT (10 * random())::int, random()
FROM generate_series(1,1000) s(i);
ANALYZE t;

SELECT n_distinct FROM pg_stats WHERE attname = 'a';

 n_distinct

  98882

-- 1% of the table
EXPLAIN ANALYZE SELECT a FROM t WHERE b < 0.01 GROUP BY a;

   QUERY PLAN
-
 HashAggregate  (cost=179514.33..179523.45 rows=912 width=4)
(actual time=916.224..955.136 rows=63492 loops=1)
   Group Key: a
   ->  Seq Scan on t  (cost=0.00..179053.25 rows=92216 width=4)
  (actual time=0.103..830.104 rows=100268 loops=1)
 Filter: (b < '0.01'::double precision)
 Rows Removed by Filter: 9899732
 Planning time: 1.237 ms
 Execution time: 982.600 ms
(7 rows)

-- 5% of the table
EXPLAIN ANALYZE SELECT a FROM t WHERE b < 0.05 GROUP BY a;

QUERY PLAN
-
 HashAggregate  (cost=180296.06..180345.22 rows=4916 width=4)
(actual time=1379.519..1440.005 rows=99348 loops=1)
   Group Key: a
   ->  Seq Scan on t  (cost=0.00..179053.25 rows=497123 width=4)
  (actual time=0.096..1000.494 rows=500320 loops=1)
 Filter: (b < '0.05'::double precision)
 Rows Removed by Filter: 9499680
 Planning time: 0.129 ms
 Execution time: 1480.986 ms
(7 rows)

This is getting especially bad when reading only a small fraction of a 
huge table - it's easy to construct cases with arbitrarily large error. 
And it's worth noting that the error is almost exclusively massive 
under-estimate, so the "wrong" direction as HashAggregate is still 
vulnerable to OOM (again, the larger the table the worse).


So I think a more appropriate way to estimate this would be to find the 
expected number of distinct values when sampling with replacements, as 
explained for example in [1].


Applied to the code, it means changing the line from

  reldistinct *= rel->rows / rel->tuples;

to

  reldistinct *= (1 - powl((reldistinct - 1) / reldistinct, rel->rows));

With this change, the estimates for the examples look like this:

   QUERY PLAN
-
 HashAggregate  (cost=179283.79..179883.48 rows=59969 width=4)
(actual time=897.071..934.764 rows=63492 loops=1)
   Group Key: a
   ->  Seq Scan on t  (cost=0.00..179053.25 rows=92216 width=4)
  (actual time=0.104..820.276 rows=100268 loops=1)
 Filter: (b < '0.01'::double precision)
 Rows Removed by Filter: 9899732
 Planning time: 1.261 ms
 Execution time: 962.812 ms
(7 rows)

and

QUERY PLAN
-
 Group  (cost=232886.15..235371.76 rows=98234 width=4)
(actual time=1519.545..2104.447 rows=99348 loops=1)
   Group Key: a
   ->  Sort  (cost=232886.15..234128.95 rows=497123 width=4)
 (actual time=1519.540..1838.575 rows=500320 loops=1)
 Sort Key: a
 Sort Method: external merge  Disk: 6824kB
 ->  Seq Scan on t  (cost=0.00..179053.25 rows=497123 width=4)
 (actual time=0.090..1044.099 rows=500320 loops=1)
   Filter: (b < '0.05'::double precision)
   Rows Removed by Filter: 9499680
 Planning time: 0.133 ms
 Execution time: 2147.340 ms
(10 rows)

So much better. Clearly, there are cases where this will over-estimate 
the cardinality - for example when the values are somehow correlated.


But I'd argue over-estimating is better because of the OOM issues in 
Hash Aggregate.


regards

[1] 
http://math.stackexchange.com/questions/72223/finding-expected-number-of-distinct-values-selected-from-a-set-of-integers


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Suppor

Re: [HACKERS] The plan for FDW-based sharding

2016-02-23 Thread Simon Riggs
On 23 February 2016 at 16:43, Bruce Momjian  wrote:

> There was discussion at the FOSDEM/PGDay Developer Meeting
> (https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting)
> about sharding so I wanted to outline where I think we are going with
> sharding and FDWs.
>

I think we need to be very careful to understand that "FDWs and Sharding"
is one tentative proposal amongst others, not a statement of direction for
the PostgreSQL project since there is not yet any universal agreement.

We know Postgres XC/XL works, and scales


Agreed.

In contrast, the FDW/sharding approach is as-yet unproven, and
significantly without any detailed technical discussion of the exact
approach and how it would work, even after more than 6 months since we
first heard of it openly. Since we don't know how it will work, we have no
idea how long it will take either, or even if it ever will.

I'd like to see discussion of the details in presentation/wiki form and an
initial prototype, with measurements. Without these things we are still
just at the speculation stage. Some alternate proposals are also at that
stage.


> , but we also know they require
> too many code changes to be merged into Postgres (at least based on
> previous discussions).  The FDW sharding approach is to enhance the
> existing features of Postgres to allow as much sharding as possible.
>
> Once that is done, we can see what workloads it covers and
> decide if we are willing to copy the volume of code necessary
> to implement all supported Postgres XC or XL workloads.
> (The Postgres XL license now matches the Postgres license,
> http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
> Postgres XC has always used the Postgres license.)
>

It's never been our policy to try to include major projects in single code
drops. Any move of XL/XC code into PostgreSQL core would need to be done
piece by piece across many releases. XL is definitely too big for the
elephant to eat in one mouthful.


> If we are not willing to add code for the missing Postgres XC/XL
> features, Postgres XC/XL will probably remain a separate fork of
> Postgres.


And if the FDW approach doesn't work, that won't be part of PostgreSQL core
either...


> I don't think anyone knows the answer to this question, and I
> don't know how to find the answer except to keep going with our current
> FDW sharding approach.
>

This is exactly the wrong time to discuss this, since we are days away from
the final deadline for PostgreSQL 9.6 and the community should be focusing
on that for next few months, not futures.

What I notice is that when Greenplum announced it would publish as open
source its modified version of Postgres, there was some scary noise made
immediately about that concerning patents etc..

Now, Postgres-XL 9.5 is recently announced and we see another scary
sounding pronouncement about that *maybe* it won't be included in core.
While the comments made are true, they do not solely apply to XC/XL, in
fact the uncertainty applies to all approaches equally since notably we
have approximately five proposals for future designs.

These comments, given their timing and nature could easily cause "Fear,
Uncertainty and Doubt" in people seeing this. FUD is also the name of a
sales technique designed to undermine proposals. I hope and presume it was
not the intention and reason for discussing uncertainty now and earlier.

I'm glad to see that the viability of the XC/XL approach is recognized. The
fact we have a working solution now is important for users, who don't want
to wait the 3-5 years while we work out and implement a longer term
strategy. Future upgrade support is certain, however.

What eventually gets into PostgreSQL core is as yet uncertain, as is the
timescale, but my hope is that we recognize that multiple use cases can be
supported rather than a single fixed architecture. It seems likely to me
that the PostgreSQL project will do what it does best - take multiple
comments and merge those into a combined system that is better than any of
the individual single proposals.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] PATCH: index-only scans with partial indexes

2016-02-23 Thread Tomas Vondra

Hi,

On 12/06/2015 11:48 PM, Tomas Vondra wrote:

   /*
* Frequently, there will be no partial indexes, so first check to
* make sure there's something useful to do here.
*/
   have_partial = false;
   foreach(lc, rel->indexlist)
   {
 IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);

 /*
  * index rinfos are the same to baseristrict infos for non-partial
  * indexes
  */
 index->indrinfos = rel->baserestrictinfo;

 if (index->indpred == NIL)
   continue;  /* ignore non-partial indexes */

 if (index->predOK)
   continue;  /* don't repeat work if already proven OK */

 have_partial = true;
 break;
   }


Attached is a v6 of the patch, which is actually the version submitted 
by Kyotaro-san on 2015/10/8 rebased to current master and with two 
additional changes.


Firstly, I've removed the "break" from the initial foreach loop in 
check_partial_indexes(). As explained in the previous message, I believe 
this was a bug in the patch.


Secondly, I've tried to improve the comments to explain a bit better 
what the code is doing.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


partial-index-only-scan-v6.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-02-23 Thread Tomas Vondra

Hi,

On 02/08/2016 03:01 PM, Shulgin, Oleksandr wrote:
>
...


I've incorporated this fix into the v2 of my patch, I think it is
related closely enough.  Also, added corresponding changes to
compute_distinct_stats(), which doesn't produce a histogram.


I think it'd be useful not to have all the changes in one lump, but 
structure this as a patch series with related changes in separate 
chunks. I doubt we'd like to mix the changes in a single commit, and it 
makes the reviews and reasoning easier. So those NULL-handling fixes 
should be in one patch, the MCV patches in another one.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Convert pltcl from strings to objects

2016-02-23 Thread Jim Nasby

On 2/23/16 6:04 AM, Victor Wagner wrote:

On Mon, 22 Feb 2016 17:57:36 -0600
Jim Nasby  wrote:


On 2/18/16 6:26 AM, Victor Wagner wrote:
(BTW, I also disagree about using a Tcl list for prodesc. That's an
object in a *Postgres* hash table; as such it needs to be managed by
Postgres, not Tcl. AFAIK the Postgres ResourceOwner stuff would be
the correct way to do that (but I'm not exactly an expert on that
area).


As far as I understand ResourceOwner is completely different garbage
collection scheme, than reference counting. ResourceOwner mechanism lets
free everything associated with transaction when transaction is over.


Ahh, right.


Here we have another case. prodesc is a global thing. And it is shared
between different operations. Problem was that there is no partcular
owner, and we have to wait when last operation which deals with it
would finish. It looks like perfect job for reference counting.


I've just tried to wrap my head around what's going on with prodesc and 
failed... specifically, I don't understand this claim in the comment:


* Add the proc description block to the hashtable.  Note we do not
* attempt to free any previously existing prodesc block.  !!This is
* annoying, but necessary since there could be active calls using
* the old prodesc.!!

What else could be referencing it? I realize it's stored in 
pltcl_proc_htab, but AFAICT that's backend-local. So I don't understand 
what else could be referencing it.


I suspect this code predates conveniences that have been added to 
Postgres over the years and that the whole way this stuff is being 
cached should be overhauled to do whatever plpgsql does now...



Actually, it doesn't matter much, because it wastes some memory only on
procedure change, and typically in the heavy loaded production
environments server side code doesn't change too frequently.


... but this is the reason it's maybe not a big priority. Though, 
prodesc does appear to be a few hundred bytes, and this is something 
that will affect *every* backend that has executed a procedure that then 
gets modified.




Moreover. errorcode and errorinfo information travel behind the scenes
and cannot break any code which doesn't try to explicitely access it.


Yeah, Karl will be looking into this as part of a separate patch.



I've mean, that there is no need to be to aggressive in the type
conversion. You'll never know if Tcl code would actually access some
tuple elements or not. Let's leave conversion to the time of actual use
of values.


Will also be looked at as part of a separate patch.


Please, send updated patch to the list in this thread, so it would
appear in the commitfest and I can mark your patch as "ready for
committer".


Done!
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
diff --git a/src/pl/tcl/pltcl.c b/src/pl/tcl/pltcl.c
index dce5d04..aceb498 100644
--- a/src/pl/tcl/pltcl.c
+++ b/src/pl/tcl/pltcl.c
@@ -47,11 +47,6 @@
((TCL_MAJOR_VERSION > maj) || \
 (TCL_MAJOR_VERSION == maj && TCL_MINOR_VERSION >= min))
 
-/* In Tcl >= 8.0, really not supposed to touch interp->result directly */
-#if !HAVE_TCL_VERSION(8,0)
-#define Tcl_GetStringResult(interp)  ((interp)->result)
-#endif
-
 /* define our text domain for translations */
 #undef TEXTDOMAIN
 #define TEXTDOMAIN PG_TEXTDOMAIN("pltcl")
@@ -212,33 +207,33 @@ static pltcl_proc_desc *compile_pltcl_function(Oid 
fn_oid, Oid tgreloid,
   bool pltrusted);
 
 static int pltcl_elog(ClientData cdata, Tcl_Interp *interp,
-  int argc, CONST84 char *argv[]);
+  int objc, Tcl_Obj * const objv[]);
 static int pltcl_quote(ClientData cdata, Tcl_Interp *interp,
-   int argc, CONST84 char *argv[]);
+   int objc, Tcl_Obj * const objv[]);
 static int pltcl_argisnull(ClientData cdata, Tcl_Interp *interp,
-   int argc, CONST84 char *argv[]);
+   int objc, Tcl_Obj * const objv[]);
 static int pltcl_returnnull(ClientData cdata, Tcl_Interp *interp,
-int argc, CONST84 char *argv[]);
+int objc, Tcl_Obj * const objv[]);
 
 static int pltcl_SPI_execute(ClientData cdata, Tcl_Interp *interp,
- int argc, CONST84 char *argv[]);
+ int objc, Tcl_Obj * const objv[]);
 static int pltcl_process_SPI_result(Tcl_Interp *interp,
 CONST84 char *arrayname,
-CONST84 char *loop_body,
+Tcl_Obj * loop_body,
 int spi_rc,
 SPITupleTable *tuptable,
   

Re: [HACKERS] Sanity checking for ./configure options?

2016-02-23 Thread David Fetter
On Tue, Feb 23, 2016 at 04:09:00PM -0600, Jim Nasby wrote:
> On 2/23/16 9:37 AM, Alvaro Herrera wrote:
> >Jim Nasby wrote:
> >>On 2/5/16 10:08 AM, David Fetter wrote:
> >>>On Wed, Feb 03, 2016 at 06:02:57PM -0600, Jim Nasby wrote:
> I just discovered that ./configure will happily accept '--with-pgport=' (I
> was actually doing =$PGPORT, and didn't realize $PGPORT was empty). What 
> you
> end up with is a compile error in guc.c, with no idea why it's broken. Any
> reason not to have configure or at least make puke if pgport isn't valid?
> >>>
> >>>That seems like a good idea.
> >>
> >>Patch attached. I've verified it with --with-pgport=, =0, =7 and =1. It
> >>catches what you'd expect it to.
> >
> >Does it work to specify port numbers below 1024?
> 
> Presumably not if you're trying to open a network port. But I just checked
> and if listen_addresses='' then you can use a low port number:
> 
> select name,quote_nullable(setting) from pg_settings where name in
> ('port','listen_addresses');
>name   | quote_nullable
> --+
>  listen_addresses | ''
>  port | '1'
> (2 rows)
> 
> Plus, the GUC check allows 1-1024, so I'm inclined to do the same in the
> config check. But I don't have a strong opinion about it.

I'm thinking that both the GUC check and the configure one should
restrict it to [1024..65535].

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-02-23 Thread Tomas Vondra

Hi,

On 02/05/2016 10:40 AM, Michael Paquier wrote:

On Thu, Feb 4, 2016 at 2:34 PM, Michael Paquier
 wrote:

On Thu, Feb 4, 2016 at 12:02 PM, Michael Paquier
 wrote:

On Tue, Feb 2, 2016 at 4:20 PM, Michael Paquier wrote:

...

So, attached is an updated patch that adds a new routine link_safe()
to ensure that a hard link is on-disk persistent. link() is called
twice in timeline.c and once in xlog.c, so those three code paths
are impacted. I noticed as well that my previous patch was sometimes
doing palloc calls in a critical section (oops), I fixed that on the
way.


I've finally got around to review the v5 version of the patch. Sorry it 
took so long (I blame FOSDEM, country-wide flu epidemic and my general 
laziness).


I do like most of the changes to the patch, thanks for improving it. A 
few comments though:


1) I'm not quite sure why the patch adds missing_ok to fsync_fname()? 
The only place where we use missing_ok=true is in rename_safe, where 
right at the beginning we do this:


fsync_fname(newfile, false, true);

I.e. we're fsyncing the rename target first, in case it exists. But that 
seems to be conflicting with the comments in xlog.c where we explicitly 
state that the target file should not exist. Why should it be OK to call 
rename_safe() when the target already exists? If this really is the 
right thing to do, it should be explained in the comment above 
rename_safe(), probably.


2) If rename_safe/link_safe are meant as crash-safe replacements for 
rename/link, then perhaps we should use the same signatures, including 
the "const" pointer parameters. So while currently the signatures look 
like this:


int rename_safe(char *oldfile, char *newfile);
int link_safe(char *oldfile, char *newfile);

it should probably look like this

int rename_safe(const char *oldfile, const char *newfile);
int link_safe(const char *oldfile, const char *newfile);

I've noticed this in CheckPointReplicationOrigin() where the current 
code has to cast the parameters to (char*) to silence the compiler.


3) Both rename_safe and link_safe do this at the very end:

fsync_parent_path(newfile);

That however assumes both the oldfile and newfile are placed in the same 
directory - otherwise we'd fsync only one of them. I don't think we have 
a place where we're renaming files between directories (or do we), so 
we're OK with respect to this. But it seems like a good idea to defend 
against this, or at least mention that in the comments.


4) nitpicking: There are some unnecessary newlines added/removed in 
RemoveXlogFile, XLogArchiveForceDone.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sanity checking for ./configure options?

2016-02-23 Thread Jim Nasby

On 2/23/16 9:37 AM, Alvaro Herrera wrote:

Jim Nasby wrote:

On 2/5/16 10:08 AM, David Fetter wrote:

On Wed, Feb 03, 2016 at 06:02:57PM -0600, Jim Nasby wrote:

I just discovered that ./configure will happily accept '--with-pgport=' (I
was actually doing =$PGPORT, and didn't realize $PGPORT was empty). What you
end up with is a compile error in guc.c, with no idea why it's broken. Any
reason not to have configure or at least make puke if pgport isn't valid?


That seems like a good idea.


Patch attached. I've verified it with --with-pgport=, =0, =7 and =1. It
catches what you'd expect it to.


Does it work to specify port numbers below 1024?


Presumably not if you're trying to open a network port. But I just 
checked and if listen_addresses='' then you can use a low port number:


select name,quote_nullable(setting) from pg_settings where name in 
('port','listen_addresses');

   name   | quote_nullable
--+
 listen_addresses | ''
 port | '1'
(2 rows)

Plus, the GUC check allows 1-1024, so I'm inclined to do the same in the 
config check. But I don't have a strong opinion about it.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Failover Slots

2016-02-23 Thread Oleksii Kliukin

> On 23 Feb 2016, at 11:30, Craig Ringer  wrote:
> 
>  
> Updated patch 
> 
> ... attached
> 
> I've split it up a bit more too, so it's easier to tell what change is for 
> what and fixed the issues mentioned by Oleksii. I've also removed some 
> unrelated documentation changes.
> 
> Patch 0001, timeline switches for logical decoding, is unchanged since the 
> last post. 

Thank you, I read the user-interface part now, looks good to me.

I found the following issue when shutting down a master with a connected 
replica that uses a physical failover slot:

2016-02-23 20:33:42.546 CET,,,54998,,56ccb3f3.d6d6,3,,2016-02-23 20:33:07 
CET,,0,DEBUG,0,"performing replication slot checkpoint",""
2016-02-23 20:33:42.594 CET,,,55002,,56ccb3f3.d6da,4,,2016-02-23 20:33:07 
CET,,0,DEBUG,0,"archived transaction log file 
""00010003""",""
2016-02-23 20:33:42.601 CET,,,54998,,56ccb3f3.d6d6,4,,2016-02-23 20:33:07 
CET,,0,PANIC,XX000,"concurrent transaction log activity while database system 
is shutting down",""
2016-02-23 20:33:43.537 CET,,,54995,,56ccb3f3.d6d3,5,,2016-02-23 20:33:07 
CET,,0,LOG,0,"checkpointer process (PID 54998) was terminated by signal 6: 
Abort trap",""
2016-02-23 20:33:43.537 CET,,,54995,,56ccb3f3.d6d3,6,,2016-02-23 20:33:07 
CET,,0,LOG,0,"terminating any other active server processes",

Basically, the issue is that CreateCheckPoint calls CheckpointReplicationSlots, 
which currently produces WAL, and this violates the assumption at line 
xlog.c:8492

if (shutdown && checkPoint.redo != ProcLastRecPtr)
ereport(PANIC,
(errmsg("concurrent transaction log activity 
while database system is shutting down")));


There are a couple of incorrect comments

logical.c: 90
There's some things missing to allow this: I think it should be “There are some 
things missing to allow this:”

logical.c:93
"we need we would need”

slot.c:889
"and there's no latch to set, so poll” - clearly there is a latch used in the 
code below.

Also,  slot.c:301 emits an error message for an attempt to create a failover 
slot on the replica after acquiring and releasing the locks and getting the 
shared memory slot, even though all the data to check for this condition is 
available right at the beginning of the function. Shouldn’t it avoid the extra 
work if it’s not needed? 

> 
> 
> -- 
>  Craig Ringer   http://www.2ndQuadrant.com/ 
> 
>  PostgreSQL Development, 24x7 Support, Training & Services
> <0001-Allow-logical-slots-to-follow-timeline-switches.patch><0002-Allow-replication-slots-to-follow-failover.patch><0003-Retain-extra-WAL-for-failover-slots-in-base-backups.patch><0004-Add-the-UI-and-for-failover-slots.patch><0005-Document-failover-slots.patch><0006-Add-failover-to-pg_replication_slots.patch><0007-not-for-inclusion-Test-script-for-failover-slots.patch>


Kind regards,
--
Oleksii



Re: [HACKERS] proposal: schema PL session variables

2016-02-23 Thread Pavel Stehule
2016-02-12 22:41 GMT+01:00 Jim Nasby :

> On 2/12/16 2:58 PM, Pavel Stehule wrote:
>
>>
>> So I think adding something like this needs to at least address
>> *how* SQL level access would work, *when* it's eventually implemented.
>>
>>
>> I understand - and I agree.
>>
>> small note: Private variables should not be executed from any SQL,
>> because SQL has not directly related schema. It can be executed only
>> from SQL embedded in some object with attached schema - PL functions,
>> views, constraints, .. So for this use case, the important information
>> is info about a container. We have to hold info about related schema in
>> planner/executor context.
>>
>
> I think that's probably true, but this also shows why we need to consider
> different PLs too. As it stands right now, the only way to access a
> variable outside of plpgsql would be to call a plpgsql function, and
> currently there's no way to make a plpgsql function private. So for this to
> work, private variables probably need to be exposed directly through the pl
> handler.
>
> Again, I'm not saying this all has to be implemented up front, but there
> needs to be a plan for how it would work.
>
> I think it would be a good idea to start a wiki page on this topic to
> start collecting stuff together.


I wrote some basic info -
https://wiki.postgresql.org/wiki/CREATE_PRIVATE_VARIABLE

I changed my opinion on initialization part. The private variables with non
null default should be initialized in session start. It is much more
practical and it can be used for triggering some ON CONNECT custom routines.

Regards

Pavel



>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


Re: [HACKERS] pg_export_snapshot on standby side

2016-02-23 Thread Bruce Momjian
On Thu, Aug  7, 2014 at 01:26:24PM +0900, Fujii Masao wrote:
> On Thu, Aug 7, 2014 at 2:17 AM, Bruce Momjian  wrote:
> >
> > Seems we still have not addressed this.
> 
> Thanks for the reminder :) I'm not sure if I have time to work on this
> for a while. If anyone is interested in this, please feel free to try it.

Added to TODO:

Allow pg_export_snapshot() to run on hot standby servers

This will allow parallel pg_dump on such servers.

pg_export_snapshot on standby side 

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] unexpected result from to_tsvector

2016-02-23 Thread Artur Zakirov

Hello,

Here is a little patch. It fixes this issue 
http://www.postgresql.org/message-id/20160217080048.26357.49...@wrigleys.postgresql.org


Without patch we get wrong result for the second email 't...@123-reg.ro':

=> SELECT * FROM ts_debug('simple', 't...@vauban-reg.ro');
 alias |  description  |   token| dictionaries | dictionary 
|   lexemes

---+---++--++--
 email | Email address | t...@vauban-reg.ro | {simple} | simple  | 
{t...@vauban-reg.ro}

(1 row)

=> SELECT * FROM ts_debug('simple', 't...@123-reg.ro');
   alias   |   description| token  | dictionaries | dictionary | 
lexemes

---+--++--++--
 asciiword | Word, all ASCII  | test   | {simple} | simple | {test}
 blank | Space symbols| @  | {}   ||
 uint  | Unsigned integer | 123| {simple} | simple | {123}
 blank | Space symbols| -  | {}   ||
 host  | Host | reg.ro | {simple} | simple | 
{reg.ro}

(5 rows)

After patch we get correct result for the second email:

=> SELECT * FROM ts_debug('simple', 't...@123-reg.ro');
 alias |  description  |  token  | dictionaries | dictionary | 
  lexemes

---+---+-+--++--
 email | Email address | t...@123-reg.ro | {simple} | simple  | 
{t...@123-reg.ro}

(1 row)

This patch allows to parser work with emails 't...@123-reg.ro', 
'1...@123-reg.ro' and 'test@123_reg.ro' correctly.


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
*** a/src/backend/tsearch/wparser_def.c
--- b/src/backend/tsearch/wparser_def.c
***
*** 1121,1126  static const TParserStateActionItem actionTPS_InUnsignedInt[] = {
--- 1121,1129 
  	{p_iseqC, '.', A_PUSH, TPS_InUDecimalFirst, 0, NULL},
  	{p_iseqC, 'e', A_PUSH, TPS_InMantissaFirst, 0, NULL},
  	{p_iseqC, 'E', A_PUSH, TPS_InMantissaFirst, 0, NULL},
+ 	{p_iseqC, '-', A_PUSH, TPS_InHostFirstAN, 0, NULL},
+ 	{p_iseqC, '_', A_PUSH, TPS_InHostFirstAN, 0, NULL},
+ 	{p_iseqC, '@', A_PUSH, TPS_InEmail, 0, NULL},
  	{p_isasclet, 0, A_PUSH, TPS_InHost, 0, NULL},
  	{p_isalpha, 0, A_NEXT, TPS_InNumWord, 0, NULL},
  	{p_isspecial, 0, A_NEXT, TPS_InNumWord, 0, NULL},

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-23 Thread Bruce Momjian
On Tue, Feb 23, 2016 at 09:54:46AM -0700, David G. Johnston wrote:
> On Tue, Feb 23, 2016 at 9:43 AM, Bruce Momjian  wrote:
> 
> 4. Cross-node read-write queries:
> 
> This will require a global snapshot manager and global snapshot manager.
> 
> 
> Probably meant "global transaction manager"

Oops, yes, it should be:

4. Cross-node read-write queries:

This will require a global snapshot manager and global transaction
manager.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-23 Thread David G. Johnston
On Tue, Feb 23, 2016 at 9:43 AM, Bruce Momjian  wrote:

> 4. Cross-node read-write queries:
>
> This will require a global snapshot manager and global snapshot manager.
>

Probably meant "global transaction manager"

​David J.​


[HACKERS] The plan for FDW-based sharding

2016-02-23 Thread Bruce Momjian
There was discussion at the FOSDEM/PGDay Developer Meeting
(https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting)
about sharding so I wanted to outline where I think we are going with
sharding and FDWs.

First, let me point out that, unlike pg_upgrade and the Windows port,
which either worked or didn't work, sharding is going be implemented and
useful in stages.  It will take several years to complete, similar to
parallelism, streaming replication, and logical replication.

Second, as part of this staged implementation, there are several use
cases that will be shardable at first, and then only later, more complex
ones.  For example, here are some use cases and the technology they
require:

1. Cross-node read-only queries on read-only shards using aggregate
queries, e.g. data warehouse:

This is the simplest to implement as it doesn't require a global
transaction manager, global snapshot manager, and the number of rows
returned from the shards is minimal because of the aggregates.

2. Cross-node read-only queries on read-only shards using non-aggregate
queries:

This will stress the coordinator to collect and process many returned
rows, and will show how well the FDW transfer mechanism scales.

3. Cross-node read-only queries on read/write shards:

This will require a global snapshot manager to make sure the shards
return consistent data.

4. Cross-node read-write queries:

This will require a global snapshot manager and global snapshot manager.

In 9.6, we will have FDW join and sort pushdown
(http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
calability.html).  Unfortunately I don't think we will have aggregate
pushdown, so we can't test #1, but we might be able to test #2, even in
9.5.  Also, we might have better partitioning syntax in 9.6.

We need things like parallel partition access and replicated lookup
tables for more join pushdown.

In a way, because these enhancements are useful independent of sharding,
we have not tested to see how well an FDW sharding setup will work and
for which workloads.

We know Postgres XC/XL works, and scales, but we also know they require
too many code changes to be merged into Postgres (at least based on
previous discussions).  The FDW sharding approach is to enhance the
existing features of Postgres to allow as much sharding as possible.

Once that is done, we can see what workloads it covers and
decide if we are willing to copy the volume of code necessary
to implement all supported Postgres XC or XL workloads.
(The Postgres XL license now matches the Postgres license,
http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
Postgres XC has always used the Postgres license.)

If we are not willing to add code for the missing Postgres XC/XL
features, Postgres XC/XL will probably remain a separate fork of
Postgres.  I don't think anyone knows the answer to this question, and I
don't know how to find the answer except to keep going with our current
FDW sharding approach.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sanity checking for ./configure options?

2016-02-23 Thread Alvaro Herrera
Jim Nasby wrote:
> On 2/5/16 10:08 AM, David Fetter wrote:
> >On Wed, Feb 03, 2016 at 06:02:57PM -0600, Jim Nasby wrote:
> >>I just discovered that ./configure will happily accept '--with-pgport=' (I
> >>was actually doing =$PGPORT, and didn't realize $PGPORT was empty). What you
> >>end up with is a compile error in guc.c, with no idea why it's broken. Any
> >>reason not to have configure or at least make puke if pgport isn't valid?
> >
> >That seems like a good idea.
> 
> Patch attached. I've verified it with --with-pgport=, =0, =7 and =1. It
> catches what you'd expect it to.

Does it work to specify port numbers below 1024?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tab completion for CREATE USER MAPPING

2016-02-23 Thread Tom Lane
Fujii Masao  writes:
> Attached patch implements the tab-completion for CREATE USER MAPPING command.
> Comments?

I think it's really bad style to use "Query_for_list_of_users"
as the name of a query that does not in fact deliver a list of
user names, but something else with only one specialized use.
Something like "Query_for_list_of_users_plus_MAPPING" would be
less likely to trip up the unwary in future patches.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] JDBC behaviour

2016-02-23 Thread Tom Lane
Craig Ringer  writes:
> On 23 February 2016 at 21:34, Robert Haas  wrote:
>> I believe Sridhar is imagining that someday "set autocommit to false"
>> might be a command that the server would understand.

> ... I guess. Yeah.

We've been there, we've done that.  We're not doing it again.
Cf commits 26993b291, f85f43dfb, 525a48991, as well as a whole
bunch of thrashing in between the first two (grep the git logs
for "autocommit" to find most of it).  It's a bit harder to locate
relevant email threads, because searching for just "autocommit"
yields too many hits; but here's one long thread from when we were
starting to realize that it wasn't working very well:
http://www.postgresql.org/message-id/flat/3e54526a.121eb...@tpf.co.jp

In all, this was one of the more searing experiences contributing
to what's now received project wisdom that GUCs that change
fundamental semantics are a bad idea.

> Oracle's SQL*Plus has the concept of turning autocommit off, but I suspect
> that's client-side behaviour.

The conclusion we came to back in 2002-2003 was that client-side
autocommit was the only behavior we could sanely support.  I see
no reason to think that a fresh experiment in the same direction
would produce a different result.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [JDBC] JDBC behaviour

2016-02-23 Thread Craig Ringer
On 23 February 2016 at 21:34, Robert Haas  wrote:

> On Sat, Feb 20, 2016 at 4:14 PM, Craig Ringer 
> wrote:
> >> currently PostgreSQL::"set autocommit to FALSE ( not supported )
> >
> > This also does not make any sense.
> >
> > PgJDBC does support turning autocommit off. So I don't know in what way
> it's
> > "not supported".
>
> I believe Sridhar is imagining that someday "set autocommit to false"
> might be a command that the server would understand.
>

... I guess. Yeah.

Oracle's SQL*Plus has the concept of turning autocommit off, but I suspect
that's client-side behaviour.

http://docs.oracle.com/cd/B19306_01/server.102/b14357/ch12040.htm

I can't really imagine how it'd make sense on the server side, given how
the protocol works etc. Nor is it necessary since the desired behaviour is
entirely controlled on the client side.

We could have a server mode that did silent, automatic savepoints and
rolled back to a savepoint automatically on ERROR. That wouldn't be the
same as autocommit, but appears to be what Sridhar actually needs. There's
even the remotest chance someone could come up with a patch that might be
acceptable, but I don't know of anyone who'd want to do it when it can be
done well enough client side.

I think Sridhar is confusing autocommit with other DBMSes behaviour of
automatically rolling back failed statements without affecting the rest of
the transaction. These are not the same thing.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Declarative partitioning

2016-02-23 Thread Robert Haas
On Thu, Feb 18, 2016 at 11:11 AM, Amit Langote
 wrote:
>> Personally, I would be more inclined to make this a CREATE TABLE statement, 
>> like
>>
>> CREATE TABLE partition_name PARTITION OF parent_name FOR VALUES ...
>> CREATE TABLE subpartition_name PARTITION OF partition_name FOR VALUES ...
>
> I guess the first of which would actually be:
>
> CREATE TABLE partition_name PARTITION OF parent_name FOR VALUES ...
> PARTITION BY ...
>
> Some might think that writing potentially the same PARTITION BY clause 100
> times for 100 level-1 partitions could be cumbersome. That is what
> SUBPARTITION BY notation may be good as a shorthand for.

I think that anybody who is doing that much partitioning is going to
use a tool to generate the DDL anyway, so it doesn't really matter.

>> I think if you've got SUBPARTITION as a keyword in the syntax
>> anywhere, you're doing it wrong.  The toplevel object shouldn't really
>> care whether its children are themselves partitioned or not.
>
> This is fine for the internals. SUBPARTITION BY is mere syntax sugar. It
> might as well be just cascaded PARTITION BYs. The point is to specify as
> many of those with CREATE TABLE toplevel as the number of levels of
> partitioning we want. That does however fix the number of levels in advance.

Which I think is not good.  If we say that a table can be partitioned,
and a table that is a partition can also be partitioned, we've got a
nice general system.  Fixing the number of partitioning levels for the
sake of a little syntactic sugar is, IMHO, getting our priorities
backwards.

> In the patch I have posted, here are some details of the tuple routing
> implementation for instance - the top-level parent knows only of its
> immediate partitions. Once a level-1 partition from that list is
> determined using a tuple's level-1 key, the tuple is passed down to choose
> one of its own partitions using the level-2 key. The key descriptor list
> is associated with the top-level parent alone and the recursive code knows
> to iterate the key list to apply nth key for nth level. The recursion
> happens in the partition module.

I haven't looked at the code, but that sounds like the right approach.

> Now there are also functions to let obtain, say *all* or only *leaf*
> partitions (OIDs) of a top-level parent but they are used for certain DDL
> scenarios. As far as DML is concerned, the level-at-a-time recursive
> approach as described above is used. Queries not yet because the plan is a
> flattened append of leaf partitions anyway.

OK.

> If such notation convenience at the expense of loss of generality is not
> worth it, I'm willing to go ahead and implement SUBPARTITION-less syntax.
>
> CREATE TABLE toplevel() PARTITION BY
> CREATE TABLE partition PARTITION OF toplevel FOR VALUES ... PARTITION BY
> CREATE TABLE subpartition PARTITION OF partition FOR VALUES
> ALTER TABLE partitioned ATTACH PARTITION name FOR VALUES ... USING TABLE
> ALTER TABLE partitioned DETACH PARTITION name [ USING newname ]
>
> While we are on the syntax story, how about FOR VALUES syntax for range
> partitions (sorry for piggybacking it here in this message). At least some
> people dislike LESS THAN notation. Corey Huinker says we should be using
> range type literals for that. It's not clear though that using range type
> literals directly is a way ahead. For one, range type API expects there to
> exist a range type with given element type. Whereas all we require for
> range partitioning proper is for column type to have a btree operator
> class. Should we require it to have an associated range type as well?
> Don't think that there exists an API to check that either. All in all,
> range types are good to implement things in applications but not so much
> within the backend (unless I'm missing something). I know reinventing the
> wheel is disliked as well but perhaps we could offer something like the
> following because Corey offered some examples which would help from the
> flexibility:
>
> START [ EXCL ] (startval) END [ INCL ] (endval)

I don't think using range type literals is gonna work.  There's no
guarantee that the necessary range types exist.  However, we could
possibly use a syntax inspired by the syntax for range types.  I'm a
little nervous about asking people to type commands with mismatching
braces:

CREATE TABLE partition_name PARTITION OF parent_name FOR VALUES [ 1, 10 );

...but maybe it's a good idea.  It certainly has the advantage of
being more compact than a lot of there ways we might choose to write
that.  And I think LESS THAN sucks.  It's just clunky and awkward
syntax.

> Some people said something akin to interval partitioning would be good like:
>
> PARTITION BY RANGE ON (columns) INCREMENT BY (INTERVAL '1 month' ) START
> WITH value;
>
> But that could be just a UI addition to the design where each partition
> has [startval, endval) bounds. In any case, not a version 1 material I'd
> think.

Agreed; not right now.

> I see the trade-off. I ag

Re: [HACKERS] Writing new unit tests with PostgresNode

2016-02-23 Thread Craig Ringer
On 23 February 2016 at 21:39, Craig Ringer  wrote:


>
> Just finished doing that. Thanks for taking a look at the first patch so
> quickly. I hope this one's better.
>

CF entry created.

https://commitfest.postgresql.org/9/536/


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Writing new unit tests with PostgresNode

2016-02-23 Thread Craig Ringer
On 23 February 2016 at 09:52, Michael Paquier 
wrote:

> On Tue, Feb 23, 2016 at 12:29 AM, Alvaro Herrera
>  wrote:
> > Craig Ringer wrote:
> >
> >> > +=pod
> >> > +
> >> > +=head2 Set up a node
> >> > pod format... Do we really want that? Considering that those modules
> >> > are only aimed at being dedicated for in-core testing, I would say no.
> >>
> >> If it's plain comments you have to scan through massive piles of verbose
> >> Perl to find what you want. If it's pod you can just perldoc
> >> /path/to/module it and get a nice summary of the functions etc.
> >>
> >> If these are intended to become usable facilities for people to write
> tests
> >> with then I think it's important that the docs be reasonably accessible.
> >
> > Yes, I think adding POD here is a good idea.  I considered doing it
> > myself back when I was messing with PostgresNode ...
>
> OK, withdrawal from here. If there are patches to add that to the
> existing tests, I'll review them, and rebase what I have depending on
> what gets in first. Could a proper patch split be done please?
>

Just finished doing that. Thanks for taking a look at the first patch so
quickly. I hope this one's better.

FWIW I think you were right that using pod for the actual test wasn't the
best choice, I should've just used comments. I do think it's important for
the modules to have structured docs.

I've removed the example suite in favour of adding a SYNOPSIS section to
PostgresNode.pm and describing the rest in the README. It won't be
necessary once your replication tests go in, they'll be a perfectly
adequate example.

I also cut out the changes to the backup method; I'll send a pull to add to
your proposed replication patch instead.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 99d01a48aa9e4a131767e8565e6a678c54a7555a Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Tue, 23 Feb 2016 20:37:57 +0800
Subject: [PATCH 1/3] Document where to find PostgreSQL's tests

---
 src/test/README | 40 
 src/test/modules/README | 13 +
 2 files changed, 53 insertions(+)
 create mode 100644 src/test/README
 create mode 100644 src/test/modules/README

diff --git a/src/test/README b/src/test/README
new file mode 100644
index 000..7ed0706
--- /dev/null
+++ b/src/test/README
@@ -0,0 +1,40 @@
+PostgreSQL tests
+
+
+This directory contains a variety of test infrastructure as well as some of the
+tests in PostgreSQL. Not all tests are here - in particular, there are more in
+individual contrib/ modules and in src/bin.
+
+Not all these tests get run by "make check". Check src/test/Makefile to see
+which tests get run automatically.
+
+examples/
+  demo programs for libpq that double as regression tests via "make check"
+
+isolation/
+  tests for concurrent behaviours at the SQL level
+
+locale/
+  sanity checks for locale data, encodings, etc
+
+mb/
+  tests for multibyte encoding (UTF-8) support
+
+modules/
+  extensions used only or mainly for test purposes, generally not useful
+  or suitable for installing in production databases. Some of these have
+  their own tests, some are used by tests elsewhere.
+
+perl/
+  infrastructure for Perl-based Test::More tests. There are no actual tests
+  here, the code is used by other tests in src/bin/, contrib/ and in the
+  subdirectories of src/test.
+
+regress/
+  PostgreSQL's main regression test suite, pg_regress
+
+ssl/
+  Tests to exercise and verify SSL certificate handling
+
+thread/
+  A thread-safety-testing utility used by configure, see its README
diff --git a/src/test/modules/README b/src/test/modules/README
new file mode 100644
index 000..42c5d38
--- /dev/null
+++ b/src/test/modules/README
@@ -0,0 +1,13 @@
+Test extensions and libraries
+=
+
+src/test/modules contains PostgreSQL extensions that are primarily or entirely
+intended for testing PostgreSQL and/or to serve as example code. The extensions
+here aren't intended to be installed in a production server and aren't useful
+for "real work".
+
+Most extensions have their own pg_regress tests. Some are also used by tests
+elsewhere in the test tree.
+
+If you're adding new hooks or other functionality exposed as C-level API this
+is where to add the tests for it.
-- 
2.1.0

From 2e89c8514a0b380d43f3feb1beab0e5cbec3b283 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Tue, 23 Feb 2016 21:34:30 +0800
Subject: [PATCH 2/3] Add a README for the perl tests and some perldoc

---
 src/test/perl/PostgresNode.pm | 399 +++---
 src/test/perl/README  |  77 
 2 files changed, 447 insertions(+), 29 deletions(-)
 create mode 100644 src/test/perl/README

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 2ab9aee..a995d28 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1,8 +1,62

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-02-23 Thread Robert Haas
On Sun, Feb 21, 2016 at 7:45 PM, Amit Kapila 
wrote:

> I mean, my basic feeling is that I would not accept a 2-3% regression in
>> the single client case to get a 10% speedup in the case where we have 128
>> clients.
>>
>
> I understand your point.  I think to verify whether it is run-to-run
> variation or an actual regression, I will re-run these tests on single
> client multiple times and post the result.
>

Perhaps you could also try it on a couple of different machines (e.g.
MacBook Pro and a couple of different large servers).


>
>   A lot of people will not have 128 clients; quite a few will have a
>> single session, or just a few.  Sometimes just making the code more complex
>> can hurt performance in subtle ways, e.g. by making it fit into the L1
>> instruction cache less well.  If the numbers you have here are accurate,
>> I'd vote to reject the patch.
>>
> One point to note is that this patch along with first patch which I
> posted in this thread to increase clog buffers can make significant
> reduction in contention on CLogControlLock.  OTOH, I think introducing
> regression at single-client is also not a sane thing to do, so lets
> first try to find if there is actually any regression and if it is, can
> we mitigate it by writing code with somewhat fewer instructions or
> in a slightly different way and then we can decide whether it is good
> to reject the patch or not.  Does that sound reasonable to you?
>

Yes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] [JDBC] JDBC behaviour

2016-02-23 Thread Robert Haas
On Sat, Feb 20, 2016 at 4:14 PM, Craig Ringer  wrote:
>> currently PostgreSQL::"set autocommit to FALSE ( not supported )
>
> This also does not make any sense.
>
> PgJDBC does support turning autocommit off. So I don't know in what way it's
> "not supported".

I believe Sridhar is imagining that someday "set autocommit to false"
might be a command that the server would understand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-23 Thread Robert Haas
On Fri, Feb 19, 2016 at 6:32 AM, Michael Paquier
 wrote:
>> To be honest, my heart still balances for the Extended() interface.
>> This reduces the risk of conflicts with back-patching with 9.5.
>
> Andres, others, what else can I do to make this thread move on? I can
> produce any version of this patch depending on committer's
> requirements.

FWIW, I don't expect to have time to review this in the level of
detail that would make me confident to commit it any time soon.  I've
said my piece on what I think the final patch should look like, and I
hope that argument was persuasive, but I don't have anything further
to add to what I already said.  I hope some other committer has some
cycles to look at this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution.

2016-02-23 Thread Robert Haas
On Mon, Feb 22, 2016 at 7:59 PM, Tom Lane  wrote:
>> No, you don't.  I've spent a good deal of time thinking about that problem.
>> [ much snipped ]
>> Unless I'm missing something, though, this is a fairly obscure
>> problem.  Early release of catalog locks is desirable, and locks on
>> scanned tables should be the same locks (or weaker) than already held
>> by the master.  Other cases are rare.  I think.  It would be good to
>> know if you think otherwise.
>
> After further thought, what I think about this is that it's safe so long
> as parallel workers are strictly read-only.  Given that, early lock
> release after user table access is okay for the same reasons it's okay
> after catalog accesses.  However, this is one of the big problems that
> we'd have to have a solution for before we ever consider allowing
> read-write parallelism.

Actually, I don't quite see what read-only vs. read-write queries has
to do with this particular issue.  We retain relation locks on target
relations until commit, regardless of whether those locks are
AccessShareLock, RowShareLock, or RowExclusiveLock.  As far as I
understand it, this isn't because anything would fail horribly if we
released those locks at end of query, but rather because we think that
releasing those locks early might result in unpleasant surprises for
client applications.  I'm actually not really convinced that's true: I
will grant that it might be surprising to run the same query twice in
the same transaction and get different tuple descriptors, but it might
also be surprising to get different rows, which READ COMMITTED allows
anyway.  And I've met a few users who were pretty surprised to find
out that they couldn't do DDL on table A and the blocking session
mentioned table A nowhere in the currently-executing query.

The main issues with allowing read-write parallelism that I know of
off-hand are:

* Updates or deletes might create new combo CIDs.  In order to handle
that, we'd need to store the combo CID mapping in some sort of
DSM-based data structure which could expand as new combo CIDs were
generated.

* Relation extension locks, and a few other types of heavyweight
locks, are used for mutual exclusion of operations that would need to
be serialized even among cooperating backends.  So group locking would
need to be enhanced to handle those cases differently, or some other
solution would need to be found.  (I've done some more detailed
analysis here about possible solutions most of which has been posted
to -hackers in various emails at one time or another; I'll refrain
from diving into all the details in this missive.)

But those are separate from the question of whether parallel workers
need to transfer any heavyweight locks they accumulate on non-scanned
tables back to the leader.

> So what distresses me about the current situation is that this is a large
> stumbling block that I don't see documented anywhere.  It'd be good if you
> transcribed some of this conversation into README.parallel.
>
> (BTW, I don't believe your statement that transferring locks back to the
> master would be deadlock-prone.  If the lock system treats locks held by
> a lock group as effectively all belonging to one entity, then granting a
> lock identical to one already held by another group member should never
> fail.  I concur that it might be expensive performance-wise, though it
> hardly seems like this would be a dominant factor compared to all the
> other costs of setting up and tearing down parallel workers.)

I don't mean that the heavyweight lock acquisition itself would fail;
I agree with your analysis on that.  I mean that you'd have to design
the protocol for the leader and the worker to communicate very
carefully in order for it not to get stuck.  Right now, the leader
initializes the DSM at startup before any workers are running with all
the data the workers will need, and after that data flows strictly
from workers to leader.  So the workers could send control messages
indicating heavyweight locks that they held to the leader, and that
would be fine.  Then the leader would need to read those messages and
do something with them, after which it would need to tell the workers
that they could now exit.  You'd need to make sure there was no
situation in which that handshake couldn't get stuck, for example
because the leader was waiting for a tuple from the worker while the
worker was waiting for a lock-acquisition-confirmation from the
leader.  That particular thing is probably not an issue but hopefully
it illustrates the sort of hazard I'm concerned about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] tab completion for CREATE USER MAPPING

2016-02-23 Thread Fujii Masao
Hi,

Attached patch implements the tab-completion for CREATE USER MAPPING command.
Comments?

Regards,

-- 
Fujii Masao
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
***
*** 656,661  static const SchemaQuery Query_for_list_of_matviews = {
--- 656,667 
  "   FROM pg_catalog.pg_roles "\
  "  WHERE substring(pg_catalog.quote_ident(rolname),1,%d)='%s'"
  
+ #define Query_for_list_of_users \
+ " SELECT pg_catalog.quote_ident(rolname) "\
+ "   FROM pg_catalog.pg_roles "\
+ "  WHERE substring(pg_catalog.quote_ident(rolname),1,%d)='%s'"\
+ "  UNION ALL SELECT 'mapping'"
+ 
  #define Query_for_list_of_grant_roles \
  " SELECT pg_catalog.quote_ident(rolname) "\
  "   FROM pg_catalog.pg_roles "\
***
*** 920,926  static const pgsql_thing_t words_after_create[] = {
  	{"UNIQUE", NULL, NULL, THING_NO_DROP},		/* for CREATE UNIQUE INDEX ... */
  	{"UNLOGGED", NULL, NULL, THING_NO_DROP},	/* for CREATE UNLOGGED TABLE
   * ... */
! 	{"USER", Query_for_list_of_roles},
  	{"USER MAPPING FOR", NULL, NULL},
  	{"VIEW", NULL, &Query_for_list_of_views},
  	{NULL}		/* end of list */
--- 926,932 
  	{"UNIQUE", NULL, NULL, THING_NO_DROP},		/* for CREATE UNIQUE INDEX ... */
  	{"UNLOGGED", NULL, NULL, THING_NO_DROP},	/* for CREATE UNLOGGED TABLE
   * ... */
! 	{"USER", Query_for_list_of_users},
  	{"USER MAPPING FOR", NULL, NULL},
  	{"VIEW", NULL, &Query_for_list_of_views},
  	{NULL}		/* end of list */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pushing down sorted joins

2016-02-23 Thread Ashutosh Bapat
Rushabh pointed out that declarations of helper functions
get_useful_ecs_for_relation and get_useful_pathkeys_for_relation() are part
of FDW routines declarations rather than helper function declaration. Since
those functions are related to this patch, the attached patch moves those
declaration in their right place.

On Wed, Feb 17, 2016 at 5:37 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> Hi All,
> Now that we have join pushdown support in postgres_fdw, we can leverage
> the sort pushdown mechanism for base relations to work for pushed down
> joins as well. PFA patch for the same.
>
> The code to find useful set of pathkeys and then generate paths for each
> list of pathkeys is moved into a function which is called for base
> relations and join relations, while creating respective paths. The useful
> pathkeys are same as the base relation i.e. root->query_pathkeys and
> pathkeys useful for merge join as discussed in [1].
>
> I measured performance of pushing down sort for merge joins for query
> SELECT lt1.val, ft1.val, ft2.val FROM lt1 join (ft1 join ft2 on (ft1.val =
> ft2.val)) on (lt1.val = ft1.val) where ft1, ft2 are foreign tables, join
> between which gets pushed down to the foreign server and lt is the local
> table.
>
> Without the patch servers prefers local merge join between foreign tables
> followed by merge join with local table by getting the data sorted from the
> foreign server. But with the patch, it pushes down the foreign join and
> also gets the data sorted for local merge join. The times measured over 10
> runs of query with and without patch are
>
> With patch
>  avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
> --+--+--+--
>60310.0369 | 251.075471210925 |59895.064 |60746.496
>
> Without patch
>  avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
> --+--+--+--
>86396.6001 |  254.30988131848 |85906.606 |86742.311
>
> With the patch the execution time of the query reduces by 30%.
>
> The scripts to setup and run query and outputs of running query with and
> without patch are attached.
>
>
> [1]
> http://www.postgresql.org/message-id/CAFjFpRfeKHiCmwJ72p4=zvuzrqsau9tbfyw7vwr-5ppvrcb...@mail.gmail.com
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pg_join_sort_pd_v2.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw vs. force_parallel_mode on ppc

2016-02-23 Thread Robert Haas
On Tue, Feb 23, 2016 at 11:47 AM, Tom Lane  wrote:
> Even if there were, it would not fix this bug, because AFAICS the only
> thing that set_rel_consider_parallel is chartered to do is set the
> per-relation consider_parallel flag.  The failure that is happening in
> that regression test with force_parallel_mode turned on happens because
> standard_planner plasters a Gather node at the top of the plan, causing
> the whole plan including the FDW access to happen inside a parallel
> worker.  The only way to prevent that is to clear the
> wholePlanParallelSafe flag, which as far as I can tell (not that any of
> this is documented worth a damn) isn't something that
> set_rel_consider_parallel is supposed to do.

Hmm.  Well, if you tested it, or looked at the places where
wholePlanParallelSafe is cleared, you would find that it DOES fix the
bug.  create_plan() clears wholePlanParallelSafe if the plan is not
parallel-safe, and the plan won't be parallel-safe unless
consider_parallel was set for the underlying relation.  In case you'd
like to test it for yourself, here's the PoC patch I wrote:

diff --git a/src/backend/optimizer/path/allpaths.c
b/src/backend/optimizer/path/allpaths.c
index bcb668f..8a4179e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -527,6 +527,11 @@ set_rel_consider_parallel(PlannerInfo *root,
RelOptInfo *rel,
return;
return;
}
+
+   /* Not for foreign tables. */
+   if (rte->relkind == RELKIND_FOREIGN_TABLE)
+   return;
+
break;

case RTE_SUBQUERY:

Adding that makes the postgres_fdw case pass.

> It looks to me like there is a good deal of fuzzy thinking here about the
> difference between locally parallelizable and globally parallelizable
> plans, ie Gather at the top vs Gather somewhere else.

If you have a specific complaint, I'm happy to try to improve things,
or you can.  I think however that it is also possible that you haven't
fully understood the code I've spent the last year or so developing
yet, possibly because I haven't documented it well enough, but
possibly also because you haven't spent much time looking on it yet.
I'm glad you are, by the way, because I'm sure there are a bunch of
things here that you can improve over what I was able to do,
especially on the planner side of things, and that would be great.
However, a bit of forbearance would be appreciated.

> I also note with
> dismay that turning force_parallel_mode on seems to pretty much disable
> any testing of local parallelism.

No, I don't think so.  It doesn't push a Gather node on top of a plan
that already contains a Gather, because such a plan isn't
parallel_safe.   Nor does it suppress generation of parallel paths
otherwise.

>> In view of 69d34408e5e7adcef8ef2f4e9c4f2919637e9a06, we shouldn't
>> blindly assume that foreign scans are not parallel-safe, but we can't
>> blindly assume the opposite either.  Maybe we should assume that the
>> foreign scan is parallel-safe only if one or more of the new methods
>> introduced by the aforementioned commit are set, but actually that
>> doesn't seem quite right.  That would tell us whether the scan itself
>> can be parallelized, not whether it's safe to run serially but within
>> a parallel worker.  I think maybe we need a new FDW API that gets
>> called from set_rel_consider_parallel() with the root, rel, and rte as
>> arguments and which can return a Boolean.  If the callback is not set,
>> assume false.
>
> Meh.  As things stand, postgres_fdw would have to aver that it can't ever
> be safely parallelized, which doesn't seem like a very satisfactory answer
> even if there are other FDWs that work differently (and which would those
> be?  None that use a socket-style connection to an external server.)

file_fdw is parallel-safe, and KaiGai posted a patch that makes it
parallel-aware, though that would have needed more work than I'm
willing to put in right now to make it committable.  So in other
words...

> The commit you mention above seems to me to highlight the dangers of
> accepting hook patches with no working use-case to back them up.
> AFAICT it's basically useless for typical FDWs because of this
> multiple-connection problem.

...I didn't ignore this principal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Convert pltcl from strings to objects

2016-02-23 Thread Victor Wagner
On Mon, 22 Feb 2016 17:57:36 -0600
Jim Nasby  wrote:

> On 2/18/16 6:26 AM, Victor Wagner wrote:

> Are you referring to this comment?
> 
> > /
> >  * Add the proc description block to the
> > hashtable.  Note we do not
> >  * attempt to free any previously existing prodesc
> > block.  This is
> >  * annoying, but necessary since there could be
> > active calls using
> >  * the old prodesc.
> >  /  
> 
> That is not changed by the patch, and I don't think it's in the scope
> of this patch. I agree it would be nice to get rid of that and the
> related malloc() call, as well as what perm_fmgr_info() is doing, but
> those are unrelated to this work.

If it has been there, before, it is OK to leave it this way.
It was rather wild guess that use of referential counting which we now
have here, can solve this old problem.
 
> (BTW, I also disagree about using a Tcl list for prodesc. That's an 
> object in a *Postgres* hash table; as such it needs to be managed by 
> Postgres, not Tcl. AFAIK the Postgres ResourceOwner stuff would be
> the correct way to do that (but I'm not exactly an expert on that
> area).

As far as I understand ResourceOwner is completely different garbage
collection scheme, than reference counting. ResourceOwner mechanism lets
free everything associated with transaction when transaction is over.

Here we have another case. prodesc is a global thing. And it is shared
between different operations. Problem was that there is no partcular
owner, and we have to wait when last operation which deals with it
would finish. It looks like perfect job for reference counting.

Actually, it doesn't matter much, because it wastes some memory only on
procedure change, and typically in the heavy loaded production
environments server side code doesn't change too frequently.

> 
> > Function pltcl_elog have a big switch case to convert enum
> > logpriority, local to this function to PostgreSql log levels.
> >
> > It seems not a most readable solution, because this enumeration is
> > desined to be passed to Tcl_GetIndexFromObj, so it can 
[skip]
> 
> Done.

Glad that my suggestion was useful to you.

> 
> > It seems that you are losing some diagnostic information by
> > extracting just message field from ErrorData, which you do in
> > pltcl_elog and pltcl_subtrans_abort.
> >
> > Tcl has  mechanisms for passing around additional error information.
> > See Tcl_SetErrorCode and Tcl_AddErrorInfo functions
> >
> > pltcl_process_SPI_result might benefit from providing SPI result
> > code in machine readable way via Tcl_SetErrorCode too.  
> 
> Is there any backwards compatibility risk to these changes? Could
> having that new info break someone's existing code?

I don't think so. ErrorCode and ErrorInfo mechanisms present in the Tcl
for ages. As far as I remember, it predates Tcl_Obj API. Unfortunately
I've lost my copy of 2-nd edition of "Practical Programming on Tcl/Tk"
and have only 4-th edition handy, so it's quite hard to dig out Tcl 7.6
docs. 

Moreover. errorcode and errorinfo information travel behind the scenes
and cannot break any code which doesn't try to explicitely access it.


> > In some cases you use Tcl_SetObjResult(interp,
> > Tcl_NewStringObj("error message",-1)) to report error with constant
> > message, where in other places Tcl_SetResult(interp,"error
> > message",TCL_STATIC) left in place.
> >
> > I see no harm in using old API with static messages, especially if
> > Tcl_AppendResult is used, but mixing two patterns above seems to be
> > a bit inconsistent.  
> 
> Switched everything to using the new API.
> 
> > As far as I can understand, using Tcl_StringObj to represent all
> > postgresql primitive (non-array) types and making no attempt to
> > convert tuple data into integer, boolean or double objects is
> > design decision.
> >
> > It really can spare few unnecessary type conversions.  
> 
> Yeah, that's on the list. But IMHO it's outside the scope of this
> patch.

I've mean, that there is no need to be to aggressive in the type
conversion. You'll never know if Tcl code would actually access some
tuple elements or not. Let's leave conversion to the time of actual use
of values.
> 
> > Thanks for your effort.  
> 
> Thanks for the review!

Please, send updated patch to the list in this thread, so it would
appear in the commitfest and I can mark your patch as "ready for
committer".







-- 
   Victor Wagner 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Failover Slots

2016-02-23 Thread Craig Ringer
On 22 February 2016 at 23:39, Oleksii Kliukin  wrote:


> What it’s doing is calling pg_basebackup first to initialize the replica,
> and that actually failed with:
>
> _basebackup: unexpected termination of replication stream: ERROR:
>  requested WAL segment 0001 has already been removed
>
> The segment name definitely looks bogus to me.
>
> The actual command causing the failure was an attempt to clone the replica
> using pg_basebackup, turning on xlog streaming:
>
> pg_basebackup --pgdata data/postgres1 --xlog-method=stream
> --dbname="host=localhost port=5432  user=replicator”
>
> I checked the same command against the  git master without the patches
> applied and could not reproduce this problem there.
>

That's a bug. In testing whether we need to return a lower LSN for minimum
WAL for BASE_BACKUP it failed to properly test for InvalidXLogRecPtr . Good
catch.


> On the code level, I have no comments on 0001, it’s well documented and I
> have no questions about the approach, although I might be not too
> knowledgable to judge the specifics of the implementation.
>

The first patch is the most important IMO, and the one I think needs the
most thought since it's ... well, timelines aren't simple.


> slots.c:294
> elog(LOG, "persistency is %i", (int)slot->data.persistency);
>
> Should be changed to DEBUG?
>

That's an escapee log statement I thought I'd already rebased out. Well
spotted, fixed.


>
> slot.c:468
> Why did you drop “void" as a parameter type of ReplicationSlotDropAcquired?
>

That's an editing error on my part that I'll reverse. Since the prototype
declares (void) it doesn't matter, but it's a pointless change. Fixed.


> walsender.c: 1509 at PhysicalConfirmReceivedLocation
>
> I’ve noticed a comment stating that we don’t need to call
> ReplicationSlotSave(), but that pre-dated the WAL-logging of replication
> slot changes. Don’t we need to call it now, the same way it’s done for
> the logical slots in logical.c:at LogicalConfirmReceivedLocation?
>

No, it's safe here. All we must ensure is that a slot is advanced on the
replica when it's advanced on the master. For physical slots even that's a
weak requirement, we just have to stop them from falling *too* far behind
and causing too much xlog retention. For logical slots we should ensure we
advance the slot on the replica before any vacuum activity that might
remove catalog tuples still needed by that slot gets replayed. Basically
the difference is that logical slots keep track of the catalog xmin too, so
they have (slightly) stricter requirements.

This patch doesn't touch either of those functions except for
renaming ReplicationSlotsComputeRequiredLSN
to ReplicationSlotsUpdateRequiredLSN . Which, by the way, I really don't
like doing, but I couldn't figure out a name to give the function that
computes-and-returns the required LSN that wouldn't be even more confusing
in the face of having a ReplicationSlotsComputeRequiredLSN function as
well. Ideas welcome.

Updated patch

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-23 Thread Kyotaro HORIGUCHI
Hello,

At Mon, 22 Feb 2016 22:52:29 +0900, Fujii Masao  wrote 
in 
> On Tue, Feb 16, 2016 at 4:19 PM, Masahiko Sawada  
> wrote:
> > On Mon, Feb 15, 2016 at 2:54 PM, Michael Paquier
> >  wrote:
> >> On Mon, Feb 15, 2016 at 2:11 PM, Kyotaro HORIGUCHI wrote:
> >>> Surprizingly yes. The list is handled as an identifier list and
> >>> parsed by SplitIdentifierString thus it can accept double-quoted
> >>> names.
> >>
> >
> > Attached latest version patch which has only feature logic so far.
> > I'm writing document patch about this feature now, so this version
> > patch doesn't have document and regression test patch.
> 
> Thanks for updating the patch!
> 
> When I changed s_s_names to 'hoge*' and reloaded the configuration file,
> the server crashed unexpectedly with the following error message.
> This is obviously a bug.
> 
> FATAL:  syntax error

I had a glance on the lexer part in the new patch.  It'd be
better to design the lexer from the beginning according to the
required behavior.

The documentation for the syntax is saying as the following,

http://www.postgresql.org/docs/current/static/runtime-config-logging.html

> application_name (string)
> 
> The application_name can be any string of less than NAMEDATALEN
> characters (64 characters in a standard build).  Only
> printable ASCII characters may be used in the application_name
> value. Other characters will be replaced with question marks (?).

And according to what some functions mentioned so far do, totally
an application_name is treated as follwoing, I suppose.

- check_application_name() currently allows [\x20-\x7e], which
  differs from the definition of the SQL identifiers.

- SplitIdentifierString() and syncrep code

  - allows any byte except a double quote in double-quoted
   representation. A double-quote just after a delimiter can open
   quoted representation.

  - Non-quoted name can contain any character including double
quotes except ',' and white spaces.

  - The syncrep code does case-insensitive matching with the
   application_name.

So, to preserve or following the current behavior expct the last
one, the following pattern definitions would do. The
lexer/grammer for the new format of s_s_names could be simpler
than what it is.

space   [ \n\r\f\t\v] /* See the definition of isspace(3) */
whitespace  {space}+
dquote\"
app_name_chars[\x21-\x2b\x2d-\x7e]   /* excluding ' ', ',' */
app_name_indq_chars [\x20\x21\x23-\x7e]  /* excluding '"'  */
app_name_dq_chars ({app_name_indq_chars}|{dquote}{dquote})
delimiter {whitespace}*,{whitespace}*
app_name  ({app_name_chars}+|{dquote}{app_name_dq_chars}+{dquote})
s_s_names {app_name}({delimiter}{app_name})*

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers