date:20180122

Re: Query related to alter table ... attach partition

2018-01-22 Thread Amit Langote

On 2018/01/23 15:55, Ashutosh Sharma wrote:
>> However, we don't make it fail because the table has a constraint that
>> contradicts the partition constraint.  Attach succeeds in the absence of
>> any violating rows and the end result is that the table/partition has
>> contradictory constraints (the existing constraint and the partition
>> constraint) and that simply means no rows can be inserted into the
>> table/partition.
>>
> 
> That's right. But, shouldn't a partition that not at all fall in the
> partition range be rejected when user tries to attach it. I feel we
> should at least try throwing a WARNING message for it. Thoughts?

I did have such thoughts back when writing the patch, but we decided
during the review that it wasn't worthwhile; see this email for example:

https://www.postgresql.org/message-id/CA%2BTgmoaQABrsLQK4ms_4NiyavyJGS-b6ZFkZBBNC%2B-P5DjJNFA%40mail.gmail.com

Or see this:

create table foo (a int check (a > 1 and a < 1));

The above command doesn't fail even if the check expression in it is
contradictory.  If we think that the attach command in question should
fail or issue WARNING, then the above command should too.

Thanks,
Amit

Re: Enhance pg_stat_wal_receiver view to display connected host

2018-01-22 Thread Haribabu Kommi

On Tue, Jan 16, 2018 at 5:56 PM, Haribabu Kommi 
wrote:

>
> On Tue, Jan 16, 2018 at 2:55 PM, Michael Paquier <
> michael.paqu...@gmail.com> wrote:
>
>>
>> Note that I still find this API confusing, it seems to me that just
>> sorting out the confusion problems with PQhost and then use it would be
>> more simple.
>>
>
> OK, Understood. Even if the confusion problems with PQhost that are
> discussed in [1] are solved, we need two new API's that are required to\
> display the proper remote server details.
>
> PQhostNoDefault - Similar like PQhost but doesn't return default host
> details.
>
> Displaying default value always some confuse even if the user doesn't
> provide
> the host details, so to avoid that confusion, we need this function.
>
> PQhostaddr - Return hostaddr used in the connection.
>
> Without PQhostaddr() function, for the connections where the host is not
> specified, it will be difficult to find out to remote server.
>
> With the above two new API's we can display either string or individual
> columns
> representation of remote server.
>

As I didn't hear objections, I changed the patch as per the above
description
with two new libpq API's and also with three additional columns
"remote_hostname",
"remote_hostaddr" and "remote_port" to the pg_stat_wal_receiver view.

I didn't explicitly add the CONNECTION_BAD, because the added libpq
functions
must return the value even the connection is not established.

Updated patch attached.

Regards,
Hari Babu
Fujitsu Australia


0002-pg_stat_wal_receiver-to-display-connected-host.patch
Description: Binary data


0001-Addition-of-two-new-libpq-API-s.patch
Description: Binary data

Re: pgsql: Move handling of database properties from pg_dumpall into pg_dum

2018-01-22 Thread Haribabu Kommi

On Tue, Jan 23, 2018 at 8:56 AM, Tom Lane  wrote:

> I wrote:
> > Specifically, I see failures like this on machines where the prevailing
> > locale isn't C or US:
>
> > pg_restore: [archiver (db)] Error while PROCESSING TOC:
> > pg_restore: [archiver (db)] Error from TOC entry 4871; 2618 34337 RULE
> rtest_emp rtest_emp_del pgbf
> > pg_restore: [archiver (db)] could not execute query: ERROR:  invalid
> input syntax for type money: "$0.00"
> > LINE 3: ... ("old"."ename", CURRENT_USER, 'fired'::"bpchar",
> '$0.00'::"...
> >  ^
> > Command was: CREATE RULE "rtest_emp_del" AS
> > ON DELETE TO "rtest_emp" DO  INSERT INTO "rtest_emplog" ("ename",
> "who", "action", "newsal", "oldsal")
> >   VALUES ("old"."ename", CURRENT_USER, 'fired'::"bpchar",
> '$0.00'::"money", "old"."salary");
>
> Actually ... maybe what this is pointing out is a pre-existing deficiency
> in pg_dump, which is that it's failing to lock down lc_monetary during
> restore.  Arguably, we should teach _doSetFixedOutputState to set
> lc_monetary to whatever prevailed in the dump session, just as we do
> for client_encoding.  I seem now to recall some user complaints about
> unsafety of dump/restore for "money" values, which essentially is the
> problem we're seeing here.
>
> I think the reason we haven't done this already is fear of putting
> platform-dependent lc_monetary values into dump scripts.  That's
> certainly an issue, but it seems a minor one: at worst, you'd have
> to not use --single-transaction when restoring on a different platform,
> so you could ignore an error from the SET command.
>
> While this would fix the specific problem we're seeing in the buildfarm,
> I'm thinking we'd still need to do what I said in the previous message,
> and change pg_dump so that the restore session will absorb ALTER DATABASE
> settings before proceeding.  Otherwise, at minimum, we have a hazard of
> differing behaviors in serial and parallel restores.  It might be that
> lc_monetary is the only GUC that makes a real difference for this purpose,
> but I haven't got much faith in that proposition at the moment.
>

Yes, restore session should absorb all the ALTER DATABASE and ALTER ROLE
IN DATABASE settings also to make sure that the target database is in same
state when the dump has started.

currently "default_transaction_read_only" is the only GUC that affects the
absorbing the restore of a database.

As you said in upthread, I feel splitting them into two _TOC entries and
dump
as last commands of each database? Does it have any problem with parallel
restore?


Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

2018-01-22 Thread Kyotaro HORIGUCHI

Hello, I returned to this.

I thouroughly checked the translator's behavior against the XPath
specifications and checkd out the documentation and regression
test. Almost everything is fine for me and this would be the last
comment from me.

At Fri, 24 Nov 2017 18:32:43 +0100, Pavel Stehule  
wrote in 
> 2017-11-24 18:13 GMT+01:00 Pavel Stehule :
> 
> >
> >
> > 2017-11-24 17:53 GMT+01:00 Pavel Stehule :
> >
> >> Hi
> >>
> >> 2017-11-22 22:49 GMT+01:00 Thomas Munro :
> >>
> >>> On Thu, Nov 9, 2017 at 10:11 PM, Pavel Stehule 
> >>> wrote:
> >>> > Attached new version.
> >>>
> >>> Hi Pavel,
> >>>
> >>> FYI my patch testing robot says[1]:
> >>>
> >>>  xml  ... FAILED
> >>>
> >>> regression.diffs says:
> >>>
> >>> + SELECT x.* FROM t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'),
> >>> '/rows/row' PASSING t1.doc COLUMNS data int PATH
> >>> 'child::a[1][attribute::hoge="haha"]') as x;
> >>> + data
> >>> + --
> >>> + (0 rows)
> >>> +
> >>>
> >>> Maybe you forgot to git-add the expected file?
> >>>
> >>> [1] https://travis-ci.org/postgresql-cfbot/postgresql/builds/305979133
> >>
> >>
> >> unfortunately xml.out has 3 versions and is possible so one version
> >> should be taken elsewhere than my comp.
> >>
> >> please can me send your result xml.out file?
> >>
> >
> > looks like this case is without xml support so I can fix on my comp.
> >
> >
> fixed regress test

(I wouldn't have found that..)

I have three comments on the behavior and one on documentation.

1. Lack of syntax handling.

["'" [^'] "'"] is also a string literal, but getXPathToken is
forgetting that and applying default namespace mistakenly to the
astring content.

2. Additional comment might be good.

It might be better having additional description about default
namespace in the comment starts from "Namespace mappings are
passed as text[]" in xpth_internal().

3. Inconsistent behavior from named namespace.

| - function context, aliases are local).
| + function context, aliases are local). Default 
namespace has
| + empty name (empty string) and should be only one.

This works as the description, on the other hand the same
namespace prefix can be defined twice or more in the array and
the last one is in effect. I don't see a reason for
differenciating the default namespace case.

4. Comments on the documentation part.

# Even though I'm not sutable for commenting on wording...

| + Inside predicate literals and, or,
| + div and mod are used as keywords
| + (XPath operators) every time and default namespace are not applied 
there.

*I*'d like to have a comma between the predicate and literals,
and have a 'a' before prediate. Or 'Literals .. inside a
predicate' might be better?

'are used as keywords' might be better being 'are identifed as
keywords'?

Default namespace is applied to tag names except the listed
keywords even inside a predicate. So 'are not applied there'
might be better being 'are not applied to them'?  Or 'are not
applied in the case'?

| + If you would to use these literals like tag names, then the default 
namespace
| + should not be used, and these literals should be explicitly
| + labeled.
| +

Default namespace is not applied *only to* such keywords inside a
predicate. Even if an Xpath expression contains such a tag name,
default namespace still works for other tags. Does the following
make sense?

+ Use named namespace to qualify such tag names appear in an
+ XPath predicate.

===
After the aboves are addressed (even or rejected), I think I
don't have no additional comment.

- This current patch applies safely (with small shifts) on the
  current master.

- The code looks fine for me.

- This patch translates the given XPath expression by prefixing
  unprefixed tag names with a special namespace prefix only in
  the case where default namespace is defined, so the existing
  behavior is not affected.

- The syntax is existing but just not implemented so I don't
  think no arguemnts needed here.

- It undocumentedly inhibits the usage of the namespace prefix
  "pgdefnamespace.pgsqlxml.internal" but I believe no one can
  notice that.

- The default-ns translator (xpath_parser.c) seems working
  perfectly with some harmless exceptions.

  (xpath specifications is here: https://www.w3.org/TR/1999/REC-xpath-19991116/)

  Related unused features (and not documented?):
context variables ($n notations), 
user-defined functions (or function names prefixed by a namespace prefix)

  Newly documented behavior:
the default namespace isn't applied to and/or/div/mod.

- Dodumentation looks enough.

- Regression test doesn't cover the XPath syntax but it's not
  viable.  I am fine with the basic test cases added by the
  current patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Query related to alter table ... attach partition

2018-01-22 Thread Ashutosh Sharma

On Tue, Jan 23, 2018 at 11:49 AM, Amit Langote
 wrote:
> On 2018/01/23 14:35, Ashutosh Sharma wrote:
>> I have created a regular table with CHECK constraint on the partition
>> key column and it conflicts with the partition constraint but, still,
>> i could attach the table with the partitioned table. Here is what i am
>> trying to do,
>>
>> postgres[76308]=# create table part_tab (b int, a int) partition by range 
>> (a);
>> CREATE TABLE
>>
>> postgres[76308]=# create table part1 (a int, b int CHECK (a >= 5));
>> CREATE TABLE
>>
>> postgres[76308]=# alter table part_tab attach partition part1 for
>> values from (0) to (5); -- the partition constraint applied here
>> conflicts with CHECK (a >= 5) applied on part1.
>> ALTER TABLE
>>
>> postgres[76308]=# \d+ part1;
>> Table "public.part1"
>> ++-+---+--+-+-+--+-+
>> | Column |  Type   | Collation | Nullable | Default | Storage | Stats
>> target | Description |
>> ++-+---+--+-+-+--+-+
>> | a  | integer |   |  | | plain   |
>>   | |
>> | b  | integer |   |  | | plain   |
>>   | |
>> ++-+---+--+-+-+--+-+
>> Partition of: part_tab FOR VALUES FROM (0) TO (5)
>> Partition constraint: ((a IS NOT NULL) AND (a >= 0) AND (a < 5))
>> Check constraints:
>> "part1_a_check" CHECK (a >= 5)
>> Options: storage_engine=zheap
>>
>> As shown in the description of part1 (child table) above, Partition
>> constraint i.e.  (a >= 0) AND (a < 5) and the CHECK constraint  a >= 5
>> conflicts with each other but still alter table ... attach partition
>> succeeded. Isn't that a bug?
>
> Hmm, I don't think it is.  If you had inserted rows with a >= 5 into the
> table before attaching it as partition, error will be correctly reported
> about the rows that violate the partition constraint and attach will fail.
>

Well, that means the attach would only fail when a table contains some
value that doesn't fall in a partition range.

> create table part_tab (b int, a int) partition by range (a);
> create table part1 (a int, b int CHECK (a >= 5));
> insert into part1 values (5);
> alter table part_tab attach partition part1 for values from (0) to (5);
> ERROR:  partition constraint is violated by some row
>
> However, we don't make it fail because the table has a constraint that
> contradicts the partition constraint.  Attach succeeds in the absence of
> any violating rows and the end result is that the table/partition has
> contradictory constraints (the existing constraint and the partition
> constraint) and that simply means no rows can be inserted into the
> table/partition.
>

That's right. But, shouldn't a partition that not at all fall in the
partition range be rejected when user tries to attach it. I feel we
should at least try throwing a WARNING message for it. Thoughts?

> -- fail because of the existing constraint (insert through parent)
> insert into part_tab (a) values (4);
> ERROR:  new row for relation "part1" violates check constraint "part1_a_check"
>
> -- fail because of the partition constraint (insert through parent)
> insert into part_tab (a) values (5);
> ERROR:  no partition of relation "part_tab" found for row
>
> -- fail because of the existing constraint (insert directly)
> insert into part1 (a) values (4);
> ERROR:  new row for relation "part1" violates check constraint "part1_a_check"
>
> -- fail because of the partition constraint (insert directly)
> insert into part1 (a) values (5);
> ERROR:  new row for relation "part1" violates partition constraint
>
> But that's the user's mistake of failing to remove the existing constraint
> before attaching as partition for a different set of values.
>
> -- drop the existing constraint
> alter table part1 drop constraint part1_a_check;
>
> -- all fine
> insert into part_tab (a) values (4); -- (insert through parent)
> insert into part1 (a) values (4); -- (insert directly)
>
> Thanks,
> Amit
>

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-22 Thread David Rowley

Hi Amit
,
On 19 January 2018 at 04:03, David Rowley  wrote:
> On 18 January 2018 at 23:56, Amit Langote  
> wrote:
>> So, I've been assuming that the planner changes in the run-time pruning
>> patch have to do with selecting clauses (restriction clauses not
>> containing Consts and/or join clauses) to be passed to the executor by
>> recording them in the Append node.  Will they be selected by the planner
>> calling into partition.c?
>
> I had thought so. I only have a rough idea in my head and that's that
> the PartitionPruneInfo struct that I wrote for the run-time pruning
> patch would have the clause List replaced with this new
> PartScanClauseInfo struct (which likely means it needs to go into
> primnodes.h), this struct would contain all the partition pruning
> clauses in a more structured form so that no matching of quals to the
> partition key wouldn't be required during execution. The idea is that
> we'd just need to call; remove_redundant_clauses,
> extract_bounding_datums and get_partitions_for_keys. I think this is
> the bare minimum of work that can be done in execution since we can't
> remove the redundant clauses until we know the values of the Params.
>
> Likely this means there will need to be 2 functions externally
> accessible for this in partition.c. I'm not sure exactly how best to
> do this. Maybe it's fine just to have allpaths.c call
> extract_partition_key_clauses to generate the PartScanClauseInfo then
> call some version of get_partitions_from_clauses which does do the
> extract_partition_key_clauses duties. This is made more complex by the
> code that handles adding the default partition bound to the quals. I'm
> not yet sure where that should live.
>
> I've also been thinking of having some sort of PartitionPruneContext
> struct that we can pass around the functions. Runtime pruning had to
> add structs which store the Param values to various functions which I
> didn't like. It would be good to just add those to the context and
> have them passed down without having to bloat the parameters in the
> functions. I might try and do that tomorrow too. This should make the
> footprint of the runtime pruning patch is a bit smaller.

Attached is what I had in mind about how to do this. Only the planner
will need to call populate_partition_clause_info. The planner and
executor can call get_partitions_from_clauseinfo. I'll just need to
change the run-time prune patch to pass the PartitionClauseInfo into
the executor in the Append node.

I've also added the context struct that I mentioned above. It's
currently not carrying much weight, but the idea is that this context
will be passed around a bit more with the run-time pruning patch and
it will also carry the details about parameter values. I'll need to
modify a few signatures of functions like partkey_datum_from_expr to
pass the context there too. I didn't do that here because currently,
those functions have no use for the context with the fields that they
currently have.

I've also fixed a bug where when you built the commutator OpExpr in
what's now called extract_partition_key_clauses the inputcollid was
not being properly set. The changes I made there go much further than
just that, I've completely removed the OpExpr from the PartClause
struct as only two members were ever used. I thought it was better
just to add those to PartClause instead.

I also did some renaming of variables that always assumed that the
Expr being compared to the partition key was a constant. This is true
now, but with run-time pruning patch, it won't be, so I thought it was
better to do this here rather than having to rename them in the
run-time pruning patch.

One thing I don't yet understand about the patch is the use of
predicate_refuted_by() in get_partitions_from_or_clause_args(). I did
adjust the comment above that code, but I'm still not certain I fully
understand why that function has to be used instead of building the
clauses for the OR being processed along with the remaining clauses.
Is it that this was too hard to extract that you ended up using
predicate_refuted_by()?

I've also removed the makeBoolExpr call that you were encapsulating
the or_clauses in. I didn't really see the need for this since you
just removed it again when looping over the or_clauses.

The only other changes are just streamlining code and comment changes.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

faster_partition_prune_v21_delta_drowley_v1.patch
Description: Binary data

Re: Query related to alter table ... attach partition

2018-01-22 Thread Amit Langote

On 2018/01/23 14:35, Ashutosh Sharma wrote:
> I have created a regular table with CHECK constraint on the partition
> key column and it conflicts with the partition constraint but, still,
> i could attach the table with the partitioned table. Here is what i am
> trying to do,
> 
> postgres[76308]=# create table part_tab (b int, a int) partition by range (a);
> CREATE TABLE
> 
> postgres[76308]=# create table part1 (a int, b int CHECK (a >= 5));
> CREATE TABLE
> 
> postgres[76308]=# alter table part_tab attach partition part1 for
> values from (0) to (5); -- the partition constraint applied here
> conflicts with CHECK (a >= 5) applied on part1.
> ALTER TABLE
> 
> postgres[76308]=# \d+ part1;
> Table "public.part1"
> ++-+---+--+-+-+--+-+
> | Column |  Type   | Collation | Nullable | Default | Storage | Stats
> target | Description |
> ++-+---+--+-+-+--+-+
> | a  | integer |   |  | | plain   |
>   | |
> | b  | integer |   |  | | plain   |
>   | |
> ++-+---+--+-+-+--+-+
> Partition of: part_tab FOR VALUES FROM (0) TO (5)
> Partition constraint: ((a IS NOT NULL) AND (a >= 0) AND (a < 5))
> Check constraints:
> "part1_a_check" CHECK (a >= 5)
> Options: storage_engine=zheap
> 
> As shown in the description of part1 (child table) above, Partition
> constraint i.e.  (a >= 0) AND (a < 5) and the CHECK constraint  a >= 5
> conflicts with each other but still alter table ... attach partition
> succeeded. Isn't that a bug?

Hmm, I don't think it is.  If you had inserted rows with a >= 5 into the
table before attaching it as partition, error will be correctly reported
about the rows that violate the partition constraint and attach will fail.

create table part_tab (b int, a int) partition by range (a);
create table part1 (a int, b int CHECK (a >= 5));
insert into part1 values (5);
alter table part_tab attach partition part1 for values from (0) to (5);
ERROR:  partition constraint is violated by some row

However, we don't make it fail because the table has a constraint that
contradicts the partition constraint.  Attach succeeds in the absence of
any violating rows and the end result is that the table/partition has
contradictory constraints (the existing constraint and the partition
constraint) and that simply means no rows can be inserted into the
table/partition.

-- fail because of the existing constraint (insert through parent)
insert into part_tab (a) values (4);
ERROR:  new row for relation "part1" violates check constraint "part1_a_check"

-- fail because of the partition constraint (insert through parent)
insert into part_tab (a) values (5);
ERROR:  no partition of relation "part_tab" found for row

-- fail because of the existing constraint (insert directly)
insert into part1 (a) values (4);
ERROR:  new row for relation "part1" violates check constraint "part1_a_check"

-- fail because of the partition constraint (insert directly)
insert into part1 (a) values (5);
ERROR:  new row for relation "part1" violates partition constraint

But that's the user's mistake of failing to remove the existing constraint
before attaching as partition for a different set of values.

-- drop the existing constraint
alter table part1 drop constraint part1_a_check;

-- all fine
insert into part_tab (a) values (4); -- (insert through parent)
insert into part1 (a) values (4); -- (insert directly)

Thanks,
Amit

Failed to request an autovacuum work-item in silence

2018-01-22 Thread Masahiko Sawada

Hi all,

While reading the code, I realized that the requesting an autovacuum
work-item could fail in silence if work-item array is full. So the
users cannot realize that work-item is never performed.
AutoVacuumRequestWork() seems to behave so from the initial
implementation but is there any reason of such behavior? It seems to
me that it can be a problem even now that there is only one kind of
work-item. Attached patch for fixing it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


report_autovac_workitem_request_failure.patch
Description: Binary data

Query related to alter table ... attach partition

2018-01-22 Thread Ashutosh Sharma

Hi All,

I have created a regular table with CHECK constraint on the partition
key column and it conflicts with the partition constraint but, still,
i could attach the table with the partitioned table. Here is what i am
trying to do,

postgres[76308]=# create table part_tab (b int, a int) partition by range (a);
CREATE TABLE

postgres[76308]=# create table part1 (a int, b int CHECK (a >= 5));
CREATE TABLE

postgres[76308]=# alter table part_tab attach partition part1 for
values from (0) to (5); -- the partition constraint applied here
conflicts with CHECK (a >= 5) applied on part1.
ALTER TABLE

postgres[76308]=# \d+ part1;
Table "public.part1"
++-+---+--+-+-+--+-+
| Column |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description |
++-+---+--+-+-+--+-+
| a  | integer |   |  | | plain   |
  | |
| b  | integer |   |  | | plain   |
  | |
++-+---+--+-+-+--+-+
Partition of: part_tab FOR VALUES FROM (0) TO (5)
Partition constraint: ((a IS NOT NULL) AND (a >= 0) AND (a < 5))
Check constraints:
"part1_a_check" CHECK (a >= 5)
Options: storage_engine=zheap

As shown in the description of part1 (child table) above, Partition
constraint i.e.  (a >= 0) AND (a < 5) and the CHECK constraint  a >= 5
conflicts with each other but still alter table ... attach partition
succeeded. Isn't that a bug?

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Re: Rangejoin rebased

2018-01-22 Thread Jeff Davis

On Fri, Jan 19, 2018 at 2:07 AM, Simon Riggs  wrote:
> err... that isn't correct. An empty range matches nothing, so can be
> ignored in joins.
>
> So probably best to explain some more, please.

The semantics of R1 @> R2 will return true if R1 is a non-NULL range
and R2 is empty.

It's set semantics, and all sets contain the empty set.

But I understand @> is an important case so I am looking into it.

Regards,
Jeff Davis

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2018-01-22 Thread Michael Paquier

On Mon, Jan 22, 2018 at 12:48:45PM +0100, Daniel Gustafsson wrote:
> Gotcha.  Iâve added some tests to the patch.  The test for CREATE
> FUNCTION was omitted for now awaiting the outcome of the discussion
> around isStrict and isCachable.

That makes sense.

> Not sure how much theyâre worth in "make checkâ though as itâs quite
> easy to add a new option checked with pg_strcasecmp which then isnât
> tested. Still, it might aid review so definitely worth it.

Thanks for making those, this eases the review lookups. There are a
couple of code paths that remained untested:
- contrib/unaccent/
- contrib/dict_xsyn/
- contrib/dict_int/
- The combinations of toast reloptions is pretty particular as option
namespaces also go through pg_strcasecmp, so I think that those should
be tested as well (your patch includes a test for fillfactor, which is a
good base by the way).
- check_option for CREATE VIEW and ALTER VIEW SET.
- ALTER TEXT SEARCH CONFIGURATION for copy/parser options.
- CREATE TYPE AS RANGE with DefineRange().

There are as well two things I have spotted on the way:
1) fillRelOptions() can safely use strcmp.
2) indexam.sgml mentions using pg_strcasecmp() for consistency with the
core code when defining amproperty for an index AM. Well, with this
patch I think that for consistency with the core code that would involve
using strcmp instead, extension developers can of course still use
pg_strcasecmp.
Those are changed as well in the attached, which applies on top of your
v6. I have added as well in it the tests I spotted were missing. If this
looks right to you, I am fine to switch this patch as ready for
committer, without considering the issues with isCachable and isStrict
in CREATE FUNCTION of course.
--
Michael
diff --git a/contrib/dict_int/expected/dict_int.out b/contrib/dict_int/expected/dict_int.out
index 3b766ec52a..20d172c730 100644
--- a/contrib/dict_int/expected/dict_int.out
+++ b/contrib/dict_int/expected/dict_int.out
@@ -300,3 +300,6 @@ select ts_lexize('intdict', '314532610153');
  {314532}
 (1 row)
 
+-- invalid: non-lowercase quoted identifiers
+ALTER TEXT SEARCH DICTIONARY intdict ("Maxlen" = 4);
+ERROR:  unrecognized intdict parameter: "Maxlen"
diff --git a/contrib/dict_int/sql/dict_int.sql b/contrib/dict_int/sql/dict_int.sql
index 8ffec6b770..05d2149101 100644
--- a/contrib/dict_int/sql/dict_int.sql
+++ b/contrib/dict_int/sql/dict_int.sql
@@ -51,3 +51,6 @@ select ts_lexize('intdict', '252281774');
 select ts_lexize('intdict', '313425');
 select ts_lexize('intdict', '641439323669');
 select ts_lexize('intdict', '314532610153');
+
+-- invalid: non-lowercase quoted identifiers
+ALTER TEXT SEARCH DICTIONARY intdict ("Maxlen" = 4);
diff --git a/contrib/dict_xsyn/expected/dict_xsyn.out b/contrib/dict_xsyn/expected/dict_xsyn.out
index 9b95e13559..4cc42956c7 100644
--- a/contrib/dict_xsyn/expected/dict_xsyn.out
+++ b/contrib/dict_xsyn/expected/dict_xsyn.out
@@ -140,3 +140,6 @@ SELECT ts_lexize('xsyn', 'grb');
  
 (1 row)
 
+-- invalid: non-lowercase quoted identifiers
+ALTER TEXT SEARCH DICTIONARY xsyn ("Matchorig" = false);
+ERROR:  unrecognized xsyn parameter: "Matchorig"
diff --git a/contrib/dict_xsyn/sql/dict_xsyn.sql b/contrib/dict_xsyn/sql/dict_xsyn.sql
index 49511061d0..ab1380c0a2 100644
--- a/contrib/dict_xsyn/sql/dict_xsyn.sql
+++ b/contrib/dict_xsyn/sql/dict_xsyn.sql
@@ -43,3 +43,6 @@ ALTER TEXT SEARCH DICTIONARY xsyn (RULES='xsyn_sample', KEEPORIG=false, MATCHORI
 SELECT ts_lexize('xsyn', 'supernova');
 SELECT ts_lexize('xsyn', 'sn');
 SELECT ts_lexize('xsyn', 'grb');
+
+-- invalid: non-lowercase quoted identifiers
+ALTER TEXT SEARCH DICTIONARY xsyn ("Matchorig" = false);
diff --git a/contrib/unaccent/expected/unaccent.out b/contrib/unaccent/expected/unaccent.out
index b93105e9c7..30c061fe51 100644
--- a/contrib/unaccent/expected/unaccent.out
+++ b/contrib/unaccent/expected/unaccent.out
@@ -61,3 +61,6 @@ SELECT ts_lexize('unaccent', '
  {åöéë}
 (1 row)
 
+-- invalid: non-lowercase quoted identifiers
+ALTER TEXT SEARCH DICTIONARY unaccent ("Rules" = 'my_rules');
+ERROR:  unrecognized Unaccent parameter: "Rules"
diff --git a/contrib/unaccent/sql/unaccent.sql b/contrib/unaccent/sql/unaccent.sql
index 310213994f..f18f934377 100644
--- a/contrib/unaccent/sql/unaccent.sql
+++ b/contrib/unaccent/sql/unaccent.sql
@@ -16,3 +16,6 @@ SELECT unaccent('unaccent', '
 SELECT ts_lexize('unaccent', 'foobar');
 SELECT ts_lexize('unaccent', '£ÌËÁ');
 SELECT ts_lexize('unaccent', '³öéë');
+
+-- invalid: non-lowercase quoted identifiers
+ALTER TEXT SEARCH DICTIONARY unaccent ("Rules" = 'my_rules');
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index a7f6c8dc6a..6a5d307c69 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -439,7 +439,7 @@ amproperty (Oid index_oid, int attno,
If the core code does not recognize the property name
then prop is AMPROP_UNKNOWN.
Access methods can define custom property names by
-   checking propname for a match (

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-22 Thread Stephen Frost

Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> > On Monday, January 22, 2018, Stephen Frost  wrote:
> >> In the end, I feel like this patch has actually been through the ringer
> >> and back because it was brought up in the context of solving a problem
> >> that we'd all like to fix in a better way.  Had it been simply dropped
> >> on us as a "I'd like to not have comments in my pg_dump output and
> >> here's a patch to allow me to do that" I suspect it would have been
> >> committed without anywhere near this level of effort.
> 
> Indeed ... but it was *not* presented that way, and that's what makes
> this somewhat of a difficult call.  You and Robert argued that there were
> valid use-cases, but I feel like that was somewhat of an after-the-fact
> justification, rather than an actual organic feature request.

This was definitely the kind of follow-on work that I was anticipating
happening with the rework that was done to dump_contains in a9f0e8e5a2e.
I would think that we'd support enable/disable of all of the different
component types that pg_dump recognizes, and we're most of the way there
at this point.  The comment above the #define's in pg_dump.h even hints
at that- component types of an object which can be selected for dumping.

> "David G. Johnston"  writes:
> > +0; but recognizing our users are going to remain annoyed by the existing
> > problem and that telling them that this is the answer will not sit well.
> 
> FWIW, today's pg_dump refactoring got rid of one of the use-cases for
> this, namely the COMMENT ON DATABASE aspect.  We got rid of another aspect
> (creating/commenting on the public schema) some time ago, via a method
> that was admittedly a bit of a hack but got the job done.  What seems to
> be left is that given a default installation, pg_dump will emit a
> "COMMENT ON EXTENSION plpgsql" that can't be restored by an unprivileged
> user ... as well as a "CREATE EXTENSION IF NOT EXISTS plpgsql", which is
> an utter kluge.  Maybe we can think of a rule that will exclude plpgsql
> from dumps without causing too much havoc.

Having a generic --exclude-extension=plpgsql might be interesting..  I
can certainly recall wishing for such an option when working with
postgis.

> The most obvious hack is just to exclude PL objects with OIDs below 16384
> from the dump.  An issue with that is that it'd lose any nondefault
> changes to the ACL for plpgsql, which seems like a problem.  On the
> other hand, I think the rule we have in place for the public schema is
> preventing dumping local adjustments to that, and there have been no
> complaints.  (Maybe I'm wrong and the initacl mechanism handles that
> case?  If so, seems like we could get it to handle local changes to
> plpgsql's ACL as well.)

Both the public schema and plpgsql's ACLs should be handled properly
(with local changes preserved) through the pg_init_privs system.  I'm
not sure what you're referring to regarding a rule preventing dumping
local adjustments to the public schema, as far as I can recall we've
basically always supported that.

Losing non-default ACLs for plpgsql seems like a step backwards.

Frankly, the comment on plpgsql is probably one of the most worthless
comments we have anyway- all it does is re-state information you
already know: it's a procedural language called 'plpgsql'.  I'd
suggest we could probably survive without it, though that is a long
route to "fixing" this, though at least we could tell people it's been
fixed in newer versions and there's a kludge available until then..

Thanks!

Stephen

signature.asc
Description: PGP signature

Remove utils/dsa.h from autovacuum.c

2018-01-22 Thread Masahiko Sawada

Hi,

Attached a patch for $subject. The implementation of autovacuum
work-item has been changed by commit
31ae1638ce35c23979f9bcbb92c6bb51744dbccb but the loading of dsa.h
header file is remained.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


get_rid_of_dsa_h.patch
Description: Binary data

Re: [HACKERS] parallel.c oblivion of worker-startup failures

2018-01-22 Thread Amit Kapila

On Mon, Jan 22, 2018 at 7:16 PM, Robert Haas  wrote:
> On Fri, Jan 19, 2018 at 10:59 PM, Amit Kapila  wrote:
>> The patch doesn't apply cleanly on the head, but after rebasing it, I
>> have reviewed and tested it and it seems to be working fine.  Apart
>> from this specific issue, I think we should consider making
>> nworkers_launched reliable (at the very least for cases where it
>> matters).  You seem to be of opinion that can be a source of subtle
>> bugs which I don't deny but now I think we are starting to see some
>> use cases of such a mechanism as indicated by Peter G. in parallel
>> create index thread.  Even, if we find some way for parallel create
>> index to not rely on that assumption, I won't be surprised if some
>> future parallelism work would have such a requirement.
>
> Isn't making nworkers_launched reliable exactly what this patch does?
> It converts the rare cases in which nworkers_launched would have been
> unreliable into errors, precisely so that code like parallel CREATE
> INDEX doesn't need to worry about the case where it's inaccurate.
>

It does turn such cases into error but only at the end when someone
calls WaitForParallelWorkersToFinish.  If one tries to rely on it at
any other time, it won't work as I think is the case for Parallel
Create Index where Peter G. is trying to use it differently.  I am not
100% sure whether Parallel Create index has this symptom as I have not
tried it with that patch, but I and Peter are trying to figure that
out.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-22 Thread Amit Kapila

On Tue, Jan 23, 2018 at 8:43 AM, Peter Geoghegan  wrote:
> On Mon, Jan 22, 2018 at 6:45 PM, Amit Kapila  wrote:
>>> FWIW, I don't think that that's really much of a difference.
>>>
>>> ExecParallelFinish() calls WaitForParallelWorkersToFinish(), which is
>>> similar to how _bt_end_parallel() calls
>>> WaitForParallelWorkersToFinish() in the patch. The
>>> _bt_leader_heapscan() condition variable wait for workers that you
>>> refer to is quite a bit like how gather_readnext() behaves. It
>>> generally checks to make sure that all tuple queues are done.
>>> gather_readnext() can wait for developments using WaitLatch(), to make
>>> sure every tuple queue is visited, with all output reliably consumed.
>>>
>>
>> The difference lies in the fact that in gather_readnext, we use tuple
>> queue mechanism which has the capability to detect that the workers
>> are stopped/exited whereas _bt_leader_heapscan doesn't have any such
>> capability, so I think it will loop forever.
>
> _bt_leader_heapscan() can detect when workers exit early, at least in
> the vast majority of cases. It can do this simply by processing
> interrupts and automatically propagating any error -- nothing special
> about that. It can also detect when workers have finished
> successfully, because of course, that's the main reason for its
> existence. What remains, exactly?
>

Will it able to detect fork failure or if worker exits before
attaching to error queue?  I think you can once try it by forcing fork
failure in do_start_bgworker and see the behavior of
_bt_leader_heapscan.  I could have tried and let you know the results,
but the latest patch doesn't seem to apply cleanly.

> I don't know that much about tuple queues, but from a quick read I
> guess you might be talking about shm_mq_receive() +
> shm_mq_wait_internal(). It's not obvious that that will work in all
> cases ("Note that if handle == NULL, and the process fails to attach,
> we'll potentially get stuck here forever"). Also, I don't see how this
> addresses the parallel_leader_participation issue I raised.
>

I am talking about shm_mq_receive->shm_mq_counterparty_gone.  In
shm_mq_counterparty_gone, it can detect if the worker is gone by using
GetBackgroundWorkerPid.





-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Small improvement to compactify_tuples

2018-01-22 Thread Stephen Frost

Greetings,

* Юрий Соколов (funny.fal...@gmail.com) wrote:
> On Wed, Nov 29, 2017 at 8:00 AM, Peter Geoghegan  wrote:
> > On Tue, Nov 28, 2017 at 2:41 PM, Andres Freund  wrote:
> >> Maybe it's a stupid question. But would we still want to have this after
> >> the change? These should be just specializations of the template version
> >> imo.
> 
> "generic" version operates on bytes, and it will be a bit hard to combine
> it with
> templated version. Not impossible, but it will look ugly.

If that's the case then does it really make sense to make this change..?

> In attach fixed qsort_template version.
> And version for compactify_tuples with bucket_sort and templated qsort.

While having the patch is handy, I'm not seeing any performance numbers
on this version, and I imagine others watching this thread are also
wondering about things like a test run that just uses the specialized
qsort_itemIds() without the bucketsort.

Are you planning to post some updated numbers and/or an updated test
case that hopefully shows best/worst case with this change?  Would be
good to get that on a couple of platforms too, if possible, since we've
seen that the original benchmarks weren't able to be consistently
repeated across different platforms.  Without someone doing that
leg-work, this doesn't seem like it'll be moving forward.

Marking as Waiting on Author.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-22 Thread Tom Lane

> On Monday, January 22, 2018, Stephen Frost  wrote:
>> In the end, I feel like this patch has actually been through the ringer
>> and back because it was brought up in the context of solving a problem
>> that we'd all like to fix in a better way.  Had it been simply dropped
>> on us as a "I'd like to not have comments in my pg_dump output and
>> here's a patch to allow me to do that" I suspect it would have been
>> committed without anywhere near this level of effort.

Indeed ... but it was *not* presented that way, and that's what makes
this somewhat of a difficult call.  You and Robert argued that there were
valid use-cases, but I feel like that was somewhat of an after-the-fact
justification, rather than an actual organic feature request.

"David G. Johnston"  writes:
> +0; but recognizing our users are going to remain annoyed by the existing
> problem and that telling them that this is the answer will not sit well.

FWIW, today's pg_dump refactoring got rid of one of the use-cases for
this, namely the COMMENT ON DATABASE aspect.  We got rid of another aspect
(creating/commenting on the public schema) some time ago, via a method
that was admittedly a bit of a hack but got the job done.  What seems to
be left is that given a default installation, pg_dump will emit a
"COMMENT ON EXTENSION plpgsql" that can't be restored by an unprivileged
user ... as well as a "CREATE EXTENSION IF NOT EXISTS plpgsql", which is
an utter kluge.  Maybe we can think of a rule that will exclude plpgsql
from dumps without causing too much havoc.

The most obvious hack is just to exclude PL objects with OIDs below 16384
from the dump.  An issue with that is that it'd lose any nondefault
changes to the ACL for plpgsql, which seems like a problem.  On the
other hand, I think the rule we have in place for the public schema is
preventing dumping local adjustments to that, and there have been no
complaints.  (Maybe I'm wrong and the initacl mechanism handles that
case?  If so, seems like we could get it to handle local changes to
plpgsql's ACL as well.)

regards, tom lane

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-22 Thread Peter Geoghegan

On Mon, Jan 22, 2018 at 6:45 PM, Amit Kapila  wrote:
>> FWIW, I don't think that that's really much of a difference.
>>
>> ExecParallelFinish() calls WaitForParallelWorkersToFinish(), which is
>> similar to how _bt_end_parallel() calls
>> WaitForParallelWorkersToFinish() in the patch. The
>> _bt_leader_heapscan() condition variable wait for workers that you
>> refer to is quite a bit like how gather_readnext() behaves. It
>> generally checks to make sure that all tuple queues are done.
>> gather_readnext() can wait for developments using WaitLatch(), to make
>> sure every tuple queue is visited, with all output reliably consumed.
>>
>
> The difference lies in the fact that in gather_readnext, we use tuple
> queue mechanism which has the capability to detect that the workers
> are stopped/exited whereas _bt_leader_heapscan doesn't have any such
> capability, so I think it will loop forever.

_bt_leader_heapscan() can detect when workers exit early, at least in
the vast majority of cases. It can do this simply by processing
interrupts and automatically propagating any error -- nothing special
about that. It can also detect when workers have finished
successfully, because of course, that's the main reason for its
existence. What remains, exactly?

I don't know that much about tuple queues, but from a quick read I
guess you might be talking about shm_mq_receive() +
shm_mq_wait_internal(). It's not obvious that that will work in all
cases ("Note that if handle == NULL, and the process fails to attach,
we'll potentially get stuck here forever"). Also, I don't see how this
addresses the parallel_leader_participation issue I raised.

-- 
Peter Geoghegan

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2018-01-22 Thread Stephen Frost

Greetings!

* chenhj (chjis...@163.com) wrote:
> Rebased and removed the  whitespace.

Thanks for working on this, I agree that it seems like a pretty cool
optimization for pg_rewind.

I've only read through the thread to try and understand what's going on
and the first thing that comes to mind is that you're changing
pg_rewind to not remove the WAL from before the divergence (split)
point, but I'm not sure why.  As noted, that WAL isn't needed for
anything (it's from before the split, after all), so why keep it?  Is
there something in this optimization that depends on the old WAL being
there and, if so, what and why?

That's also different from how pg_basebackup works, which I don't think
is good (seems like pg_rewind should operate in a pretty similar manner
to pg_basebackup).

Setting this back to Waiting for Author but hopefully you can reply soon
and clarify that, possibly adjusting the patch accordingly.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-22 Thread Amit Kapila

On Tue, Jan 23, 2018 at 1:45 AM, Peter Geoghegan  wrote:
> On Mon, Jan 22, 2018 at 3:52 AM, Amit Kapila  wrote:
>> The difference is that nodeGather.c doesn't have any logic like the
>> one you have in _bt_leader_heapscan where the patch waits for each
>> worker to increment nparticipantsdone.  For Gather node, we do such a
>> thing (wait for all workers to finish) by calling
>> WaitForParallelWorkersToFinish which will have the capability after
>> Robert's patch to detect if any worker is exited abnormally (fork
>> failure or failed before attaching to the error queue).
>
> FWIW, I don't think that that's really much of a difference.
>
> ExecParallelFinish() calls WaitForParallelWorkersToFinish(), which is
> similar to how _bt_end_parallel() calls
> WaitForParallelWorkersToFinish() in the patch. The
> _bt_leader_heapscan() condition variable wait for workers that you
> refer to is quite a bit like how gather_readnext() behaves. It
> generally checks to make sure that all tuple queues are done.
> gather_readnext() can wait for developments using WaitLatch(), to make
> sure every tuple queue is visited, with all output reliably consumed.
>

The difference lies in the fact that in gather_readnext, we use tuple
queue mechanism which has the capability to detect that the workers
are stopped/exited whereas _bt_leader_heapscan doesn't have any such
capability, so I think it will loop forever.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] session_replication_role = replica with TRUNCATE

2018-01-22 Thread Peter Eisentraut

On 1/18/18 12:41, Tom Lane wrote:
> Peter Eisentraut  writes:
>> So I'm proposing the attached alternative patch, which creates
>> constraint triggers to be TRIGGER_FIRES_ALWAYS by default.
>> Thoughts?
> 
> Hm, the general idea seems attractive, but I'm not sure we want
> this behavioral change for user-created triggers.  Can we make it
> happen like that only for RI triggers specifically?  If not, there's
> at least some missing doco changes here.

I never quite understood the difference between a normal trigger and a
constraint trigger.  But perhaps this should be it.  If the purpose of a
constraint trigger is to enforce a constraint, then this should be the
default behavior, I think.  You could always manually ALTER TABLE things.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] session_replication_role = replica with TRUNCATE

2018-01-22 Thread Peter Eisentraut

On 1/19/18 05:31, Simon Riggs wrote:
>> I'm not aware of an explanation why it currently works the way it does,
>> other than that FKs happen to be implemented by triggers and triggers
>> happen to work that way.  But I think it's pretty bogus that logical
>> replication subscriptions can insert data that violates constraints.
>> It's also weird that you can violate deferred unique constraints but not
>> immediate ones (I think, not tested).
> 
> The explanation is two-fold.
> 
> 1. If we pass valid data from the publisher, it should still be valid
> data when it gets there.

That is not necessarily the case, because we allow independent writes on
the subscriber.  The current test suite contains an example that you can
create invalid data this way.

> If we cannot apply changes then the
> subscriber is broken and the subscription cannot continue, so
> executing the a constraint trigger does not solve the problem.

But that is already the case for any other kind of constraint or type
violation.  Just that certain kinds of constraints are apparently
ignored for no clear reason.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2018-01-22 Thread Craig Ringer

On 23 January 2018 at 15:21, Stephen Frost  wrote:

> Alexey,
>
> * Alexey Kondratov (kondratov.alek...@gmail.com) wrote:
> > On Fri, Dec 1, 2017 at 1:58 AM, Alvaro Herrera 
> wrote:
> > > On a *very* quick look, please use an enum to return from NextCopyFrom
> > > rather than 'int'.  The chunks that change bool to int are very
> > > odd-looking.  This would move the comment that explains the value from
> > > copy.c to copy.h, obviously.  Also, you seem to be using non-ASCII
> dashes
> > > in the descriptions of those values; please don't.
> >
> > I will fix it, thank you.
> >
> > > Or maybe I misunderstood the patch completely.
> >
> > I hope so. Here is my thoughts how it all works, please correct me,
> > where I am wrong:
> >
> > 1) First, I have simply changed ereport level to WARNING for specific
> > validations (extra or missing columns, etc) if INGONE_ERRORS option is
> > used. All these checks are inside NextCopyFrom. Thus, this patch
> > performs here pretty much the same as before, excepting that it is
> > possible to skip bad lines, and this part should be safe as well
> >
> > 2) About PG_TRY/CATCH. I use it to catch the only one specific
> > function call inside NextCopyFrom--it is InputFunctionCall--which is
> > used just to parse datatype from the input string. I have no idea how
> > WAL write or trigger errors could get here
> >
> > All of these is done before actually forming a tuple, putting it into
> > the heap, firing insert-related triggers, etc. I am not trying to
> > catch all errors during the row processing, only input data errors. So
> > why is it unsafe?
>
> The issue is that InputFunctionCall is actually calling out to other
> functions and they could be allocating memory and doing other things.
> What you're missing by using PG_TRY() here without doing a PG_RE_THROW()
> is all the cleanup work that AbortTransaction() and friends do- ensuring
> there's a valid memory context (cleaning up the ones that might have
> been allocated by the input function), releasing locks, resetting user
> and security contexts, and a whole slew of other things.
>
> Is all of that always needed for an error thrown by an input function?
> Hard to say, really, since input functions can be provided by
> extensions.  You might get away with doing this for int4in(), but we
> also have to be thinking about extensions like PostGIS which introduce
> their own geometry and geography data types that are a great deal more
> complicated.
>
>
For example, an input function could conceivably take a lwlock, do some
work in the heapam, or whatever.

We don't have much in the way of rules about what input functions can or
cannot do, so you can't assume much about their behaviour and what must /
must not be cleaned up. Nor can you just reset the state in a heavy handed
manner like (say) plpgsql does.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Handling better supported channel binding types for SSL implementations

2018-01-22 Thread Michael Paquier

On Mon, Jan 22, 2018 at 04:52:11PM +0100, Daniel Gustafsson wrote:
> Not really, but IIUC a bug in this code could lead to channel binding
> not being used for a connection even if the user wanted it (and may
> miss that ixt didn’t happen due to not looking at logs etc)?

Possible, if an implementation decides to send a NIL list as return
result of this API when it should not.

> Considering the limited set of values (currently) it should be quite
> simple to check for offending entries, if there is indeed a risk of
> “silently” not using channel binding.

I think I understand the point you are coming at, but a problem is that
the channel binding type the client wants to use is known by the server
in SCRAM authentication only after reading the client-first message,
which happens of course after the client decided to choose if channel
binding is used or not. Now the server needs to emit first a list of
supported SASL mechanisms which are consistent with what the SSL
implementation can do when the mechanism is negotiated. Another, third
approach that I can think of is to have this additional API in betls
emit a list of mechanisms, but I am not sure that this is worth the
complication as at the end what you want to know is if at least one
channel binding type is supported or not.

We could as well live with the existing set of betls APIs, just that the
implementation of secure transport for MacOS will need a hack in auth.c
to stop the -PLUS mechanisms to be sent. Putting this load into each
be-secure-*.c file is cleaner in my opinion.
--
Michael

signature.asc
Description: PGP signature

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2018-01-22 Thread Stephen Frost

Alexey,

* Alexey Kondratov (kondratov.alek...@gmail.com) wrote:
> On Fri, Dec 1, 2017 at 1:58 AM, Alvaro Herrera  
> wrote:
> > On a *very* quick look, please use an enum to return from NextCopyFrom
> > rather than 'int'.  The chunks that change bool to int are very
> > odd-looking.  This would move the comment that explains the value from
> > copy.c to copy.h, obviously.  Also, you seem to be using non-ASCII dashes
> > in the descriptions of those values; please don't.
> 
> I will fix it, thank you.
> 
> > Or maybe I misunderstood the patch completely.
> 
> I hope so. Here is my thoughts how it all works, please correct me,
> where I am wrong:
> 
> 1) First, I have simply changed ereport level to WARNING for specific
> validations (extra or missing columns, etc) if INGONE_ERRORS option is
> used. All these checks are inside NextCopyFrom. Thus, this patch
> performs here pretty much the same as before, excepting that it is
> possible to skip bad lines, and this part should be safe as well
> 
> 2) About PG_TRY/CATCH. I use it to catch the only one specific
> function call inside NextCopyFrom--it is InputFunctionCall--which is
> used just to parse datatype from the input string. I have no idea how
> WAL write or trigger errors could get here
> 
> All of these is done before actually forming a tuple, putting it into
> the heap, firing insert-related triggers, etc. I am not trying to
> catch all errors during the row processing, only input data errors. So
> why is it unsafe?

The issue is that InputFunctionCall is actually calling out to other
functions and they could be allocating memory and doing other things.
What you're missing by using PG_TRY() here without doing a PG_RE_THROW()
is all the cleanup work that AbortTransaction() and friends do- ensuring
there's a valid memory context (cleaning up the ones that might have
been allocated by the input function), releasing locks, resetting user
and security contexts, and a whole slew of other things.

Is all of that always needed for an error thrown by an input function?
Hard to say, really, since input functions can be provided by
extensions.  You might get away with doing this for int4in(), but we
also have to be thinking about extensions like PostGIS which introduce
their own geometry and geography data types that are a great deal more
complicated.

In the end, this isn't an acceptable approach to this problem.  The
approach outlined previously using sub-transactions which attempted
insert some number of records and then, on failure, restarted and
inserted up until the failed record and then skipped it, starting over
again with the next batch, seemed pretty reasonable to me.  If you have
time to work on that, that'd be great, otherwise we'll probably need to
mark this as returned with feedback.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: [HACKERS] A design for amcheck heapam verification

2018-01-22 Thread Michael Paquier

On Mon, Jan 22, 2018 at 03:22:05PM -0800, Peter Geoghegan wrote:
> Michael said he'd do more review. I generally feel this is close, though.

Yep. I have provided the feedback I wanted for 0001 (no API change in
the bloom facility by the way :( ), but I still wanted to look at 0002
in depths.
--
Michael


signature.asc
Description: PGP signature

[HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-22 Thread David G. Johnston

On Monday, January 22, 2018, Stephen Frost  wrote:

>
> In the end, I feel like this patch has actually been through the ringer
> and back because it was brought up in the context of solving a problem
> that we'd all like to fix in a better way.  Had it been simply dropped
> on us as a "I'd like to not have comments in my pg_dump output and
> here's a patch to allow me to do that" I suspect it would have been
> committed without anywhere near this level of effort.
>

+0; but recognizing our users are going to remain annoyed by the existing
problem and that telling them that this is the answer will not sit well.

David J.

Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-22 Thread Michael Paquier

On Mon, Jan 22, 2018 at 04:23:55PM -0500, Tom Lane wrote:
> Robert Haas  writes:
>> I agree with Noah.  It's true that having unfixed bugs isn't
>> particularly good, but it doesn't justify activating a CommitFest
>> entry under someone else's name.
> 
> Agreed.  The CF app is not a bug tracker.

OK, thanks for the feedback. Based on that I am deleting this entry from
the CF. And my apologies for the noise.

>> If nobody is willing to put in the effort to keep AIX supported under
>> XLC, then we should just update the documentation to say that it isn't
>> supported.  Our support for that platform is pretty marginal anyway if
>> we're only supporting it up through 6.1; that was released in 2007 and
>> went out of support in Q2 of last year.
> 
> Huh?  We have somebody just upthread saying that they're working out
> issues on newer AIX.  We should at least give them time to finish
> that research before making any decisions.

Yes, Tony stated so, still he has not published any patches
yet. Regarding the potential documentation impact, please let me suggest
just to mention that support of XLC up to 12.1 is supported, per the
buildfarm this is rather stable and does not face the same issues as
what we are seeing with XLC 13.1. Is there any point to wait for the
docs to not say that now? As far as I know, if support for XLC 13.1
moves on in the future we can update the docs so as the support range is
increased later on.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] Adding column_constraint description in ALTER TABLE synopsis

2018-01-22 Thread Amit Langote

On 2018/01/23 8:57, Thomas Munro wrote:
> On Tue, Jan 23, 2018 at 12:41 PM, Thomas Munro
>  wrote:
>> On Mon, Jan 15, 2018 at 2:32 PM, Stephen Frost  wrote:
>>> If someone else would like to review it, that'd be great, otherwise I'll
>>> probably get it committed soon.
>>
>> FYI the v2 doesn't build:
>>
>> ref/alter_table.sgml:135: parser error : Opening and ending tag
>> mismatch: refentry line 6 and synopsis
>> 
> 
> Here's an update to move the new stuff to the correct side of the
> closing "".  Builds for me, and the typesetting looks OK.
> I'm not sure why the preexisting bogo-productions have inconsistent
> indentation levels (looking at table_constraint_using_index) but
> that's not this patch's fault.

Thanks for fixing that.

I noticed that partition_bound_spec in the patch is missing hash partition
bound syntax that was added after the original patch was written.  Fixed
in the attached.

Thanks,
Amit
diff --git a/doc/src/sgml/ref/alter_table.sgml 
b/doc/src/sgml/ref/alter_table.sgml
index 286c7a8589..5cc0519c8c 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -85,6 +85,20 @@ ALTER TABLE [ IF EXISTS ] name
 OWNER TO { new_owner | 
CURRENT_USER | SESSION_USER }
 REPLICA IDENTITY { DEFAULT | USING INDEX index_name | FULL | NOTHING }
 
+and column_constraint 
is:
+
+[ CONSTRAINT constraint_name ]
+{ NOT NULL |
+  NULL |
+  CHECK ( expression ) [ NO 
INHERIT ] |
+  DEFAULT default_expr |
+  GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY [ ( 
sequence_options ) ] |
+  UNIQUE index_parameters |
+  PRIMARY KEY index_parameters |
+  REFERENCES reftable [ ( 
refcolumn ) ] [ MATCH FULL | MATCH 
PARTIAL | MATCH SIMPLE ]
+[ ON DELETE action ] [ ON 
UPDATE action ] }
+[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
+
 and table_constraint 
is:
 
 [ CONSTRAINT constraint_name ]
@@ -96,11 +110,27 @@ ALTER TABLE [ IF EXISTS ] name
 [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] [ ON DELETE action ] [ ON UPDATE action ] }
 [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
 
+index_parameters in 
UNIQUE, PRIMARY KEY, and 
EXCLUDE constraints are:
+
+[ WITH ( storage_parameter [= 
value] [, ... ] ) ]
+[ USING INDEX TABLESPACE tablespace_name ]
+
+exclude_element in an 
EXCLUDE constraint is:
+
+{ column_name | ( expression ) } [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST 
} ]
+
 and table_constraint_using_index is:
 
 [ CONSTRAINT constraint_name ]
 { UNIQUE | PRIMARY KEY } USING INDEX index_name
 [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE 
]
+
+and partition_bound_spec 
is:
+
+IN ( { numeric_literal | 
string_literal | NULL } [, ...] ) |
+FROM ( { numeric_literal | 
string_literal | MINVALUE | 
MAXVALUE } [, ...] )
+  TO ( { numeric_literal | 
string_literal | MINVALUE | 
MAXVALUE } [, ...] ) |
+WITH ( MODULUS numeric_literal, 
REMAINDER numeric_literal )

Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2018-01-22 Thread Stephen Frost

Robert, all,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Mon, Aug 7, 2017 at 11:14 AM, Fabrízio de Royes Mello
>  wrote:
> > I also test against all current supported versions (9.2 ... 9.6) and didn't
> > find any issue.
> >
> > Changed status to "ready for commiter".
> 
> On a very fast read this patch looks OK to me, but I'm a bit concerned
> about whether we have consensus for it.  By my count the vote is 6-3
> in favor of proceeding.
> 
> +1: Robins Tharakan, Stephen Frost, David Fetter, Fabrizio Mello,
> Michael Paquier, Robert Haas
> -1: David G. Johnston, Tom Lane, Simon Riggs
> 
> I guess that's probably sufficient to go forward, but does anyone wish
> to dispute my characterization of their vote or the vote tally in
> general?  Would anyone else like to cast a vote?

This patch continues to drag on, so I'll try to spur it a bit.

David G. Jonston, you are listed as a '-1' on this but your last comment
on this thread was:

cakfquwzuyhh+kuzp76cdb4r8z616zwewcheedq6veqgdgp9...@mail.gmail.com

Where you commented:

> It was argued, successfully IMO, to have this capability independent of
> any particular problem to be solved.  In that context I haven't given the
> proposed implementation much thought.

> I do agree that this is a very unappealing solution for the stated problem
> and am hoping someone will either have an ingenious idea to solve it the
> right way or is willing to hold their nose and put in a hack.

Which didn't strike me as really being against this patch but rather a
complaint that making use of this feature to "fix" the non-superuser
Single Transaction restore was a hack, which is somewhat indepedent.
We have lots of features that people use in various very hacky ways, but
that doesn't mean those are bad features for us to have.

Simon, you are also a '-1' on this but your last comment also makes me
think you're unhappy with this feature being used in a hacky way but
perhaps aren't against the feature itself:

> But back to the main point which is that --no-comments discards ALL
> comments simply to exclude one pointless and annoying comment. That
> runs counter to our stance that we do not allow silent data loss. I
> want to solve the problem too. I accept that not everyone uses
> comments, but if they do, spilling them all on the floor is a user
> visible slip up that we should not be encouraging. Sorry y'all.

I really can't see accepting this as a data loss issue when this feature
is used only when requested by a user (it would *not* be enabled by
default; I'd be against that too).  We have other options would also be
considered data loss with this viewpoint but I can't imagine we would be
willing to remove them (eg: --no-privileges, or perhaps more clearly
--no-blobs, which was only recently added).

Tom, you're also listed as a '-1' on this but your last comment on this
thread was 16858.1496369...@sss.pgh.pa.us where you asked:

> I dunno.  What's the actual use-case, other than as a bad workaround
> to a problem we should fix a different way?

which was answered by both Robert and I; Robert's reply being mainly
that it's likely to prove useful in the field even if we don't have a
specific use-case today, while I outlined at least feasible use-cases
(minimization, obfuscation).

For my 2c, I don't know that a large use-case is needed for this
particular feature to begin with.

In the end, I feel like this patch has actually been through the ringer
and back because it was brought up in the context of solving a problem
that we'd all like to fix in a better way.  Had it been simply dropped
on us as a "I'd like to not have comments in my pg_dump output and
here's a patch to allow me to do that" I suspect it would have been
committed without anywhere near this level of effort.

I'll just reiterate Robert's ask for people to clarify their positions.

If we're not moving forward, then the next step is to mark the patch as
Rejected.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-22 Thread Peter Geoghegan

On Thu, Jan 18, 2018 at 9:22 AM, Robert Haas  wrote:
> But, hang on a minute.  Why do the workers need to wait for the leader
> anyway?  Can't they just exit once they're done sorting?  I think the
> original reason why this ended up in the patch is that we needed the
> leader to assume ownership of the tapes to avoid having the tapes get
> blown away when the worker exits.  But, IIUC, with sharedfileset.c,
> that problem no longer exists.  The tapes are jointly owned by all of
> the cooperating backends and the last one to detach from it will
> remove them.  So, if the worker sorts, advertises that it's done in
> shared memory, and exits, then nothing should get removed and the
> leader can adopt the tapes whenever it gets around to it.

BTW, I want to point out that using the shared fileset infrastructure
is only a very small impediment to adding randomAccess support. If we
really wanted to support randomAccess for the leader's tapeset, while
recycling blocks from worker BufFiles, it looks like all we'd have to
do is change PathNameOpenTemporaryFile() to open files O_RDWR, rather
than O_RDONLY (shared fileset BufFiles that are opened after export
always have O_RDONLY segments -- we'd also have to change some
assertions, as well as some comments). Overall, this approach looks
straightforward, and isn't something that I can find an issue with
after an hour or so of manual testing.

Now, I'm not actually suggesting we go that way. As you know,
randomAccess isn't used by CREATE INDEX, and randomAccess may never be
needed for any parallel sort operation. More importantly, Thomas made
PathNameOpenTemporaryFile() use O_RDONLY for a reason, and I don't
want to trade one special case (randomAccess disallowed for parallel
tuplesort leader tapeset) in exchange for another one (the logtape.c
calls to BufFileOpenShared() ask for read-write BufFiles, not
read-only BufFiles).

I'm pointing this out because this is something that should increase
confidence in the changes I've proposed to logtape.c. The fact that
randomAccess support *would* be straightforward is a sign that I
haven't accidentally introduced some other assumption, or special
case.

-- 
Peter Geoghegan

Re: Add %r substitution for psql prompts to show recovery status

2018-01-22 Thread Ian Barwick


On 01/23/2018 03:19 AM, Peter Eisentraut wrote:

On 1/9/18 15:36, Peter Eisentraut wrote:

On 12/7/17 19:54, Tatsuo Ishii wrote:

On Wed, Dec 6, 2017 at 9:19 PM, Ian Barwick  wrote:

Note this substitution sends a "pg_is_in_recovery()" query to the server
each time it's encountered; unless there's something I'm overlooking I
think that's the only reliable way to determine current recovery status.


That seems kinda painful.

And what happens in an aborted transaction?


Yeah. I think we need some from help backend for this. For example, a
parameter status message can be used here.  If PostgreSQL moves to the
recovery state or vice versa, PostgreSQL sends the parameter status
message to frontend.


I agree a backend status message is the right way to do this.

We could perhaps report transaction_read_only, if we don't want to add a
new one.


I'm setting this CF item to returned with feedback, since it would
apparently be a much bigger change than then initial patch.


Yup, agree :)

Thanks for the feedback!


Regards

Ian Barwick


--
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Adding column_constraint description in ALTER TABLE synopsis

2018-01-22 Thread Thomas Munro

On Tue, Jan 23, 2018 at 12:41 PM, Thomas Munro
 wrote:
> On Mon, Jan 15, 2018 at 2:32 PM, Stephen Frost  wrote:
>> If someone else would like to review it, that'd be great, otherwise I'll
>> probably get it committed soon.
>
> FYI the v2 doesn't build:
>
> ref/alter_table.sgml:135: parser error : Opening and ending tag
> mismatch: refentry line 6 and synopsis
> 

Here's an update to move the new stuff to the correct side of the
closing "".  Builds for me, and the typesetting looks OK.
I'm not sure why the preexisting bogo-productions have inconsistent
indentation levels (looking at table_constraint_using_index) but
that's not this patch's fault.

-- 
Thomas Munro
http://www.enterprisedb.com

adding_column_constraint_description_in_ALTER_TABLE_synopsis_v3.patch
Description: Binary data

Re: [HACKERS] [PATCH] Call RelationDropStorage() for broader range of object drops.

2018-01-22 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> Unless somebody can prove convincingly that this argument is wrong and
> that there are no other possible problems either, and memorialize that
> argument in the form of a detailed comment, I think we should reject
> this patch.  
> http://postgr.es/m/ca+tgmoa7tja6-mvjwdcb_bouwfkx9apnu+ok9m94tktztym...@mail.gmail.com
> from earlier this morning is good evidence for the proposition that we
> must be careful to document the reasons for the changes we make, even
> seemingly minor ones, if we don't want developers to be guessing in
> ten years whether those changes were actually safe and correct.

Based on Robert's feedback (which, given his comments, I agree with),
I'm going to mark this as rejected.  The approach for dealing with this
seems to be what Alvaro was getting at above where we should have a way
for an extension to get control to handle cleaning things up during
DROP EXTENSION, or call the appropriate event triggers before we drop
the functions, or, well, something else, but certainly not by just
making this change.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: [HACKERS] Adding column_constraint description in ALTER TABLE synopsis

2018-01-22 Thread Thomas Munro

On Mon, Jan 15, 2018 at 2:32 PM, Stephen Frost  wrote:
> If someone else would like to review it, that'd be great, otherwise I'll
> probably get it committed soon.

FYI the v2 doesn't build:

ref/alter_table.sgml:135: parser error : Opening and ending tag
mismatch: refentry line 6 and synopsis

   ^
ref/alter_table.sgml:1534: parser error : chunk is not well balanced

^
ref/alter_table.sgml:1534: parser error : chunk is not well balanced

^

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] INSERT .. ON CONFLICT DO SELECT [FOR ..]

2018-01-22 Thread Stephen Frost

Greetings,

* Marko Tiikkaja (ma...@joh.to) wrote:
> On Thu, Nov 30, 2017 at 6:39 AM, Peter Geoghegan  wrote:
> > On Wed, Nov 29, 2017 at 8:34 PM, Michael Paquier
> >  wrote:
> > > The patch does not currently apply. I am noticing as well that Peter
> > > Geoghegan has registered himself as a reviewer a couple of hours back,
> > > so moved to next CF with waiting on author as status.
> >
> > Marko unregistered me, so I promptly reregistered. You'll have to ask him
> > why.
> 
> Oops.  I didn't mean to do what I did, and I apologize.  I had my months
> mixed up and I thought this commit fest had ended without you having looked
> at the patch, so I assumed you would register in the next CF if you were
> still interested, or someone else could pick it up.
> 
> In any case, *I* don't have any interest in pursuing this patch at this
> point.  I think this is a really good feature, and as far as I know the
> patch is technically solid, or at least was when it still applied.  Can
> someone else take it from here?

Given that there hasn't been any further progress on this, I'm going to
go ahead and mark it as returned with feedback.  It'll be in the
archives and in the CF app if someone decides to pick it up but there
doesn't seem much point leaving it as Waiting for Author in this CF when
the author doesn't have further time/interest in it.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: [HACKERS] A design for amcheck heapam verification

2018-01-22 Thread Peter Geoghegan

Hi,

On Fri, Jan 12, 2018 at 1:41 AM, Andrey Borodin  wrote:
> I've looked into the code and here's my review.
>
> The new functionality works and is useful right now. I believe it should be 
> shipped in the Postgres box (currently, I install it with packet managers).
> The code is nice and well documented.

Thanks.

> I'd be happy, if I could pass argument like ErrorLevel, which would help me 
> to estimate scale of corruption. Or any other way to list more than one 
> error. But, clearly, this functionality is not necessary for this patch.

My previous remarks apply here, too. I don't know how to rate severity
among error messages.

> Also, I couldn't find where is check that ensures that there is a heap tuple 
> for every B-tree leaf tuple. Is it there?

No, there isn't, because that's inherently race-prone. This is not so
bad. If you don't end up discovering a problem the other way around
(going from the heap to the index/Bloom filter), then the only way
that an index tuple pointing to the wrong place can go undetected is
because there is a duplicate heap TID in the index, or the heap TID
doesn't exist in any form.

I actually prototyped a patch that makes bt_index_parent_check()
detect duplicate heap TIDs in an index, by collecting heap TIDs from
the index, sorting them, and scanning the sorted array. That seems
like material for another patch, though.

> I've found few neatpicks in bloom filter:
> 1. Choice of sdbmhash does not seems like a best option from my point of view.

I don't mind changing the second hash function, but I want to
emphasize that this shouldn't be expected to make anything more than a
very small difference. The bloom filter probes are slow because the
memory accesses are random.

There are many hash functions that would be suitable here, and not too
many reasons to prefer one over the other. My choice was based on a
desire for simplicity, and for something that seemed to be in
widespread usage for a long time.

> +   bitset_bytes = Max(1024L * 1024L, bitset_bytes);
> bloom_work_mem was supposed to be the limit? Why we do not honor this limit?

The minimum maintenance_work_mem setting is 1MB anyway.

> 3.
> +   filter = palloc0(offsetof(bloom_filter, bitset) +
> +sizeof(unsigned char) * 
> bitset_bytes);
> sizeof(unsigned char) == 1 by C standard

I know. That's just what I prefer to do as a matter of style.

> 4. function my_bloom_power() returns bit numers, then it's result is powered 
> INT64CONST(1) << bloom_power; back. I'd stik with removing bits in a loop by 
> while(target_bitset_bits & (target_bitset_bits-1)) { 
> target_bitset_bits&=target_bitset_bits-1; } or something like that. Or, if we 
> use code like sdbm hash, fallback to bithacks :) 
> https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2

my_bloom_power() is only called once per Bloom filter. So again, I
think that this is not very important. Thomas Munro talked about using
popcount() in another function, which is from the book Hacker's
Delight. While I'm sure that these techniques have their use, they
just don't seem to make sense when the alternative is simpler code
that is only going to be executed at most once per verification
operation anyway.

> for (i = 0; i < filter->k_hash_funcs; i++)
> {
> /* Accumulate hash value for caller */
> hashes[i] = (hasha + i * hashb + i) % filter->bitset_bits;
> }
> }
> It produces almost same result (hashes 1..k-1 are +1'd), but uses a lot less 
> % operations. Potential overflow is governed by uint64 type.

I prefer to be conservative, and to stick to what is described by
Dillinger & Manolios as much as possible.

> That's all what I've found. I do not know did the patch had all necessary 
> reviewers attention. Please, feel free to change status if you think that 
> patch is ready. From my point of view, the patch is in a good shape.

Michael said he'd do more review. I generally feel this is close, though.

Thanks for the review
-- 
Peter Geoghegan

Re: [HACKERS] make async slave to wait for lsn to be replayed

2018-01-22 Thread Stephen Frost

Greetings Ivan,

* Ivan Kartyshov (i.kartys...@postgrespro.ru) wrote:
> Ants Aasma писал 2017-10-26 17:29:
> >On Mon, Oct 23, 2017 at 12:29 PM, Ivan Kartyshov
> > wrote:
> >>Ants Aasma писал 2017-09-26 13:00:
> >>>
> >>>Exposing this interface as WAITLSN will encode that visibility order
> >>>matches LSN order. This removes any chance of fixing for example
> >>>visibility order of async/vs/sync transactions. Just renaming it so
> >>>the token is an opaque commit visibility token that just happens to be
> >>>a LSN would still allow for progress in transaction management. For
> >>>example, making PostgreSQL distributed will likely want timestamp
> >>>and/or vector clock based visibility rules.
> >>
> >>
> >>I'm sorry I did not understand exactly what you meant.
> >>Please explain this in more detail.
> >
> >Currently transactions on the master become visible when xid is
> >removed from proc array. This depends on the order of acquiring
> >ProcArrayLock, which can happen in a different order from inserting
> >the commit record to wal. Whereas on the standby the transactions will
> >become visible in the same order that commit records appear in wal.
> >The difference can be quite large when transactions are using
> >different values for synchronous_commit. Fixing this is not easy, but
> >we may want to do it someday. IIRC CSN threads contained more
> >discussion on this topic. If we do fix it, it seems likely that what
> >we need to wait on is not LSN, but some other kind of value. If the UI
> >were something like "WAITVISIBILITY token", then we can later change
> >the token to something other than LSN.
> >
> >Regards,
> >Ants Aasma
> 
> It sounds reasonable. I can offer the following version.
> 
> WAIT  LSN  lsn_number;
> WAIT  LSN  lsn_number  TIMEOUT delay;
> WAIT  LSN  lsn_number  INFINITE;
> WAIT  LSN  lsn_number  NOWAIT;
> 
> 
> WAIT  [token]  wait_value [option];
> 
> token - [LSN, TIME | TIMESTAMP | CSN | XID]
> option - [TIMEOUT | INFINITE | NOWAIT]
> 
> Ready to listen to your suggestions.

There were a few different suggestions made, but somehow this thread
ended up in Needs Review again while still having LSNs.  I've changed it
back to Waiting for Author since it seems pretty unlikely that using LSN
is going to be acceptable based on the feedback.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: [HACKERS] Subscription code improvements

2018-01-22 Thread Stephen Frost

Greetings,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Fri, Aug 18, 2017 at 2:09 PM, Masahiko Sawada  
> wrote:
> > On Thu, Aug 17, 2017 at 8:17 PM, Masahiko Sawada  
> > wrote:
> >> On Thu, Aug 17, 2017 at 9:10 AM, Peter Eisentraut
> >>  wrote:
> >>> On 8/8/17 05:58, Masahiko Sawada wrote:
>  Are you planning to work on remaining patches 0005 and 0006 that
>  improve the subscription codes in PG11 cycle? If not, I will take over
>  them and work on the next CF.
> >>>
> >>> Based on your assessment, the remaining patches were not required bug
> >>> fixes.  So I think preparing them for the next commit fest would be great.
> >>>
> >>
> >> Thank you for the comment.
> >>
> >> After more thought, since 0001 and 0003 patches on the first mail also
> >> improve the subscription codes and are worth to be considered, I
> >> picked total 4 patches up and updated them. I'm planning to work on
> >> these patches in the next CF.
> >>
> >
> > Added this item to the next commit fest.
> 
> The patch set fails to apply. Please provide a rebased version. I am
> moving this entry to next CF with waiting on author as status.

Masahiko Sawada, this patch is still in Waiting on Author and hasn't
progressed in a very long time.  Is there any chance you'll be able to
provide an updated patch soon for review?  Or should this patch be
closed out?

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: unique indexes on partitioned tables

2018-01-22 Thread Alvaro Herrera

Alvaro Herrera wrote:
> Version 4 of this patch, rebased on today's master.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 1a02e7f359c94e5db0bb069666b950775cd3e2af Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 6 Nov 2017 17:04:55 +0100
Subject: [PATCH v4] allow indexes on partitioned tables to be unique

---
 doc/src/sgml/ref/alter_table.sgml |   9 +-
 doc/src/sgml/ref/create_table.sgml|  16 +-
 src/backend/bootstrap/bootparse.y |   2 +
 src/backend/catalog/index.c   |  45 -
 src/backend/catalog/pg_constraint.c   |  76 
 src/backend/catalog/toasting.c|   4 +-
 src/backend/commands/indexcmds.c  | 125 +++--
 src/backend/commands/tablecmds.c  |  62 ++-
 src/backend/parser/analyze.c  |   7 +
 src/backend/parser/parse_utilcmd.c|  31 +---
 src/backend/tcop/utility.c|   1 +
 src/include/catalog/index.h   |   5 +-
 src/include/catalog/pg_constraint_fn.h|   4 +-
 src/include/commands/defrem.h |   1 +
 src/include/parser/parse_utilcmd.h|   3 +-
 src/test/regress/expected/alter_table.out |   8 -
 src/test/regress/expected/create_table.out|  12 --
 src/test/regress/expected/indexing.out| 254 +-
 src/test/regress/expected/insert_conflict.out |   2 +-
 src/test/regress/sql/alter_table.sql  |   2 -
 src/test/regress/sql/create_table.sql |   8 -
 src/test/regress/sql/indexing.sql | 151 ++-
 22 files changed, 740 insertions(+), 88 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml 
b/doc/src/sgml/ref/alter_table.sgml
index 286c7a8589..c00fd09fe1 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -804,8 +804,9 @@ ALTER TABLE [ IF EXISTS ] name
   This form attaches an existing table (which might itself be partitioned)
   as a partition of the target table. The table can be attached
   as a partition for specific values using FOR VALUES
-   or as a default partition by using DEFAULT
-  .  For each index in the target table, a corresponding
+   or as a default partition by using
+  DEFAULT.
+  For each index in the target table, a corresponding
   one will be created in the attached table; or, if an equivalent
   index already exists, will be attached to the target table's index,
   as if ALTER INDEX ATTACH PARTITION had been executed.
@@ -820,8 +821,10 @@ ALTER TABLE [ IF EXISTS ] name
   as the target table and no more; moreover, the column types must also
   match.  Also, it must have all the NOT NULL and
   CHECK constraints of the target table.  Currently
-  UNIQUE, PRIMARY KEY, and
   FOREIGN KEY constraints are not considered.
+  UNIQUE and PRIMARY KEY constraints
+  from the parent table will be created in the partition, if they don't
+  already exist.
   If any of the CHECK constraints of the table being
   attached is marked NO INHERIT, the command will fail;
   such a constraint must be recreated without the NO 
INHERIT
diff --git a/doc/src/sgml/ref/create_table.sgml 
b/doc/src/sgml/ref/create_table.sgml
index a0c9a6d257..4c56df8960 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -546,8 +546,8 @@ WITH ( MODULUS numeric_literal, REM
  
 
  
-  Partitioned tables do not support UNIQUE,
-  PRIMARY KEY, EXCLUDE, or
+  Partitioned tables do not support
+  EXCLUDE, or
   FOREIGN KEY constraints; however, you can define
   these constraints on individual partitions.
  
@@ -786,6 +786,11 @@ WITH ( MODULUS numeric_literal, REM
   primary key constraint defined for the table.  (Otherwise it
   would just be the same constraint listed twice.)
  
+
+ 
+  When used on partitioned tables, UNIQUE constraints
+  must include all the columns of the partition key.
+ 
 

 
@@ -814,6 +819,13 @@ WITH ( MODULUS numeric_literal, REM
   about the design of the schema, since a primary key implies that other
   tables can rely on this set of columns as a unique identifier for rows.
  
+
+ 
+  PRIMARY KEY constraints share the restrictions that
+  UNIQUE constraints have when placed on partitioned
+  tables.
+ 
+
 

 
diff --git a/src/backend/bootstrap/bootparse.y 
b/src/backend/bootstrap/bootparse.y
index dfd53fa054..9e81f9514d 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -322,6 +322,7 @@ Boot_DeclareIndexStmt:
stmt,
$4,

Re: unique indexes on partitioned tables

2018-01-22 Thread Alvaro Herrera

Version 4 of this patch, rebased on today's master.

The main change is in dependency handling for the constraints: you now
can't drop a constraint from a partition, if it's attached to a
constraint in the parent (you can't drop indexes from under the
constraints either, but that was true in previous versions too).  Also
some error message rewording.  I added a bunch of additional tests.

I implemented the dependencies using pg_depend entries.  However,
pg_constraint has the notion of "coninhcount" and "conislocal", so I
update those values for the partition's pg_constraint row, because the
error messages are nicer that way.  We could remove those lines from the
patch and the mechanics should be pretty much identical.

I'll review the doc additions, per Simon upthread.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] pg_serial early wraparound

2018-01-22 Thread Stephen Frost

Greetings,

* Thomas Munro (thomas.mu...@enterprisedb.com) wrote:
> On Wed, Jun 28, 2017 at 1:11 PM, Thomas Munro
>  wrote:
> > On Sat, Mar 25, 2017 at 7:27 AM, Thomas Munro
> >  wrote:
> >> On Sat, Mar 25, 2017 at 3:11 AM, Anastasia Lubennikova
> >>  wrote:
> >>> You claim that SLRUs now support five digit segment name, while in slru.h
> >>> at current master I see the following:
> >>>
> >>>  * Note: slru.c currently assumes that segment file names will be four hex
> >>>  * digits.  This sets a lower bound on the segment size (64K transactions
> >>>  * for 32-bit TransactionIds).
> >>>  */
> >
> > I've now complained about that comment in a separate thread.
> >
> >> It's not urgent, it's just cleanup work, so I've now moved it to the
> >> next commitfest.  I will try to figure out a new way to demonstrate
> >> that it works correctly without having to ask a review[er] to disable
> >> any assertions.  Thanks again.
> 
> Rebased again, now with a commit message.  That assertion has since
> been removed (commit ec99dd5a) so the attached test script can once
> again be used to see the contents of pg_serial as the xid goes all the
> way around, if you build with TEST_OLDSERXID defined so that
> predicate.c forces information about xids out to pg_serial.

I've taken a look through this and it seems pretty reasonable.  Would be
great to have someone actually try to duplicate the testing that Thomas
did (though I have little doubt that it works as described) and get it
to Ready-For-Committer state.

Anastasia, thanks for the previous review, any chance you could try
again with the latest patch (against the current state of git)?

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: jsonpath

2018-01-22 Thread Andrew Dunstan

On 01/17/2018 04:01 PM, Andrew Dunstan wrote:
>
> On 01/15/2018 07:24 PM, Andrew Dunstan wrote:
>> On 01/10/2018 05:42 PM, Nikita Glukhov wrote:
>>> Attached new 8th version of jsonpath related patches. Complete
>>> documentation is still missing.
>>>
>>> The first 4 small patches are necessary datetime handling in jsonpath:
>>> 1. simple refactoring, extracted function that will be used later in
>>> jsonpath
>>> 2. throw an error when the input or format string contains trailing
>>> elements
>>> 3. avoid unnecessary cstring to text conversions
>>> 4. add function for automatic datetime type recognition by the
>>> presence of formatting components
>>>
>>> Should they be posted in a separate thread?
>>>
>> The first of these refactors the json/jsonb timestamp formatting into a
>> single function, removing a lot of code duplication. The involves
>> exposing time2tm() and timetz2tm(). I don't think that's a tragedy, so
>> unless there is any objection I propose to commit it shortly.
>>
>> The next three expose a bit more of the date/time API. I'm still
>> reviewing those.
>>
>
> I have committed the first of these patches.
>
> I have reviewed the next three, and I think they are generally good.
> There is no real point in committing them ahead of the jsonpath patch
> since there would be no point in having them at all but for that patch.
>
> Note that these do export the following hitherto internal bits of the
> datetime functionality:
>
> tm2time
> tm2timetz
> AdjustTimeForTypmod
> AdjustTimestampForTypmod
>
> Moving on to review the main jsonpath patch.
>

OK, I understand a good deal more than I did about how the jsopnpath
code works, but the commenting is abysmal.

I got quite nervous about adding a new datetime variant to JsonbValue.
However, my understanding from the code is that this will only ever be
used in an intermediate jsonpath processing result, and it won't be used
in a stored or build jsonb object. Is that right? If it is we need to
say so, and moreover we need to warn coders in the header file about the
restricted use of this variant.  I'm not sure we can enforce it with an
Assert, but If we can we should. I'm not 100% sure that doing it this
way, i.e. by extending and resuing jsonbValue, is desirable, I'd like to
know some of the thinking behind the design.

The encoding of a jsonpath value into a binary string is quite nifty,
but it needs to be documented. Once I understood it better some of my
fears about parser overhead were alleviated.

The use of a function pointer inside JsonPathVariable seems unnecessary
and baroque. It would be much more straightforward to set an isnull flag
in the struct and process the value in an "if" statement accordingly,
avoiding the function pointer altogether.

I am going to be travelling for a few days, then back online for a day
or two, then gone for a week. I hope we can make progress on these
features, but this needs lots more eyeballs, reviewing code as well as
testing, and lots more responsiveness. The whole suite is enormous.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Logical decoding of TRUNCATE

2018-01-22 Thread Petr Jelinek

On 22/01/18 19:45, Petr Jelinek wrote:
> On 19/01/18 12:37, Marco Nenciarini wrote:
>> Hi All,
> + /* logicalrep_rel_close call not needed, because ExecuteTruncateGuts
> +  * already closes the relations. Setting localrel to NULL in the 
> map entry
> +  * is still needed.
> +  */
> + rel->localrel = NULL;

 This is somewhat ugly. Perhaps the ExecuteTruncateGuts should track
 which relations it opened and only close those and the rest should be
 closed by caller? That should also remove the other ugly part which is
 that the ExecuteTruncateGuts modifies the input list. What if caller
 wanted to use those relations it sent as parameter later?
>>>
>>> Agreed
>>>
>>
>> Attached a new version of the patch addressing these issues.
>>
> 
> Besides the small thing I wrote for the 0001 in the other thread I am
> pretty much happy with this now.
> 

Actually on second look, I don't like the new boolean parameter much.
I'd rather we didn't touch the input list and always close only
relations opened inside the ExecuteTruncateGuts().

It may mean more list(s) but the current interface is still not clean.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2018-01-22 Thread Stephen Frost

Pavel,

* Pavel Stehule (pavel.steh...@gmail.com) wrote:
> here is a GUC based patch for plancache controlling. Looks so this code is
> working.

This really could use a new thread, imv.  This thread is a year old and
about a completely different feature than what you've implemented here.

> It is hard to create regress tests. Any ideas?

Honestly, my idea for this would be to add another option to EXPLAIN
which would make it spit out generic and custom plans, or something of
that sort.

Please review your patches before sending them and don't send in patches
which have random unrelated whitespace changes.

> diff --git a/src/backend/utils/cache/plancache.c 
> b/src/backend/utils/cache/plancache.c
> index ad8a82f1e3..cc99cf6dcc 100644
> --- a/src/backend/utils/cache/plancache.c
> +++ b/src/backend/utils/cache/plancache.c
> @@ -1031,6 +1033,12 @@ choose_custom_plan(CachedPlanSource *plansource, 
> ParamListInfo boundParams)
>   if (IsTransactionStmtPlan(plansource))
>   return false;
>  
> + /* See if settings wants to force the decision */
> + if (plancache_mode & PLANCACHE_FORCE_GENERIC_PLAN)
> + return false;
> + if (plancache_mode & PLANCACHE_FORCE_CUSTOM_PLAN)
> + return true;

Not a big deal, but I probably would have just expanded the conditional
checks against cursor_options (just below) rather than adding
independent if() statements.

> diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
> index ae22185fbd..4ce275e39d 100644
> --- a/src/backend/utils/misc/guc.c
> +++ b/src/backend/utils/misc/guc.c
> @@ -3916,6 +3923,16 @@ static struct config_enum ConfigureNamesEnum[] =
>   NULL, NULL, NULL
>   },
>  
> + {
> + {"plancache_mode", PGC_USERSET, QUERY_TUNING_OTHER,
> + gettext_noop("Forces use of custom or generic plans."),
> + gettext_noop("It can control query plan cache.")
> + },
> + &plancache_mode,
> + PLANCACHE_DEFAULT, plancache_mode_options,
> + NULL, NULL, NULL
> + },

That long description is shorter than the short description.  How about:

short: Controls the planner's selection of custom or generic plan.
long: Prepared queries have custom and generic plans and the planner
will attempt to choose which is better; this can be set to override
the default behavior.

(either that, or just ditch the long description)

> diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
> index 87fab19f3c..962895cc1a 100644
> --- a/src/include/utils/plancache.h
> +++ b/src/include/utils/plancache.h
> @@ -143,7 +143,6 @@ typedef struct CachedPlan
>   MemoryContext context;  /* context containing this CachedPlan */
>  } CachedPlan;
>  
> -
>  extern void InitPlanCache(void);
>  extern void ResetPlanCache(void);
>  

Random unrelated whitespace change...

This is also missing documentation updates.

Setting to Waiting for Author, but with those changes I would think we
could mark it ready-for-committer, it seems pretty straight-forward to
me and there seems to be general agreement (now) that it's worthwhile to
have.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: [HACKERS] A design for amcheck heapam verification

2018-01-22 Thread Peter Geoghegan

On Thu, Jan 11, 2018 at 2:14 AM, Andrey Borodin  wrote:
> I like heapam verification functionality and use it right now. So, I'm 
> planning to provide review for this patch, probably, this week.

Great!

> Seems like new check is working 4 orders of magnitudes faster then 
> bt_index_parent_check() and still finds my specific error that 
> bt_index_check() missed.
> From this output I see that there is corruption, but cannot understand:
> 1. What is the scale of corruption
> 2. Are these corruptions related or not

I don't know the answer to either question, and I don't think that
anyone else could provide much more certainty than that, at least when
it comes to the general case. I think it's important to remember why
that is.

When amcheck raises an error, that really should be a rare,
exceptional event. When I ran amcheck on Heroku's platform, that was
what we found - it tended to be some specific software bug in all
cases (turns out that Amazon's EBS is very reliable in the last few
years, at least when it comes to avoiding silent data corruption). In
general, the nature of those problems was very difficult to predict.

The PostgreSQL project strives to provide a database system that never
loses data, and I think that we generally do very well there. It's
probably also true that (for example) Yandex have some very good DBAs,
that take every reasonable step to prevent data loss (validating
hardware, providing substantial redundancy at the storage level, and
so on). We trust the system, and you trust your own operational
procedures, and for the most part everything runs well, because you
(almost) think of everything.

I think that running amcheck at scale is interesting because its very
general approach to validation gives us an opportunity to learn *what
we were wrong about*. Sometimes the reasons will be simple, and some
times they'll be complicated, but they'll always be something that we
tried to account for in some way, and just didn't think of, despite
our best efforts. I know that torn pages can happen, which is a kind
of corruption -- that's why crash recovery replays FPIs. If I knew
what problems amcheck might find, then I probably would have already
found a way to prevent them from happening in the first place - there
are limits to what we can predict. (Google "Ludic fallacy" for more
information on this general idea.)

I try to be humble about these things. Very complicated systems can
have very complicated problems that stay around for a long time
without being discovered. Just ask Intel. While it might be true that
some people will use amcheck as the first line of defense, I think
that it makes much more sense as the last line of defense. So, to
repeat myself -- I just don't know.

> I think an interface to list all or top N error could be useful.

I think that it might be useful if you could specify a limit on how
many errors you'll accept before giving up. I think that it's likely
less useful than you think, though. Once amcheck detects even a single
problem, all bets are off. Or at least any prediction that I might try
to give you now isn't worth much. Theoretically, amcheck should
*never* find any problem, which is actually what happens in the vast
majority of real world cases. When it does find a problem, there
should be some new lesson to be learned. If there isn't some new
insight, then somebody somewhere is doing a bad job.

-- 
Peter Geoghegan

Re: pgsql: Move handling of database properties from pg_dumpall into pg_dum

2018-01-22 Thread Tom Lane

I wrote:
> Specifically, I see failures like this on machines where the prevailing
> locale isn't C or US:

> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 4871; 2618 34337 RULE 
> rtest_emp rtest_emp_del pgbf
> pg_restore: [archiver (db)] could not execute query: ERROR:  invalid input 
> syntax for type money: "$0.00"
> LINE 3: ... ("old"."ename", CURRENT_USER, 'fired'::"bpchar", '$0.00'::"...
>  ^
> Command was: CREATE RULE "rtest_emp_del" AS
> ON DELETE TO "rtest_emp" DO  INSERT INTO "rtest_emplog" ("ename", "who", 
> "action", "newsal", "oldsal")
>   VALUES ("old"."ename", CURRENT_USER, 'fired'::"bpchar", '$0.00'::"money", 
> "old"."salary");

Actually ... maybe what this is pointing out is a pre-existing deficiency
in pg_dump, which is that it's failing to lock down lc_monetary during
restore.  Arguably, we should teach _doSetFixedOutputState to set
lc_monetary to whatever prevailed in the dump session, just as we do
for client_encoding.  I seem now to recall some user complaints about
unsafety of dump/restore for "money" values, which essentially is the
problem we're seeing here.

I think the reason we haven't done this already is fear of putting
platform-dependent lc_monetary values into dump scripts.  That's
certainly an issue, but it seems a minor one: at worst, you'd have
to not use --single-transaction when restoring on a different platform,
so you could ignore an error from the SET command.

While this would fix the specific problem we're seeing in the buildfarm,
I'm thinking we'd still need to do what I said in the previous message,
and change pg_dump so that the restore session will absorb ALTER DATABASE
settings before proceeding.  Otherwise, at minimum, we have a hazard of
differing behaviors in serial and parallel restores.  It might be that
lc_monetary is the only GUC that makes a real difference for this purpose,
but I haven't got much faith in that proposition at the moment.

regards, tom lane

Re: [HACKERS] Proposal: generic WAL compression

2018-01-22 Thread Stephen Frost

Oleg,

I'm not really sure why this is still in Needs Review as a review was
posted and I don't see any follow-up.  I've changed this to be Waiting
for Author.

* Antonin Houska (a...@cybertec.at) wrote:
> Oleg Ivanov  wrote:
> 
> > In order to overcome that issue, I would like to propose the patch, which
> > provides possibility to use another approach of the WAL record
> > construction.
> 
> After having spent several hours reviewing this patch I dare to send the
> following comments:

Thanks for the review Antonin!

> * writeToPtr() and readFromPtr() are applied to the existing code. I think
>   this is a reason for a separate diff, to be applied before the actual patch.

I'm not really a fan of using these, for my 2c.  I'm not sure how others
feel, but having these macros which change one of the arguments to the
macro (by advancing the pointer) doesn't result in a net improvement to
readability for me.

The other review comments also seem pretty reasonable to me.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: [HACKERS] log_destination=file

2018-01-22 Thread Tomas Vondra

On 01/22/2018 08:52 PM, Robert Haas wrote:
> On Sat, Jan 20, 2018 at 7:51 AM, Magnus Hagander  wrote:
>> Finally found myself back at this one, because I still think this is a
>> problem we definitely need to adress (whether with this file or not).
>>
>> The funneling into a single process is definitely an issue.
>>
>> But we don't really solve that problem today wit logging to stderr, do we?
>> Because somebody has to pick up the log as it came from stderr. Yes, you get
>> more overhead when sending the log to devnull, but that isn't really a
>> realistic scenario. The question is what to do when you actually want to
>> collect that much logging that quickly.
> 
> I think it depends on where the bottleneck is. If you're limited by 
> the speed at which a single process can write, shutting the logging 
> collector off and letting everyone write fixes it, because now you
> can bring the CPU cycles of many processes to bear rather than just
> one. If you're limited by the rate at which you can lay the file down
> on disk, then turning off the logging collector doesn't help, but I
> don't think that's the main problem. Now, of course, if you're
> writing the file to disk faster than a single process could do all
> those writes, then you're probably also going to need multiple
> processes to keep up with reading it, parsing it, etc. But that's not
> a problem for PostgreSQL core unless we decide to start shipping an
> in-core log analyzer.
> 

Sorry for the naive question, but which of these bottlenecks are we
actually hitting? I don't recall dealing with an actual production
system where the log collector would be an issue, so I don't have a very
good idea where the actual bottleneck is in this case.

I find it hard to believe the collector would be limited by I/O when
writing the data to disk (buffered sequential writes and all that).

So I guess it's more about the process of collecting the data from all
the processes through pipe, with the PIPE_MAX_PAYLOAD dance, etc.

I plan to do some experiments with log_statements=all, but my guess is
that the main issue is in transferring individual messages (which may be
further split into multiple chunks). What if we instead send the log
messages in larger batches? Of course, that would require alternative
transport mechanism (say, through shared memory) and the collector would
have to merge to maintain ordering. And then there's the issue that the
collector is pretty critical component.

FWIW I'm pretty sure we're not the only project facing the need to
collect high volume of logs from many processes, so how do the other
projects handle this issue?

>>
>> If each backend could actually log to *its own file*, then things would get
>> sped up. But we can't do that today. Unless you use the hooks and build it
>> yourself.
> 
> That seems like a useful thing to support in core.
> 

Yeah, I agree with that.

>> Per the thread referenced, using the hooks to handle the
>> very-high-rate-logging case seems to be the conclusion. But is that still
>> the conclusion, or do we feel we need to also have a native solution?
>>
>> And if the conclusion is that hooks is the way to go for that, then is the
>> slowdown of this patch actually a relevant problem to it?
> 
> I think that if we commit what you've proposed, we're making it harder
> for people who have a high volume of logging but are not currently
> using hooks.  I think we should try really hard to avoid the situation
> where our suggested workaround for a server change is "go write some C
> code and maybe you can get back to the performance you had with
> release N-1".  That's just not friendly.
> 
> I wonder if it would be feasible to set things up so that the logging
> collector was always started, but whether or not backends used it or
> wrote directly to their original stderr was configurable (e.g. dup
> stderr elsewhere, then dup whichever output source is currently
> selected onto stderr, then dup the other one if the config is changed
> later).
> 

I think the hook system is a really powerful tool, but it seems a bit
awkward to force people to use it to improve performance like this ...
That seems like something the core should to out of the box.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-22 Thread Tom Lane

Robert Haas  writes:
> I agree with Noah.  It's true that having unfixed bugs isn't
> particularly good, but it doesn't justify activating a CommitFest
> entry under someone else's name.

Agreed.  The CF app is not a bug tracker.

> If nobody is willing to put in the effort to keep AIX supported under
> XLC, then we should just update the documentation to say that it isn't
> supported.  Our support for that platform is pretty marginal anyway if
> we're only supporting it up through 6.1; that was released in 2007 and
> went out of support in Q2 of last year.

Huh?  We have somebody just upthread saying that they're working out
issues on newer AIX.  We should at least give them time to finish
that research before making any decisions.

regards, tom lane

Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2018-01-22 Thread Stephen Frost

Greetings,

* Sokolov Yura (funny.fal...@postgrespro.ru) wrote:
> diff --git a/src/backend/utils/time/snapmgr.c 
> b/src/backend/utils/time/snapmgr.c
> index 08a08c8e8f..7c3fe7563e 100644
> --- a/src/backend/utils/time/snapmgr.c
> +++ b/src/backend/utils/time/snapmgr.c
> @@ -662,13 +662,16 @@ CopySnapshot(Snapshot snapshot)
>   Snapshotnewsnap;
>   Sizesubxipoff;
>   Sizesize;
> + int xcnt, subxcnt;
> + uint8   xhlog, subxhlog;
>  
>   Assert(snapshot != InvalidSnapshot);
>  
> + xcnt = ExtendXipSizeForHash(snapshot->xcnt, &xhlog);
> + subxcnt = ExtendXipSizeForHash(snapshot->subxcnt, &subxhlog);
>   /* We allocate any XID arrays needed in the same palloc block. */
> - size = subxipoff = sizeof(SnapshotData) +
> - snapshot->xcnt * sizeof(TransactionId);
> - if (snapshot->subxcnt > 0)
> + size = subxipoff = sizeof(SnapshotData) + xcnt * sizeof(TransactionId);
> + if (subxcnt > 0)
>   size += snapshot->subxcnt * sizeof(TransactionId);

Here you've changed to use xcnt instead of snapshot->xcnt for
calculating size, but when it comes to adding in the subxcnt, you
calculate a subxcnt but still use snapshot->subxcnt...?  That doesn't
seem like what you intended to do here.

> diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
> index f9da9e17f5..a5e7d427b4 100644
> --- a/src/backend/utils/time/tqual.c
> +++ b/src/backend/utils/time/tqual.c
> @@ -1451,6 +1451,69 @@ HeapTupleIsSurelyDead(HeapTuple htup, TransactionId 
> OldestXmin)
>  }
>  
>  /*
> + * XidInXip searches xid in xip array.
> + *
> + * If xcnt is smaller than SnapshotMinHash (60), or *xhlog is zero, then 
> simple
> + * linear scan of xip array used. Looks like there is no benifit in hash 
> table
> + * for small xip array.

I wouldn't write '60' in there, anyone who is interested could go look
up whatever it ends up being set to.

I tend to agree with Robert that it'd be nice if simplehash could be
used here instead.  The insertion is definitely more expensive but
that's specifically to try and improve lookup performance, so it might
end up not being so bad.  I do understand that it would end up using
more memory, so that's a fair concern.

I do think this could use with more comments and perhaps having some
Assert()'s thrown in (and it looks like you're removing one..?).

I haven't got a whole lot else to say on this patch, would be good if
someone could spend some more time reviewing it more carefully and
testing it to see what kind of performance they get.  The improvements
that you've posted definitely look nice, especially with the lwlock
performance work.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Security leak with trigger functions?

2018-01-22 Thread Chapman Flack

On 12/14/2006 01:17 PM, Peter Eisentraut wrote:
> Tom Lane wrote:
>> Peter Eisentraut  writes:
>>> By extrapolation of the SQL standard, I'd say we'd need to check
>>> the EXECUTE privilege of the function at run time.
>>
>> Certainly EXECUTE privilege is what to check, but whose privilege?
> 
> ...
> ("The authorization identifier of the owner of the schema that includes 
> the trigger descriptor of TR is pushed onto the authorization stack.")
> 
> PostgreSQL only allows a trigger action of "call this function", so in 
> the SQL standard context that would mean we'd need to check the EXECUTE 
> privilege of the owner of the trigger.  The trick is figuring out who 
> the owner is.  If it's the owner of the table, then TRIGGER privilege 
> is effectively total control over the owner of the table.  If it's 
> whoever created the trigger, it might be useful, but I don't see how 
> that is compatible with the intent of the SQL standard.

Hmm, it's been not quite a dozen years, have there been later threads
that followed up on this discussion?

Is it still accurate to describe the status quo as:

- Triggers can be created only by a role with TRIGGER on the
  relation and EXECUTE on the function, checked at trigger creation
  time.

- Once created, triggers are executed with no local userid change (so,
  just as whatever role issued whatever DML fired the trigger), and
  without a check for EXECUTE on the function at that time, except:

- Internal trigger functions implementing RI checks will execute their
  queries as "the owner of whichever table the query touches (the
  referenced table for verification of FK inserts, the referencing
  table when cascading a PK change)."
  https://www.postgresql.org/message-id/5761.1509733215%40sss.pgh.pa.us

It seems to me that the 2006 conversation was complicated by the
fact that PostgreSQL triggers don't have owners of their own, and
don't have schema-qualified names, as triggers in the SQL standard do.
So the conversation was going in the direction of running as the owner
of the relation, but for the obvious consequence that it would turn
TRIGGER permission into an equivalent of the relation owner's identity.

If the idea of changing the current behavior at all could be thinkable,
could one think about changing it more along SQL standard lines, by
letting a trigger itself have a schema, and executing as the owner
of the schema?

This seems to me not too different from just letting a trigger have
an owner, and executing as that, which could seem less roundabout.
OTOH, going through the schema has the advantage of following the
standard, and may allow a little more flexibility in setting up
delegated authority; a schema with a certain owner can be created,
CREATE(*) on that schema can be selectively granted, enabling grantees
to create trigger declarations in that schema, applying to any
relations they have TRIGGER permission for. Either way, TRIGGER
permission is left with something useful to mean, unlike the
execute-as-relation-owner case, where the meaning of TRIGGER sort of
evaporates.

(*) a different permission might be better here, as many existing
schemas probably grant CREATE to roles today without intending that
they be able to create triggers that execute as the schema owner.
TRIGGER might work, having no existing meaning on a schema.

I think the execute-as-relation-owner idea might not easily fit
situations where different roles with cross-cutting concerns all have
their own reasons to put triggers on a given relation, but should not
otherwise have the same permissions or run with the same id. I think
such situations could easily be accommodated in a model where triggers
have schemas and execute as the schema owner.

I could imagine a graceful migration plan:

1. Give pg_trigger a tgnamespace column, but allow it to be null
   or InvalidOid, meaning a trigger that should have the current
   behavior, executed with no local userid change. Conversely, a
   trigger with a valid tgnamespace gets executed as the owner
   of that schema.

2. pg_upgrade preserves existing triggers as triggers with no
   namespace. The grammar is altered to allow a schema when creating
   a trigger (and to allow ALTER TRIGGER SET SCHEMA).

3. Permission checks for schemaless triggers stay as they are now:
   EXECUTE on the function checked at for the trigger's creator at
   creation time, but not at run time. For triggers in schemas, the
   schema owner is who must have EXECUTE on the function.

4. It could still be possible to avoid runtime permission checks,
   the existence of EXECUTE for the right role being checked at
   trigger creation time, if REVOKE EXECUTE on a trigger function
   includes a check through pg_depend for triggers whose execute
   right would be lost. Those would be (a) any triggers with schemas
   owned by a role having EXECUTE revoked, and (b) any schema-less
   triggers at all (as there's no telling whose permission might
   have been checked at the time the

Re: pgsql: Move handling of database properties from pg_dumpall into pg_dum

2018-01-22 Thread Tom Lane

I wrote:
> Commit 4bd371f6f's hack to emit "SET default_transaction_read_only = off"
> is gone: we now dodge that problem by the expedient of not issuing ALTER
> DATABASE SET commands until after reconnecting to the target database.
> Therefore, such settings won't apply during the restore session.

The buildfarm just pointed out an issue in that: the delayed SET commands
might affect the interpretation of data during the restore session.
Specifically, I see failures like this on machines where the prevailing
locale isn't C or US:

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 4871; 2618 34337 RULE 
rtest_emp rtest_emp_del pgbf
pg_restore: [archiver (db)] could not execute query: ERROR:  invalid input 
syntax for type money: "$0.00"
LINE 3: ... ("old"."ename", CURRENT_USER, 'fired'::"bpchar", '$0.00'::"...
 ^
Command was: CREATE RULE "rtest_emp_del" AS
ON DELETE TO "rtest_emp" DO  INSERT INTO "rtest_emplog" ("ename", "who", 
"action", "newsal", "oldsal")
  VALUES ("old"."ename", CURRENT_USER, 'fired'::"bpchar", '$0.00'::"money", 
"old"."salary");

The dump dumped the money literal with '$' because we set up the
regression database with

ALTER DATABASE regression SET lc_monetary TO 'C';

but at the time we process the CREATE RULE during the restore, we're
still running with whatever the environmental default is.

Not very sure about the best solution here.  Clearly, we'd better absorb
these settings to some extent, but how?

I thought for a bit about having pg_dump emit immediate SET operations
along with the ALTER versions, for any GUCs we deem important for
restore purposes.  This would probably be about a 99% solution, but
we could only do it for ALTER DATABASE SET not for ALTER ROLE IN DATABASE
SET, because we couldn't be very sure which of the latter should apply.
So it would have some edge-case failure conditions.

Or we could reconnect after applying the ALTERs, ensuring that we have
the expected environment.  But then we're back into the problem that
commit 4bd371f6f hoped to solve, namely that "ALTER DATABASE SET
default_transaction_read_only = on" breaks pg_dumpall (and hence
pg_upgrade).

I then thought about splitting the ALTERs into two separate TOC entries,
one for "desirable" GUCs (which we'd apply by reconnecting after emitting
its contents), and one for "undesirables", which we would not reconnect
after.  That seemed a bit messy because of the need to maintain a
blacklist going forward, and the possibility that we couldn't agree on
what to blacklist.

Then it occurred to me that none of this works anyway for parallel
pg_restore --- including parallel pg_upgrade --- because the workers
are going to see all the ALTERs in place.  And that means that commit
4bd371f6f's hack has been broken since parallel upgrade went in, in 9.3.

So at this point I'm feeling that letting pg_dumpall work around
default_transaction_read_only = on is just too much of a hack, and we
should forget about that consideration entirely.  If we do, fixing the
current buildfarm failure is about a one-line patch: just reconnect
after processing DATABASE PROPERTIES.

If someone were to hold a gun to my head and say "make this work", what
I'd probably do is set up the desirable vs undesirable split and then
arrange for the "undesirable" GUCs to be postponed to the end of the
restore, after the parallel portion of the run is complete.  But man,
that's a lot of ugliness.

I think the only reason that 4bd371f6f got in at all was that it was just
a two-line kluge, and we were willing to accept that amount of ugliness
to handle a corner case more nicely.  The cost of solving it after the
pg_dump/dumpall refactoring will be a lot higher, and I for one don't
think it's worth it.

Comments?

regards, tom lane

Re: [HACKERS] Deadlock in XLogInsert at AIX

2018-01-22 Thread Robert Haas

On Sat, Jan 20, 2018 at 6:16 PM, Michael Paquier
 wrote:
>> The most recent patch version is Returned with Feedback.  As a matter of
>> procedure, I discourage creating commitfest entries as a tool to solicit new
>> patch versions.  If I were the author of a RwF patch, I would dislike finding
>> a commitfest entry that I did not create with myself listed as author.
>
> Per my understanding, this is a bug, and we don't want to lose track of
> them.

I agree with Noah.  It's true that having unfixed bugs isn't
particularly good, but it doesn't justify activating a CommitFest
entry under someone else's name.  If they don't decide to work further
on the problem, what are we going to do?  Keep the entry open forever,
nagging them continually until the end of time?  That won't net us
many new contributors.

If nobody is willing to put in the effort to keep AIX supported under
XLC, then we should just update the documentation to say that it isn't
supported.  Our support for that platform is pretty marginal anyway if
we're only supporting it up through 6.1; that was released in 2007 and
went out of support in Q2 of last year.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Custom compression methods

2018-01-22 Thread Ildar Musin

Hello Ildus,


15/01/2018 00:49, Ildus Kurbangaliev пишет:
> Attached a new version of the patch. Main changes:
>
> * compression as an access method
> * pglz as default compression access method.
> * PRESERVE syntax for tables rewrite control.
> * pg_upgrade fixes
> * support partitioned tables.
> * more tests.
>
You need to rebase to the latest master, there are some conflicts. I've
applied it to the three days old master to try it.

As I can see the documentation is not yet complete. For example, there
is no section for ALTER COLUMN ... SET COMPRESSION in ddl.sgml; and
section "Compression Access Method Functions" in compression-am.sgml
hasn't been finished.

I've implemented an extension [1] to understand the way developer would
go to work with new infrastructure. And for me it seems clear. (Except
that it took me some effort to wrap my mind around varlena macros but it
is probably a different topic).

I noticed that you haven't cover 'cmdrop' in the regression tests and I
saw the previous discussion about it. Have you considered using event
triggers to handle the drop of column compression instead of 'cmdrop'
function? This way you would kill two birds with one stone: it still
provides sufficient infrastructure to catch those events (and it
something postgres already has for different kinds of ddl commands) and
it would be easier to test.

Thanks!

[1] https://github.com/zilder/pg_lz4

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-22 Thread Peter Geoghegan

On Mon, Jan 22, 2018 at 3:52 AM, Amit Kapila  wrote:
> The difference is that nodeGather.c doesn't have any logic like the
> one you have in _bt_leader_heapscan where the patch waits for each
> worker to increment nparticipantsdone.  For Gather node, we do such a
> thing (wait for all workers to finish) by calling
> WaitForParallelWorkersToFinish which will have the capability after
> Robert's patch to detect if any worker is exited abnormally (fork
> failure or failed before attaching to the error queue).

FWIW, I don't think that that's really much of a difference.

ExecParallelFinish() calls WaitForParallelWorkersToFinish(), which is
similar to how _bt_end_parallel() calls
WaitForParallelWorkersToFinish() in the patch. The
_bt_leader_heapscan() condition variable wait for workers that you
refer to is quite a bit like how gather_readnext() behaves. It
generally checks to make sure that all tuple queues are done.
gather_readnext() can wait for developments using WaitLatch(), to make
sure every tuple queue is visited, with all output reliably consumed.

This doesn't look all that similar to  _bt_leader_heapscan(), I
suppose, but I think that that's only because it's normal for all
output to become available all at once for nbtsort.c workers. The
startup cost is close to or actually the same as the total cost, as it
*always* is for sort nodes.

-- 
Peter Geoghegan

Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs

2018-01-22 Thread Stephen Frost

Chris,

* Chris Travers (chris.trav...@adjust.com) wrote:
> Attached is the patch as submitted for commitfest.

This has been stuck in Waiting for Author since before the commitfest
started.  I'll try to kick-start it but it seems like it's stuck because
there's a fundamental disagreement about if we should be using include
or exclude for pg_rewind (and, possibly, pg_basebackup, et al).

> Please note, I am not adverse to adding an additional --Include-path
> directive if that avoids backwards-compatibility problems.  However the
> patch is complex enough I would really prefer review on the rest of it to
> start first.  This doesn't strike me as perfectly trivial and I think it is
> easier to review pieces separately.  I have already started on the
> --Include-path directive and could probably get it attached to a later
> version of the patch very soon.
> 
> I would also like to address a couple of important points here:
> 
> 1.  I think default restrictions plus additional paths is the best, safest
> way forward.  Excluding shell-globs doesn't solve the "I need this
> particular config file" very well particularly if we want to support this
> outside of an internal environment.  Shell globs also tend to be overly
> inclusive and so if you exclude based on them, you run into a chance that
> your rewind is corrupt for being overly exclusive.

There's been a number of strong points made as to why pg_basebackup uses
an exclude list but you seem to keep coming back to a concern around
shell globs when considering exclusion lists.

Having an exclude list doesn't mean that shell globs have to be used to
in the exclude list and, indeed, there are no such globs used in the
exclude list for pg_basebackup today- every single file or directory
excluded by pg_basebackup is an explicit file or directory (compared
using a full strcmp()).

Further, the specific concerns raised are about things which are very
clearly able to be dealt with using specific paths
(postgresql.auto.conf, pg_log) and which an administrator is likely to
be familiar with (such as pg_log, in particular).

Personally, I'm a big advocate of *not* having PG's logs in the data
directory and part of it is because, really, the data directory is PG's
purview while logs are for the administrator.  I wasn't ever a fan of
postgresql.auto.conf and I'm not surprised that we're having this
discussion about if it should be pulled over by pg_rewind or not.

> 3.  I think it would be a mistake to tie backup solutions in non-replicated
> environments to replication use cases, and vice versa.  Things like
> replication slots (used for streaming backups) have different
> considerations in different environments.  Rather than build the same
> infrastructure first, I think it is better to support different use cases
> well and then build common infrastructure to support the different cases.
> I am not against building common infrastructure for pg_rewind and
> pg_basebackup.  I am very much against having the core guarantees being the
> exact same.

Backup solutions are already involved in understanding of replicated
environments as they can be used to back up from replicas rather than
the primary (or perhaps using both in some cases), so I'm not really
sure why you're argueing that backups are somehow different between
non-replicated and replicated environments.

As it relates to the question about if the core guarantees between
pg_basebackup and pg_rewind being the same or not, I've not see much
argument for why they'd be different.  The intent of pg_rewind is to do
what pg_basebackup would do, but in a more efficient manner, as
discussed in the documentation for pg_rewind.

I would think the next step here, as Michael suggested very early on in
this thread, would be to bring the exclude list and perhaps logic for
pg_basebackup into the common code and have pg_rewind leverage that
instead of having its own code that largely does the same and then
adding an option to exclude additional items to that.  There's no sense
having pg_rewind operate on files that are going to end up getting wiped
out when recovery starts anyway.  Perhaps there's a use-case for
overriding the exclude list with a 'include' option too, but I'm not
convinced there is.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: [HACKERS] [PATCH] Tests for reloptions

2018-01-22 Thread Nikolay Shaplov

В письме от 29 октября 2017 17:08:29 пользователь Nikolay Shaplov написал:
 
> While merging this commit to my branch, I found two issues that as I think
> needs fixing. Hope this does not require creating new commit request...

To draw more attention to this issue, I've filed it as a new thread
https://www.postgresql.org/message-id/3951112.MkJ1MAZpIQ%40x200m
with a entry for next commit fest.
Hope this will help fixing missing tab.


-- 
Do code for fun.

signature.asc
Description: This is a digitally signed message part.

Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions

2018-01-22 Thread Petr Jelinek

On 22/01/18 20:21, Robert Haas wrote:
> On Mon, Jan 22, 2018 at 10:40 AM, Petr Jelinek
>  wrote:
>> I think this is the only real problem from your list for logical
>> decoding catalog snapshots. But it's indeed quite a bad one. Is there
>> something preventing us to remove the assumption that the CTID of T1 is
>> garbage nobody cares about? I guess turning off HOT for catalogs is not
>> an option :)
> 
> It doesn't seem like a great idea.  For a lot of workloads it wouldn't
> hurt, but for some it might cause a lot of catalog bloat.  Also, Tom
> would probably hate it with a fiery passion, since as I understand it
> he argued passionately for HOT to work for both catalog and
> non-catalog tables back when it was first implemented.
>

I wasn't really serious about that proposal.

> 
>> General problem is that we have couple of assumptions
>> (HeapTupleSatisfiesVacuum being one, what you wrote is another) about
>> tuples from aborted transactions not being read by anybody. But if we
>> want to add decoding of 2PC or transaction streaming that's no longer
>> true so I think we should try to remove that assumption (even if we do
>> it only for catalogs since that what we care about).
> 
> I'm extremely skeptical about this idea.  I think that assumption is
> fairly likely to be embedded in subtle ways in several more places
> that we haven't thought about yet.  Unless we're very, very careful
> about doing something like this, we could end up with a system that
> mostly seems to work but is actually unreliable in ways that can't be
> fixed without non-back-patchable changes.

I am worried about it too, I don't feel like I can judge if we just need
to be very careful or if it's too big a risk so I'll defer to you here.
>> The other option would be to make sure 2PC decoding/tx streaming does
>> not read aborted transaction but that would mean locking the transaction
>> every time we give control to output plugin. Given that output plugin
>> may do network write, this would really mean locking the transaction for
>> and unbounded period of time. That does not strike me as something we
>> want to do, decoding should not interact with frontend transaction
>> management, definitely not this badly.
> 
> Yeah, I don't think it's acceptable to postpone abort indefinitely.
> 

Hmm, maybe less bad and potentially acceptable version of this would be
to make TransactionIdIsInProgress() treat transaction as running if it's
being decoded, that might solve most of the issues. There is still
potential interlocking issue with UPDATEs but they have to change same
tuples as the aborted tx did.

What I mean concretely is that decoding would check if tx has aborted,
if not then it would put somewhere in shmem info about it working on the
tx and once plugin is done with current call it would reset the shmem
info to invalid tx. Then the TransactionIdIsInProgress() would check
this shmem info before checking about abort. If the first check would
show that transaction is aborted then decoding would not bother decoding
it further.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

[PATCH] Minor fixes for reloptions tests

2018-01-22 Thread Nikolay Shaplov


I have a small fixes for already committed reloption test patch
(https://www.postgresql.org/message-id/2615372.orqtEn8VGB@x200m)

First one is missing tab symbol where it should be.

The second is a comment. "The OIDS option is not stored" is not quite correct.
They are stored, but not as reloption. I've added this notion there...


--
Do code for fun.diff --git a/src/test/regress/expected/reloptions.out b/src/test/regress/expected/reloptions.out
index c4107d5..df3c99d 100644
--- a/src/test/regress/expected/reloptions.out
+++ b/src/test/regress/expected/reloptions.out
@@ -77,7 +77,7 @@ SELECT reloptions FROM pg_class WHERE oid = 'reloptions_test'::regclass;
 ALTER TABLE reloptions_test RESET (autovacuum_enabled,
 	autovacuum_analyze_scale_factor);
 SELECT reloptions FROM pg_class WHERE oid = 'reloptions_test'::regclass AND
-reloptions IS NULL;
+   reloptions IS NULL;
  reloptions
 

@@ -86,7 +86,7 @@ reloptions IS NULL;
 -- RESET fails if a value is specified
 ALTER TABLE reloptions_test RESET (fillfactor);
 ERROR:  RESET must not include values for parameters
--- The OIDS option is not stored
+-- The OIDS option is not stored as reloption
 DROP TABLE reloptions_test;
 CREATE TABLE reloptions_test(i INT) WITH (fillfactor , oids=true);
 SELECT reloptions, relhasoids FROM pg_class WHERE oid = 'reloptions_test'::regclass;
diff --git a/src/test/regress/sql/reloptions.sql b/src/test/regress/sql/reloptions.sql
index c9119fd..37fbf41 100644
--- a/src/test/regress/sql/reloptions.sql
+++ b/src/test/regress/sql/reloptions.sql
@@ -47,12 +47,12 @@ SELECT reloptions FROM pg_class WHERE oid = 'reloptions_test'::regclass;
 ALTER TABLE reloptions_test RESET (autovacuum_enabled,
 	autovacuum_analyze_scale_factor);
 SELECT reloptions FROM pg_class WHERE oid = 'reloptions_test'::regclass AND
-reloptions IS NULL;
+   reloptions IS NULL;

 -- RESET fails if a value is specified
 ALTER TABLE reloptions_test RESET (fillfactor);

--- The OIDS option is not stored
+-- The OIDS option is not stored as reloption
 DROP TABLE reloptions_test;
 CREATE TABLE reloptions_test(i INT) WITH (fillfactor , oids=true);
 SELECT reloptions, relhasoids FROM pg_class WHERE oid = 'reloptions_test'::regclass;


signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] log_destination=file

2018-01-22 Thread Robert Haas

On Sat, Jan 20, 2018 at 7:51 AM, Magnus Hagander  wrote:
> Finally found myself back at this one, because I still think this is a
> problem we definitely need to adress (whether with this file or not).
>
> The funneling into a single process is definitely an issue.
>
> But we don't really solve that problem today wit logging to stderr, do we?
> Because somebody has to pick up the log as it came from stderr. Yes, you get
> more overhead when sending the log to devnull, but that isn't really a
> realistic scenario. The question is what to do when you actually want to
> collect that much logging that quickly.

I think it depends on where the bottleneck is.  If you're limited by
the speed at which a single process can write, shutting the logging
collector off and letting everyone write fixes it, because now you can
bring the CPU cycles of many processes to bear rather than just one.
If you're limited by the rate at which you can lay the file down on
disk, then turning off the logging collector doesn't help, but I don't
think that's the main problem.  Now, of course, if you're writing the
file to disk faster than a single process could do all those writes,
then you're probably also going to need multiple processes to keep up
with reading it, parsing it, etc.  But that's not a problem for
PostgreSQL core unless we decide to start shipping an in-core log
analyzer.

> If each backend could actually log to *its own file*, then things would get
> sped up. But we can't do that today. Unless you use the hooks and build it
> yourself.

That seems like a useful thing to support in core.

> Per the thread referenced, using the hooks to handle the
> very-high-rate-logging case seems to be the conclusion. But is that still
> the conclusion, or do we feel we need to also have a native solution?
>
> And if the conclusion is that hooks is the way to go for that, then is the
> slowdown of this patch actually a relevant problem to it?

I think that if we commit what you've proposed, we're making it harder
for people who have a high volume of logging but are not currently
using hooks.  I think we should try really hard to avoid the situation
where our suggested workaround for a server change is "go write some C
code and maybe you can get back to the performance you had with
release N-1".  That's just not friendly.

I wonder if it would be feasible to set things up so that the logging
collector was always started, but whether or not backends used it or
wrote directly to their original stderr was configurable (e.g. dup
stderr elsewhere, then dup whichever output source is currently
selected onto stderr, then dup the other one if the config is changed
later).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pgsql: Move handling of database properties from pg_dumpall into pg_dum

2018-01-22 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jan 22, 2018 at 2:09 PM, Tom Lane  wrote:
>> pg_dumpall with --clean will now drop and recreate the "postgres" and
>> "template1" databases in the target cluster, allowing their locale and
>> encoding settings to be changed if necessary, and providing a cleaner
>> way to set nondefault tablespaces for them than we had before.  This
>> means that such a script must now always be started in the "postgres"
>> database; the order of drops and reconnects will not work otherwise.
>> Without --clean, the script will not adjust any database-level properties
>> of those two databases (including their comments, ACLs, and security
>> labels, which it formerly would try to set).

> Unless we insert some guard that causes a hard error or at least warns
> the user if they do the wrong thing, this seems extremely likely to
> have people (1) try to dump and restore using pg_dumpall and (2)
> complain when it doesn't work.  Actually, it'll work fine if your OS
> username happens to be "postgres", but otherwise not so much.

Not entirely following your concern.  The initial DB's name is "postgres"
regardless of the bootstrap superuser's name.  If you're thinking of the
case where you do "initdb -U joe" and then try to "psql

Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

2018-01-22 Thread Tom Lane

Jing Wang  writes:
> [ support_CURRENT_DATABASE_keyword_v4.6.patch ]

Not surprisingly, this patch no longer applies in the wake of commit
b3f840120.  Rather than rebasing the pg_dump portions, I would suggest
you just drop them.  It is no longer necessary for pg_dump to worry about
this, because it won't emit COMMENT ON DATABASE or SECURITY LABEL FOR
DATABASE except in cases where it just created the target database
and so knows the DB name for sure.  So, using COMMENT ON DATABASE
CURRENT_DATABASE would just create an unnecessary version dependency
in the output script.  (BTW, using the source database's version to decide
what to emit is not per pg_dump policy.  Also, if you did still want to
modify pg_dump to use CURRENT_DATABASE, you'd have to extend the patch
to handle ACLs and some other ALTER DATABASE commands.)

I'm not really sure how much of the use-case for this feature rested
on the pg_dump issue.  It may still be worthwhile for other use cases,
or maybe we should just drop it.

I notice some other patch application failures in dbcommands.c,
objectaddress.c, and user.c, so there's more work to do besides
this ...

regards, tom lane

Re: pgsql: Move handling of database properties from pg_dumpall into pg_dum

2018-01-22 Thread Robert Haas

On Mon, Jan 22, 2018 at 2:09 PM, Tom Lane  wrote:
> pg_dumpall with --clean will now drop and recreate the "postgres" and
> "template1" databases in the target cluster, allowing their locale and
> encoding settings to be changed if necessary, and providing a cleaner
> way to set nondefault tablespaces for them than we had before.  This
> means that such a script must now always be started in the "postgres"
> database; the order of drops and reconnects will not work otherwise.
> Without --clean, the script will not adjust any database-level properties
> of those two databases (including their comments, ACLs, and security
> labels, which it formerly would try to set).

Unless we insert some guard that causes a hard error or at least warns
the user if they do the wrong thing, this seems extremely likely to
have people (1) try to dump and restore using pg_dumpall and (2)
complain when it doesn't work.  Actually, it'll work fine if your OS
username happens to be "postgres", but otherwise not so much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions

2018-01-22 Thread Robert Haas

On Mon, Jan 22, 2018 at 10:40 AM, Petr Jelinek
 wrote:
> I think this is the only real problem from your list for logical
> decoding catalog snapshots. But it's indeed quite a bad one. Is there
> something preventing us to remove the assumption that the CTID of T1 is
> garbage nobody cares about? I guess turning off HOT for catalogs is not
> an option :)

It doesn't seem like a great idea.  For a lot of workloads it wouldn't
hurt, but for some it might cause a lot of catalog bloat.  Also, Tom
would probably hate it with a fiery passion, since as I understand it
he argued passionately for HOT to work for both catalog and
non-catalog tables back when it was first implemented.

CTID chain integrity is important for non-HOT updates too, at least if
any SELECT .. FOR UPDATE/SHARE or UPDATE operations are taking place.
Now, I suppose we're not going to do any of that during logical
decoding, but I guess I can't say I'm absolutely positive that there
are no other problem cases.  If there aren't, you can imagine
disallowing HOT updates only on catalogs and only when xmax is newer
than the newest XID we'll never need to decode, or more narrowly
still, only when it's in a list of XIDs currently being decoded.

Independently of CTID, I also wonder if there's a risk of us trying to
read from a multixact that's been truncated away.  I haven't checked
the multixact code in detail, but I bet there's no provision to keep
around multixacts that are only interesting to aborted transactions.
Conversely, what keeps us from freezing the xmin of a tuple that is
invisible only to some aborted transaction, but visible to all
committed transactions?  Or just marking the page all-visible?

> General problem is that we have couple of assumptions
> (HeapTupleSatisfiesVacuum being one, what you wrote is another) about
> tuples from aborted transactions not being read by anybody. But if we
> want to add decoding of 2PC or transaction streaming that's no longer
> true so I think we should try to remove that assumption (even if we do
> it only for catalogs since that what we care about).

I'm extremely skeptical about this idea.  I think that assumption is
fairly likely to be embedded in subtle ways in several more places
that we haven't thought about yet.  Unless we're very, very careful
about doing something like this, we could end up with a system that
mostly seems to work but is actually unreliable in ways that can't be
fixed without non-back-patchable changes.

> The other option would be to make sure 2PC decoding/tx streaming does
> not read aborted transaction but that would mean locking the transaction
> every time we give control to output plugin. Given that output plugin
> may do network write, this would really mean locking the transaction for
> and unbounded period of time. That does not strike me as something we
> want to do, decoding should not interact with frontend transaction
> management, definitely not this badly.

Yeah, I don't think it's acceptable to postpone abort indefinitely.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] Logical decoding of TRUNCATE

2018-01-22 Thread Petr Jelinek

On 19/01/18 12:37, Marco Nenciarini wrote:
> Hi All,
> 
> Il 18/01/18 17:48, Simon Riggs ha scritto:
>> On 17 January 2018 at 17:07, Petr Jelinek  
>> wrote:
>>
>>> Things I am less convinced about:
>>>
>>> The patch will cascade truncation on downstream if cascade was specified
>>> on the upstream, that can potentially be dangerous and we either should
>>> not do it and only truncate the tables which were truncated upstream
>>> (but without restricting because of FKs), leaving the data inconsistent
>>> on downstream (like we do already with DELETE or UPDATE). Or maybe make
>>> it into either subscription or publication option so that user can chose
>>> the behaviour here as I am sure some people will want it to cascade (but
>>> the default should still IMHO be to not cascade as that's safer).
>>
>> I agree the default should be to NOT cascade.
>>
>> If someone wants cascading as a publication option, that can be added later.
>>
> 
> I agree that not replicating the CASCADE option is the best option
> according to POLA principle.
> 
 + /* logicalrep_rel_close call not needed, because ExecuteTruncateGuts
 +  * already closes the relations. Setting localrel to NULL in the map 
 entry
 +  * is still needed.
 +  */
 + rel->localrel = NULL;
>>>
>>> This is somewhat ugly. Perhaps the ExecuteTruncateGuts should track
>>> which relations it opened and only close those and the rest should be
>>> closed by caller? That should also remove the other ugly part which is
>>> that the ExecuteTruncateGuts modifies the input list. What if caller
>>> wanted to use those relations it sent as parameter later?
>>
>> Agreed
>>
> 
> Attached a new version of the patch addressing these issues.
> 

Besides the small thing I wrote for the 0001 in the other thread I am
pretty much happy with this now.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: [PATCH] session_replication_role = replica with TRUNCATE

2018-01-22 Thread Petr Jelinek

On 19/01/18 12:41, Marco Nenciarini wrote:
> Hi Peter,
> 
> Il 18/01/18 17:30, Peter Eisentraut ha scritto:
>> On 1/17/18 11:33, Petr Jelinek wrote:
 P.S: I'm struggling to understand why we have two possible values of
 session_replication_role settings that behave identically (origin and
 local). I'm unable to see any difference according to the code or the
 documentation, so I'm wondering if we should document that they are the
 same.

>>> It's for use by 3rd party tools (afaik both londiste and slony use it)
>>> to differentiate between replicated and not replicated changes.
>>
>> I have committed some documentation updates and tests to cover this a
>> bit better.
>>
> 
> Thanks, the documentation is a lot clearer now.
> 
> This superseded the documentation change that was in the patch, so I've
> removed it from the v3 version.
> 

Now that we have tests for this, I think it would be good idea to expand
them to cover the new behavior of TRUNCATE in this patch.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: PATCH: psql tab completion for SELECT

2018-01-22 Thread Vik Fearing

On 01/12/2018 06:59 AM, Edmund Horner wrote:
> Hi, here's a new patch.
> 
> This one makes some changes to the criteria for which functions to
> include; namely filtering out trigger functions and those that take or
> return values of type "internal"; and including aggregate and window
> functions.  Some of the other checks could be removed as they were
> covered by the "internal" check.

This looks better.  One minor quibble I have is your use of != (which
PostgreSQL accepts) instead of <> (the SQL standard).  I'll let the
committer worry about that.

> It also uses the server version to determine which query to run.  I
> have not written a custom query for every version from 7.1!  I've
> split up the server history into:
> 
> pre-7.3 - does not even have pg_function_is_visible.  Not supported.
> pre-9.0 - does not have several support functions in types, languages,
> etc.  We don't bother filtering using columns in other tables.
> pre-9.6 - does not have various aggregate support functions.
> 9.6 or later - the full query

I'm not sure how overboard we need to go with this going backwards so
what you did is probably fine.

> I was able to test against 9.2, 9.6, and 10 servers, but compiling and
> testing the older releases was a bit much for a Friday evening.  I'm
> not sure there's much value in support for old servers, as long as we
> can make completion queries fail a bit more gracefully.

pg_dump stops at 8.0, we can surely do the same.

Now for some other points:

Your use of Matches1 is wrong, you should use TailMatches1 instead.
SELECT is a fully reserved keyword, so just checking if it's the
previous token is sufficient.  By using Matches1, you miss cases like
this: SELECT (SELECT 

It also occurred to me that SELECT isn't actually a complete command (or
whatever you want to call it), it's an abbreviation for SELECT ALL.  So
you should check for SELECT, SELECT ALL, and SELECT DISTINCT. (The FROM
tab completion is missing this trick but that's a different patch)
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Re: Add %r substitution for psql prompts to show recovery status

2018-01-22 Thread Peter Eisentraut

On 1/9/18 15:36, Peter Eisentraut wrote:
> On 12/7/17 19:54, Tatsuo Ishii wrote:
>>> On Wed, Dec 6, 2017 at 9:19 PM, Ian Barwick  
>>> wrote:
 Note this substitution sends a "pg_is_in_recovery()" query to the server
 each time it's encountered; unless there's something I'm overlooking I
 think that's the only reliable way to determine current recovery status.
>>>
>>> That seems kinda painful.
>>>
>>> And what happens in an aborted transaction?
>>
>> Yeah. I think we need some from help backend for this. For example, a
>> parameter status message can be used here.  If PostgreSQL moves to the
>> recovery state or vice versa, PostgreSQL sends the parameter status
>> message to frontend.
> 
> I agree a backend status message is the right way to do this.
> 
> We could perhaps report transaction_read_only, if we don't want to add a
> new one.

I'm setting this CF item to returned with feedback, since it would
apparently be a much bigger change than then initial patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Built-in connection pooling

2018-01-22 Thread Tomas Vondra



On 01/22/2018 05:05 PM, Konstantin Knizhnik wrote:
> 
> 
> On 19.01.2018 20:28, Tomas Vondra wrote:
>>>
>>> With pgbouncer you will never be able to use prepared statements which
>>> slows down simple queries almost twice (unless my patch with
>>> autoprepared statements is committed).
>>>
>> I don't see why that wouldn't be possible? Perhaps not for prepared
>> statements with simple protocol, but I'm pretty sure it's doable for
>> extended protocol (which seems like a reasonable limitation).
>>
>> That being said, I think it's a mistake to turn this thread into a
>> pgbouncer vs. the world battle. I could name things that are possible
>> only with standalone connection pool - e.g. pausing connections and
>> restarting the database without interrupting the clients.
>>
>> But that does not mean built-in connection pool is not useful.
>>
>>
>> regards
>>
> 
> Sorry, I do not understand how extended protocol can help to handle 
> prepared statements without shared prepared statement cache or
> built-in connection pooling.
>

The extended protocol makes it easy for pgbouncer (or any other proxy)
to identify prepared statements, so that it can track (a) which prepared
statements a client defined, and (b) what prepared statements are
defined on a connection. And then do something when a client gets
assigned a connection missing some of those.

I do not claim doing this would be trivial, but I don't see why would
that be impossible.

Of course, the built-in pool can handle this in different ways, as it
has access to the internal caches.

> The problems is that now in Postgres most of caches including catalog
> cache, relation cache, prepared statements cache are private to a backend.

True. I wouldn't say it's a "problem" but it's certainly a challenge for
certain features.

> There is certainly one big advantage of such approach: no need to
> synchronize access to the cache. But it seems to be the only advantage.
> And there are a lot of drawbacks:
> inefficient use of memory, complex invalidation mechanism, not
> compatible with connection pooling...
> 

Perhaps. I personally see the minimal synchronization as a quite
valuable feature.

> So there are three possible ways (may be more, but I know only three):
> 1. Implement built-in connection pooling which will be aware of proper
> use of local caches. This is what I have implemented with the proposed
> approach.
> 2. Implicit autoprepare. Clients will not be able to use standard
> Postgres prepare mechanism, but executor will try to generate generic
> plan for ordinary queries. My implementation of this approach is at
> commit fest.
> 3. Global caches. It seems to be the best solution but the most
> difficult to implement.
> 

Perhaps.

> Actually I think that the discussion about the value of built-in
> connection pooling is very important.

I agree, and I wasn't speaking against built-in connection pooling.

> Yes, external connection pooling is more flexible. It allows to 
> perform pooling either at client side either at server side (or even 
> combine two approaches).>
> Also external connection pooling for PostgreSQL is not limited by 
> pgbouncer/pgpool.>
> There are many frameworks maintaining their own connection pool, for 
> example J2EE, jboss, hibernate,...>
> I have a filling than about 70% of enterprise systems working with 
> databases are written in Java and doing connection pooling in their 
> own way.>

True, but that does not really mean we don't need "our" connection
pooling (built-in or not). The connection pools are usually built into
the application servers, so each application server has their own
independent pool. With larger deployments (a couple of application
servers) that quickly causes problems with max_connections.

> So may be embedded connection pooling is not needed for such
> applications...
>
> But what I have heard from main people is that Postgres' poor
> connection pooling is one of the main drawbacks of Postgres
> complicating it's usage in enterprise environments.
> 

Maybe. I'm sure there's room for improvement.

That being said, when enterprise developers tell me PostgreSQL is
missing some feature, 99% of the time it turns out they're doing
something quite stupid.

> In any case please find updated patch with some code cleanup and
> more comments added.
> 

OK, will look.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Remove PARTIAL_LINKING?

2018-01-22 Thread Peter Eisentraut

On 1/21/18 15:43, Andres Freund wrote:
> We've still some support for building the backend with PARTIAL_LINKING /
> SUBSYS.o instead of the current objfiles.txt approach.
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9956ddc19164b02dc1925fb389a1af77472eba5e
> 
> Any objections to removing that? Seems that's largely just removing a
> bit of logic from common.mk.

Yeah, I think it's OK to remove that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Built-in connection pooling

2018-01-22 Thread Konstantin Knizhnik




On 19.01.2018 20:28, Tomas Vondra wrote:


With pgbouncer you will never be able to use prepared statements which
slows down simple queries almost twice (unless my patch with
autoprepared statements is committed).


I don't see why that wouldn't be possible? Perhaps not for prepared
statements with simple protocol, but I'm pretty sure it's doable for
extended protocol (which seems like a reasonable limitation).

That being said, I think it's a mistake to turn this thread into a
pgbouncer vs. the world battle. I could name things that are possible
only with standalone connection pool - e.g. pausing connections and
restarting the database without interrupting the clients.

But that does not mean built-in connection pool is not useful.


regards



Sorry, I do not understand how extended protocol can help to handle 
prepared statements without shared prepared statement cache or built-in 
connection pooling.
The problems is that now in Postgres most of caches including catalog 
cache, relation cache, prepared statements cache are private to a backend.
There is certainly one big advantage of such approach: no need to 
synchronize access to the cache. But it seems to be the only advantage. 
And there are a lot of drawbacks:
inefficient use of memory, complex invalidation mechanism, not 
compatible with connection pooling...


So there are three possible ways (may be more, but I know only three):
1. Implement built-in connection pooling which will be aware of proper 
use of local caches. This is what I have implemented with the proposed 
approach.
2. Implicit autoprepare. Clients will not be able to use standard 
Postgres prepare mechanism, but executor will try to generate generic 
plan for ordinary queries. My implementation of this approach is at 
commit fest.
3. Global caches. It seems to be the best solution but the most 
difficult to implement.


Actually I think that the discussion about the value of built-in 
connection pooling is very important.
Yes, external connection pooling is more flexible. It allows to perform 
pooling either at client side either at server side (or even combine two 
approaches).
Also external connection pooling for PostgreSQL is not limited by 
pgbouncer/pgpool.
There are many frameworks maintaining their own connection pool, for 
example J2EE, jboss, hibernate,...
I have a filling than about 70% of enterprise systems working with 
databases are written in Java and doing connection pooling in their own way.

So may be embedded connection pooling is not needed for such applications...
But what I have heard from main people is that Postgres' poor connection 
pooling is one of the main drawbacks of Postgres complicating it's usage 
in enterprise environments.


In any case please find updated patch with some code cleanup and more 
comments added.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index b945b15..8e8a737 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -813,3 +813,32 @@ build_regtype_array(Oid *param_types, int num_params)
 	result = construct_array(tmp_ary, num_params, REGTYPEOID, 4, true, 'i');
 	return PointerGetDatum(result);
 }
+
+/*
+ * Drop all statements prepared in the specified session.
+ */
+void
+DropSessionPreparedStatements(char const* sessionId)
+{
+	HASH_SEQ_STATUS seq;
+	PreparedStatement *entry;
+	size_t idLen = strlen(sessionId);
+
+	/* nothing cached */
+	if (!prepared_queries)
+		return;
+
+	/* walk over cache */
+	hash_seq_init(&seq, prepared_queries);
+	while ((entry = hash_seq_search(&seq)) != NULL)
+	{
+		if (strncmp(entry->stmt_name, sessionId, idLen) == 0 && entry->stmt_name[idLen] == '.')
+		{
+			/* Release the plancache entry */
+			DropCachedPlan(entry->plansource);
+
+			/* Now we can remove the hash table entry */
+			hash_search(prepared_queries, entry->stmt_name, HASH_REMOVE, NULL);
+		}
+	}
+}
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index a4f6d4d..7f40edb 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -1029,6 +1029,17 @@ pq_peekbyte(void)
 }
 
 /* 
+ *		pq_available_bytes	- get number of buffered bytes available for reading.
+ *
+ * 
+ */
+int
+pq_available_bytes(void)
+{
+	return PqRecvLength - PqRecvPointer;
+}
+
+/* 
  *		pq_getbyte_if_available - get a single byte from connection,
  *			if available
  *
diff --git a/src/backend/port/Makefile b/src/backend/port/Makefile
index aba1e92..56ec998 100644
--- a/src/backend/port/Makefile
+++ b/src/backend/port/Makefile
@@ -21,7 +21,7 @@ subdir = src/backend/port
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = atomics.o dynloader.o pg_sema.o pg_shmem.o $(TAS)
+OBJS = atomics.o dynloader.o pg_sema.o pg_shmem.o send_sock.o $(TAS)
 
 ifeq ($(P

Re: Handling better supported channel binding types for SSL implementations

2018-01-22 Thread Daniel Gustafsson

> On 22 Jan 2018, at 14:05, Michael Paquier  wrote:
> 
> On Mon, Jan 22, 2018 at 11:07:55AM +0100, Daniel Gustafsson wrote:
>> An extensible API makes more sense than on/off (or one on/off call per
>> binding).  Perhaps a way to validate the contents of the list is
>> required though?  Or an assertion on the contents to catch errors
>> during testing.
> 
> Do you have something specific in mind?

Not really, but IIUC a bug in this code could lead to channel binding not being
used for a connection even if the user wanted it (and may miss that ixt didn’t
happen due to not looking at logs etc)?  Considering the limited set of values
(currently) it should be quite simple to check for offending entries, if there
is indeed a risk of “silently” not using channel binding.

cheers ./daniel

Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions

2018-01-22 Thread Petr Jelinek

Hi,

On 20/01/18 00:52, Robert Haas wrote:
> On Fri, Jan 19, 2018 at 5:19 PM, Tomas Vondra
>  wrote:
>> Regarding the HOT issue - I have to admit I don't quite see why A2
>> wouldn't be reachable through the index, but that's likely due to my
>> limited knowledge of the HOT internals.
> 
> The index entries only point to the root tuple in the HOT chain.  Any
> subsequent entries can only be reached by following the CTID pointers
> (that's why they are called "Heap Only Tuples").  After T1 aborts,
> we're still OK because the CTID link isn't immediately cleared.  But
> after T2 updates the tuple, it makes A1's CTID link point to A3,
> leaving no remaining link to A2.
> 
> Although in most respects PostgreSQL treats commits and aborts
> surprisingly symmetrically, CTID links are an exception.  When T2
> comes to A1, it sees that A1's xmax is T1 and checks the status of T1.
> If T1 is still in progress, it waits.  If T2 has committed, it must
> either abort with a serialization error or update A2 instead under
> EvalPlanQual semantics, depending on the isolation level.  If T2 has
> aborted, it assumes that the CTID field of T1 is garbage nobody cares
> about, adds A3 to the page, and makes A1 point to A3 instead of A2.
> No record of the A1->A2 link is kept anywhere *precisely because* A2
> can no longer be visible to anyone.
> 

I think this is the only real problem from your list for logical
decoding catalog snapshots. But it's indeed quite a bad one. Is there
something preventing us to remove the assumption that the CTID of T1 is
garbage nobody cares about? I guess turning off HOT for catalogs is not
an option :)

General problem is that we have couple of assumptions
(HeapTupleSatisfiesVacuum being one, what you wrote is another) about
tuples from aborted transactions not being read by anybody. But if we
want to add decoding of 2PC or transaction streaming that's no longer
true so I think we should try to remove that assumption (even if we do
it only for catalogs since that what we care about).

The other option would be to make sure 2PC decoding/tx streaming does
not read aborted transaction but that would mean locking the transaction
every time we give control to output plugin. Given that output plugin
may do network write, this would really mean locking the transaction for
and unbounded period of time. That does not strike me as something we
want to do, decoding should not interact with frontend transaction
management, definitely not this badly.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: Vacuum: allow usage of more than 1GB of work mem

2018-01-22 Thread Aleksander Alekseev

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

I can confirm that these patches don't break anything; the code is well
documented, has some tests and doesn't do anything obviously wrong. However
I would recommend someone who is more familiar with the VACUUM mechanism than I
do to recheck these patches.

The new status of this patch is: Ready for Committer

Re: [HACKERS] UPDATE of partition key

2018-01-22 Thread Tom Lane

Robert Haas  writes:
> Tom, do you want to double-check that this fixes it for you?

I can confirm that a valgrind run succeeded for me with the patch
in place.

regards, tom lane

Re: [HACKERS] Supporting huge pages on Windows

2018-01-22 Thread Andrew Dunstan



On 01/22/2018 04:16 AM, Magnus Hagander wrote:
>
>
>
>
> I'll be quite happy to retire the XP machine running brolga, currawong
> and frogmouth, if that's the consensus. XP is now long out of support.
> OTOH I have personal experience of it running in many potentially
> critical situations, still (hospitals, for example). 
>
>
> But do they really run PostgreSQL 11 (or 10..) on that? In my
> experience they usually run an old business application on it only.
> That is a problem in itself of course, but that is not our problem in
> this case :)
>
>  
>
> I can, if people
> want, keep the machine running just building the back branches.
>
>
> That's what I suggest we do. Removing the builds of back branches
> would be the equivalent of de-supporting it on a still supported
> branch, and I don't like that idea. But removing the master branch
> means we desupport in 11, which I think is the right thing to do.


OK, I have left the machine running but these three animals will no
longer build 11, only the back branches.

>
>
>  
>
> I should probably look at setting up a modern 32-bit replacement (say
> Windows 10 Pro-32).
>
>
> Unless we want to desupport 32-bit Windows completely. But unless we
> have an actual reason to do so, I think we shouldn't. So yeah if you
> can get a box like that up and running, that'd be much welcome.


It's probably going to have to wait a couple of months, and at least a
couple of weeks.

It's worth noting that the last Windows Server edition that supported
342bit architectures was WS2008. That's quite old now. I wonder how long
they will continue to support it in the consumer-grade Windows versions.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Transaction control in procedures

2018-01-22 Thread Peter Eisentraut

On 1/16/18 15:24, Andrew Dunstan wrote:
> Looks good. Marking ready for committer.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: GSoC 2018 Project Ideas & Mentors - Last Call

2018-01-22 Thread Robert Haas

On Sun, Jan 21, 2018 at 9:02 AM, Amit Kapila  wrote:
> How about adding a project to support Unique capability for the Hash
> Index?

Hmm, that seems pretty hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] parallel.c oblivion of worker-startup failures

2018-01-22 Thread Robert Haas

On Fri, Jan 19, 2018 at 10:59 PM, Amit Kapila  wrote:
> The patch doesn't apply cleanly on the head, but after rebasing it, I
> have reviewed and tested it and it seems to be working fine.  Apart
> from this specific issue, I think we should consider making
> nworkers_launched reliable (at the very least for cases where it
> matters).  You seem to be of opinion that can be a source of subtle
> bugs which I don't deny but now I think we are starting to see some
> use cases of such a mechanism as indicated by Peter G. in parallel
> create index thread.  Even, if we find some way for parallel create
> index to not rely on that assumption, I won't be surprised if some
> future parallelism work would have such a requirement.

Isn't making nworkers_launched reliable exactly what this patch does?
It converts the rare cases in which nworkers_launched would have been
unreliable into errors, precisely so that code like parallel CREATE
INDEX doesn't need to worry about the case where it's inaccurate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Handling better supported channel binding types for SSL implementations

2018-01-22 Thread Michael Paquier

On Mon, Jan 22, 2018 at 11:07:55AM +0100, Daniel Gustafsson wrote:
> An extensible API makes more sense than on/off (or one on/off call per
> binding).  Perhaps a way to validate the contents of the list is
> required though?  Or an assertion on the contents to catch errors
> during testing.

Do you have something specific in mind?

> Nitpicking: In src/backend/libpq/auth.c:CheckSCRAMAuth(), this comment
> reads a bit strange:
> 
> +  * Get the list of channel binding types supported by this SSL
> +  * implementation to determine if server should publish -PLUS
> +  * mechanisms or not.
> 
> Since auth.c isn’t tied to any SSL implementation, shouldn’t it be
> “supported by the configured SSL implementation” or something along
> those lines?

Yes, your words sound better.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] UPDATE of partition key

2018-01-22 Thread Robert Haas

On Mon, Jan 22, 2018 at 2:44 AM, Amit Khandekar  wrote:
> Yes, right, that's what is happening. It is not happening on an Assert
> though (there is no assert in that function). It is happening when we
> try to access the array here :
>
> if (proute->subplan_partition_offsets &&
> proute->subplan_partition_offsets[subplan_index] == i)
>
> Attached is a fix, where I have introduced another field
> PartitionTupleRouting.num_ subplan_partition_offsets, so that above,
> we can add another condition (subplan_index <
> proute->num_subplan_partition_offsets) in order to stop accessing the
> array once we are done with all the offset array elements.
>
> Ran the update.sql test with valgrind enabled on my laptop, and the
> valgrind output now does not show errors.

Tom, do you want to double-check that this fixes it for you?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status)

2018-01-22 Thread Robert Haas

On Sun, Jan 21, 2018 at 5:41 PM, Craig Ringer  wrote:
> If we'd done server_version_num in 9.5, for example, less stuff would've
> broken with pg10.

Yeah, and if Tom hadn't forced it to be reverted from *8.2*, then
every version anyone still cares about would now have support for it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] PATCH: enabling parallel execution for cursors explicitly (experimental)

2018-01-22 Thread Amit Kapila

On Thu, Jan 18, 2018 at 12:59 AM, Robert Haas  wrote:
> On Wed, Jan 17, 2018 at 8:53 AM, Simon Riggs  wrote:
>
> Or we could go the other way and try to keep the workers running.  I
> don't really like that because it ties down those workers for
> potentially a long period of time, but that might be acceptable for
> some users.  The main implementation problem is that somehow we'd need
> to propagate to them an updated version of any state that has changed
> while the query was suspended, such as new combo CIDs that have been
> created in the meantime.  dshash seems like a useful tool toward such
> a goal, but there are details to work out, and there are similar
> problems with everything else that is copied from leader to workers.
> We could possibly prevent these problems from arising by placing
> draconian restrictions on what a backend is allowed to do while a
> parallel cursor is open, such as in your follow-on proposal to lock
> out everything except FETCH.  I'm not really that excited about such a
> thing because it's extremely limiting and still doesn't solve all the
> problems: in particular, after BEGIN ... DECLARE CURSOR PARALLEL ...
> FETCH ... FETCH ... syntax error, there is going to be trouble around
> the state of group locking.  It will be very bad if the workers think
> the transaction is still alive and the leader thinks it is in a new
> transaction and they're all sharing locks.
>

On error, workers should be terminated.  What kind of problem do you
have in mind?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-22 Thread Amit Kapila

On Mon, Jan 22, 2018 at 10:36 AM, Peter Geoghegan  wrote:
> On Sun, Jan 21, 2018 at 6:34 PM, Amit Kapila  wrote:
>>> Why is this okay for Gather nodes, though? nodeGather.c looks at
>>> pcxt->nworkers_launched during initialization, and appears to at least
>>> trust it to indicate that more than zero actually-launched workers
>>> will also show up when "nworkers_launched > 0". This trust seems critical
>>> when parallel_leader_participation is off, because "node->nreaders ==
>>> 0" overrides the parallel_leader_participation GUC's setting (note
>>> that node->nreaders comes directly from pcxt->nworkers_launched). If
>>> zero workers show up, and parallel_leader_participation is off, but
>>> pcxt->nworkers_launched/node->nreaders is non-zero, won't the Gather
>>> never make forward progress?
>>
>> Ideally, that situation should be detected and we should throw an
>> error, but that doesn't happen today.  However, it will be handled
>> with Robert's patch on the other thread for CF entry [1].
>
> I knew that, but I was confused by your sketch of the
> WaitForParallelWorkerToAttach() API [1]. Specifically, your suggestion
> that the problem was unique to nbtsort.c, or was at least something
> that nbtsort.c had to take a special interest in. It now appears more
> like a general problem with a general solution, and likely one that
> won't need *any* changes to code in places like nodeGather.c (or
> nbtsort.c, in the case of my patch).
>
> I guess that you meant that parallel CREATE INDEX is the first thing
> to care about the *precise* number of nworkers_launched -- that is
> kind of a new thing. That doesn't seem like it makes any practical
> difference to us, though. I don't see why nbtsort.c should take a
> special interest in this problem, for example by calling
> WaitForParallelWorkerToAttach() itself. I may have missed something,
> but right now ISTM that it would be risky to make the API anything
> other than what both nodeGather.c and nbtsort.c already expect (that
> they'll either have nworkers_launched workers show up, or be able to
> propagate an error).
>

The difference is that nodeGather.c doesn't have any logic like the
one you have in _bt_leader_heapscan where the patch waits for each
worker to increment nparticipantsdone.  For Gather node, we do such a
thing (wait for all workers to finish) by calling
WaitForParallelWorkersToFinish which will have the capability after
Robert's patch to detect if any worker is exited abnormally (fork
failure or failed before attaching to the error queue).


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2018-01-22 Thread Daniel Gustafsson

> On 15 Jan 2018, at 02:33, Michael Paquier  wrote:
> On Fri, Jan 12, 2018 at 11:35:48PM +0100, Daniel Gustafsson wrote:
>> On 11 Jan 2018, at 09:01, Michael Paquier  wrote:

 One open question from this excercise is how to write a good test for
 this.  It can either be made part of the already existing test queries
 or a separate suite.  I’m leaning on the latter simply because the
 case-flipping existing tests seems like something that can be cleaned
 up years from now accidentally because it looks odd.
>>> 
>>> Adding them into src/test/regress/ sounds like a good plan to me.
>> 
>> If there is interest in this patch now that the list exists and aids review, 
>> I
>> can turn the list into a proper test that makes a little more sense than the
>> current list which is rather aimed at helping reviewers.
> 
> Sorry if my words were hard to catch here. What I mean here is to add in
> each command's test file the set of commands which check the
> compatibility. There is no need to test all the options in my opinion,
> as just testing one option is enoughto show the purpose. So for example
> to cover the grounds of DefineAggregate(), you could add a set of
> commands in create_aggregate.sql. For DefineCollation(), those can go in
> collate.sql, etc.

Gotcha.  I’ve added some tests to the patch.  The test for CREATE FUNCTION was
omitted for now awaiting the outcome of the discussion around isStrict
isCachable.

Not sure how much they’re worth in "make check” though as it’s quite easy to
add a new option checked with pg_strcasecmp which then isn’t tested.  Still, it
might aid review so definitely worth it.

cheers ./daniel



defname_strcmp-v6.patch
Description: Binary data

Re: [HACKERS] PoC: custom signal handler for extensions

2018-01-22 Thread Maksim Milyutin


Hello!


On 11.01.2018 18:53, Arthur Zakirov wrote:

The relationship between custom signals and
assigned handlers (function addresses) is replicated from postmaster to
child processes including working backends.

I think this happens only if a custom signal registered during initializing 
shared_preload_libraries.
But from RegisterCustomProcSignalHandler()'s implementation I see that you can 
register the signal anytime. Am I wrong?

If so then custom signal handlers may not work as expected.


Yeah, thanks. Added checks on 
process_shared_preload_libraries_in_progress flag.



What is purpose of AssignCustomProcSignalHandler() function? This function can 
erase a handler set by another extension.
For example, extension 1 set handler passing reason PROCSIG_CUSTOM_1, and 
extension 2 set another handler passing reason PROCSIG_CUSTOM_1 too. Here the 
handler of the extension 2 will be purged.


+
+   Assert(reason >= PROCSIG_CUSTOM_1 && reason <= PROCSIG_CUSTOM_N);
+

I think it is not good to use asserts within AssignCustomProcSignalHandler() 
and GetCustomProcSignalHandler(). Because this functions can be executed by an 
external extension, and it may pass a reason outside this boundary. It will be 
better if the reason will be checked in runtime.


I was guided by the consideration that assertions define preconditions 
for input parameter (in our case, procsignal) of functions. Meeting 
these preconditions have to be provided by function callers. Since these 
checks is not expensive it will be good to replace their to ereport calls.


Updated patch is attached in [1].

1. 
https://www.postgresql.org/message-id/37a48ac6-aa14-4e36-5f08-cf8581fe1382%40gmail.com


--
Regards,
Maksim Milyutin

Re: [HACKERS] PoC: custom signal handler for extensions

2018-01-22 Thread Maksim Milyutin


On 12.01.2018 18:51, Teodor Sigaev wrote:

In perspective, this mechanism provides the low-level instrument to 
define remote procedure call on extension side. The simple RPC that 
defines effective userid on remote backend (remote_effective_user 
function) is attached for example.


Suppose, it's useful patch. Some notes:

1) AssignCustomProcSignalHandler is unneeded API function, easy 
replaced by 
GetCustomProcSignalHandler/ReleaseCustomProcSignal/RegisterCustomProcSignalHandler 
functions, but in other hand, it isn't looked as widely used feature 
to reassign custom signal handler.


Agreed. AssignCustomProcSignalHandler is unnecessary. Also I have 
removed GetCustomProcSignalHandler as not see any application of this 
function.


2) Naming RegisterCustomProcSignalHandler and ReleaseCustomProcSignal  
is not self-consistent. Other parts suggest pair Register/Unregister 
or Aquire/Release. Seems, Register/Unregister pair is preferred here.


Thanks for notice. Fixed.

3) Agree that Assert(reason >= PROCSIG_CUSTOM_1 && reason <= 
PROCSIG_CUSTOM_N) should be replaced with ereport. By the way, 
ReleaseCustomProcSignal() is a single place where there isn't a range 
check of reason arg


Oops, I missed this assert check.
I considered assert check in user available routines accepting 
procsignal as a precondition for routine's clients, i.e. violation of 
this check have to involve uncertain behavior. This checks is not 
expensive itself therefore it makes sense to replace their to runtime 
checks via ereport calls.


4) CustomSignalInterrupt() - play with errno and SetLatch() seems 
unneeded here - there isn't call of handler of custom signal, SetLatch 
will be called several lines below


I have noticed that even simple HandleNotifyInterrupt() and 
HandleParallelMessageInterrupt() routines add at least 
SetLatch(MyLatch). I think it makes sense to leave this call in our 
case. Perhaps, I'm wrong. I don't understand entirely the intention of 
SetLatch() in those cases.


5) Using a special memory context for handler call looks questionably, 
I think that complicated handlers could manage memory itself (and will 
do, if they need to store something between calls). So, I prefer to 
remove context.


Fixed. But in this case we have to explicitly reflect in documentation 
the ambiguity of running memory context under signal handler and the 
intention to leave memory management on extension's developer.


6) As I can see, all possible (not only custom) signal handlers could 
perfectly use this API, or, at least, be store in CustomHandlers 
array, which could be renamed to InterruptHandlers, for example. Next, 
custom handler type is possible to make closier to built-in handlers, 
let its signature extends to
void (*ProcSignalHandler_type) (ProcSignalReason reason) as others. It 
will allow to use single handler to support several reasons.


OK, fixed.

7) Suppose, API allows to use different handlers in different 
processes for the same reason, it's could be reason of confusion. I 
suggest to restrict Register/Unregister call only for 
shared_preload_library, ie only during startup.


Yeah, added checks on process_shared_preload_libraries_in_progress flag. 
Thanks for notice.


8) I'd like to see an example of usage this API somewhere in contrib 
in exsting modules. Ideas?


Besides of usage in the extension pg_query_state [1] that provides the 
way to extract query state from running backend, I see the possibility 
with this patch to implement statistical sampling-based profiler for 
plpgsql functions on extension side. If we could get a call stack of 
plpgsql functions on interruptible backend then there are no obstacles 
to organize capturing of stack frames at some intervals over fixed 
period of time defined by arguments in extension's function.



I have attached a new version of patch and updated version of 
remote_effective_user function implementation that demonstrates the 
usage of custom signals API.



1. https://github.com/postgrespro/pg_query_state

--
Regards,
Maksim Milyutin

diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index b0dd7d1..ae7d007 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -60,12 +60,20 @@ typedef struct
  */
 #define NumProcSignalSlots	(MaxBackends + NUM_AUXPROCTYPES)
 
+#define IsCustomProcSignalReason(reason) \
+	((reason) >= PROCSIG_CUSTOM_1 && (reason) <= PROCSIG_CUSTOM_N)
+
+static bool CustomSignalPendings[NUM_CUSTOM_PROCSIGNALS];
+static ProcSignalHandler_type CustomInterruptHandlers[NUM_CUSTOM_PROCSIGNALS];
+
 static ProcSignalSlot *ProcSignalSlots = NULL;
 static volatile ProcSignalSlot *MyProcSignalSlot = NULL;
 
 static bool CheckProcSignal(ProcSignalReason reason);
 static void CleanupProcSignalState(int status, Datum arg);
 
+static void CheckAndSetCustomSignalInterrupts(void);
+
 /*
  * ProcSignalShmemSize
  *		Compute space needed for procsignal's shared memory
@@ -166,6 +17

Re: Handling better supported channel binding types for SSL implementations

2018-01-22 Thread Daniel Gustafsson

> On 22 Jan 2018, at 08:29, Michael Paquier  wrote:

> Attached is a patch which is an attempt to make this choice cleaner for
> new SSL implementations. As we are (rightly!) calling the APIs to fetch
> the channel binding data only when necessary, I think that we need an
> extra API for SSL implementations to let the server decide if channel
> binding mechanisms should be published or not. There could be multiple
> possibilities, like making this API return only a boolean flag. However
> I have made a more extensible choice by having each SSL implementation
> build a list of supported channel bindings.

An extensible API makes more sense than on/off (or one on/off call per
binding).  Perhaps a way to validate the contents of the list is required
though?  Or an assertion on the contents to catch errors during testing.

Nitpicking: In src/backend/libpq/auth.c:CheckSCRAMAuth(), this comment reads a
bit strange:

+* Get the list of channel binding types supported by this SSL
+* implementation to determine if server should publish -PLUS
+* mechanisms or not.

Since auth.c isn’t tied to any SSL implementation, shouldn’t it be “supported
by the configured SSL implementation” or something along those lines?

cheers ./daniel

Re: stricter MCV tests for uniform distributions (was Re: MCV lists for highly skewed distributions)

2018-01-22 Thread Dean Rasheed

On 22 January 2018 at 08:07, John Naylor  wrote:
> On 1/21/18, Dean Rasheed  wrote:
>> It occurs to me that maybe a better test to exclude a value from the
>> MCV list would be to demand that its relative standard error not be
>> too high. Such a test, in addition to the existing tests, might be
>> sufficient to solve the opposite problem of too many values in the MCV
>> list, because the real problem there is including a value after having
>> seen relatively few occurrences of it in the sample, and thus having a
>> wildly inaccurate estimate for it. Setting a bound on the relative
>> standard error would mean that we could have a reasonable degree of
>> confidence in estimates produced from the sample.
>
> If you don't mind, what would the math look like for that?

Using the same syntax as before:

N = Total rows in table (population size)
n = Number of rows sampled
x = Number of times a particular value appears in the sample
p = x/n = Frequency of the value in the sample

So that the standard error of the proportion is

SE = sqrt(p*(1-p)/n) * sqrt((N-n)/(N-1))

Then the relative standard error (which is usually expressed as a percentage) is

RSE = 100 * SE / p

So if we were to demand that the relative standard error was less
than, say, 10%, then the constraint would just be

SE < 0.1 * p

Note:

* This formula not valid if x is very small (see what I wrote about
being able to approximate the distribution of p with a normal
distribution). So we should also enforce the "rule of thumb" x >= 10.

* The frequency p that we're storing is the count divided by the
overall sample size. So the values for N and n above should not be the
counts that exclude NULLs. As far as this logic is concerned, NULLs
(and too-wide values) are just values not equal to value under
consideration. Thus it appears, from just a quick glance at your
patch, that you're using the wrong values for N and n.

* The RSE is a monotonically decreasing function of p in the range
[0,1], so an upper bound on the RSE equates to a lower bound on the
number of occurrences of the value.

This last point gives me a different idea. Instead of applying this
test *in addition to* the existing tests, how about applying it
*instead of* the existing tests. That is, we keep all MCVs that appear
sufficiently often that we can be reasonably confident in the
estimates produced from their sample frequencies, regardless of
whether or not they are more common than average (the average being
fairly meaningless for a highly skewed distribution anyway).

This is still keeping the most common values, but now we'd be saying
that we keep any value that appears sufficiently often in the sample
that we can extrapolate its sample frequency to produce a reasonably
accurate estimate of the population frequency, and discard values for
which the estimate is likely to be inaccurate.

I have not tested this idea at all, but it seems like it might be
worth considering. It has the nice property that the test depends
entirely on how often the value appears, rather than on other
previously computed statistics, such as Ndistinct.

Doing a quick test in pure SQL, using the highly skewed distribution
Jeff Janes gave in the other thread, with the default sample size of
30,000:

with samples as (
  select floor(-log(random())/log(2))::int  as who
  from generate_series(1,3)
), freqs as (
  select who, count(*) as x, count(*)/3::float8 as p
  from samples group by who
), stats as (
  select *, sqrt(p*(1-p)/3) *
sqrt((1000-3)::float8/(1000-1)) as se
  from freqs
)
select *, (1000*p)::int||'+/-'||(2*se*1000)::int as "95% interval",
   case when x >=10 and se < 0.1*p then 'KEEP' else 'DISCARD' end
from stats order by p desc limit 100;

it pretty consistently keeps the 8 most common values:

 who |   x   |  p   |  se  |  95%
interval   |  case
-+---+--+--+-+-
   0 | 15017 |0.5005667 |  0.00288241625942075 |
5005667+/-57648 | KEEP
   1 |  7607 |0.2535667 |  0.00250800597590887 |
2535667+/-50160 | KEEP
   2 |  3713 |0.1237667 | 0.0018984483 |
1237667+/-37969 | KEEP
   3 |  1855 |   0.0618 |   0.0013884757600711 |
618333+/-27770  | KEEP
   4 |   914 |   0.03046667 | 0.000990788179299791 |
304667+/-19816  | KEEP
   5 |   448 |   0.0149 | 0.000699194759916533 |
149333+/-13984  | KEEP
   6 |   229 |  0.00763 | 0.000501741670388358 |
76333+/-10035   | KEEP
   7 |   108 |   0.0036 | 0.000345267009604061 |
36000+/-6905| KEEP
   8 |46 |  0.00153 | 0.000225565173744715 |
15333+/-4511| DISCARD
   9 |34 |  0.00113 | 0.000193963300230354 |
11333+/-3879| DISCARD
  10 |17 | 0.000567 | 0.000137191663419411 |
5667+/-2744 | DISCARD
  11 |11 | 0.000367 | 0.000110367969704201 |
3667+/-2207 | D

Re: pgbench - add \if support

2018-01-22 Thread Pavel Stehule

2018-01-22 10:45 GMT+01:00 Fabien COELHO :

>
> few scripting features doesn't mean scripting language. \if in psql is nice
>> feature that reduce duplicate code, unreadable code, and helps with
>> deployment and test scripts. pgbench and psql should to have similar
>> environment - and I am thinking so \if should be there.
>>
>> Using Lua is not bad idea - in psql too - I though about it much, but in
>> this case is better to start from zero.
>>
>
> Yep. Having another versatile (interactive) client would not be a bad
> thing. I'm still wondering how to conciliate any scripting language with
> "bare SQL". The backslash-oriented syntax already used for psql & pgbench
> seems the only available option. Otherwise ISTM that it is back to a
> standard library oriented client access with import, connect, exec...
> whatever set of function already provided by standard libraries (psycopg
> for python, ...).
>

The implementation of some parts in C is frustrating - mainly tab complete.
There is not possibility to create own backslash command - or enhance
buildin commands. Is not possible to customize output.

So some hypothetical client can be implemented like some core C module -
for fast processing of tabular data and all other can be implemented in
Lua. I can imagine so this client can support some input forms, for bar
menu, for some simple reports. It can be more like FoxPro client than
command line only client. In few years we can use ncurses everywhere, and
then there are possibility to write rich TUI client.

Regards

Pavel


> --
> Fabien.
>

Re: pgbench - add \if support

2018-01-22 Thread Fabien COELHO




few scripting features doesn't mean scripting language. \if in psql is nice
feature that reduce duplicate code, unreadable code, and helps with
deployment and test scripts. pgbench and psql should to have similar
environment - and I am thinking so \if should be there.

Using Lua is not bad idea - in psql too - I though about it much, but in
this case is better to start from zero.


Yep. Having another versatile (interactive) client would not be a bad 
thing. I'm still wondering how to conciliate any scripting language with 
"bare SQL". The backslash-oriented syntax already used for psql & pgbench 
seems the only available option. Otherwise ISTM that it is back to a 
standard library oriented client access with import, connect, exec... 
whatever set of function already provided by standard libraries (psycopg 
for python, ...).


--
Fabien.

Re: [HACKERS] Supporting huge pages on Windows

2018-01-22 Thread Magnus Hagander

On Mon, Jan 22, 2018 at 4:24 AM, Thomas Munro  wrote:

> On Mon, Jan 22, 2018 at 3:45 AM, Magnus Hagander 
> wrote:
> > I got myself a working build env now so I can at least verify it builds,
> > which it does.
> >
> > With that, I'm pushing this. Let's see what the buildfarm thinks of it.
> And
> > if others end up complaining about the platform drop, but I doubt that.
> >
> > Once again, apologies for the delay, and thanks for the patch!
>
> +To start the database server on the command prompt as a
> standalone process,
> +not as a Windows service, the command prompt must be run as
> an administrator
> +User Access Control (UAC) must be disabled. When the UAC is
> enabled, the normal
> +command prompt revokes the user right Lock Pages in Memory
> when started.
>
> Is that first sentence  missing the word "and" after "administrator"?
>
>
Nope.

But it is missing the word "or".

Fixed, thanks for the spot!

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [HACKERS] Supporting huge pages on Windows

2018-01-22 Thread Magnus Hagander

On Mon, Jan 22, 2018 at 12:13 AM, Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

>
>
> On 01/21/2018 01:02 PM, Andres Freund wrote:
> > Hi,
> >
> > On 2018-01-21 13:42:13 +0200, Magnus Hagander wrote:
> >> To add some more notes on this. Again, the API appears in Vista/2003.
> >> Windows Vista went EOL (out of extended support even) in April 2017,
> >> Windows 2003 did so in July 2015. Those are the versions that it's *in*
> --
> >> obviously the versions without it are even older.
> >>
> >> Our binaries only support Windows 2008 and up (at least the ones by EDB,
> >> which are the ones that have a supported-version matrix documented on
> our
> >> site).
> >>
> >> We have traditionally supported older versions of Windows as long as
> people
> >> build from source. But perhaps I'm way overreading that and we should
> just
> >> bite the bullet, commit this patch, and declare those platforms as
> >> completely dead by PostgreSQL 11?
> > Yea, I think it's beyond time that we declare some old windows versions
> > dead. There's enough weird behaviour in supported versions of windows
> > (especially its socket API) that I really don't want to support more
> > than necessary. And supporting versions that've been out of date for a
> > while seems more than unnecessary.
> >
>
>
> I'll be quite happy to retire the XP machine running brolga, currawong
> and frogmouth, if that's the consensus. XP is now long out of support.
> OTOH I have personal experience of it running in many potentially
> critical situations, still (hospitals, for example).


But do they really run PostgreSQL 11 (or 10..) on that? In my experience
they usually run an old business application on it only. That is a problem
in itself of course, but that is not our problem in this case :)



> I can, if people
> want, keep the machine running just building the back branches.
>

That's what I suggest we do. Removing the builds of back branches would be
the equivalent of de-supporting it on a still supported branch, and I don't
like that idea. But removing the master branch means we desupport in 11,
which I think is the right thing to do.




> I should probably look at setting up a modern 32-bit replacement (say
> Windows 10 Pro-32).
>

Unless we want to desupport 32-bit Windows completely. But unless we have
an actual reason to do so, I think we shouldn't. So yeah if you can get a
box like that up and running, that'd be much welcome.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: PATCH: Exclude unlogged tables from base backups

2018-01-22 Thread Masahiko Sawada

On Thu, Dec 14, 2017 at 11:58 PM, Robert Haas  wrote:
> On Tue, Dec 12, 2017 at 8:48 PM, Stephen Frost  wrote:
>> If the persistence is changed then the table will be written into the
>> WAL, no?  All of the WAL generated during a backup (which is what we're
>> talking about here) has to be replayed after the restore is done and is
>> before the database is considered consistent, so none of this matters,
>> as far as I can see, because the drop table or alter table logged or
>> anything else will be in the WAL that ends up getting replayed.
>
> I can't see a hole in this argument.  If we copy the init fork and
> skip copying the main fork, then either we skipped copying the right
> file, or the file we skipped copying will be recreated with the
> correct contents during WAL replay anyway.
>
> We could have a problem if wal_level=minimal, because then the new
> file might not have been WAL-logged; but taking an online backup with
> wal_level=minimal isn't supported precisely because we won't have WAL
> replay to fix things up.
>
> We would also have a problem if the missing file caused something in
> recovery to croak on the grounds that the file was expected to be
> there, but I don't think anything works that way; I think we just
> assume missing files are an expected failure mode and silently do
> nothing if asked to remove them.
>

I also couldn't see a problem in this approach.

Here is the first review comments.

+   unloggedDelim = strrchr(path, '/');

I think it doesn't work fine on windows. How about using
last_dir_separator() instead?


+ * Find all unlogged relations in the specified directory and return
their OIDs.

What the ResetUnloggedrelationsHash() actually returns is a hash
table. The comment of this function seems not appropriate.


+   /* Part of path that contains the parent directory. */
+   int parentPathLen = unloggedDelim - path;
+
+   /*
+* Build the unlogged relation hash if the parent path is either
+* $PGDATA/base or a tablespace version path.
+*/
+   if (strncmp(path, "./base", parentPathLen) == 0 ||
+   (parentPathLen >=
(sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) &&
+strncmp(unloggedDelim -
(sizeof(TABLESPACE_VERSION_DIRECTORY) - 1),
+TABLESPACE_VERSION_DIRECTORY,
+
sizeof(TABLESPACE_VERSION_DIRECTORY) - 1) == 0))
+   unloggedHash = ResetUnloggedRelationsHash(path);
+   }

How about using get_parent_directory() to get parent directory name?
Also, I think it's better to destroy the unloggedHash after use.


+   /* Exclude all forks for unlogged tables except the
init fork. */
+   if (unloggedHash && ResetUnloggedRelationsMatch(
+   unloggedHash, de->d_name) == unloggedOther)
+   {
+   elog(DEBUG2, "unlogged relation file \"%s\"
excluded from backup",
+de->d_name);
+   continue;
+   }

I think it's better to log this debug message at DEBUG2 level for
consistency with other messages.


+   ok(!-f "$tempdir/tbackup/tblspc1/$tblspc1UnloggedBackupPath",
+   'unlogged imain fork not in tablespace backup');

s/imain/main/


If a new unlogged relation is created after constructed the
unloggedHash before sending file, we cannot exclude such relation. It
would not be problem if the taking backup is not long because the new
unlogged relation unlikely becomes so large. However, if takeing a
backup takes a long time, we could include large main fork in the
backup.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

stricter MCV tests for uniform distributions (was Re: MCV lists for highly skewed distributions)

2018-01-22 Thread John Naylor

(Starting a new thread so as not to distract review)

On 1/21/18, Dean Rasheed  wrote:
> On 21 January 2018 at 07:26, John Naylor  wrote:
>> I spent a few hours hacking on this, and it turns out calculating the
>> right number of MCVs taking into account both uniform and highly
>> non-uniform distributions is too delicate a problem for me to solve
>> right now. The logic suggested by Dean Rasheed in [1] always produces
>> no MCVs for a perfectly uniform distribution (which is good), but very
>> often also for other distributions, which is not good. My efforts to
>> tweak that didn't work, so I didn't get as far as adapting it for the
>> problem Jeff is trying to solve.
>
> Hmm, Tom suggested that the test based on the average frequency over
> all values might be too strict because the estimated number of
> distinct values is often too low, so that might explain what you're
> seeing.

In my test tables, I've noticed that our Ndistinct estimator is most
inaccurate for geometric distributions, so that's certainly possible,
but confusingly, it occasionally gave an empty MCV list along with a
histogram with a boundary duplicated 5 times, which I thought I was
guarding against. I'm thinking my implementation of your logic is
flawed somehow. In case you're curious I've attached my rough
(complier warnings and all) test patch.

> It occurs to me that maybe a better test to exclude a value from the
> MCV list would be to demand that its relative standard error not be
> too high. Such a test, in addition to the existing tests, might be
> sufficient to solve the opposite problem of too many values in the MCV
> list, because the real problem there is including a value after having
> seen relatively few occurrences of it in the sample, and thus having a
> wildly inaccurate estimate for it. Setting a bound on the relative
> standard error would mean that we could have a reasonable degree of
> confidence in estimates produced from the sample.

If you don't mind, what would the math look like for that?

-John Naylor
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 5f21fcb..da21333 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -2318,6 +2318,8 @@ compute_scalar_stats(VacAttrStatsP stats,
 	int			num_mcv = stats->attr->attstattarget;
 	int			num_bins = stats->attr->attstattarget;
 	StdAnalyzeData *mystats = (StdAnalyzeData *) stats->extra_data;
+	double		N,
+n;
 
 	values = (ScalarItem *) palloc(samplerows * sizeof(ScalarItem));
 	tupnoLink = (int *) palloc(samplerows * sizeof(int));
@@ -2525,10 +2527,10 @@ compute_scalar_stats(VacAttrStatsP stats,
 			 */
 			int			f1 = ndistinct - nmultiple + toowide_cnt;
 			int			d = f1 + nmultiple;
-			double		n = samplerows - null_cnt;
-			double		N = totalrows * (1.0 - stats->stanullfrac);
 			double		stadistinct;
 
+			n = samplerows - null_cnt;
+			N = totalrows * (1.0 - stats->stanullfrac);
 			/* N == 0 shouldn't happen, but just in case ... */
 			if (N > 0)
 stadistinct = (n * d) / ((n - f1) + f1 * n / N);
@@ -2558,9 +2560,44 @@ compute_scalar_stats(VacAttrStatsP stats,
 		 * we are able to generate a complete MCV list (all the values in the
 		 * sample will fit, and we think these are all the ones in the table),
 		 * then do so.  Otherwise, store only those values that are
-		 * significantly more common than the (estimated) average. We set the
-		 * threshold rather arbitrarily at 25% more than average, with at
-		 * least 2 instances in the sample.  Also, we won't suppress values
+		 * significantly more common than the (estimated) average.
+		 *
+		 * Note: For this POC patch, the implementation and comments
+		 * were copied from an email from Dean Rasheed, which contains further references:
+		 * https://www.postgresql.org/message-id/CAEZATCVu9zK0N%3Dnd9ufavabbM8YZiyWYJca0oiE8F31GAY%2B_XA%40mail.gmail.com
+		 *
+		 * We calculate the threshold from the table and sample sizes.
+
+		 * The initial rule of thumb is that the value should occur at
+		 * least 10 times in the sample.
+		 *
+		 * Suppose that N is the population size (total number of rows in the
+		 * table), and n is the sample size, and that some particular candidate
+		 * value appears x times in the sample. Then the "sample proportion" is
+		 * given by p = x/n.
+		 *
+		 * It is reasonable to treat p as having a normal distribution, which
+		 * then allows the margin of error to be analysed using standard
+		 * techniques. We calculate the standard error of the sample proportion:
+		 *
+		 * SE = sqrt(p*(1-p)/n) * sqrt((N-n)/(N-1))
+		 *
+		 * The second term is a finite population correction. There is a 95%
+		 * probability that the total population proportion lies in the range
+		 *
+		 * [ pmin = p-2*SE, pmax = p+2*SE ]
+		 *
+		 * If there are Nd distinct values in the table, so that the average
+		 * frequency of occurrence of any particular value is 1/Nd, then the test
+		 *
+		 * pmin > 1/Nd
+		 *
+		 * would imply that

96 matches

Mail list logo