Re: [HACKERS] Oddity in error handling of constraint violation in ExecConstraints for partitioned tables

2017-07-23 Thread Amit Langote
Hi Amit,

On 2017/07/24 14:09, Amit Khandekar wrote:
>>> On 2017/07/10 14:15, Etsuro Fujita wrote:
>>> Another thing I noticed is the error handling in ExecWithCheckOptions; it
>>> doesn't take any care for partition tables, so the column description in
>>> the error message for WCO_VIEW_CHECK is built using the partition table's
>>> rowtype, not the root table's.  But I think it'd be better to build that
>>> using the root table's rowtype, like ExecConstraints, because that would
>>> make the column description easier to understand since the parent view(s)
>>> (from which WithCheckOptions evaluated there are created) would have been
>>> defined on the root table.  This seems independent from the above issue,
>>> so I created a separate patch, which I'm attaching. What do you think
>>> about that?
> 
> + if (map != NULL)
> + {
> + tuple = do_convert_tuple(tuple, map);
> + ExecStoreTuple(tuple, slot, InvalidBuffer, false);
> + }
> 
> Above, the tuple descriptor also needs to be set, since the parent and
> child partitions can have different column ordering.
> 
> Due to this, the following testcase crashes :
> 
> CREATE TABLE range_parted (a text,b int, c int) partition by range (b);
> CREATE VIEW upview AS SELECT * FROM range_parted WHERE (select c >
> 120) WITH CHECK OPTION;
> create table part_a_1_a_10(b int, c int, a text);
> alter table range_parted attach partition part_a_1_a_10 for values
> from (1) to (10);
> insert into upview values ('a', 2, 100);

Good catch.  Thanks for creating the patch.

There are other places with similar code viz. those in ExecConstraints()
that would need fixing.  Without the same, the following causes a crash
(building on your example):

alter table range_parted add constraint check_c check (c > 120);
insert into range_parted values ('a', 2, 100);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Attached is the updated version of your patch.  A test is also added in
insert.sql on lines of the above example.

Will add this to the PG 10 open items list.

Thanks,
Amit
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b22de78516..040e9a916a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -1956,6 +1956,7 @@ ExecConstraints(ResultRelInfo *resultRelInfo,
if (map != NULL)
{
tuple = do_convert_tuple(tuple, 
map);
+   ExecSetSlotDescriptor(slot, 
tupdesc);
ExecStoreTuple(tuple, slot, 
InvalidBuffer, false);
}
}
@@ -2003,6 +2004,7 @@ ExecConstraints(ResultRelInfo *resultRelInfo,
if (map != NULL)
{
tuple = do_convert_tuple(tuple, map);
+   ExecSetSlotDescriptor(slot, tupdesc);
ExecStoreTuple(tuple, slot, 
InvalidBuffer, false);
}
}
@@ -2112,6 +2114,7 @@ ExecWithCheckOptions(WCOKind kind, ResultRelInfo 
*resultRelInfo,
if (map != NULL)
{
tuple = 
do_convert_tuple(tuple, map);
+   
ExecSetSlotDescriptor(slot, tupdesc);
ExecStoreTuple(tuple, 
slot, InvalidBuffer, false);
}
}
diff --git a/src/test/regress/expected/insert.out 
b/src/test/regress/expected/insert.out
index c608ce377f..0eaa47fb2b 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -410,6 +410,14 @@ with ins (a, b, c) as
  mlparted4  | 1 |  30 |  39
 (5 rows)
 
+alter table mlparted add c text;
+create table mlparted5 (c text, a int not null, b int not null);
+alter table mlparted attach partition mlparted5 for values from (1, 40) to (1, 
50);
+alter table mlparted add constraint check_b check (a = 1 and b < 45);
+insert into mlparted values (1, 45, 'bah');
+ERROR:  new row for relation "mlparted5" violates check constraint "check_b"
+DETAIL:  Failing row contains (1, 45, bah).
+drop table mlparted5;
 -- check that message shown after failure to find a partition shows the
 -- appropriate key description (or none) in various situations
 create table key_desc (a int, b int) partition by list ((a+0));
diff --git a/src/test/regress/expected/updatable_views.out 
b/src/test/regress/expected/upd

Re: [HACKERS] pl/perl extension fails on Windows

2017-07-23 Thread Noah Misch
On Wed, Jul 19, 2017 at 05:01:31PM -0400, Tom Lane wrote:
> Ashutosh Sharma  writes:
> > Here are the list of macros and variables from 'intrpvar.h' file that
> > are just defined in perl module but not in plperl on Windows,
> 
> > #ifdef PERL_USES_PL_PIDSTATUS
> > PERLVAR(I, pidstatus,   HV *)   /* pid-to-status mappings for waitpid */
> > #endif
> 
> > #ifdef PERL_SAWAMPERSAND
> > PERLVAR(I, sawampersand, U8)/* must save all match strings */
> > #endif
> 
> I am really suspicious that this means your libperl was built in an unsafe
> fashion, that is, by injecting configuration choices as random -D switches
> in the build process rather than making sure the choices were recorded in
> perl's config.h.  As an example, looking at the perl 5.24.1 headers on
> a Fedora box, it looks to me like PERL_SAWAMPERSAND could only get defined
> if PERL_COPY_ON_WRITE were not defined, and the only way that that can
> happen is if PERL_NO_COW is defined, and there are no references to the
> latter anyplace except in this particular #if defined test in perl.h.
> 
> Where did your perl installation come from, anyway?  Are you sure the .h
> files match up with the executables?

I see corresponding symptoms with the following Perl distributions:

strawberry-perl-5.26.0.1-64bit.msi:
  src/pl/plperl/Util.c: loadable library and perl binaries are mismatched (got 
handshake key 11800080, needed 11c00080)
ActivePerl-5.24.1.2402-MSWin32-x64-401627.exe:
  src/pl/plperl/Util.c: loadable library and perl binaries are mismatched (got 
handshake key 11500080, needed 11900080)

So, this affects each of the two prominent families of Perl Windows binaries.
Notes for anyone trying to reproduce:

- Both of those Perl distributions require the hacks described here:
  
https://www.postgresql.org/message-id/flat/CABcP5fjEjgOsh097cWnQrsK9yCswo4DZxp-V47DKCH-MxY9Gig%40mail.gmail.com
- Add PERL_USE_UNSAFE_INC=1 to the environment until we update things to cope
  with this Perl 5.26 change:
  
http://search.cpan.org/~xsawyerx/perl-5.26.0/pod/perldelta.pod#Removal_of_the_current_directory_(%22.%22)_from_@INC


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in HEAD when too many nested functions call

2017-07-23 Thread Noah Misch
On Tue, Jul 18, 2017 at 01:04:10PM -0700, Andres Freund wrote:
> Ok, I'll flesh out the patch till Thursday.  But I do think we're going
> to have to do something about the back branches, too.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-07-23 Thread Amit Khandekar
On 13 July 2017 at 22:39, Amit Khandekar  wrote:
> Attached is a WIP patch (make_resultrels_ordered.patch) that generates
> the result rels in canonical order. This patch is kept separate from
> the update-partition-key patch, and can be applied on master branch.

Attached update-partition-key_v13.patch now contains this
make_resultrels_ordered.patch changes.

So now that the per-subplan result rels and the leaf partition oids
that are generated for tuple routing are both known to have the same
order (cannonical), in ExecSetupPartitionTupleRouting(), we look for
the per-subplan results without the need for a hash table. Instead of
the hash table, we iterate over the leaf partition oids and at the
same time keep shifting a position over the per-subplan resultrels
whenever the resultrel at the position is found to be present in the
leaf partitions list. Since the two lists are in the same order, we
never have to again scan the portition of the lists that is already
scanned.

I considered whether the issue behind this recent commit might be
relevant for update tuple-routing as well :
commit f81a91db4d1c2032632aa5df9fc14be24f5fe5ec
Author: Robert Haas 
Date:   Mon Jul 17 21:29:45 2017 -0400
Use a real RT index when setting up partition tuple routing.

Since we know that using a dummy 1 value for tuple routing result rels
is not correct, I am checking about another possibility : Now in the
latest patch, the tuple routing partitions would have a mix of a)
existing update result-rels, and b) new partition resultrels. 'b'
resultrels would have the RT index of nominalRelation, but the
existing 'a' resultrels would have their own different RT indexes. I
suspect, this might surface a similar issue that was fixed by the
above commit, for e.g. with the WITH query having UPDATE subqueries
doing tuple routing. Will check that.

This patch also has Robert's changes in the planner to decide whether
to do update tuple routing.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


update-partition-key_v13.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in error handling of constraint violation in ExecConstraints for partitioned tables

2017-07-23 Thread Amit Khandekar
>> On 2017/07/10 14:15, Etsuro Fujita wrote:
>> Another thing I noticed is the error handling in ExecWithCheckOptions; it
>> doesn't take any care for partition tables, so the column description in
>> the error message for WCO_VIEW_CHECK is built using the partition table's
>> rowtype, not the root table's.  But I think it'd be better to build that
>> using the root table's rowtype, like ExecConstraints, because that would
>> make the column description easier to understand since the parent view(s)
>> (from which WithCheckOptions evaluated there are created) would have been
>> defined on the root table.  This seems independent from the above issue,
>> so I created a separate patch, which I'm attaching. What do you think
>> about that?

+ if (map != NULL)
+ {
+ tuple = do_convert_tuple(tuple, map);
+ ExecStoreTuple(tuple, slot, InvalidBuffer, false);
+ }

Above, the tuple descriptor also needs to be set, since the parent and
child partitions can have different column ordering.

Due to this, the following testcase crashes :

CREATE TABLE range_parted (a text,b int, c int) partition by range (b);
CREATE VIEW upview AS SELECT * FROM range_parted WHERE (select c >
120) WITH CHECK OPTION;
create table part_a_1_a_10(b int, c int, a text);
alter table range_parted attach partition part_a_1_a_10 for values
from (1) to (10);
insert into upview values ('a', 2, 100);

Attached is a patch that sets the tuple descriptor.
Also in the patch, in test updatable_view.sql, I have added a varchar
column in one of the partitioned tables used in updatable_views.sql,
so as to cover this scenario. Without setting the tuple descriptor,
the output of the patched updatable_views.sql  shows junk value in one
of the columns of the row in the error message emitted for the
WithCheckOption violation.

Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


set_slot_descriptor.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PgFDW connection invalidation by ALTER SERVER/ALTER USER MAPPING

2017-07-23 Thread Ashutosh Bapat
On Fri, Jul 21, 2017 at 10:39 PM, Tom Lane  wrote:
> Ashutosh Bapat  writes:
>> On Fri, Jul 21, 2017 at 10:55 AM, Kyotaro HORIGUCHI
>>  wrote:
>>> The attached patch differs only in this point.
>
>> +1. The patch looks good to me.
>
> Pushed with a couple additional changes: we'd all missed that the header
> comment for GetConnection was obsoleted by this change, and the arguments
> for GetSysCacheHashValue really need to be coerced to Datum.  (I think
> OID to Datum is the same as what the compiler would do anyway, but best
> not to assume that.)

Thanks and sorry for not noticing the prologue.

>
> Back-patching was more exciting than I could wish.  It seems that
> before 9.6, we did not have struct UserMapping storing the OID of the
> pg_user_mapping row it had been made from.  I changed GetConnection to
> re-look-up that row and get the OID.  But that's ugly, and there's a
> race condition: if user mappings are being added or deleted meanwhile,
> we might locate a per-user mapping when we're really using a PUBLIC
> mapping or vice versa, causing the ConnCacheEntry to be labeled with
> the wrong hash value so that it might not get invalidated properly later.
> Still, it's significantly better than it was, and that corner case seems
> unlikely to get hit in practice --- for one thing, you'd have to then
> revert the mapping addition/deletion before the ConnCacheEntry would be
> found and used again.  I don't want to take the risk of modifying struct
> UserMapping in stable branches, so it's hard to see a way to make that
> completely bulletproof before 9.6.

+1.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-23 Thread Masahiko Sawada
On Sat, Jul 22, 2017 at 8:04 AM, Stephen Frost  wrote:
> Masahiko, all,
>
> * Masahiko Sawada (sawada.m...@gmail.com) wrote:
>> On Tue, Jul 18, 2017 at 1:28 PM, Stephen Frost  wrote:
>> > Masahiko, Michael,
>> >
>> > * Masahiko Sawada (sawada.m...@gmail.com) wrote:
>> >> > This is beginning to shape.
>> >>
>> >> Sorry, I missed lots of typo in the last patch. All comments from you
>> >> are incorporated into the attached latest patch and I've checked it
>> >> whether there is other typos. Please review it.
>> >
>> > I've taken an initial look through the patch and it looks pretty
>> > reasonable.  I need to go over it in more detail and work through
>> > testing it myself next.
>> >
>> > I expect to commit this (with perhaps some minor tweaks primairly around
>> > punctuation/wording), barring any issues found, on Wednesday or Thursday
>> > of this week.
>>
>> I understood. Thank you for looking at this!
>
> I started discussing this with David off-list and he'd like a chance to
> review it in a bit more detail (he's just returning from being gone for
> a few weeks).  That review will be posted to this thread on Monday, and
> I'll wait until then to move forward with the patch.
>
> Next update will be before Tuesday, July 25th, COB.
>

Thank you! I'll start to revise the patch as soon as I got review comments.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables

2017-07-23 Thread Etsuro Fujita

On 2017/07/21 19:16, Etsuro Fujita wrote:

On 2017/07/20 11:21, Etsuro Fujita wrote:

On 2017/07/19 23:36, Tom Lane wrote:

Please put the responsibility of doing const-expression simplification
in these cases somewhere closer to where the problem is being created.


It would be reasonable that it's the FDW's responsibility to do that 
const-simplification if necessary?
There seems to be no objections, so I removed the const-expression 
simplification from the patch and I added the note to the docs for 
AddForeignUpdateTargets.


Attached is an updated version of the patch.


I cleaned up the patch a bit.  PFA a new version of the patch.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 6962,6967  update bar set f2 = f2 + 100 returning *;
--- 6962,7026 
7 | 277
  (6 rows)
  
+ -- Test that UPDATE/DELETE with inherited target works with row-level triggers
+ CREATE TRIGGER trig_row_before
+ BEFORE UPDATE OR DELETE ON bar2
+ FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo');
+ CREATE TRIGGER trig_row_after
+ AFTER UPDATE OR DELETE ON bar2
+ FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo');
+ explain (verbose, costs off)
+ update bar set f2 = f2 + 100;
+   QUERY PLAN  

+ 
--
+  Update on public.bar
+Update on public.bar
+Foreign Update on public.bar2
+  Remote SQL: UPDATE public.loct2 SET f2 = $2 WHERE ctid = $1 RETURNING 
f1, f2, f3
+->  Seq Scan on public.bar
+  Output: bar.f1, (bar.f2 + 100), bar.ctid
+->  Foreign Scan on public.bar2
+  Output: bar2.f1, (bar2.f2 + 100), bar2.f3, bar2.ctid, bar2.*
+  Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 FOR UPDATE
+ (9 rows)
+ 
+ update bar set f2 = f2 + 100;
+ NOTICE:  trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2
+ NOTICE:  OLD: (3,333,33),NEW: (3,433,33)
+ NOTICE:  trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2
+ NOTICE:  OLD: (4,344,44),NEW: (4,444,44)
+ NOTICE:  trig_row_before(23, skidoo) BEFORE ROW UPDATE ON bar2
+ NOTICE:  OLD: (7,277,77),NEW: (7,377,77)
+ NOTICE:  trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2
+ NOTICE:  OLD: (3,333,33),NEW: (3,433,33)
+ NOTICE:  trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2
+ NOTICE:  OLD: (4,344,44),NEW: (4,444,44)
+ NOTICE:  trig_row_after(23, skidoo) AFTER ROW UPDATE ON bar2
+ NOTICE:  OLD: (7,277,77),NEW: (7,377,77)
+ explain (verbose, costs off)
+ delete from bar where f2 < 400;
+  QUERY PLAN   
   
+ 
-
+  Delete on public.bar
+Delete on public.bar
+Foreign Delete on public.bar2
+  Remote SQL: DELETE FROM public.loct2 WHERE ctid = $1 RETURNING f1, f2, f3
+->  Seq Scan on public.bar
+  Output: bar.ctid
+  Filter: (bar.f2 < 400)
+->  Foreign Scan on public.bar2
+  Output: bar2.ctid, bar2.*
+  Remote SQL: SELECT f1, f2, f3, ctid FROM public.loct2 WHERE ((f2 < 
400)) FOR UPDATE
+ (10 rows)
+ 
+ delete from bar where f2 < 400;
+ NOTICE:  trig_row_before(23, skidoo) BEFORE ROW DELETE ON bar2
+ NOTICE:  OLD: (7,377,77)
+ NOTICE:  trig_row_after(23, skidoo) AFTER ROW DELETE ON bar2
+ NOTICE:  OLD: (7,377,77)
+ -- cleanup
+ DROP TRIGGER trig_row_before ON bar2;
+ DROP TRIGGER trig_row_after ON bar2;
  drop table foo cascade;
  NOTICE:  drop cascades to foreign table foo2
  drop table bar cascade;
*** a/contrib/postgres_fdw/sql/postgres_fdw.sql
--- b/contrib/postgres_fdw/sql/postgres_fdw.sql
***
*** 1632,1637  explain (verbose, costs off)
--- 1632,1657 
  update bar set f2 = f2 + 100 returning *;
  update bar set f2 = f2 + 100 returning *;
  
+ -- Test that UPDATE/DELETE with inherited target works with row-level triggers
+ CREATE TRIGGER trig_row_before
+ BEFORE UPDATE OR DELETE ON bar2
+ FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo');
+ 
+ CREATE TRIGGER trig_row_after
+ AFTER UPDATE OR DELETE ON bar2
+ FOR EACH ROW EXECUTE PROCEDURE trigger_data(23,'skidoo');
+ 
+ explain (verbose, costs off)
+ update bar set f2 = f2 + 100;
+ update bar set f2 = f2 + 100;
+ 
+ explain (verbose, costs off)
+ delete from bar where f2 < 400;
+ delete from bar where f2 < 400;
+ 
+ -- cleanup
+ DROP TRIGGER trig_row_before ON bar2;
+ DROP TRIGGER trig_row_after ON bar2;
  drop table foo cascade;
  drop table bar cascade;
  drop table loct1;
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***
*** 428,438  AddForeignUpdateTargets (Query *parsetree,
   Avoid using names matching ctidN,
   wholerow, or
   wholerowN, as the core system can
!  generate junk columns of these names.
  
  
 

Re: [HACKERS] [WIP] [B-Tree] Keep indexes sorted by heap physical location

2017-07-23 Thread Claudio Freire
On Fri, Jul 21, 2017 at 10:31 PM, Peter Geoghegan  wrote:
> On Wed, Aug 17, 2016 at 7:54 PM, Claudio Freire  
> wrote:
>> The attached patch tries to maintain the initial status of B-Tree
>> indexes, which are created with equal-key runs in physical order,
>> during the whole life of the B-Tree, and make key-tid pairs
>> efficiently searchable in the process.
>
> I don't see an entry for this in the next CF. Do you have a plan for it?
>
> BTW, I did post some remarks on this patch on another thread recently
> [1]. Not sure if any of what I said there is news to you at this
> point.
>
> [1] 
> postgr.es/m/CAH2-Wzn=6Lc3OVA88x=E6SKG72ojNUE6ut6RZAqNnQx-YLcw=q...@mail.gmail.com
> --
> Peter Geoghegan

I plan to restart work on it soonishly, but ATM most of my time is
dedicated to the vacuum patch, which is almost done.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buildfarm failure and dubious coding in predicate.c

2017-07-23 Thread Thomas Munro
On Sun, Jul 23, 2017 at 8:32 AM, Tom Lane  wrote:
> Meanwhile, it's still pretty unclear what happened yesterday on
> culicidae.

That failure is indeed baffling.  The only code that inserts
(HASH_ENTER[_NULL]) into PredicateLockTargetHash:

1.  CreatePredicateLock().  I would be a bug if that ever tried to
insert a { 0, 0, 0, 0 } tag, and in any case it holds
SerializablePredicateLockListLock in LW_SHARED.

2.  TransferPredicateLocksToNewTarget(), which removes and restores
the scratch entry and also explicitly inserts a transferred entry.  It
asserts that it holds SerializablePredicateLockListLock and is called
only by PredicateLockPageSplit() which acquires it in LW_EXCLUSIVE.

3.  DropAllPredicateLocksFromTable(), which removes and restores the
scratch entry and also explicitly inserts a transferred entry.
Acquires SerializablePredicateLockListLock in LW_EXCLUSIVE.

I wondered if DropAllPredicateLocksFromTable() had itself inserted a
tag that accidentally looks like the scratch tag in between removing
and restoring, perhaps because the relation passed in had a bogus 0 DB
OID etc, but it constructs a tag with
SET_PREDICATELOCKTARGETTAG_RELATION(heaptargettag, dbId, heapId) which
sets locktag_field3 to InvalidBlockNumber == -1, not 0 so that can't
explain it.

I wondered if a concurrent PredicateLockPageSplit() called
TransferPredicateLocksToNewTarget() using a newtargettag built from a
Relation that somehow had a bogus relation with DB OID 0, rel OID 0
and newblkno 0, but that doesn't help because
SerializablePredicateLockListLock is acquired at LW_EXCLUSIVE so it
can't run concurrently.

It looks a bit like something at a lower level needs to be broken (GCC
6.3 released 6 months ago, maybe interacts badly with some clever
memory model-dependent code of ours?) or something needs to be
trashing memory.

Here's the set of tests that ran concurrently with select_into, whose
backtrace we see ("DROP SCHEMA selinto_schema CASCADE;"):

parallel group (20 tests):  select_distinct_on delete select_having
random btree_index select_distinct namespace update case hash_index
select_implicit subselect select_into arrays prepared_xacts
transactions portals aggregates join union

Of those I see that prepared_xacts, portals and transactions
explicitly use SERIALIZABLE (which may or may not be important).  I
wonder if the thing to do here is to run selinto (or maybe just its
setup and tear-down, "DROP SCHEMA ...") concurrently with those others
in tight loops and burn some CPU.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise

2017-07-23 Thread Joshua D. Drake

Hello,

I changed the test to run for 6 hours at a time regardless of number of 
transactions. I also changed the du command to only look at the database 
(previously wal logs were included). This is the clearest indication of 
the problem I have been able to produce.


Again, this is with 128 clients and 500 warehouses. The first test is a 
clean test, everything dropped, vacuumed etc... Each subsequent test is 
just starting the test again to have breakpoints.



-+---
 autovacuum  | on
 autovacuum_analyze_scale_factor | 0.1
 autovacuum_analyze_threshold| 50
 autovacuum_freeze_max_age   | 2
 autovacuum_max_workers  | 12
 autovacuum_multixact_freeze_max_age | 4
 autovacuum_naptime  | 10
 autovacuum_vacuum_cost_delay| 0
 autovacuum_vacuum_cost_limit| 5000
 autovacuum_vacuum_scale_factor  | 0.1
 autovacuum_vacuum_threshold | 50
 autovacuum_work_mem | -1
 log_autovacuum_min_duration | -1
 max_wal_size| 640
 checkpoint_timeout  | 86400
 checkpoint_completion_target| 0.5

Starting base metric
50G /srv/main/base

Test 1:
90G /srv/main/base
TPS: 838

Test 2:
121G/srv/main/base
TPS: 725

Test 3:
146G/srv/main/base
TPS: 642

Test 4:
171G/srv/main/base
TPS: 549

Test 5:
189G/srv/main/base
TPS: 489

Test 6:
208G/srv/main/base
TPS: 454

As you can see even with aggressive vacuuming, over a period of 36 hours 
life gets increasingly miserable.


The largest table is:

postgres=# select 
pg_size_pretty(pg_total_relation_size('bmsql_order_line'));

 pg_size_pretty

 148 GB
(1 row)

postgres=# \d bmsql_order_line
 Table "public.bmsql_order_line"
 Column |Type | Modifiers
+-+---
 ol_w_id| integer | not null
 ol_d_id| integer | not null
 ol_o_id| integer | not null
 ol_number  | integer | not null
 ol_i_id| integer | not null
 ol_delivery_d  | timestamp without time zone |
 ol_amount  | numeric(6,2)|
 ol_supply_w_id | integer |
 ol_quantity| integer |
 ol_dist_info   | character(24)   |
Indexes:
"bmsql_order_line_pkey" PRIMARY KEY, btree (ol_w_id, ol_d_id, 
ol_o_id, ol_number)

Foreign-key constraints:
"ol_order_fkey" FOREIGN KEY (ol_w_id, ol_d_id, ol_o_id) REFERENCES 
bmsql_oorder(o_w_id, o_d_id, o_id)
"ol_stock_fkey" FOREIGN KEY (ol_supply_w_id, ol_i_id) REFERENCES 
bmsql_stock(s_w_id, s_i_id)


With the PK being

postgres=# select pg_size_pretty(pg_relation_size('bmsql_order_line_pkey'));
 pg_size_pretty

 48 GB
(1 row)

I tried to see how much data we are dealing with here:

postgres=# select count(*) from bmsql_order_line;
   count
---
 910324839
(1 row)

Time: 503965.767 ms

And just to show that we were pushing to get these numbers:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   2.380.002.201.980.00   93.44

Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
sdb2027.40   239.99 0.05   1199  0
sda   0.80 0.00 0.01  0  0



So we have 910M rows, and it took 8.39941667 minutes to count them at 
240MB/s.


I know this is a lot of data and as I said previously, happy to let 
anyone look at it. However, we clearly have something deeper to look into.


Thanks in advance,

JD


--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve perfomance for index search ANY(ARRAY[]) condition with single item

2017-07-23 Thread Tom Lane
I wrote:
> Dima Pavlov  writes:
>> That's my first patch so I will be grateful for constructive criticism.

> I think it would be better to do this in the planner, see specifically
> eval_const_expressions.

BTW, currently eval_const_expressions doesn't know anything about
ScalarArrayOpExpr, but there's a patch pending to improve that:

https://commitfest.postgresql.org/14/1136/

You should probably build your revised patch as a follow-on to that
one, else we're going to have merge conflicts.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improve perfomance for index search ANY(ARRAY[]) condition with single item

2017-07-23 Thread Tom Lane
Dima Pavlov  writes:
> The problem was discussed on stackoverflow:
> https://stackoverflow.com/questions/45061966/index-usage-with-single-item-anyarray

Usually, we're not very happy with submissions that reference external
pages for justification; we like to have all the relevant material in our
mail archives.

> That's my first patch so I will be grateful for constructive criticism.

I think it would be better to do this in the planner, see specifically
eval_const_expressions.  That would allow the transformation to succeed
in more cases, like where the array is coming from a parameter rather
than an ARRAY[] construct.  That is, you should be able to do this not
only for ARRAY[x] but for any single-element constant array.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testlib.pm vs msys

2017-07-23 Thread Tom Lane
Andrew Dunstan  writes:
> It turns out I was wrong about the problem jacana has been having with
> the pg_ctl tests hanging. The problem was not the use of select as a
> timeout mechanism, although I think the change to using
> Time::Hires::usleep() is correct and shouldn't be reverted.

> The problem is command_like's use of redirection to strings. Why this
> should be a problem for this particular use is a matter of speculation.
> I suspect it's to do with the fact that in this instance pg_ctl is
> leaving behind some child processes (i.e. postmaster and children) after
> it exits, and so on this particular path IPC::Run isn't detecting the
> exit properly. The workaround I have found to work is to redirect
> command_like's output instead to a couple of files and then slurp in
> those files and delete them. A bit hacky, I know, so I'm open to other
> suggestions.

Yeah, I'd been eyeing that behavior of IPC::Run a month or so back,
though from the opposite direction.  If you are reading either stdout
or stderr of the executed command into Perl, then it detects command
completion by waiting till it gets EOF on those stream(s).  If you
are reading neither, then it goes into this wonky backoff behavior
where it sleeps a bit and then checks waitpid(WNOHANG), with the
value of "a bit" continually increasing until it reaches a fairly
large value, half a second or a second (I forget).  So you have
potentially some sizable fraction of a second that's just wasted after
command termination.  I'd been able to make a small but noticeable
improvement in the runtime of some of our TAP test suites by forcing
the first behavior, ie reading stdout even if we were going to throw
it away.

So I'm not really that excited about going in the other direction ;-).
It shouldn't matter much time-wise for short-lived commands, but it's
disturbing if the EOF technique fails entirely for some cases.

I looked at jacana's two recent pg_ctlCheck failures, and they both
seem to have failed on this:

command_like([ 'pg_ctl', 'start', '-D', "$tempdir/data",
'-l', "$TestLib::log_path/001_start_stop_server.log" ],
qr/done.*server started/s, 'pg_ctl start');

That is redirecting the postmaster's stdout/stderr into a file,
for sure, so the child processes shouldn't impact EOF detection AFAICS.
It's also hard to explain this way why it only fails some of the time.

I think we need to look at what the recent changes were in this area
and try to form a better theory of why it's started to fail here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Improve perfomance for index search ANY(ARRAY[]) condition with single item

2017-07-23 Thread Dima Pavlov
Hello,

The problems I tried to solve here:
1. Improve perfomance for index search ANY(ARRAY[...]) condition with
single item
2. I saw tons of users code like: if len(array) == 1: sql +=
'{}'.format(array[0]) else: sql += 'ANY(ARRAY[{}])'.format(array)
So there will be less lines of code and it will be clearer.
3. Confusing moment that "IN" works well with single item, and
"ANY(ARRAY[])" doesn't without any real reason.


The problem was discussed on stackoverflow:
https://stackoverflow.com/questions/45061966/index-usage-with-single-item-anyarray

That's my first patch so I will be grateful for constructive criticism.
---

CREATE TABLE public.t (id serial, a integer, b integer);

INSERT INTO t(a, b)
SELECT round(random()*1000), round(random()*1000)
FROM generate_series(1, 100);

CREATE INDEX "i_1" ON public.t USING btree (a, b);
CREATE INDEX "i_2" ON public.t USING btree (b);

---

If "a = 50" in the first query, everything is ok, appropriate index "i_1"
is used:

SELECT * FROM t WHERE a = 50 ORDER BY b LIMIT 1

"Limit  (cost=0.42..4.03 rows=1 width=12) (actual time=0.085..0.085 rows=1
loops=1)"
"  Buffers: shared hit=1 read=3"
"  ->  Index Scan using i_1 on t  (cost=0.42..4683.12 rows=1300 width=12)
(actual time=0.084..0.084 rows=1 loops=1)"
"Index Cond: (a = 50)"
"Buffers: shared hit=1 read=3"
"Planning time: 0.637 ms"
"Execution time: 0.114 ms"

---

With "a IN (50)" result is the same:

SELECT * FROM t WHERE a IN (50) ORDER BY b LIMIT 1

"Limit  (cost=0.42..4.03 rows=1 width=12) (actual time=0.058..0.058 rows=1
loops=1)"
"  Buffers: shared hit=4"
"  ->  Index Scan using i_1 on t  (cost=0.42..4683.12 rows=1300 width=12)
(actual time=0.056..0.056 rows=1 loops=1)"
"Index Cond: (a = 50)"
"Buffers: shared hit=4"
"Planning time: 0.287 ms"
"Execution time: 0.105 ms"

---

The problem is when I try to use "a = ANY(ARRAY[50])". Wrong index "i_2" is
used instead of "i_1" and execution time becomes x25 longer:

SELECT * FROM t WHERE a = ANY(ARRAY[50]) ORDER BY b LIMIT 1

"Limit  (cost=0.42..38.00 rows=1 width=12) (actual time=2.591..2.591 rows=1
loops=1)"
"  Buffers: shared hit=491 read=4"
"  ->  Index Scan using i_2 on t  (cost=0.42..48853.65 rows=1300 width=12)
(actual time=2.588..2.588 rows=1 loops=1)"
"Filter: (a = ANY ('{50}'::integer[]))"
"Rows Removed by Filter: 520"
"Buffers: shared hit=491 read=4"
"Planning time: 0.251 ms"
"Execution time: 2.627 ms"


improve-single-item-array.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [GSOC][weekly report 7] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-07-23 Thread Mengxing Liu
In the last week, I focus on
1) Creating an independent skip list data structure and related interfaces.
Now it has only two levels so that I don't have to modify too much existing 
code.  But it is easy to be transformed into the data structure with any number 
of levels if necessary. Unfortunately, its performance is not good. In some 
cases, it's only 1/2 of original version. It reminded me that even 
conflict-tracking didn't consume too much CPU time, it was on the critical path 
and wrapped by a pair of lock acquiring and releasing. Slower conflicts 
tracking would result in more lock contentions, which would make the 
performance drop quickly. 
2) Using some tricks to improve its performance. 
For example, I found if the length of the conflict list is smaller than 10, the 
original linked list is faster 
than the skip list. So I used it in a hybrid way: if the total conflicts are 
more than 10, using skip list; otherwise using linked list. 
Now, the performance is approximately equal to the original version in 
different benchmarks. 
But I don't found a case in which the new version is much faster. 


The patch is attached. 


So far, I have tried: 1) using hash table for conflict tracking.
2) reducing the global lock contention 
3) using skip list for conflict tracking.
But all of them did not improve the performance. So I'm a little confused now 
about what to do next. 
Could you please give me any suggestions? 


--

Sincerely


Mengxing Liu








skip-list-for-conflict-tracking.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Testlib.pm vs msys

2017-07-23 Thread Andrew Dunstan

It turns out I was wrong about the problem jacana has been having with
the pg_ctl tests hanging. The problem was not the use of select as a
timeout mechanism, although I think the change to using
Time::Hires::usleep() is correct and shouldn't be reverted.

The problem is command_like's use of redirection to strings. Why this
should be a problem for this particular use is a matter of speculation.
I suspect it's to do with the fact that in this instance pg_ctl is
leaving behind some child processes (i.e. postmaster and children) after
it exits, and so on this particular path IPC::Run isn't detecting the
exit properly. The workaround I have found to work is to redirect
command_like's output instead to a couple of files and then slurp in
those files and delete them. A bit hacky, I know, so I'm open to other
suggestions.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers