Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-06-26 Thread Amit Langote
On 2018/06/26 18:02, Ashutosh Bapat wrote:
> On Tue, Jun 26, 2018 at 2:27 PM, Amit Langote
>  wrote:
>> Hi Ashutosh,
>>
>> On 2018/05/14 20:14, Ashutosh Bapat wrote:
>>> 0001-Hash-partition-bound-equality-refactoring.patch
>>> 0002-Targetlist-of-a-child-join-is-produced-by-translatin.patch
>>> 0003-Partition-wise-join-for-1-1-1-0-0-1-partition-matchi.patch
>>> 0004-Add-a-debug-message-to-notify-whether-partition-wise.patch
>>> 0005-Tests-for-0-1-1-1-and-1-0-partition-matching.patch
>>> 0006-Extra-extensive-tests-for-advanced-partition-matchin.patch
>>
>> I noticed after *cleanly* applying 0001-0004 to today's HEAD that while
>> 0005's test all pass, there are many failures in 0006's tests.  Maybe, you
>> need to adjust something in one of the patches or adjust test outputs.
> 
> If the failures are because of plan changes, it's expected. If those
> are because of crashes or changed output, those need to be fixed. I
> have kept that patch to notice any crashes or output changes, in which
> case, I pull that test into 0005 test set after fixing the code. Once
> we are near commit, I will remove that patch from the patchset.

Ah, okay.  I thought of reporting this because I felt the errors may have
to do with changes to the related code in HEAD between May 14 when you
last posted the patches and today that you may need to account for in you
patches.  For instance, there are many diffs like this:

***
*** 90,132 
  -- left outer join, with whole-row reference
  EXPLAIN (COSTS OFF)
  SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b =
0 ORDER BY t1.a, t1.b, t1.c, t2.a, t2.b, t2.c;
!QUERY PLAN
! 
   Sort
 Sort Key: t1.a, t1.c, t2.a, t2.b, t2.c
!->  Result
!  ->  Append
!->  Hash Right Join
!  Hash Cond: (t2.b = t1.a)
!  ->  Seq Scan on prt2_p0 t2
!  ->  Hash
!->  Seq Scan on prt1_p0 t1


--- 90,131 
  -- left outer join, with whole-row reference
  EXPLAIN (COSTS OFF)
  SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b =
0 ORDER BY t1.a, t1.b, t1.c, t2.a, t2.b, t2.c;
! QUERY PLAN
! --
   Sort
 Sort Key: t1.a, t1.c, t2.a, t2.b, t2.c
!->  Append
!  ->  Hash Right Join
!Hash Cond: (t2.b = t1.a)
!->  Seq Scan on prt2_p0 t2
!->  Hash
!  ->  Seq Scan on prt1_p0 t1
!Filter: (b = 0)

Looks like the Result node on top of Append is no longer there after
applying your patch.

Thanks,
Amit




Re: Portability concerns over pq_sendbyte?

2018-06-26 Thread Andres Freund
Hi,

On 2018-06-14 13:25:30 -0700, Andres Freund wrote:
> On 2018-06-14 16:17:28 -0400, Tom Lane wrote:
> > Alvaro Herrera  writes:
> > > How about not renaming the functions, but just change argument types?
> 
> Yea, I'm in favor of this. I don't think the 'u' in there would benefit
> us, and the cast from signed to unsigned is well defined, so it's safe
> to call the functions with signed input.

Nobody argued against, thus I've pushed a patch doing so.

Looking at surrounding code I found a few more oddities, but of older
vintage:
- pq_sendfloat4 uses an uint32 in the union, but float8 uses a int64.
- same with pq_getmsgfloat[48]
- pq_getmsgint64 returns a int64, should probably also be uint64

Given they're practially harmless I'm inclined to only fix them in
master?

Greetings,

Andres Freund



Re: postgresql_fdw doesn't handle defaults correctly

2018-06-26 Thread Amit Langote
On 2018/06/27 15:33, Pavel Stehule wrote:
>>> Unfortunately, when I use identity column
>>>
>>> create table foo(a int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, b
>> date
>>> default current_date, c int);
>>>
>>> then import doesn't fail, but still it doesn't work
>>
>> It seems that, unlike DEFAULT, the information about IDENTITY is not
>> stored in pg_attrdef catalog.  It's rather stored in
>> pg_attribute.attidentity.  Looking at postgres_fdw's IMPORT FOREIGN SCHEMA
>> implementation, while it fetches the DEFAULT expression from pg_attrdef
>> when asked, it seems that it does not fetch the value of attidentity.
>>
>> Not sure if we should consider that a bug or simply an unsupported case
>> like a DEFAULT referring to a sequence.  In any case, if it's an
>> unsupported case, we should perhaps error out in a more user-friendly
>> manner.
>>
> 
> I don't understand, why is necessary to replace missing values by NULLs?
> 
> I didn't expect so insert into foo(c) values(10)
> 
> will be translated to
> 
> insert into foo(a,b,c) values(NULL, NULL, 10)

That's what we do if there is no default value to fill in if the INSERT
command didn't specify the value.  In this case, even if the table on the
remote side may be define with column as IDENTITY, the IMPORT FOREIGN
SCHEMA command does not fetch that information and creates the foreign
table locally without any default value.  So, that's a missing piece of
functionality in postgres_fdw's implementation of the command.

As a workaround for that missing functionality, one can always create the
foreign table by hand and specify DEFAULT and IDENTITY explicitly as
necessary.

Thanks,
Amit




Re: Constraint documentation

2018-06-26 Thread Peter Eisentraut
On 6/26/18 09:49, Lætitia Avrot wrote:
> +   
> +
> + Check constraints are not designed to enforce business rules across 
> tables.
> + Avoid using check constraints with a function accessing other tables and
> + use  instead. Although PostgreSQL won't 
> prevent you
> + from doing so, beware that dumps generated by 
> pg_dump
> + or pg_dumpall may be hard
> + to restore because of such checks, as the underlying dependencies are 
> not
> + taken into account.
> +
> +   

In a way, I think this is attacking the wrong problem.  It is saying
that you should use triggers instead of check constraints in certain
circumstances.  But triggers are also used as an implementation detail
of constraints.  While it currently doesn't exist, a deferrable check
constraint would probably be implemented as a trigger.  It's not the
triggerness that fixes this problem.  The problem is more generally that
if a function uses a table, then pg_dump can't know about the ordering.
It happens to work for triggers because triggers are dumped after all
tables, as a performance optimization, and we could very well dump check
constraints differently as well.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [WIP] [B-Tree] Retail IndexTuple deletion

2018-06-26 Thread Andrey V. Lepikhov



On 23.06.2018 00:43, Peter Geoghegan wrote:

On Fri, Jun 22, 2018 at 4:24 AM, Andrey V. Lepikhov
 wrote:

According to your feedback, i develop second version of the patch.
In this version:
1. High-level functions index_beginscan(), index_rescan() not used. Tree
descent made by _bt_search(). _bt_binsrch() used for positioning on the
page.
2. TID list introduced in amtargetdelete() interface. Now only one tree
descent needed for deletion all tid's from the list with equal scan key
value - logical duplicates deletion problem.
3. Only one WAL record for index tuple deletion per leaf page per
amtargetdelete() call.


Cool.

What is this "race" code about?


+   buffer = ReadBufferExtended(rel, MAIN_FORKNUM, 
ItemPointerGetBlockNumber(tid), RBM_NORMAL, NULL);
+   LockBuffer(buffer, BUFFER_LOCK_SHARE);
+
+   page = (Page) BufferGetPage(buffer);
+   offnum = ItemPointerGetOffsetNumber(tid);
+   lp = PageGetItemId(page, offnum);
+
+   /*
+* VACUUM Races: someone already remove the tuple from HEAP. Ignore it.
+*/
+   if (!ItemIdIsUsed(lp))
+   return NULL;


It looks wrong -- why should another process have set the item as
unused? And if we assume that that's possible at all, what's to stop a
third process from actually reusing the item pointer before we arrive
(at get_tuple_by_tid()), leaving us to find a tuple that is totally
unrelated to the original tuple to be deleted?

(Also, you're not releasing the buffer lock before you return.)


4. VACUUM can sort TID list preliminary for more quick search of duplicates.


This version of the patch prevents my own "force unique keys" patch
from working, since you're not using my proposed new
_bt_search()/_bt_binsrch()/_bt_compare() interface (you're not passing
them a heap TID). It is essential that your patch be able to quickly
reach any tuple that it needs to kill. Otherwise, the worst case
performance is totally unacceptable; it will never be okay to go
through 10%+ of the index to kill a single tuple hundreds or even
thousands of times per VACUUM. It seems to me that doing this
tid_list_search() binary search is pointless -- you should instead be
relying on logical duplicates using their heap TID as a tie-breaker.
Rather than doing a binary search within tid_list_search(), you should
instead match the presorted heap TIDs at the leaf level against the
sorted in-memory TID list. You know, a bit like a merge join.


I agree with you. Binary search was developed in awaiting your patch.


I suggest that you go even further than this: I think that you should
just start distributing my patch as part of your patch series. You can
change my code if you need to. 


Good. I am ready to start distributing your patch. At the beginning of 
the work I planned to make patch for physical TID ordering in the btree 
index. Your patch will make it much easier.


I also suggest using "git format patch"

with simple, short commit messages to produce patches. This makes it a
lot easier to track the version of the patch, changes over time, etc.


Ok


I understand why you'd hesitate to take ownership of my code (it has
big problems!), but the reality is that all the problems that my patch
has are also problems for your patch. One patch cannot get committed
without the other, so they are already the same project. As a bonus,
my patch will probably improve the best case performance for your
patch, since multi-deletions will now have much better locality of
access.


I still believe that the patch for physical TID ordering in btree:
1) has its own value, not only for target deletion,
2) will require only a few local changes in my code,
and this patches can be developed independently.

I prepare third version of the patches. Summary:
1. Mask DEAD tuples at a page during consistency checking (See comments 
for the mask_dead_tuples() function).

2. Still not using physical TID ordering.
3. Index cleanup() after each quick_vacuum_index() call was excluded.

--
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company
>From b3f8ddf54c5bd68b96db8cda4f5f382b86a4568e Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 27 Jun 2018 09:57:50 +0500
Subject: [PATCH 1/2] Retail IndexTuple Deletion patch v.3

---
 contrib/bloom/blutils.c  |   1 +
 src/backend/access/brin/brin.c   |   1 +
 src/backend/access/gin/ginutil.c |   1 +
 src/backend/access/gist/gist.c   |   1 +
 src/backend/access/hash/hash.c   |   1 +
 src/backend/access/index/indexam.c   |  15 
 src/backend/access/nbtree/nbtree.c   | 170 +++
 src/backend/access/spgist/spgutils.c |   1 +
 src/include/access/amapi.h   |   6 ++
 src/include/access/genam.h   |  19 
 src/include/access/nbtree.h  |   4 +
 11 files changed, 220 insertions(+)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 6b2b9e3..96f1d47 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -126,6 +

Re: postgresql_fdw doesn't handle defaults correctly

2018-06-26 Thread Pavel Stehule
2018-06-27 8:28 GMT+02:00 Amit Langote :

> On 2018/06/27 2:47, Pavel Stehule wrote:
> > 2018-06-25 4:30 GMT+02:00 Amit Langote :
> >> It seems you missed using OPTIONS (import_default 'true') here.
> >>
> >> create schema foo;
> >> create table foo.foo (a serial primary key, b date default current_date
> >> not null, c int);
> >>
> >> import foreign schema foo from server loopback into public options
> >> (import_default 'true');
> >>
> >> insert into public.foo (c) values (1);
> >> select * from public.foo;
> >>  a | b  | c
> >> ---++---
> >>  1 | 2018-06-25 | 1
> >> (1 row)
> >>
> >> insert into foo.foo (c) values (2);
> >
> > This insert doesn't use foreign table. So it is different case.
>
> The first one (insert into public.foo ...) does, but...
>
> > select * from public.foo;
> >>  a | b  | c
> >> ---++---
> >>  1 | 2018-06-25 | 1
> >>  2 | 2018-06-25 | 2
> >> (2 rows)
> >>
> > It looks like more different than I expected.
> >
> > create database t1;
> > \c t1
> > create table foo(a serial primary key, b date default current_date, c
> int);
> > insert into foo(c) values(10),(20);
> > select * from foo;
> >
> > t1=# select * from foo;
> > +---+++
> > | a | b  | c  |
> > +---+++
> > | 1 | 2018-06-26 | 10 |
> > | 2 | 2018-06-26 | 20 |
> > +---+++
> > (2 rows)
> >
> > \c postgres
> > create server t1 foreign data wrapper postgres_fdw options (dbname 't1');
> > create user mapping for pavel server t1;
> >
> > postgres=# import foreign schema public from server t1 into public
> options
> > (import_default 'true');
> > ERROR:  relation "public.foo_a_seq" does not exist
> > CONTEXT:  importing foreign table "foo"
> >
> > So it fails as probably expected - we doesn't support foreign sequences
> -
> > so we cannot to import schema with table with sequence with option
> > import_default = true;
> >
> > Looks like unsupported case - is not possible to insert to table with
> > serial column;
>
> Hmm, yes.  In the example in my previous reply, I used the same database,
> so foo_a_seq would exist when importing foo.  I now tried with the foreign
> server pointing to a different database, and can see the problem.
>
> So, that's indeed an unsupported case.
>
> > Unfortunately, when I use identity column
> >
> > create table foo(a int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, b
> date
> > default current_date, c int);
> >
> > then import doesn't fail, but still it doesn't work
>
> It seems that, unlike DEFAULT, the information about IDENTITY is not
> stored in pg_attrdef catalog.  It's rather stored in
> pg_attribute.attidentity.  Looking at postgres_fdw's IMPORT FOREIGN SCHEMA
> implementation, while it fetches the DEFAULT expression from pg_attrdef
> when asked, it seems that it does not fetch the value of attidentity.
>
> Not sure if we should consider that a bug or simply an unsupported case
> like a DEFAULT referring to a sequence.  In any case, if it's an
> unsupported case, we should perhaps error out in a more user-friendly
> manner.
>

I don't understand, why is necessary to replace missing values by NULLs?

I didn't expect so insert into foo(c) values(10)

will be translated to

insert into foo(a,b,c) values(NULL, NULL, 10)

why? For situation, when target is a SQL database, it is contraproductive.

Regards

Pavel


> Thanks,
> Amit
>
>


libpq example doesn't work

2018-06-26 Thread Ideriha, Takeshi
Hi, 

When I tried to use libpq, I found that libpq example[1] does not work.
That's because SELECT pg_catlog.set_config() never returns PGRES_COMMAND_OK 
which is expected,
but actually returns PGRES_TUPLES_OK.

Patch attached. 
I changed both src/test/example and document.

[1] https://www.postgresql.org/docs/devel/static/libpq-example.html 

(By the way, I'm wondering why these examples are placed under src/test.
 They are just examples and not regression tests.)

Best regards,
Takeshi Ideriha



0001-libpq-example-fix.patch
Description: 0001-libpq-example-fix.patch


Re: postgresql_fdw doesn't handle defaults correctly

2018-06-26 Thread Amit Langote
On 2018/06/27 2:47, Pavel Stehule wrote:
> 2018-06-25 4:30 GMT+02:00 Amit Langote :
>> It seems you missed using OPTIONS (import_default 'true') here.
>>
>> create schema foo;
>> create table foo.foo (a serial primary key, b date default current_date
>> not null, c int);
>>
>> import foreign schema foo from server loopback into public options
>> (import_default 'true');
>>
>> insert into public.foo (c) values (1);
>> select * from public.foo;
>>  a | b  | c
>> ---++---
>>  1 | 2018-06-25 | 1
>> (1 row)
>>
>> insert into foo.foo (c) values (2);
> 
> This insert doesn't use foreign table. So it is different case.

The first one (insert into public.foo ...) does, but...

> select * from public.foo;
>>  a | b  | c
>> ---++---
>>  1 | 2018-06-25 | 1
>>  2 | 2018-06-25 | 2
>> (2 rows)
>>
> It looks like more different than I expected.
> 
> create database t1;
> \c t1
> create table foo(a serial primary key, b date default current_date, c int);
> insert into foo(c) values(10),(20);
> select * from foo;
> 
> t1=# select * from foo;
> +---+++
> | a | b  | c  |
> +---+++
> | 1 | 2018-06-26 | 10 |
> | 2 | 2018-06-26 | 20 |
> +---+++
> (2 rows)
> 
> \c postgres
> create server t1 foreign data wrapper postgres_fdw options (dbname 't1');
> create user mapping for pavel server t1;
> 
> postgres=# import foreign schema public from server t1 into public options
> (import_default 'true');
> ERROR:  relation "public.foo_a_seq" does not exist
> CONTEXT:  importing foreign table "foo"
> 
> So it fails as probably expected - we doesn't support foreign sequences  -
> so we cannot to import schema with table with sequence with option
> import_default = true;
> 
> Looks like unsupported case - is not possible to insert to table with
> serial column;

Hmm, yes.  In the example in my previous reply, I used the same database,
so foo_a_seq would exist when importing foo.  I now tried with the foreign
server pointing to a different database, and can see the problem.

So, that's indeed an unsupported case.

> Unfortunately, when I use identity column
> 
> create table foo(a int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, b date
> default current_date, c int);
> 
> then import doesn't fail, but still it doesn't work

It seems that, unlike DEFAULT, the information about IDENTITY is not
stored in pg_attrdef catalog.  It's rather stored in
pg_attribute.attidentity.  Looking at postgres_fdw's IMPORT FOREIGN SCHEMA
implementation, while it fetches the DEFAULT expression from pg_attrdef
when asked, it seems that it does not fetch the value of attidentity.

Not sure if we should consider that a bug or simply an unsupported case
like a DEFAULT referring to a sequence.  In any case, if it's an
unsupported case, we should perhaps error out in a more user-friendly manner.

Thanks,
Amit




Re: ssl_library parameter

2018-06-26 Thread Peter Eisentraut
On 6/26/18 17:48, Tom Lane wrote:
> (1) I'm not really clear why we need this.  GUC variables aren't free.
> 
> (2) Are there security issues with exposing this info to everybody?

This functionality was requested in the threads about GnuTLS and other
SSL implementations so that users/admins can determine which SSL
settings are applicable.

I'm not sure about the security impact.  We do expose all the other
ssl_* settings to ordinary users, so if users want to see whether the
server is misconfigured or something like that, they can already do
that.  I think in the context of an SSL connection, the server is not
supposed to be the adversary of the client, so if the server can provide
more information about what it's doing to protect the client's
information, then the better.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Typo in llvm_function_reference

2018-06-26 Thread Andres Freund
Hi,

On 2018-06-27 10:46:35 +0530, Rushabh Lathia wrote:
> There is multiple return statement in llvm_function_reference
> and that's definitely looks typo. Here is the patch to fix the
> same.

> diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
> index daae964..5d0cdab 100644
> --- a/src/backend/jit/llvm/llvmjit.c
> +++ b/src/backend/jit/llvm/llvmjit.c
> @@ -394,7 +394,6 @@ llvm_function_reference(LLVMJitContext *context,
>   LLVMSetGlobalConstant(v_fn, true);
>  
>   return LLVMBuildLoad(builder, v_fn, "");
> - return v_fn;
>   }
>  
>   /* check if function already has been added */

Thanks! Pushed.

Greetings,

Andres Freund



commitfest app moving patch error

2018-06-26 Thread Peter Eisentraut
I tried to move a patch from 2018-07 to 2018-09 and got this error:

"Cannot move patch to the same commitfest, and multiple future
commitfests exist!"

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] GnuTLS support

2018-06-26 Thread Peter Eisentraut
On 3/8/18 20:13, Peter Eisentraut wrote:
> In the thread about Secure Transport we agreed to move the consideration
> of new SSL libraries to PG12.
> 
> Here is my current patch, after all the refactorings.
> 
> The status is that it works fine and could be used.
> 
> There are two failures in the SSL tests that I cannot explain.  The
> tests are for some rather obscure configurations, so the changed
> behaviors are not obviously wrong, perhaps legitimate implementation
> differences.  But someone wrote those tests with a purpose (probably),
> so we should have some kind of explanation for the regressions.
> 
> Other non-critical, nice-to-have issues:
> 
> - Do something about sslinfo, perhaps fold into pg_stat_ssl view.
> - Do something about pgcrypto.
> - Add tests for load_dh_file().
> - Implement channel binding tls-server-end-point.

Also, ...

- Add ssl_passphrase_command support.

I'm moving this patch forward to CF 2018-09, since it's not going to be
ready for -07, and we're still whacking around some channel binding
details, which would potentially interfere with this patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Server crashed with TRAP: FailedAssertion("!(parallel_workers > 0)" when partitionwise_aggregate true.

2018-06-26 Thread Jeevan Chalke
On Fri, Jun 22, 2018 at 6:58 PM, Robert Haas  wrote:

> On Fri, Jun 22, 2018 at 2:26 AM, Rajkumar Raghuwanshi
>  wrote:
> > I have applied patch and checked reported issue. Patch applied cleanly
> and
> > issues not reproducible any more.
>
> Committed, with a few changes:
>

Thanks Robert.


>
> - Renamed found_partially_grouped_child to partial_grouping_valid.
> The old name seemed to me to be a poor choice, because it sounds from
> the name like it gets set to true whenever we've found at least one
> partially grouped child, whereas really it gets set to false whenever
> we've failed to find at least one partially grouped child.  The new
> name is also more like the names in add_paths_to_append_rel.
>
> - Modified the wording of the comment.
>
> - Omitted the new assertion in add_paths_to_append_rel.  I doubt
> whether that's correct.  I don't see any obvious reason why
> live_childrels can't be NIL there, and further down I see this:
>
> /*
>  * If we found unparameterized paths for all children, build
> an unordered,
>  * unparameterized Append path for the rel.  (Note: this is
> correct even
>  * if we have zero or one live subpath due to constraint
> exclusion.)
>  */
>
> If it's not possible to have no live_childrels, then that comment is
> highly suspect.
>
>
OK, do these comments also holds true for partial_subpaths?

If we have NIL live_childrels, then the Assert() present inside the "if
(partial_subpaths_valid)" block will hit. Which I think is wrong.

We either need to remove these Asserts altogether or we should enter inside
this block only when we have non-NIL partial_subpaths. Also, if we don't
have any partial_subpaths, then variable partial_subpaths_valid should be
false. Currently, by default, it is set to true due to which we are ending
up inside that if block and then hitting an Assert.


> Also, even if this assertion *is* correct, I think it needs a better
> comment explaining why it's correct, because there isn't anything
> obvious in set_append_rel_pathlist that keeps IS_DUMMY_REL() from
> being true for every child.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: partitioning - changing a slot's descriptor is expensive

2018-06-26 Thread Amit Langote
On 2018/06/27 14:55, Andres Freund wrote:
> On 2018-06-27 14:46:26 +0900, Amit Langote wrote:
>> There is however similar code that runs in non-error paths, such as in
>> ExecPrepareTupleRouting(), that is executed *if* the leaf partition and
>> the root parent have differing attribute numbers.  So, one might say that
>> that code's there to handle the special cases which may not arise much in
>> practice, but it may still be a good idea to make improvements if we can
>> do so.
> 
> Yea, I was referring to all do_convert_tuple() callers, and some of them
> are more hot-path than the specific one above. E.g. the
> ConvertPartitionTupleSlot() call in CopyFrom().

Just in case you haven't noticed, ConvertPartitionTupleSlot(), even if
it's called unconditionally from CopyFrom etc., calls do_convert_tuple
only if necessary.  Note the return statement at the top of its body.

HeapTuple
ConvertPartitionTupleSlot(TupleConversionMap *map,
  HeapTuple tuple,
  TupleTableSlot *new_slot,
  TupleTableSlot **p_my_slot)
{
if (!map)
return tuple;

map is non-NULL only if a partition has different attribute numbers than
the root parent.

Thanks,
Amit




Re: Thinko/typo in ExecSimpleRelationInsert

2018-06-26 Thread Ashutosh Bapat
On Wed, Jun 27, 2018 at 11:24 AM, Amit Kapila  wrote:
>>
>> I don't understand what do you mean by consistent. Do you mean to say
>> that current usage " Store the slot into tuple ... " is correct?
>>
>
> Oh no,  I was talking about replacing it with below comment which is
> used at other places in the code.
> /*
> * get the heap tuple out of the tuple table slot, making sure we have a
> * writable copy
> */

Probably yes. But I would just correct the usage and grammar.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: unexpected relkind: 73 ERROR with partition table index

2018-06-26 Thread Rajkumar Raghuwanshi
Thanks for fix and commit. It is working fine now.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

On Tue, Jun 26, 2018 at 8:39 PM, Alvaro Herrera 
wrote:

> On 2018-Jun-27, David Rowley wrote:
>
> > On 27 June 2018 at 00:18, Rajkumar Raghuwanshi
> >  wrote:
> > > postgres=> ALTER INDEX part_idx RENAME TO part_idx_renamed;
> > > ERROR:  unexpected relkind: 73
> >
> > Seems to be caused by the auth failure code path in
> > RangeVarCallbackForAlterRelation().
>
> Ah, yeah, thanks.  No surprise this was missed, since I didn't add
> tests for ALTER INDEX RENAME and Peter concurrently refactored the
> handling code.  I propose we add a few more test lines, as attached.
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: partitioning - changing a slot's descriptor is expensive

2018-06-26 Thread Andres Freund
On 2018-06-27 14:46:26 +0900, Amit Langote wrote:
> Hi Andres,
> 
> On 2018/06/27 14:09, Andres Freund wrote:
> > Hi,
> > 
> > (sorry if I CCed the wrong folks, the code has changed a fair bit)
> > 
> > I noticed that several places in the partitioning code look like:
> > 
> > /*
> >  * If the tuple has been routed, it's been converted to the
> >  * partition's rowtype, which might differ from the root
> >  * table's.  We must convert it back to the root table's
> >  * rowtype so that val_desc shown error message matches the
> >  * input tuple.
> >  */
> > if (resultRelInfo->ri_PartitionRoot)
> > {
> > TableTuple tuple = ExecFetchSlotTuple(slot);
> > TupleConversionMap *map;
> > 
> > rel = resultRelInfo->ri_PartitionRoot;
> > tupdesc = RelationGetDescr(rel);
> > /* a reverse map */
> > map = convert_tuples_by_name(orig_tupdesc, tupdesc,
> >  gettext_noop("could not convert row 
> > type"));
> > if (map != NULL)
> > {
> > tuple = do_convert_tuple(tuple, map);
> > ExecSetSlotDescriptor(slot, tupdesc);
> > ExecStoreTuple(tuple, slot, InvalidBuffer, false);
> > }
> > }
> 
> This particular code block (and a couple of others like it) are executed
> only if a tuple fails partition constraint check, so maybe not a very
> frequently executed piece of code.
> 
> There is however similar code that runs in non-error paths, such as in
> ExecPrepareTupleRouting(), that is executed *if* the leaf partition and
> the root parent have differing attribute numbers.  So, one might say that
> that code's there to handle the special cases which may not arise much in
> practice, but it may still be a good idea to make improvements if we can
> do so.

Yea, I was referring to all do_convert_tuple() callers, and some of them
are more hot-path than the specific one above. E.g. the
ConvertPartitionTupleSlot() call in CopyFrom().


> > Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has
> > to reallocate the values/isnull arrays (and potentially do desc pinning
> > etc). Nor is it always cheap to call ExecFetchSlotTuple(slot) - it might
> > be a virtual slot. Calling heap_deform_tuple(), as done in
> > do_convert_tuple(), might be superfluous work, because the input tuple
> > might already be entirely deformed in the input slot.
> > 
> > I think it'd be good to rewrite the code so there's an input and an
> > output slot that each will keep their slot descriptors set.
> > 
> > Besides performance as the code stands, this'll also be important for
> > pluggable storage (as we'd otherwise get a *lot* of back/forth
> > conversions between tuple formats).
> > 
> > Anybody interesting in writing a patch that redoes this?
> 
> Thanks for the pointers above, I will see if I can produce a patch and
> will also be interested in hearing what others may have to say.

Cool.

Greetings,

Andres Freund



Re: Thinko/typo in ExecSimpleRelationInsert

2018-06-26 Thread Amit Kapila
On Wed, Jun 27, 2018 at 10:09 AM, Ashutosh Bapat
 wrote:
> On Tue, Jun 26, 2018 at 8:43 PM, Amit Kapila  wrote:
>> On Tue, Jun 26, 2018 at 7:02 PM, Ashutosh Bapat
>>  wrote:
>>> On Tue, Jun 26, 2018 at 6:18 PM, Amit Kapila  
>>> wrote:
 On Tue, Jun 26, 2018 at 4:33 PM, Ashutosh Bapat
  wrote:
> Looks like we need similar adjustment in ExecSimpleRelationUpdate() as
> well. Updated the patch.
>

 - /* Store the slot into tuple that we can write. */
 + /* Materialize slot into a tuple that we can inspect. */
   tuple = ExecMaterializeSlot(slot);

 I think it is better to keep the last word of the sentence as "write"
 instead of "inspect" as was in the original sentence.
>>>
>>> A copy-pasto while correcting a typo :)
>>>
 It makes more
 sense as we are materializing the tuple to write it.  Similarly, in
 the other change in the patch can use "write".
>>>
>>> Are you suggesting that we should use "write" in the modified comment
>>> instead of "inspect" in original comment.
>>>
>>
>> Yes.
>>
>> Another variant used in few other similar places is:
>>
>> /*
>> * get the heap tuple out of the tuple table slot, making sure we have a
>> * writable copy
>> */
>>
>>
>>> Ok, I have now corrected grammar as well. Here's updated patch.
>>>
>>
>> Your new comment is also correct.  I like the comment already used in
>> code as that makes the code consistent, what do you think?
>>
>
> I don't understand what do you mean by consistent. Do you mean to say
> that current usage " Store the slot into tuple ... " is correct?
>

Oh no,  I was talking about replacing it with below comment which is
used at other places in the code.
/*
* get the heap tuple out of the tuple table slot, making sure we have a
* writable copy
*/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: partitioning - changing a slot's descriptor is expensive

2018-06-26 Thread Amit Langote
Hi Andres,

On 2018/06/27 14:09, Andres Freund wrote:
> Hi,
> 
> (sorry if I CCed the wrong folks, the code has changed a fair bit)
> 
> I noticed that several places in the partitioning code look like:
> 
> /*
>  * If the tuple has been routed, it's been converted to the
>  * partition's rowtype, which might differ from the root
>  * table's.  We must convert it back to the root table's
>  * rowtype so that val_desc shown error message matches the
>  * input tuple.
>  */
> if (resultRelInfo->ri_PartitionRoot)
> {
> TableTuple tuple = ExecFetchSlotTuple(slot);
> TupleConversionMap *map;
> 
> rel = resultRelInfo->ri_PartitionRoot;
> tupdesc = RelationGetDescr(rel);
> /* a reverse map */
> map = convert_tuples_by_name(orig_tupdesc, tupdesc,
>  gettext_noop("could not convert row 
> type"));
> if (map != NULL)
> {
> tuple = do_convert_tuple(tuple, map);
> ExecSetSlotDescriptor(slot, tupdesc);
> ExecStoreTuple(tuple, slot, InvalidBuffer, false);
> }
> }

This particular code block (and a couple of others like it) are executed
only if a tuple fails partition constraint check, so maybe not a very
frequently executed piece of code.

There is however similar code that runs in non-error paths, such as in
ExecPrepareTupleRouting(), that is executed *if* the leaf partition and
the root parent have differing attribute numbers.  So, one might say that
that code's there to handle the special cases which may not arise much in
practice, but it may still be a good idea to make improvements if we can
do so.

> Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has
> to reallocate the values/isnull arrays (and potentially do desc pinning
> etc). Nor is it always cheap to call ExecFetchSlotTuple(slot) - it might
> be a virtual slot. Calling heap_deform_tuple(), as done in
> do_convert_tuple(), might be superfluous work, because the input tuple
> might already be entirely deformed in the input slot.
> 
> I think it'd be good to rewrite the code so there's an input and an
> output slot that each will keep their slot descriptors set.
> 
> Besides performance as the code stands, this'll also be important for
> pluggable storage (as we'd otherwise get a *lot* of back/forth
> conversions between tuple formats).
> 
> Anybody interesting in writing a patch that redoes this?

Thanks for the pointers above, I will see if I can produce a patch and
will also be interested in hearing what others may have to say.

Thanks,
Amit




Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-06-26 Thread Thomas Munro
On Wed, Apr 25, 2018 at 6:23 PM, Thomas Munro
 wrote:
> On Tue, Apr 24, 2018 at 7:37 PM, Michael Paquier  wrote:
>> Thomas, trying to understand here...  Why this place for the signal
>> initialization?  Wouldn't InitPostmasterChild() be a more logical place
>> as we'd want to have this logic caught by all other processes?
>
> Yeah, you're right.  Here's one like that.

Amazingly, due to the way the project schedules fell and thanks to the
help of a couple of very supportive FreeBSD committers and reviewers,
the new PROC_PDEATHSIG_CTL kernel facility[1] landed in a production
release today, beating the Commitfest by several days.

My question is whether this patch set is enough, or people (Andres?) want
to go further and actually kill the postmaster death pipe completely.
My kqueue patch would almost completely kill it on BSDs as it happens,
but for Linux there are a number of races and complications to plug to
do that IIUC.  I don't immediately see what there is to gain by doing
that work (assuming we reuse WaitEventSet objects in enough places),
and we'll always need to maintain the pipe code for other OSes anyway.
So I'm -0.5 on doing that, even though the technical puzzle is
appealing...

[1] 
https://www.freebsd.org/cgi/man.cgi?query=procctl&apropos=0&sektion=2&manpath=FreeBSD+11.2-RELEASE&arch=default&format=html

-- 
Thomas Munro
http://www.enterprisedb.com



Typo in llvm_function_reference

2018-06-26 Thread Rushabh Lathia
Hi,

There is multiple return statement in llvm_function_reference
and that's definitely looks typo. Here is the patch to fix the
same.

Thanks,
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index daae964..5d0cdab 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -394,7 +394,6 @@ llvm_function_reference(LLVMJitContext *context,
 		LLVMSetGlobalConstant(v_fn, true);
 
 		return LLVMBuildLoad(builder, v_fn, "");
-		return v_fn;
 	}
 
 	/* check if function already has been added */


partitioning - changing a slot's descriptor is expensive

2018-06-26 Thread Andres Freund
Hi,

(sorry if I CCed the wrong folks, the code has changed a fair bit)

I noticed that several places in the partitioning code look like:

/*
 * If the tuple has been routed, it's been converted to the
 * partition's rowtype, which might differ from the root
 * table's.  We must convert it back to the root table's
 * rowtype so that val_desc shown error message matches the
 * input tuple.
 */
if (resultRelInfo->ri_PartitionRoot)
{
TableTuple tuple = ExecFetchSlotTuple(slot);
TupleConversionMap *map;

rel = resultRelInfo->ri_PartitionRoot;
tupdesc = RelationGetDescr(rel);
/* a reverse map */
map = convert_tuples_by_name(orig_tupdesc, tupdesc,
 gettext_noop("could not convert row 
type"));
if (map != NULL)
{
tuple = do_convert_tuple(tuple, map);
ExecSetSlotDescriptor(slot, tupdesc);
ExecStoreTuple(tuple, slot, InvalidBuffer, false);
}
}


Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has
to reallocate the values/isnull arrays (and potentially do desc pinning
etc). Nor is it always cheap to call ExecFetchSlotTuple(slot) - it might
be a virtual slot. Calling heap_deform_tuple(), as done in
do_convert_tuple(), might be superfluous work, because the input tuple
might already be entirely deformed in the input slot.

I think it'd be good to rewrite the code so there's an input and an
output slot that each will keep their slot descriptors set.

Besides performance as the code stands, this'll also be important for
pluggable storage (as we'd otherwise get a *lot* of back/forth
conversions between tuple formats).

Anybody interesting in writing a patch that redoes this?

Greetings,

Andres Freund



Re: Thinko/typo in ExecSimpleRelationInsert

2018-06-26 Thread Ashutosh Bapat
On Tue, Jun 26, 2018 at 8:43 PM, Amit Kapila  wrote:
> On Tue, Jun 26, 2018 at 7:02 PM, Ashutosh Bapat
>  wrote:
>> On Tue, Jun 26, 2018 at 6:18 PM, Amit Kapila  wrote:
>>> On Tue, Jun 26, 2018 at 4:33 PM, Ashutosh Bapat
>>>  wrote:
 Looks like we need similar adjustment in ExecSimpleRelationUpdate() as
 well. Updated the patch.

>>>
>>> - /* Store the slot into tuple that we can write. */
>>> + /* Materialize slot into a tuple that we can inspect. */
>>>   tuple = ExecMaterializeSlot(slot);
>>>
>>> I think it is better to keep the last word of the sentence as "write"
>>> instead of "inspect" as was in the original sentence.
>>
>> A copy-pasto while correcting a typo :)
>>
>>> It makes more
>>> sense as we are materializing the tuple to write it.  Similarly, in
>>> the other change in the patch can use "write".
>>
>> Are you suggesting that we should use "write" in the modified comment
>> instead of "inspect" in original comment.
>>
>
> Yes.
>
> Another variant used in few other similar places is:
>
> /*
> * get the heap tuple out of the tuple table slot, making sure we have a
> * writable copy
> */
>
>
>> Ok, I have now corrected grammar as well. Here's updated patch.
>>
>
> Your new comment is also correct.  I like the comment already used in
> code as that makes the code consistent, what do you think?
>

I don't understand what do you mean by consistent. Do you mean to say
that current usage " Store the slot into tuple ... " is correct?

I don't think " so that we can write" is right usage either. I think,
there should be a preposition after "write" something like "write to",
to indicate that we will write to tuple. I find "scrible upon" to be
better than "write to". But I am not wedded to any of those. However,
I do want to change "store the slot into tuple" usage, which is wrong
as explained earlier as well as in the commit message.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Speedup of relation deletes during recovery

2018-06-26 Thread Thomas Munro
On Wed, Jun 27, 2018 at 4:17 PM, Andres Freund  wrote:
> On 2018-06-27 15:56:58 +1200, Thomas Munro wrote:
>> Without range-scannable buffer mapping (Andres's radix tree thing),
>> that bet doesn't work out too well when you do it more than once.
>> Hmm... we could just... not do that?
>
> That'd probably hurt noticably too...

Allow the optimisation only once per transaction?

>> (Has anyone ever looked into a lazier approach to dropping buffers?)
>
> What precisely are you thinking of? We kinda now do something lazy-ish
> at EOXact...

I mean at the buffer level.  If we did nothing at all, the problem
would be dirty buffers that you'd eventually try to write out to
non-existant files.  What if... you just remembered recently dropped
relations, and then whenever writing dirty buffers (either because of
stealing or checkpointing) you could check if they belong to recently
dropped relations and just mark them clean?  To garbage collect the
recently dropped list, you could somehow make use of the knowledge
that checkpoints and full clock hand cycles must drop all such
buffers.  I'm not proposing anything, just musing and wondering if
anyone has looked into this sort of thing...

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Speedup of relation deletes during recovery

2018-06-26 Thread Andres Freund
Hi,

On 2018-06-27 15:56:58 +1200, Thomas Munro wrote:
> That's a different code path that eats a lot of CPU on the *primary*, because:

While that's obviously not great, I think it's far less dangerous than
the standby having worse complexity than the primary. There's an obvious
backpressure if commands on the primary take a while, but there's
normally none for commands with high complexity on the standby...


> /*
>  * Normally, we need a transaction-safe truncation
> here.  However, if
>  * the table was either created in the current
> (sub)transaction or has
>  * a new relfilenode in the current (sub)transaction,
> then we can just
>  * truncate it in-place, because a rollback would
> cause the whole
>  * table or the current physical file to be thrown away 
> anyway.
>  */
> if (rel->rd_createSubid == mySubid ||
> rel->rd_newRelfilenodeSubid == mySubid)
> {
> /* Immediate, non-rollbackable truncation is OK */
> heap_truncate_one_rel(rel);
> }
> else
> 
> Without range-scannable buffer mapping (Andres's radix tree thing),
> that bet doesn't work out too well when you do it more than once.
> Hmm... we could just... not do that?

That'd probably hurt noticably too...


> (Has anyone ever looked into a lazier approach to dropping buffers?)

What precisely are you thinking of? We kinda now do something lazy-ish
at EOXact...

Greetings,

Andres Freund



Re: [WIP] [B-Tree] Retail IndexTuple deletion

2018-06-26 Thread Andrey V. Lepikhov




On 26.06.2018 15:31, Masahiko Sawada wrote:

On Fri, Jun 22, 2018 at 8:24 PM, Andrey V. Lepikhov
 wrote:

Hi,
According to your feedback, i develop second version of the patch.
In this version:
1. High-level functions index_beginscan(), index_rescan() not used. Tree
descent made by _bt_search(). _bt_binsrch() used for positioning on the
page.
2. TID list introduced in amtargetdelete() interface. Now only one tree
descent needed for deletion all tid's from the list with equal scan key
value - logical duplicates deletion problem.
3. Only one WAL record for index tuple deletion per leaf page per
amtargetdelete() call.
4. VACUUM can sort TID list preliminary for more quick search of duplicates.

Background worker will come later.




Thank you for updating patches! Here is some comments for the latest patch.

+static void
+quick_vacuum_index(Relation irel, Relation hrel,
+  IndexBulkDeleteResult **overall_stats,
+  LVRelStats *vacrelstats)
+{
(snip)
+   /*
+* Collect statistical info
+*/
+   lazy_cleanup_index(irel, *overall_stats, vacrelstats);
+}

I think that we should not call lazy_cleanup_index at the end of
quick_vacuum_index because we call it multiple times during a lazy
vacuum and index statistics can be changed during vacuum. We already
call lazy_cleanup_index at the end of lazy_scan_heap.


Ok


bttargetdelete doesn't delete btree pages even if pages become empty.
I think we should do that. Otherwise empty page never be recycled. But
please note that if we delete btree pages during bttargetdelete,
recyclable pages might not be recycled. That is, if we choose the
target deletion method every time then the deleted-but-not-recycled
pages could never be touched, unless reaching
vacuum_cleanup_index_scale_factor. So I think we need to either run
bulk-deletion method or do cleanup index before btpo.xact wraparound.

+   ivinfo.indexRelation = irel;
+   ivinfo.heapRelation = hrel;
+   qsort((void *)vacrelstats->dead_tuples,
vacrelstats->num_dead_tuples, sizeof(ItemPointerData),
tid_comparator);

Ok. I think caller of bttargetdelete() must decide when to make index 
cleanup.



I think the sorting vacrelstats->dead_tuples is not necessary because
garbage TIDs  are stored in a sorted order.

Sorting was introduced because I keep in mind background worker and more 
flexible cleaning strategies, not only full tuple-by-tuple relation and 
block scan.
Caller of bttargetdelete() can set info->isSorted to prevent sorting 
operation.



Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



--
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company



Re: Speedup of relation deletes during recovery

2018-06-26 Thread Thomas Munro
On Wed, Jun 27, 2018 at 1:46 PM, Andres Freund  wrote:
> On 2018-06-27 13:44:03 +1200, Thomas Munro wrote:
>> On further reflection, on the basis that it's the most conservative
>> change, +1 for Fujii-san's close-in-reverse-order idea.  We should
>> reconsider that data structure for 12; there doesn't seems to be a
>> good reason to carry all those comments warning about performance when
>> the O(1) version is shorter than the comments.
>
> Agreed on this.  I like the dlist version more than the earlier one, so
> let's fix it up, for v12+.  But regardless I'd argue that we consider
> disabling that infrastructure while in recovery - it's just unnecessary.

I tested this patch using the functions that Michael provided + an
extra variant truncate_tables(), using the -v2 patch + the adjustment
to close in reverse order.  I set shared_buffers to 4GB and
max_locks_per_transaction to 3.  I tested three CREATE/DROP code
paths, and also two different TRUNCATE paths:

SELECT create_tables(1);
SELECT drop_tables(1);

BEGIN;
SELECT create_tables(1);
ROLLBACK;

SELECT create_tables(1);
BEGIN;
SELECT drop_tables(1);
PREPARE TRANSACTION 'x';
COMMIT PREPARED 'x';

SELECT create_tables(1);
SELECT truncate_tables(1);

In all these cases my startup process would eat 100% CPU for ~20
seconds, and with the patch this dropped to ~1-2 second (I didn't
measure precisely, I just watched htop) with the patch.  I also tried
back-patching to 9.3 and the results were the same.

Based on this and positive feedback from other reviewers, I will mark
this "Ready for Committer" (meaning v2 + close-in-reverse-order).  I
did have one cosmetic remark a few messages up (redundant variable
initialisation).

By the way, as noted by others already, there is still a way to reach
a horrible case with or without the patch:

BEGIN;
SELECT create_tables(1);
SELECT truncate_tables(1);

That's a different code path that eats a lot of CPU on the *primary*, because:

/*
 * Normally, we need a transaction-safe truncation
here.  However, if
 * the table was either created in the current
(sub)transaction or has
 * a new relfilenode in the current (sub)transaction,
then we can just
 * truncate it in-place, because a rollback would
cause the whole
 * table or the current physical file to be thrown away anyway.
 */
if (rel->rd_createSubid == mySubid ||
rel->rd_newRelfilenodeSubid == mySubid)
{
/* Immediate, non-rollbackable truncation is OK */
heap_truncate_one_rel(rel);
}
else

Without range-scannable buffer mapping (Andres's radix tree thing),
that bet doesn't work out too well when you do it more than once.
Hmm... we could just... not do that?

(Has anyone ever looked into a lazier approach to dropping buffers?)

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Is PG built on any C compilers where int division floors?

2018-06-26 Thread Tom Lane
Chapman Flack  writes:
> Do PG's operators just do what the underlying C compiler generates?

Yes.  This is generally true for both the int and float operators.

regards, tom lane



Re: Is PG built on any C compilers where int division floors?

2018-06-26 Thread Chapman Flack
On 06/24/18 23:38, Tom Lane wrote:
> Chapman Flack  writes:
>> C99 finally pinned down what / does on signed ints, truncating toward zero.
>> Before that, it could truncate toward zero, or floor toward -inf.
>> Is PostgreSQL built on any compilers/platforms that have the floor
>> behavior?
> 
> I'm not sure if we still have any buildfarm animals that act that way[1],
> but the project policy is that we target C90 not C99.  So wiring in any


On a related note, does PG itself specify the behavior of its own signed int
/ and % (and div() and mod() functions)? Should it? I've been looking at

https://www.postgresql.org/docs/11/static/functions-math.html

and that doesn't specify. (There's a column of examples for the operators,
where the operands are positive. Changing nothing else, it might be more
revealing to toss in a negative-operand example: do we have div(-9,4) = -2
with mod(-9,4) = -1, or div(-9,4) = -3 with mod(-9,4) = +3 ?)

I don't see it specified in the standard either (2006 draft) - doesn't even
pay the issue enough notice to say it's implementation-defined.

Do PG's operators just do what the underlying C compiler generates?

Also, the PG manual doesn't seem to say whether >> is arithmetic or logical.
(I just tried it here and it was arithmetic; I assume it's consistent, but
would it be worth saying?)

-Chap



Re: Speedup of relation deletes during recovery

2018-06-26 Thread Andres Freund
On 2018-06-27 13:44:03 +1200, Thomas Munro wrote:
> On further reflection, on the basis that it's the most conservative
> change, +1 for Fujii-san's close-in-reverse-order idea.  We should
> reconsider that data structure for 12; there doesn't seems to be a
> good reason to carry all those comments warning about performance when
> the O(1) version is shorter than the comments.

Agreed on this.  I like the dlist version more than the earlier one, so
let's fix it up, for v12+.  But regardless I'd argue that we consider
disabling that infrastructure while in recovery - it's just unnecessary.

Greetings,

Andres Freund



Re: Speedup of relation deletes during recovery

2018-06-26 Thread Thomas Munro
On Wed, Jun 27, 2018 at 1:13 PM, Thomas Munro
 wrote:
> On Wed, Jun 27, 2018 at 12:16 PM, Thomas Munro
>  wrote:
>> I think we should take the hint in the comments and make it O(1)
>> anyway.  See attached draft patch.
>
> Alternatively, here is a shorter and sweeter dlist version (I did the
> open-coded one thinking of theoretical back-patchability).

... though, on second thoughts, the dlist version steam-rolls over the
possibility that it might not be in the list (mentioned in the
comments, though it's not immediately clear how that would happen).

On further reflection, on the basis that it's the most conservative
change, +1 for Fujii-san's close-in-reverse-order idea.  We should
reconsider that data structure for 12; there doesn't seems to be a
good reason to carry all those comments warning about performance when
the O(1) version is shorter than the comments.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Speedup of relation deletes during recovery

2018-06-26 Thread Thomas Munro
On Wed, Jun 27, 2018 at 12:16 PM, Thomas Munro
 wrote:
> I think we should take the hint in the comments and make it O(1)
> anyway.  See attached draft patch.

Alternatively, here is a shorter and sweeter dlist version (I did the
open-coded one thinking of theoretical back-patchability).

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Use-a-dlist-for-unowned-relations.patch
Description: Binary data


Re: Speedup of relation deletes during recovery

2018-06-26 Thread Thomas Munro
On Fri, Jun 22, 2018 at 5:41 AM, Andres Freund  wrote:
> On 2018-06-21 14:40:58 +0900, Michael Paquier wrote:
>> On Wed, Jun 20, 2018 at 08:43:11PM -0700, Andres Freund wrote:
>> > On 2018-06-18 11:13:47 -0700, Andres Freund wrote:
>> >> We could do that - but add_to_unowned_list() is actually a bottleneck in
>> >> other places during recovery too. We pretty much never (outside of
>> >> dropping relations / databases) close opened relations during recovery -
>> >> which is obviously problematic since nearly all of the entries are
>> >> unowned.  I've come to the conclusion that we should have a global
>> >> variable that just disables adding anything to the global lists.
>> >
>> > On second thought: I think we should your approach in the back branches,
>> > and something like I'm suggesting in master once open.

I think we should take the hint in the comments and make it O(1)
anyway.  See attached draft patch.

+   SMgrRelation *srels = NULL;

There seems to be no reason to initialise the variable to NULL in the
three places where you do that.  It is always assigned a value before
use.

There is probably a way to share a bit more code among those three
places, but that sort of refactoring should be for later.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Use-a-doubly-linked-list-for-unowned-relations.patch
Description: Binary data


Re: Name of main process differs between servers (postmaster vs postgres)

2018-06-26 Thread David G. Johnston
On Tue, Jun 26, 2018 at 2:51 PM, Jonathan Lemig  wrote:

> From what I can tell, things are running fine on both servers, I guess I
> just don't like there is a difference/inconsistency.  Can anyone please
> explain this?
>
>
​Best guess, one of the servers has upgrades from previous versions applied
to it while the other started with a newer release.  The upgraded server
didn't mess with a pre-existing launch script that was using postmaster
while the new one made use of the likely newer script that calls postgres.

Dave


Re: Name of main process differs between servers (postmaster vs postgres)

2018-06-26 Thread Justin Pryzby
On Tue, Jun 26, 2018 at 04:51:32PM -0500, Jonathan Lemig wrote:
> Hi,
> 
> I noticed on two of my postgres servers, one has "postmaster" for the main
> process, and the other has "postgres".   My question is - why is this?  For
> example:

On my centos6 servers:

1518140 lrwxrwxrwx   1 root root8 May 10 21:32 
/usr/pgsql-10/bin/postmaster -> postgres

Do you have something else ?

Is there a symlink for pg_ctl or postmaster or postgres in /s?bin, /usr/s?bin,
or /usr/local/s?bin ?

Was one started by initscript and one started by something else (manually
running pg_ctl or similar) ?

Are the initscripts the same?

Did one of them start at boot and one restarted since then ?

You might get a hint by comparing /proc/909/environ with /proc/4804/environ
(the relative pids suggests that 4804 was perhaps re/started significantly
after boot).  And or proc/pid/comm, exe and stat.

Did one of them crash since it was started (or killed by OOM) ?

Justin

PS, I think the list residents would for the future to ask to keep admin/DBA
oriented questions on pgsql-general, and development/bugs on -hackers.



Re: Query Rewrite for Materialized Views (Postgres Extension)

2018-06-26 Thread Dent John
Hi Nico,

By the way, I do agree with your point about MERGE — if we can factor MV 
updates in that fashion, it will certainly save.

I didn’t reply immediately because your point caught me off guard:

> […]  If you look at my
> sketch for how to do it, you'll notice that many of the sorts of queries
> that one would choose to materialize... are not really amenable to this
> treatment […]

I’d rather presumed that there would be many examples of DML on base relations 
that could trigger a direct (incremental) update to the MV. I had presumed it, 
but not actually done any research.

So I did a bit of a trawl. There’s actually quite a lot of academic research 
out there, including [1] and [2]. [2] references a bunch of methods for 
incremental MV refresh, and ties them into a graph query context. I’m not sure 
if the graph query context itself is relevant for Postgres, but it’s certainly 
interesting and perhaps suggests that incremental refresh of at least some 
RECURSIVE MVs may not be entirely impossible. I also found [3], which is /very/ 
dated, but it strongly supports that MVs are a performant path to executing 
certain types of query.

So I definitely agree with you in the general case, but it seems there is scope 
to provide an incremental MV refresh capability that is broadly useful.

Almost certainly, any initial implementation would quickly fall back to a “full 
refresh”. But the refresh planner’s capability develops, I wonder if it could 
not embody an intelligent strategy that might even recognise common recursive 
patterns such as the transitive closure you mention, and refresh on an 
incremental basis — that would be really quite a cool capability to have.

denty.

[1] http://www.dbnet.ece.ntua.gr/pubs/uploads/TR-1998-14.ps 

[2] https://arxiv.org/pdf/1806.07344.pdf 
[3] https://www.cs.indiana.edu/pub/techreports/TR280.pdf 



Name of main process differs between servers (postmaster vs postgres)

2018-06-26 Thread Jonathan Lemig
Hi,

I noticed on two of my postgres servers, one has "postmaster" for the main
process, and the other has "postgres".   My question is - why is this?  For
example:

Server1:
postgres   909 1  0 May08 ?00:03:55
/usr/pgsql-9.6/bin/postmaster -D /var/lib/pgsql/9.6/data/


Server2:
postgres  4804 1  0 May01 ?00:05:08 /usr/pgsql-9.6/bin/postgres
-D /var/lib/pgsql/9.6/data



Both servers are running CentOS Linux release 7.4.1708 for the OS
and PostgreSQL 9.6.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-16), 64-bit.

The only differences I've been able to find so far are:

1) When doing a ps, server1 shows a trailing slash for the $PGDATA
directory whereas server2 doesn't have the trailing slash. I echoed $PGDATA
on both servers and they're both the same (no trailing slash).

2)  Server2 has an additional package installed
-- postgresql96-contrib-9.6.8-1PGDG.rhel7.x86_64.  Aside from that, both
servers have the same packages installed:

postgresql96-9.6.8-1PGDG.rhel7.x86_64
postgresql96-server-9.6.8-1PGDG.rhel7.x86_64
postgresql96-libs-9.6.8-1PGDG.rhel7.x86_64
postgresql96-pglogical-2.2.0-1.el7.x86_64

Looking at the documentation, it seems that postmaster is a deprecated
alias for the postgres process so while they differ in name, they're the
same in spirit?

>From what I can tell, things are running fine on both servers, I guess I
just don't like there is a difference/inconsistency.  Can anyone please
explain this?

Thanks!

Jon


Re: wrong query result with jit_above_cost= 0

2018-06-26 Thread Andres Freund
On 2018-06-26 21:55:07 +0100, Andrew Gierth wrote:
> > "Dmitry" == Dmitry Dolgov <9erthali...@gmail.com> writes:
> 
>  Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the
>  Dmitry> problem, one of the null fields is screwed up, will try to
>  Dmitry> figure out why is that.
> 
> The handling of nulls in grouping set results is a bit icky, see
> prepare_projection_slot in nodeAgg.c. The comment there lists a number
> of assumptions which may or may not hold true under JIT which might give
> a starting point to look for problems. (Unfortunately I'm not currently
> in a position to test on a JIT build)

I probably just screwed up a bit of code generation. I can't see any of
the more fundamental assumptions being changed by the way JITing is
done.

Greetings,

Andres Freund



Re: wrong query result with jit_above_cost= 0

2018-06-26 Thread Andrew Gierth
> "Dmitry" == Dmitry Dolgov <9erthali...@gmail.com> writes:

 Dmitry> Yep, my bad, forgot to turn it on. Now I see what's the
 Dmitry> problem, one of the null fields is screwed up, will try to
 Dmitry> figure out why is that.

The handling of nulls in grouping set results is a bit icky, see
prepare_projection_slot in nodeAgg.c. The comment there lists a number
of assumptions which may or may not hold true under JIT which might give
a starting point to look for problems. (Unfortunately I'm not currently
in a position to test on a JIT build)

(This wasn't my first choice of approach; originally I had a GroupedVar
node that evaluated either as the Var would or to NULL instead, and put
that into the projection. However there was a lot of resistance to this
approach at the time, and the new node idea was abandoned in favour of
poking directly into the slot.)

-- 
Andrew (irc:RhodiumToad)



Re: [HACKERS] [PATCH] WIP Add ALWAYS DEFERRED option for constraints

2018-06-26 Thread Robbie Harwood
Nico Williams  writes:

> [Re-send; first attempt appears to have hit /dev/null somewhere.  My
> apologies if you get two copies.]
>
> I've finally gotten around to rebasing this patch and making the change
> that was requested, which was: merge the now-would-be-three deferral-
> related bool columns in various pg_catalog tables into one char column.
>
> Instead of (deferrable, initdeferred, alwaysdeferred), now there is just
> (deferral).

This design seems correct to me.  I have a couple questions inline, and
some nits to go with them.

> All tests (make check) pass.

Thank you for adding tests!

> diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
> index 3ed9021..e82e39b 100644
> --- a/doc/src/sgml/catalogs.sgml
> +++ b/doc/src/sgml/catalogs.sgml
> @@ -2239,17 +2239,15 @@ SCRAM-SHA-256$ count>:&l
>   
>  
>   
> -  condeferrable
> -  bool
> -  
> -  Is the constraint deferrable?
> - 
> -
> - 
> -  condeferred
> -  bool
> +  condeferral
> +  char
>
> -  Is the constraint deferred by default?
> +  Constraint deferral option:
> +a = always deferred,
> +d = deferrable,
> +d = deferrable initially deferred,

From the rest of the code, I think this is supposed to be 'i', not 'd'.

> +n = not deferrable
> +  
>   
>  
>   

> diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
> index 8b276bc..795a7a9 100644
> --- a/src/backend/catalog/index.c
> +++ b/src/backend/catalog/index.c
> @@ -1070,6 +1070,7 @@ index_create(Relation heapRelation,
>  
>   recordDependencyOn(&myself, &referenced, 
> deptype);
>   }
> + Assert((flags & INDEX_CREATE_ADD_CONSTRAINT) == 0);

What does this ensure, and why is it in this part of the code?  We're in
the `else` branch here - isn't this always the case (i.e., Assert can
never fire), since `flags` isn't manipulated in this block?

>   }
>  
>   /* Store dependency on parent index, if any */

> diff --git a/src/backend/catalog/information_schema.sql 
> b/src/backend/catalog/information_schema.sql
> index f4e69f4..bde6199 100644
> --- a/src/backend/catalog/information_schema.sql
> +++ b/src/backend/catalog/information_schema.sql
> @@ -891,10 +891,14 @@ CREATE VIEW domain_constraints AS
> CAST(current_database() AS sql_identifier) AS domain_catalog,
> CAST(n.nspname AS sql_identifier) AS domain_schema,
> CAST(t.typname AS sql_identifier) AS domain_name,
> -   CAST(CASE WHEN condeferrable THEN 'YES' ELSE 'NO' END
> +   CAST(CASE WHEN condeferral = 'n' THEN 'NO' ELSE 'YES' END
>   AS yes_or_no) AS is_deferrable,
> -   CAST(CASE WHEN condeferred THEN 'YES' ELSE 'NO' END
> +   CAST(CASE WHEN condeferral = 'i' OR condeferral = 'a' THEN 'YES' 
> ELSE 'NO' END
>   AS yes_or_no) AS initially_deferred
> +/*
> + * XXX Can we add is_always_deferred here?  Are there
> + * standards considerations?
> + */

I'm not familiar enough to speak to this.  Hopefully someone else can.
Absent other input, I think we should add is_always_deferred.  (And if
we were building this today, probably just expose a single character
instead of three booleans.)

>  FROM pg_namespace rs, pg_namespace n, pg_constraint con, pg_type t
>  WHERE rs.oid = con.connamespace
>AND n.oid = t.typnamespace

> diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
> index 57519fe..41dc6a4 100644
> --- a/src/backend/commands/trigger.c
> +++ b/src/backend/commands/trigger.c
> @@ -3632,6 +3627,7 @@ typedef struct AfterTriggerSharedData
>   TriggerEvent ats_event; /* event type indicator, see trigger.h 
> */
>   Oid ats_tgoid;  /* the trigger's ID */
>   Oid ats_relid;  /* the relation it's on 
> */
> + boolats_alwaysdeferred; /* whether this can be 
> deferred */

"Whether this must be deferred"?  Also, I'm not sure what this is for -
it doesn't seem to be used anywhere.

>   CommandId   ats_firing_id;  /* ID for firing cycle */
>   struct AfterTriggersTableData *ats_table;   /* transition table 
> access */
>  } AfterTriggerSharedData;

> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
> index 90dfac2..dab721a 100644
> --- a/src/backend/parser/gram.y
> +++ b/src/backend/parser/gram.y
> @@ -184,8 +185,8 @@ static void SplitColQualList(List *qualList,
>List **constraintList, 
> CollateClause **collClause,
>core_yyscan_t 
> yyscanner);
>  static void processCASbits(int cas_bits, int location, const char 
> *constrType,
> -bool *de

Re: Fix slot's xmin advancement and subxact's lost snapshots in decoding.

2018-06-26 Thread Alvaro Herrera
Pushed this to 9.4 - master after some more tinkering.

It occurred to me that it might be better to have
ReorderBufferSetBaseSnapshot do the IncrRefCount instead of expecting
caller to do it.  But I wouldn't backpatch that change, so I refrained.

Thanks for the patch.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-06-26 Thread Jerry Sievers
Hackers, felt like reporting this relevant tidbit in case of interest...

My site is among the few who've hit this bug.

We observed recently a case of pg_database having joined pg_authid and
pg_auth_members in the bad xmin, unable to vacuum shcatalog group,
however...

pg_database then started working again without any intervention on our
part so apparently due to some relevant system activity.

We did later do a restart to correct the problem for those other 2
 catalogs but , will try the global init file hack mentioned below later
 if/when we face this anew before running a fixed Pg version.

Many thanks!


Matheus de Oliveira  writes:

> Hello again...
>
> On Tue, Jun 19, 2018 at 12:53 PM, Andres Freund 
> wrote:
>
> ...
>
> After that, yes, deleting the
> global/pg_internal.init file is the way to go, and I can't think
> of a
> case where it's problematic, even without stopping the server.
>

>
>
> Just to let you know. I applied the work around in the affected
> server and seemed to work way, so far no error.
>
> Thank you a lot for all the information and help.
>
> Best regards,
> --
> Matheus de Oliveira
>
>
>
>

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consult...@comcast.net
p: 312.241.7800



Re: [bug fix] ECPG: freeing memory for pgtypes crashes on Windows

2018-06-26 Thread MauMau
From: Thomas Munro
> Ok, back-patched.

Thank you very much!

> It seems like the other patch[1] might need the same treatment,
right?

I believe so, because that patch is based on the same cause.


Regards
MauMau




Re: wrong query result with jit_above_cost= 0

2018-06-26 Thread Dmitry Dolgov
> On 26 June 2018 at 22:11, Andres Freund  wrote:
> On 2018-06-26 22:09:10 +0200, Dmitry Dolgov wrote:
>> > On 26 June 2018 at 20:23, Andres Freund  wrote:
>> > Hi,
>> >
>> > On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:
>> >> I found the below query which returns the wrong output
>> >> when jit_above_cost= 0 is set.
>> >>
>> >> Steps to reproduce:
>> >>
>> >> CREATE TABLE emp (
>> >> epno   NUMERIC(4),
>> >> ename   VARCHAR(10),
>> >> job VARCHAR(9),
>> >> mgr NUMERIC(4),
>> >> hiredateDATE,
>> >> sal NUMERIC(7,2),
>> >> commNUMERIC(7,2),
>> >> deptno  NUMERIC(2)
>> >> );
>> >>
>> >> INSERT INTO emp VALUES 
>> >> (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
>> >> INSERT INTO emp VALUES 
>> >> (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);
>> >>
>> >> set jit_above_cost= 0;
>> >>
>> >> select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
>> >>
>> >> without the ROLLUP, I don't see any problem with results.
>> >
>> > Interesting.  I've opened an open item referencing this.
>>
>> Hi,
>>
>> Just out of curiosity, what exactly is wrong with the output of this query? I
>> see the same results with jit_above_cost = 0 and with the default value:
>>
>> =# show jit_above_cost;
>>  jit_above_cost
>> 
>>  10
>> (1 row)
>>
>> =# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
>>  max
>> --
>>  7369
>>  7499
>>  7499
>> (3 rows)
>>
>> =# set jit_above_cost = 0;
>> SET
>> =# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
>>  max
>> --
>>  7369
>>  7499
>>  7499
>> (3 rows)
>>
>> And as far as I understand it's totally correct, since we do rollup by just 
>> two
>> values and have one more row as a total (with NULLs):
>>
>> =# select max(epno), deptno, epno
>>from emp group by rollup((deptno,epno)) order by 1 asc;
>>
>>  max  | deptno | epno
>> --++--
>>  7369 | 20 | 7369
>>  7499 |   NULL | NULL
>>  7499 | 30 | 7499
>> (3 rows)
>
> I've not reproduced the problem yet (I'm deep in a review / edit of
> another patchset). Could it be that you've not compiled with JIT
> support and thus don't see the problem Rushab was complaining about?
> SELECT pg_jit_available();

Yep, my bad, forgot to turn it on. Now I see what's the problem, one of the
null fields is screwed up, will try to figure out why is that.



Re: wrong query result with jit_above_cost= 0

2018-06-26 Thread Andres Freund
On 2018-06-26 22:09:10 +0200, Dmitry Dolgov wrote:
> > On 26 June 2018 at 20:23, Andres Freund  wrote:
> > Hi,
> >
> > On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:
> >> I found the below query which returns the wrong output
> >> when jit_above_cost= 0 is set.
> >>
> >> Steps to reproduce:
> >>
> >> CREATE TABLE emp (
> >> epno   NUMERIC(4),
> >> ename   VARCHAR(10),
> >> job VARCHAR(9),
> >> mgr NUMERIC(4),
> >> hiredateDATE,
> >> sal NUMERIC(7,2),
> >> commNUMERIC(7,2),
> >> deptno  NUMERIC(2)
> >> );
> >>
> >> INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
> >> INSERT INTO emp VALUES 
> >> (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);
> >>
> >> set jit_above_cost= 0;
> >>
> >> select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
> >>
> >> without the ROLLUP, I don't see any problem with results.
> >
> > Interesting.  I've opened an open item referencing this.
> 
> Hi,
> 
> Just out of curiosity, what exactly is wrong with the output of this query? I
> see the same results with jit_above_cost = 0 and with the default value:
> 
> =# show jit_above_cost;
>  jit_above_cost
> 
>  10
> (1 row)
> 
> =# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
>  max
> --
>  7369
>  7499
>  7499
> (3 rows)
> 
> =# set jit_above_cost = 0;
> SET
> =# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
>  max
> --
>  7369
>  7499
>  7499
> (3 rows)
> 
> And as far as I understand it's totally correct, since we do rollup by just 
> two
> values and have one more row as a total (with NULLs):
> 
> =# select max(epno), deptno, epno
>from emp group by rollup((deptno,epno)) order by 1 asc;
> 
>  max  | deptno | epno
> --++--
>  7369 | 20 | 7369
>  7499 |   NULL | NULL
>  7499 | 30 | 7499
> (3 rows)

I've not reproduced the problem yet (I'm deep in a review / edit of
another patchset). Could it be that you've not compiled with JIT
support and thus don't see the problem Rushab was complaining about?
SELECT pg_jit_available();

Greetings,

Andres Freund



Re: wrong query result with jit_above_cost= 0

2018-06-26 Thread Dmitry Dolgov
> On 26 June 2018 at 20:23, Andres Freund  wrote:
> Hi,
>
> On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:
>> I found the below query which returns the wrong output
>> when jit_above_cost= 0 is set.
>>
>> Steps to reproduce:
>>
>> CREATE TABLE emp (
>> epno   NUMERIC(4),
>> ename   VARCHAR(10),
>> job VARCHAR(9),
>> mgr NUMERIC(4),
>> hiredateDATE,
>> sal NUMERIC(7,2),
>> commNUMERIC(7,2),
>> deptno  NUMERIC(2)
>> );
>>
>> INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
>> INSERT INTO emp VALUES 
>> (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);
>>
>> set jit_above_cost= 0;
>>
>> select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
>>
>> without the ROLLUP, I don't see any problem with results.
>
> Interesting.  I've opened an open item referencing this.

Hi,

Just out of curiosity, what exactly is wrong with the output of this query? I
see the same results with jit_above_cost = 0 and with the default value:

=# show jit_above_cost;
 jit_above_cost

 10
(1 row)

=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
 max
--
 7369
 7499
 7499
(3 rows)

=# set jit_above_cost = 0;
SET
=# select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
 max
--
 7369
 7499
 7499
(3 rows)

And as far as I understand it's totally correct, since we do rollup by just two
values and have one more row as a total (with NULLs):

=# select max(epno), deptno, epno
   from emp group by rollup((deptno,epno)) order by 1 asc;

 max  | deptno | epno
--++--
 7369 | 20 | 7369
 7499 |   NULL | NULL
 7499 | 30 | 7499
(3 rows)



Re: Fix slot's xmin advancement and subxact's lost snapshots in decoding.

2018-06-26 Thread Arseny Sher


Alvaro Herrera  writes:

> I'm struggling with this assert.  I find that test_decoding passes tests
> with this assertion in branch master, but fails in 9.4 - 10.  Maybe I'm
> running the tests wrong (no assertions in master?) but I don't see it.
> It *should* fail ...

Your v3 patch fails for me on freshest master (4d54543efa) in exactly
this assert, it looks like there is something wrong in your setup.

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Constraint documentation

2018-06-26 Thread Brad DeJong
On Tue, Jun 26, 2018 at 4:24 AM, Vik Fearing 
wrote:

> Looks good to me.


I'll second that. Looks good to me too.


Re: Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-06-26 Thread David G. Johnston
On Tue, Jun 26, 2018 at 10:06 AM, Tom Lane  wrote:

> Pavel Stehule  writes:
> > 2018-06-26 18:22 GMT+02:00 David G. Johnston  >:
> >>> So I am not sure, if proposed change is practical because views and
> >>> tables shares same namespace and current behave has sense too.
>
> >> I'm doubtful that there is any meaningful technical/practical challenge
> >> involved here.
>
> Certainly we *could* change it, but it's not at all clear that it's a good
> idea.  The current behavior seemed sensible when it was implemented, and
> it has stood for quite some years now.


​There are numerous examples of bugs that go unreported until well after a
release is out of support; I'm leaning toward figuring this falls into that
category.  A code comment, commit, or email where this was discussed
explicitly would help shift my view here.


>   Now, we have one person
> complaining that it wasn't what he expected.  If we change the behavior on
> the strength of that, will we change it back on the first complaint from
> someone else?


​I'm not saying change it because someone was confused but because the
wording of the feature leads me to believe that the current implementation
is wrong and...

  What about the possibility that the change breaks peoples'
> applications?
>

...it is broken in such a way that I cannot foresee someone relying upon
the existing behavior intentionally.

I'm willing to admit that this is an edge case and fixing it only in 12 is
acceptable.  I'm unable to prove a negative here but likewise no one has
shown the negative to be false either.


> I think there might be grounds for clarifying the documentation, but
> I'm quite hesitant to change the behavior without someone making a case
> a whole lot stronger than this.
>

​OK.​


> Also worth noting is that similar issues arise elsewhere, eg we now
> have "procedures" vs "functions" in a single namespace.  Let's not have
> DROP TABLE acting in a way that's inconsistent with other parts of the
> system.
>

MaybeVIEWS and TABLES are in many ways interchangeable whereas other
objects sharing a namespace have distinct uses that are different from each
other. For those the choice here doesn't even matter (though if this is how
procedures/functions work I wouldn't be surprise if we got this kind of
complaint for those sooner rather than later) as no one would be running
code where they want to drop a table and create a view of the same name (or
vice-versa I suppose).  I'm sure it doesn't happen that often, hence the
lack of reports, but it also a transformation that I wouldn't consider
unreasonable (client code doesn't want to change the logically named
relation even though the DBA wants to change the physical implementation).

David J.


Re: effect of JIT tuple deform?

2018-06-26 Thread Andres Freund
On 2018-06-26 21:25:54 +0200, Pavel Stehule wrote:
> Hi
> 
> I played with it and maybe I got interesting result.
> 
> When I played with really big table, then I got IO waits and effect of
> jit_tuple_deforming is near to zero (as expected)
> 
> When I played with smaller table under RAM, then I can see positive effect
> of JIT_TD, but only when optimization is active. When optimization was
> false, then JIT_TD has negative slowdown.
> 
> Shared buffers 1GB
> table wt .. 100columns, 2M rows, cca 823MB
> set max_parallel_workers to 0;
> 
> query:
> 
> select sum(c99) from wt;
> 
> default (jit active) ... 1853ms
> set jit_tuple_deforming to off .. 1397ms
> set jit off .. 1400ms
> 
> == enforced inline ==
> jit_tuple_deforming on .. 1610ms
> jit_tuple_deforming off .. 1420ms
> 
> == enforced optimize ==
> jit_tuple_deforming on .. 842ms
> jit_tuple_deforming off .. 1420ms
> 
> 
> I played with setting and I got the best speed with
> 
> jit_inline_above_cost ..some pretty high number
> jit_optimize_above_cost 0
> jit_tuple_deforming on
> 
> 
> So I am able to see effect of jit_tuple_deforming, and very well, but only
> if optimization is active. When optimization is not active then
> jit_tuple_deforming does slowdown.
> 
> So maybe a usage of jit_tuple_deforming can be conditioned by
> jit_optimization?

No, it's definitely useful outside of that. I don't think we should
optimize for one individual query.

Greetings,

Andres Freund



Re: effect of JIT tuple deform?

2018-06-26 Thread Pavel Stehule
Hi

I played with it and maybe I got interesting result.

When I played with really big table, then I got IO waits and effect of
jit_tuple_deforming is near to zero (as expected)

When I played with smaller table under RAM, then I can see positive effect
of JIT_TD, but only when optimization is active. When optimization was
false, then JIT_TD has negative slowdown.

Shared buffers 1GB
table wt .. 100columns, 2M rows, cca 823MB
set max_parallel_workers to 0;

query:

select sum(c99) from wt;

default (jit active) ... 1853ms
set jit_tuple_deforming to off .. 1397ms
set jit off .. 1400ms

== enforced inline ==
jit_tuple_deforming on .. 1610ms
jit_tuple_deforming off .. 1420ms

== enforced optimize ==
jit_tuple_deforming on .. 842ms
jit_tuple_deforming off .. 1420ms


I played with setting and I got the best speed with

jit_inline_above_cost ..some pretty high number
jit_optimize_above_cost 0
jit_tuple_deforming on


So I am able to see effect of jit_tuple_deforming, and very well, but only
if optimization is active. When optimization is not active then
jit_tuple_deforming does slowdown.

So maybe a usage of jit_tuple_deforming can be conditioned by
jit_optimization?

Regards

Pavel


Re: Fix slot's xmin advancement and subxact's lost snapshots in decoding.

2018-06-26 Thread Alvaro Herrera
On 2018-Jun-26, Arseny Sher wrote:

> Alvaro Herrera  writes:

> > * I'm a bit unsure about the comment atop ReorderBufferSetBaseSnapshot.
> >   Obviously, the bit within the #if 0/#endif I'm going to remove before
> >   push.
> 
> It looks like you've started editing that bit and didn't finish.

Yeah, I left those lines in place as a reminder that I need to finish
editing, wondering if there's any nuance I'm missing that I need to add
to the final version.

> >   I don't understand why it says "Needs to be called before any
> >   changes are added with ReorderBufferQueueChange"; but if you edit that
> >   function and add an assert that the base snapshot is set, it crashes
> >   pretty quickly in the test_decoding tests.  (The supposedly bogus
> >   comment was there before your patch -- I'm not saying your comment
> >   addition is faulty.)
> 
> That's because REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID changes are
> queued whenever we have read that xact has modified the catalog,
> regardless of base snapshot existence. Even if there are no changes yet,
> we will need it for correct visibility of catalog, so I am inclined to
> remove the assert and comment or rephrase the latter with 'any
> *data-carrying* changes'.

I'm struggling with this assert.  I find that test_decoding passes tests
with this assertion in branch master, but fails in 9.4 - 10.  Maybe I'm
running the tests wrong (no assertions in master?) but I don't see it.
It *should* fail ...

> > * I also noticed that we're doing subtxn cleanup one by one in both
> >   ReorderBufferAssignChild and ReorderBufferCommitChild, which means the
> >   top-level txn is sought in the hash table over and over, which seems a
> >   bit silly.  Not this patch's problem to fix ...
> 
> There is already one-entry cache in ReorderBufferTXNByXid.

That's useless, because we use two xids: first the parent; then the
subxact.  Looking up the subxact evicts the parent from the one-entry
cache, so when we loop to process next subxact, the parent is no longer
cached :-)

> We could add 'don't cache me' flag to it and use it with subxacts, or
> simply pull top-level reorderbuffer out of loops.

Yeah, but that's an API change.

> I'm fine with the rest of your edits. One more little comment:
> 
> @@ -543,9 +546,10 @@ ReorderBufferQueueChange(ReorderBuffer *rb, 
> TransactionId xid, XLogRecPtr lsn,
>   ReorderBufferTXN *txn;
> 
>   txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
> + Assert(txn->base_snapshot != NULL || txn->is_known_as_subxact);
> 
>   change->lsn = lsn;
> - Assert(InvalidXLogRecPtr != lsn);
> + Assert(!XLogRecPtrIsInvalid(lsn));
>   dlist_push_tail(&txn->changes, &change->node);
>   txn->nentries++;
>   txn->nentries_mem++;
> 
> Since we do that, probably we should replace all lsn validity checks
> with XLogRectPtrIsInvalid?

I reverted this back to how it was originally.  We can change it as a
style item later in all places where it appears (branch master only),
but modifying only one in a backpatched commit seems poor practice.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: wrong query result with jit_above_cost= 0

2018-06-26 Thread Andres Freund
Hi,

On 2018-06-26 23:50:32 +0530, Rushabh Lathia wrote:
> I found the below query which returns the wrong output
> when jit_above_cost= 0 is set.
> 
> Steps to reproduce:
> 
> CREATE TABLE emp (
> epno   NUMERIC(4),
> ename   VARCHAR(10),
> job VARCHAR(9),
> mgr NUMERIC(4),
> hiredateDATE,
> sal NUMERIC(7,2),
> commNUMERIC(7,2),
> deptno  NUMERIC(2)
> );
> 
> INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
> INSERT INTO emp VALUES
> (7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);
> 
> set jit_above_cost= 0;
> 
> select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;
> 
> without the ROLLUP, I don't see any problem with results.

Interesting.  I've opened an open item referencing this.

Greetings,

Andres Freund



wrong query result with jit_above_cost= 0

2018-06-26 Thread Rushabh Lathia
Hi,

I found the below query which returns the wrong output
when jit_above_cost= 0 is set.

Steps to reproduce:

CREATE TABLE emp (
epno   NUMERIC(4),
ename   VARCHAR(10),
job VARCHAR(9),
mgr NUMERIC(4),
hiredateDATE,
sal NUMERIC(7,2),
commNUMERIC(7,2),
deptno  NUMERIC(2)
);

INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,'17-DEC-80',800,NULL,20);
INSERT INTO emp VALUES
(7499,'ALLEN','SALESMAN',7698,'20-FEB-81',1600,300,30);

set jit_above_cost= 0;

select max(epno) from emp group by rollup((deptno,epno)) order by 1 asc;

without the ROLLUP, I don't see any problem with results.

Thanks,
Rushabh Lathia
www.EnterpriseDB.com


Re: postgresql_fdw doesn't handle defaults correctly

2018-06-26 Thread Pavel Stehule
Hi

2018-06-25 4:30 GMT+02:00 Amit Langote :

> Hi.
>
> On 2018/06/24 2:23, Pavel Stehule wrote:
> > Hi
> >
> > I have a table boo
> >
> > create table boo(id serial primary key, inserted date default
> current_date,
> > v varchar);
> >
> > I imported this table via simple
> >
> > IMPORT FOREIGN SCHEMA public FROM SERVER foreign_server INTO public;
>
> It seems you missed using OPTIONS (import_default 'true') here.
>
> create schema foo;
> create table foo.foo (a serial primary key, b date default current_date
> not null, c int);
>
> import foreign schema foo from server loopback into public options
> (import_default 'true');
>
> insert into public.foo (c) values (1);
> select * from public.foo;
>  a | b  | c
> ---++---
>  1 | 2018-06-25 | 1
> (1 row)
>
> insert into foo.foo (c) values (2);
>

This insert doesn't use foreign table. So it is different case.

select * from public.foo;
>  a | b  | c
> ---++---
>  1 | 2018-06-25 | 1
>  2 | 2018-06-25 | 2
> (2 rows)
>
>
It looks like more different than I expected.

create database t1;
\c t1
create table foo(a serial primary key, b date default current_date, c int);
insert into foo(c) values(10),(20);
select * from foo;

t1=# select * from foo;
+---+++
| a | b  | c  |
+---+++
| 1 | 2018-06-26 | 10 |
| 2 | 2018-06-26 | 20 |
+---+++
(2 rows)

\c postgres
create server t1 foreign data wrapper postgres_fdw options (dbname 't1');
create user mapping for pavel server t1;

postgres=# import foreign schema public from server t1 into public options
(import_default 'true');
ERROR:  relation "public.foo_a_seq" does not exist
CONTEXT:  importing foreign table "foo"

So it fails as probably expected - we doesn't support foreign sequences  -
so we cannot to import schema with table with sequence with option
import_default = true;

Looks like unsupported case - is not possible to insert to table with
serial column;

Unfortunately, when I use identity column

create table foo(a int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, b date
default current_date, c int);

then import doesn't fail, but still it doesn't work

Regards

Pavel






>
> Thanks,
> Amit
>
>


Re: Global shared meta cache

2018-06-26 Thread Andres Freund
Hi,

On 2018-06-26 06:48:28 +, Ideriha, Takeshi wrote:
> > I think it would be interested for somebody to build a prototype here
> > that ignores all the problems but the first and uses some
> > straightforward, relatively unoptimized locking strategy for the first
> > problem. Then benchmark it. If the results show that the idea has
> > legs, then we can try to figure out what a real implementation would
> > look like.
> > (One possible approach: use Thomas Munro's DHT stuff to build the shared 
> > cache.)
> 
> I'm inspired by this comment and now developing a prototype (please see 
> attached),
> but I haven't yet put cache structure on shared memory.

> Instead, I put dummy data on shared memory which is initialized at startup, 
> and then acquire/release lock just before/after searching/creating catcache 
> entry.

> I haven't considered relcache and catcachelist either.
> It is difficult for me to do everything at one time with right direction. 
> So I'm trying to make small prototype and see what I'm walking on the proper 
> way.
> 
> I tested pgbench to compare master branch with my patch. 
> 
> 0) Environment
>- RHEL 7.4
>- 16 cores
>- 128 GB memory
> 
> 1) Initialized with pgbench -i -s10
> 
> 2) benchmarked 3 times for each conditions and got the average result of TPS.
>  |master branch | prototype  | 
> proto/master (%)
>
> 
>pgbench -c48 -T60 -Msimple -S   | 131297  |130541 |101%
>pgbench -c48 -T60 -Msimple  | 4956|4965   |95%
>pgbench -c48 -T60 -Mprepared -S |129688   |132538 |97%
>pgbench -c48 -T60 -Mprepared|5113 |4615   |84%
> 
>   This result seems to show except for prepared protocol with "not only 
> SELECT" it didn't make much difference.

This seems like an pretty large regression to me. And that's an
extremely simplistic case, with tiny caches, and barely any changes to
the cache contents.

Greetings,

Andres Freund



Re: Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-06-26 Thread Tom Lane
Pavel Stehule  writes:
> 2018-06-26 18:22 GMT+02:00 David G. Johnston :
>>> So I am not sure, if proposed change is practical because views and
>>> tables shares same namespace and current behave has sense too.

>> I'm doubtful that there is any meaningful technical/practical challenge
>> involved here.

Certainly we *could* change it, but it's not at all clear that it's a good
idea.  The current behavior seemed sensible when it was implemented, and
it has stood for quite some years now.  Now, we have one person
complaining that it wasn't what he expected.  If we change the behavior on
the strength of that, will we change it back on the first complaint from
someone else?  What about the possibility that the change breaks peoples'
applications?

I think there might be grounds for clarifying the documentation, but
I'm quite hesitant to change the behavior without someone making a case
a whole lot stronger than this.

Also worth noting is that similar issues arise elsewhere, eg we now
have "procedures" vs "functions" in a single namespace.  Let's not have
DROP TABLE acting in a way that's inconsistent with other parts of the
system.

regards, tom lane



RE: Global shared meta cache

2018-06-26 Thread serge
Takeshi-san,


>My customer created hundreds of thousands of partition tables and tried to 
>select data from hundreds of applications,
>which resulted in enormous consumption of memory because it consumed # of 
>backend multiplied by #
> of local memory (ex. 100 backends X 1GB = 100GB).
>Relation caches are loaded on each backend local memory. 
My team and I have been working to make caches shared for the past two years, 
but the system and rel caches we have chosen not to share..
Reason being that these caches play a big role in transactional DDL processing.
When you do DDL your backend can see all the changes since you update your own 
cache, but no anyone else's until you commit.
You will find that dealing with that will be the true complexity.
 
 
Have you tried to simply cap the size of these caches?
That's a rather straight forward piece of work and will get you quite far. 
We run with a 20MB syscache and a 10MB relcache with 100k+ objects and hundreds 
of backends
A dumb LRU is plenty good for the purpose. 
 
That being said I would love to see these caches shared. :-)
 
Cheers
Serge
Salesforce


Re: Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-06-26 Thread David G. Johnston
On Tuesday, June 26, 2018, Pavel Stehule  wrote:

>
> Just I don't see this proposal as clean win. More it is not limited only
> this case. It should be consistent with DROP INDEX, SEQUENCE, ...
>

Yes, they are likely all broken in the same way and whomever agrees with
the "it's bugged" conclusion and is willing and able to write a patch
should check them and possibly add those checks as regression tests.

But without a concrete example of problems I'm not going to be convinced
there is a downside to fixing this bug.  Given it's converting an error
into an equivalent end result without the error I'd support a committer who
wants to backpatch whatever fix is devised.

David J.


Re: Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-06-26 Thread Pavel Stehule
2018-06-26 18:22 GMT+02:00 David G. Johnston :

> On Tuesday, June 26, 2018, Pavel Stehule  wrote:
>>
>> My note is related to @b. I understand to the motivation, but I am not
>> sure if it is good idea. Tables and views shares one namespace.
>>
>
> But the command say "drop table" and so it must only be concerned with
> that subset of the namespace when searching.  The note about the view is
> helpful, but it shouldn't change whether the command succeeded with a
> notice or errors.
>
>
>> Often usage of DROP TABLE IF EXISTS is together with CREATE TABLE
>>
>> Now if some does bad reference in DROP TABLE command, then this command
>> fails (first). If we do proposed change, then DROP TABLE do nothing, and
>> CREATE TABLE fails.
>>
>
> Yes, this is what should be happening.  CREATE does have to care about the
> whole namespace since it creating a new identifier.
>
>
>> So I am not sure, if proposed change is practical because views and
>> tables shares same namespace and current behave has sense too.
>>
>
> I'm doubtful that there is any meaningful technical/practical challenge
> involved here.  And as you say when doing paired drop/creates one of them
> is going to fail anyway so it doesn't really matter which one.  But if
> someone wants to convert a table to a view imdompotently (the point if INE)
> right now they cannot using the features that are documented to do just
> that.
>

Just I don't see this proposal as clean win. More it is not limited only
this case. It should be consistent with DROP INDEX, SEQUENCE, ...

Regards

Pavel


> David J.
>
>


Re: Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-06-26 Thread David G. Johnston
On Tuesday, June 26, 2018, Pavel Stehule  wrote:
>
> My note is related to @b. I understand to the motivation, but I am not
> sure if it is good idea. Tables and views shares one namespace.
>

But the command say "drop table" and so it must only be concerned with that
subset of the namespace when searching.  The note about the view is
helpful, but it shouldn't change whether the command succeeded with a
notice or errors.


> Often usage of DROP TABLE IF EXISTS is together with CREATE TABLE
>
> Now if some does bad reference in DROP TABLE command, then this command
> fails (first). If we do proposed change, then DROP TABLE do nothing, and
> CREATE TABLE fails.
>

Yes, this is what should be happening.  CREATE does have to care about the
whole namespace since it creating a new identifier.


> So I am not sure, if proposed change is practical because views and tables
> shares same namespace and current behave has sense too.
>

I'm doubtful that there is any meaningful technical/practical challenge
involved here.  And as you say when doing paired drop/creates one of them
is going to fail anyway so it doesn't really matter which one.  But if
someone wants to convert a table to a view imdompotently (the point if INE)
right now they cannot using the features that are documented to do just
that.

David J.


Re: Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-06-26 Thread Pavel Stehule
2018-06-26 17:48 GMT+02:00 David G. Johnston :

> On Tuesday, June 26, 2018, Pavel Stehule  wrote:
>
>> 2018-06-26 17:23 GMT+02:00 Peter Moser :
>>
>>> Hi,
>>> I want to delete a table X, that may not exist, hence I execute
>>>
>>> DROP TABLE IF EXISTS X;
>>>
>>> However, if X is a view, I get an error
>>>
>>> ERROR: "X" is not a table
>>>
>> HINT: Use DROP VIEW to remove a view.
>>> SQL state: 42809
>>>
>>> That is unexpected and also difficult to handle
>>>
>>
>> DROP TABLE should to remove table and nothing else, like DROP VIEW should
>> to drop just view and nothing else. It is safeguard.
>>
>
> Peter isn't asking for drop table to drop a view though, he's asking for
> the documented behavior:
>
> "Do not throw an error if the table does not exist. A notice is issued in
> this case."
>

This is different issue.


> There is no Table named X in the database so the command should be a noop
> with a notice.  I would concur, though I'm open to just fixing it in v12
> and back patching a documentation bug fix stating the exception due to
> relations sharing a namespace but there be lacking a corresponding shared
> namespace "drop relation" command.
>

There are two points

a) documentation issue
b) different behave for DROP TABLE IF EXISTS command

My note is related to @b. I understand to the motivation, but I am not sure
if it is good idea. Tables and views shares one namespace.

Often usage of DROP TABLE IF EXISTS is together with CREATE TABLE

Now if some does bad reference in DROP TABLE command, then this command
fails (first). If we do proposed change, then DROP TABLE do nothing, and
CREATE TABLE fails.

So I am not sure, if proposed change is practical because views and tables
shares same namespace and current behave has sense too.

Regards

Pavel


> David J.
>
>


Re: ALTER TABLE does not check for column existence before starting operations

2018-06-26 Thread Arthur Zakirov
Hello,

On Fri, Mar 02, 2018 at 12:36:41PM +0100, Pierre Ducroquet wrote:
> The attached patch fixes this behaviour by adding a small check in the first 
> pass of alter table to make sure that a column referenced by an alter command 
> exists first. It also checks if the column is added by another alter sub-
> command. It does not handle every scenario (dropping a column and then 
> altering it for instance), these are left to the exec code to exclude.
> The patch has been checked with make check, and I see no documentation change 
> to do since this does not alter any existing documented behaviour.

I looked at the patch. It is true that there is no need to change the
documentation. Tests also passes (but maybe some changes would be
needed).

I have a couple comments:

> tuple = SearchSysCacheCopyAttName(RelationGetRelid(rel), colName);

I think it is necessary to release the heap tuple using heap_freetuple()
if it is valid after all work done.

Second comment is related with several subcommands (ALTER COLUMN
DEFAULT, SET NOT NULL, SET/RESET (options)). The following query fails
with the patch:

=# alter table test
   alter non_existing set not null,
   add non_existing text;

It raises the error 'column "non_existing" of relation "test" does not
exist'. But without the patch the query is executed without errors. It
is because of how Phase 2 is performed. Subcommands are executed in a pass
determined by subcommand type. AT_PASS_ADD_COL subcommands are executed
before AT_PASS_ADD_INDEX, AT_PASS_ADD_CONSTR and AT_PASS_MISC.

I'm not sure how important it is. But I think it could break backward
compatibility.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: Small fixes about backup history file in doc and pg_standby

2018-06-26 Thread Fujii Masao
On Tue, Jun 26, 2018 at 5:47 PM, Yugo Nagata  wrote:
> Hi,
>
> While looking into the backup and recovery code, I found small documentation 
> bugs.
> The documatation says that the backup history files can be requested for 
> recovery,
> but it's not used by the system and not requested anymore since PG 9.0
> (commit 06f82b29616cd9effcaefd99c6b6e2e80697482f) and never be requested.
>
> Attached patch (doc_backup_history_file.patch) corrects the description about 
> this.

Pushed. Thanks!

> In addition, the current pg_standby still can handle a backup history file 
> that are
> never requested. It is harmless but unnecessary code. Another attached patch
> (pg_standby.patch) removes this part of code.

Since this is not bug fix, let's discuss this in 12dev cycle.

Regards,

-- 
Fujii Masao



Re: "wal receiver" process hang in syslog() while exiting after receiving SIGTERM while the postgres has been promoted.

2018-06-26 Thread Robert Haas
On Tue, Jun 26, 2018 at 4:12 AM, Chen, Yan-Jack (NSB - CN/Hangzhou)
 wrote:
> Hi,
>   Well, if you agree with do not write log in signal handling function in any 
> circumstance

I don't understand this sentence.

> I see in many cases in postgresql signal handling function just set one flag 
> which triggers its main process to handling the progress.

Right, that's a good way to do it.

>   How about simply remove the code lines?
>
> --- walreceiver_old.c
> +++ walreceiver.c
> @@ -816,10 +816,6 @@
>
> SetLatch(&WalRcv->latch);
>
> -   /* Don't joggle the elbow of proc_exit */
> -   if (!proc_exit_inprogress && WalRcvImmediateInterruptOK)
> -   ProcessWalRcvInterrupts();
> -
> errno = save_errno;
>  }

Hmm, I doubt it's that simple but I haven't studied this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-06-26 Thread David G. Johnston
On Tuesday, June 26, 2018, Pavel Stehule  wrote:

> 2018-06-26 17:23 GMT+02:00 Peter Moser :
>
>> Hi,
>> I want to delete a table X, that may not exist, hence I execute
>>
>> DROP TABLE IF EXISTS X;
>>
>> However, if X is a view, I get an error
>>
>> ERROR: "X" is not a table
>>
> HINT: Use DROP VIEW to remove a view.
>> SQL state: 42809
>>
>> That is unexpected and also difficult to handle
>>
>
> DROP TABLE should to remove table and nothing else, like DROP VIEW should
> to drop just view and nothing else. It is safeguard.
>

Peter isn't asking for drop table to drop a view though, he's asking for
the documented behavior:

"Do not throw an error if the table does not exist. A notice is issued in
this case."

There is no Table named X in the database so the command should be a noop
with a notice.  I would concur, though I'm open to just fixing it in v12
and back patching a documentation bug fix stating the exception due to
relations sharing a namespace but there be lacking a corresponding shared
namespace "drop relation" command.

David J.


Re: ssl_library parameter

2018-06-26 Thread Tom Lane
Peter Eisentraut  writes:
> Extracted from the GnuTLS thread/patch, here is a patch to add a
> server-side read-only parameter ssl_library, which currently reports
> either 'OpenSSL' or an empty string, depending on what SSL library was
> built with.  This is analogous to the libpq function call
> PQsslAttribute(conn, "library"), but there was no equivalent
> functionality on the server side.

(1) I'm not really clear why we need this.  GUC variables aren't free.

(2) Are there security issues with exposing this info to everybody?

regards, tom lane



Re: Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-06-26 Thread Peter Moser

On 06/26/2018 05:34 PM, Pavel Stehule wrote:
2018-06-26 17:23 GMT+02:00 Peter Moser >:

What do you think, is it worth to create a patch to solve this
issue, where a DROP TABLE does not fail, if the given name is
actually a VIEW or vice-versa?


DROP TABLE should to remove table and nothing else, like DROP VIEW 
should to drop just view and nothing else. It is safeguard.


My last sentence is misleading. What I thought is:

DROP TABLE X;

Should not delete a view X, and result in an error only if table X does 
not exist. It should not look at views for error handling, maybe just as 
hint that there is a view X and we might use DROP VIEW X instead.


DROP TABLE IF EXISTS X;

Should also not delete a view X, and not result in any error if X is a 
view and not a table.


Hopefully I explained things better now.

Cheers,
Peter



Re: Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-06-26 Thread Pavel Stehule
2018-06-26 17:23 GMT+02:00 Peter Moser :

> Hi,
> I want to delete a table X, that may not exist, hence I execute
>
> DROP TABLE IF EXISTS X;
>
> However, if X is a view, I get an error
>
> ERROR: "X" is not a table
>
HINT: Use DROP VIEW to remove a view.
> SQL state: 42809
>
> That is unexpected and also difficult to handle, if I want to be sure that
> I can delete all old tables *and* views called X, and create a new
> view for instance with
>
> CREATE VIEW X AS 
>
> I cannot do that safely, because having both DROP commands would for sure
> cause an error and therefore a rollback.
>
> What do you think, is it worth to create a patch to solve this issue,
> where a DROP TABLE does not fail, if the given name is actually a VIEW or
> vice-versa?
>

DROP TABLE should to remove table and nothing else, like DROP VIEW should
to drop just view and nothing else. It is safeguard.

what is hard on code

do $$ declare r record;
begin
  for r in select table_name, case table_type when 'BASE TABLE' then
'table' when 'VIEW' then 'view' end as tp
 from information_schema.tables
  where table_type in ('BASE TABLE', 'VIEW')
  and table_name = 'foo'
loop
  raise notice '%', format('drop %s %I', r.tp, r.table_name);
end loop;
end $$;

replace raise notice by execute if you really want to drop some objects.

Regards

Pavel

>
> Best regards,
> Peter
>
>


Re: [GSoC] working status

2018-06-26 Thread Charles Cui
got it, will start to use these tools and add more comments.

On Tue, Jun 26, 2018, 7:44 AM Aleksander Alekseeev <
a.aleks...@postgrespro.ru> wrote:

> Hello Charles,
>
> >Here is my current working status. Resolved all warnings found by
> > Aleksander previously. Having two threads in parallel. One is the
> > thrift binary type implementation, the other is thrift compact byte
> > interface implementation. For these two threads, simple data type has
> > been implemented and tested, but still need time to focus on
> > complicated data type like struct, map or list. Let me know if you
> > have any questions. Here is the repo with latest updates,
> > https://github.com/charles-cui/pg_thrift
>
> Thanks for keeping us informed!
>
> I noticed that there are not many comments in your code. It's
> considered good practice to write at least brief comments for every
> procedure with 1) short description of what it does 2) what every
> argument means 3) what does the procedure return 4) whether it changes
> global state somehow (better avoid it). However, there is no strict
> requirement to write these comments, especially if there are many
> similar procedures and (1-4) are obvious.
>
> Also I advise to check your code with following tools unless you've
> already done this:
>
> * lcov will show whether the code is covered with tests well. If some
>   part of code is not covered there is probably a test missing or maybe
>   the code is dead and should be removed.
> * Clang Static Analyzer may find some errors that are difficult to
>   notice with a naked eye.
> * Valgrind solves the same issue, but unlike Clang Static Analyzer
>   it checks your code not statically but in runtime.
>
> There are plenty of corresponding tutorials online. I wrote a few
> myself [1][2][3]. However, although they should be readable through
> Google Translate, I guess you may prefer tutorials in English.
>
> [1]: https://eax.me/c-code-coverage/
> [2]: https://eax.me/c-static-analysis/
> [3]: https://eax.me/valgrind/
>
> --
> Best regards,
> Aleksander Alekseev
>


Unexpected behavior of DROP VIEW/TABLE IF EXISTS

2018-06-26 Thread Peter Moser

Hi,
I want to delete a table X, that may not exist, hence I execute

DROP TABLE IF EXISTS X;

However, if X is a view, I get an error

ERROR: "X" is not a table
HINT: Use DROP VIEW to remove a view.
SQL state: 42809

That is unexpected and also difficult to handle, if I want to be sure 
that I can delete all old tables *and* views called X, and create a new

view for instance with

CREATE VIEW X AS 

I cannot do that safely, because having both DROP commands would for 
sure cause an error and therefore a rollback.


What do you think, is it worth to create a patch to solve this issue, 
where a DROP TABLE does not fail, if the given name is actually a VIEW 
or vice-versa?


Best regards,
Peter



Re: Thinko/typo in ExecSimpleRelationInsert

2018-06-26 Thread Amit Kapila
On Tue, Jun 26, 2018 at 7:02 PM, Ashutosh Bapat
 wrote:
> On Tue, Jun 26, 2018 at 6:18 PM, Amit Kapila  wrote:
>> On Tue, Jun 26, 2018 at 4:33 PM, Ashutosh Bapat
>>  wrote:
>>> Looks like we need similar adjustment in ExecSimpleRelationUpdate() as
>>> well. Updated the patch.
>>>
>>
>> - /* Store the slot into tuple that we can write. */
>> + /* Materialize slot into a tuple that we can inspect. */
>>   tuple = ExecMaterializeSlot(slot);
>>
>> I think it is better to keep the last word of the sentence as "write"
>> instead of "inspect" as was in the original sentence.
>
> A copy-pasto while correcting a typo :)
>
>> It makes more
>> sense as we are materializing the tuple to write it.  Similarly, in
>> the other change in the patch can use "write".
>
> Are you suggesting that we should use "write" in the modified comment
> instead of "inspect" in original comment.
>

Yes.

Another variant used in few other similar places is:

/*
* get the heap tuple out of the tuple table slot, making sure we have a
* writable copy
*/


> Ok, I have now corrected grammar as well. Here's updated patch.
>

Your new comment is also correct.  I like the comment already used in
code as that makes the code consistent, what do you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Avoid extra Sort nodes between WindowAggs when sorting can be reused

2018-06-26 Thread Alexander Kuzmenkov

Daniel,

I took a look at the patch. It applies and compiles, the tests pass.

Some thoughts about the code:

* Postgres lists cache their lengths, so you don't need uniqueLen.

* Use an array of WindowClauseSortNode's instead of a list. It's more 
suitable here because you are going to sort (qsort_list creates a 
temporary array).


* Reversing the sorted list looks more confusing to me than just sorting 
it in the proper order right away. A qsort() comparator returning 
negative means "left goes before right", but here it is effectively the 
other way around.


* This isn't relevant given the previous points, but to reverse a list, 
you can walk it with foreach() and construct a reversed version with 
lcons().


* There is a function named make_pathkeys_for_window that makes a list 
of canonical pathkeys given a window clause. Using this function to give 
you the pathkeys, and then comparing them, would be more future-proof in 
case we ever start using hashing for windowing. Moreover, the canonical 
pathkeys can be compared with pointer comparison which is much faster 
than equal(). Still, I'm not sure whether it's going to be convenient to 
use in this case, or even whether it is a right thing to do. What do you 
think?


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: unexpected relkind: 73 ERROR with partition table index

2018-06-26 Thread Alvaro Herrera
On 2018-Jun-27, David Rowley wrote:

> On 27 June 2018 at 00:18, Rajkumar Raghuwanshi
>  wrote:
> > postgres=> ALTER INDEX part_idx RENAME TO part_idx_renamed;
> > ERROR:  unexpected relkind: 73
> 
> Seems to be caused by the auth failure code path in
> RangeVarCallbackForAlterRelation().

Ah, yeah, thanks.  No surprise this was missed, since I didn't add
tests for ALTER INDEX RENAME and Peter concurrently refactored the
handling code.  I propose we add a few more test lines, as attached.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 4369b4aaea349d427a4924df74e387eea33aad0b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 26 Jun 2018 10:45:38 -0400
Subject: [PATCH] fix get_relkind_objtype (introduced by
 8b9e9644dc6a9bd4b7a97950e6212f63880cf18b)

---
 src/backend/catalog/objectaddress.c  |  1 +
 src/test/regress/expected/alter_table.out| 18 ++
 src/test/regress/expected/object_address.out | 11 +--
 src/test/regress/sql/alter_table.sql | 15 +++
 src/test/regress/sql/object_address.sql  |  5 +
 5 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/objectaddress.c 
b/src/backend/catalog/objectaddress.c
index ad682673e6..7db942dcba 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -5248,6 +5248,7 @@ get_relkind_objtype(char relkind)
case RELKIND_PARTITIONED_TABLE:
return OBJECT_TABLE;
case RELKIND_INDEX:
+   case RELKIND_PARTITIONED_INDEX:
return OBJECT_INDEX;
case RELKIND_SEQUENCE:
return OBJECT_SEQUENCE;
diff --git a/src/test/regress/expected/alter_table.out 
b/src/test/regress/expected/alter_table.out
index b9fd6d1d1c..df604a326c 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -159,6 +159,24 @@ SELECT * FROM attmp_new2;
 
 DROP TABLE attmp_new;
 DROP TABLE attmp_new2;
+-- check rename of partitioned tables and indexes also
+CREATE TABLE part_attmp (a int primary key) partition by range (a);
+CREATE TABLE part_attmp1 PARTITION OF part_attmp FOR VALUES FROM (0) TO (100);
+ALTER INDEX part_attmp_pkey RENAME TO part_attmp_index;
+ALTER INDEX part_attmp1_pkey RENAME TO part_attmp1_index;
+ALTER TABLE part_attmp RENAME TO part_at2tmp;
+ALTER TABLE part_attmp1 RENAME TO part_at2tmp1;
+SET ROLE regress_alter_table_user1;
+ALTER INDEX part_attmp_index RENAME TO fail;
+ERROR:  must be owner of index part_attmp_index
+ALTER INDEX part_attmp1_index RENAME TO fail;
+ERROR:  must be owner of index part_attmp1_index
+ALTER TABLE part_at2tmp RENAME TO fail;
+ERROR:  must be owner of table part_at2tmp
+ALTER TABLE part_at2tmp1 RENAME TO fail;
+ERROR:  must be owner of table part_at2tmp1
+RESET ROLE;
+DROP TABLE part_at2tmp;
 --
 -- check renaming to a table's array type's autogenerated name
 -- (the array type's name should get out of the way)
diff --git a/src/test/regress/expected/object_address.out 
b/src/test/regress/expected/object_address.out
index d195a0d700..4085e451e4 100644
--- a/src/test/regress/expected/object_address.out
+++ b/src/test/regress/expected/object_address.out
@@ -19,6 +19,9 @@ CREATE TEXT SEARCH PARSER addr_ts_prs
 CREATE TABLE addr_nsp.gentable (
a serial primary key CONSTRAINT a_chk CHECK (a > 0),
b text DEFAULT 'hello');
+CREATE TABLE addr_nsp.parttable (
+   a int PRIMARY KEY
+) PARTITION BY RANGE (a);
 CREATE VIEW addr_nsp.genview AS SELECT * from addr_nsp.gentable;
 CREATE MATERIALIZED VIEW addr_nsp.genmatview AS SELECT * FROM 
addr_nsp.gentable;
 CREATE TYPE addr_nsp.gencomptype AS (a int);
@@ -368,7 +371,9 @@ ERROR:  name list length must be exactly 1
 -- test successful cases
 WITH objects (type, name, args) AS (VALUES
('table', '{addr_nsp, gentable}'::text[], 
'{}'::text[]),
+   ('table', '{addr_nsp, parttable}'::text[], 
'{}'::text[]),
('index', '{addr_nsp, gentable_pkey}', '{}'),
+   ('index', '{addr_nsp, parttable_pkey}', '{}'),
('sequence', '{addr_nsp, gentable_a_seq}', 
'{}'),
-- toast table
('view', '{addr_nsp, genview}', '{}'),
@@ -444,6 +449,8 @@ SELECT (pg_identify_object(addr1.classid, addr1.objid, 
addr1.objsubid)).*,
  table | addr_nsp   | gentable  | 
addr_nsp.gentable| t
  table column  | addr_nsp   | gentable  | 
addr_nsp.gentable.b  | t
  index | addr_nsp   | gentable_pkey | 
addr_nsp.gentable_pkey   | t
+ ta

Re: Performance regression with PostgreSQL 11 and partitioning

2018-06-26 Thread Thomas Reiss



Le 26/06/2018 à 16:43, Alvaro Herrera a écrit :
> On 2018-Jun-25, Tom Lane wrote:
> 
>> Alvaro Herrera  writes:
>>> On 2018-Jun-18, David Rowley wrote:
 I've attached a patch which cleans up my earlier version and moves the
 setup of the append_rel_array into its own function instead of
 sneaking code into setup_simple_rel_arrays(). I've also now updated
 the comment above find_childrel_appendrelinfo(), which is now an
 unused function.
>>
>>> I checked that this patch fixes the originally reported performance
>>> regression.
>>> Unless there are objections, I intend to push this patch tomorrow.
>>
>> If find_childrel_appendrelinfo is now unused, we should remove it.
> 
> Agreed -- thanks for following up.  Pushed that way.

Thanks Alvaro, Ashutosh and David.




Re: [GSoC] working status

2018-06-26 Thread Aleksander Alekseeev
Hello Charles,

>Here is my current working status. Resolved all warnings found by
> Aleksander previously. Having two threads in parallel. One is the
> thrift binary type implementation, the other is thrift compact byte
> interface implementation. For these two threads, simple data type has
> been implemented and tested, but still need time to focus on
> complicated data type like struct, map or list. Let me know if you
> have any questions. Here is the repo with latest updates,
> https://github.com/charles-cui/pg_thrift

Thanks for keeping us informed!

I noticed that there are not many comments in your code. It's
considered good practice to write at least brief comments for every
procedure with 1) short description of what it does 2) what every
argument means 3) what does the procedure return 4) whether it changes
global state somehow (better avoid it). However, there is no strict
requirement to write these comments, especially if there are many
similar procedures and (1-4) are obvious.

Also I advise to check your code with following tools unless you've
already done this:

* lcov will show whether the code is covered with tests well. If some
  part of code is not covered there is probably a test missing or maybe
  the code is dead and should be removed.
* Clang Static Analyzer may find some errors that are difficult to
  notice with a naked eye.
* Valgrind solves the same issue, but unlike Clang Static Analyzer
  it checks your code not statically but in runtime.

There are plenty of corresponding tutorials online. I wrote a few
myself [1][2][3]. However, although they should be readable through
Google Translate, I guess you may prefer tutorials in English.

[1]: https://eax.me/c-code-coverage/
[2]: https://eax.me/c-static-analysis/
[3]: https://eax.me/valgrind/

-- 
Best regards,
Aleksander Alekseev



Re: Performance regression with PostgreSQL 11 and partitioning

2018-06-26 Thread Alvaro Herrera
On 2018-Jun-25, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2018-Jun-18, David Rowley wrote:
> >> I've attached a patch which cleans up my earlier version and moves the
> >> setup of the append_rel_array into its own function instead of
> >> sneaking code into setup_simple_rel_arrays(). I've also now updated
> >> the comment above find_childrel_appendrelinfo(), which is now an
> >> unused function.
> 
> > I checked that this patch fixes the originally reported performance
> > regression.
> > Unless there are objections, I intend to push this patch tomorrow.
> 
> If find_childrel_appendrelinfo is now unused, we should remove it.

Agreed -- thanks for following up.  Pushed that way.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Few cosmetic suggestions for commit 16828d5c (Fast Alter Table Add Column...)

2018-06-26 Thread Alvaro Herrera
On 2018-Jun-26, Amit Kapila wrote:

> Andrew, Alvaro, do you think we can go ahead with above naming
> suggestions or do we want to brainstorm more on it?

Looks good to me in a quick skim.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Fix slot's xmin advancement and subxact's lost snapshots in decoding.

2018-06-26 Thread Arseny Sher


Alvaro Herrera  writes:

> Firstly -- this is top-notch detective work, kudos and thanks for the
> patch and test cases.  (I verified that both fail before the code fix.)

Thank you!

> Here's a v3.  I applied a lot of makeup in order to try to understand
> what's going on.  I *think* I have a grasp on the original code and your
> bugfix, not terribly firm I admit.

Well, when I started digging it, I have found logical decoding code
somewhat underdocumented. If you or any other committer will consider
the patch trying to improve the comments, I can prepare one.

> * you don't need to Assert that things are not NULL if you're
>   immediately going to dereference them.

Indeed.

> * I think setting snapshot_base to NULL in ReorderBufferCleanupTXN is
>   pointless, since the struct is gonna be freed shortly afterwards.

Yeah, but it is a general style of the original code, e.g. see
/* cosmetic... */
comments. It probably makes the picture a bit more aesthetic and
consistent, kind of final chord, though I don't mind removing it.

> * I rewrote many comments (both existing and some of the ones your patch
>   adds), and added lots of comments where there were none.

Now the code is nicer, thanks.

> * I'm a bit unsure about the comment atop ReorderBufferSetBaseSnapshot.
>   Obviously, the bit within the #if 0/#endif I'm going to remove before
>   push.

It looks like you've started editing that bit and didn't finish.

>   I don't understand why it says "Needs to be called before any
>   changes are added with ReorderBufferQueueChange"; but if you edit that
>   function and add an assert that the base snapshot is set, it crashes
>   pretty quickly in the test_decoding tests.  (The supposedly bogus
>   comment was there before your patch -- I'm not saying your comment
>   addition is faulty.)

That's because REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID changes are
queued whenever we have read that xact has modified the catalog,
regardless of base snapshot existence. Even if there are no changes yet,
we will need it for correct visibility of catalog, so I am inclined to
remove the assert and comment or rephrase the latter with 'any
*data-carrying* changes'.

> * I also noticed that we're doing subtxn cleanup one by one in both
>   ReorderBufferAssignChild and ReorderBufferCommitChild, which means the
>   top-level txn is sought in the hash table over and over, which seems a
>   bit silly.  Not this patch's problem to fix ...

There is already one-entry cache in ReorderBufferTXNByXid. We could add
'don't cache me' flag to it and use it with subxacts, or simply pull
top-level reorderbuffer out of loops.

I'm fine with the rest of your edits. One more little comment:

@@ -543,9 +546,10 @@ ReorderBufferQueueChange(ReorderBuffer *rb, TransactionId 
xid, XLogRecPtr lsn,
ReorderBufferTXN *txn;

txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
+   Assert(txn->base_snapshot != NULL || txn->is_known_as_subxact);

change->lsn = lsn;
-   Assert(InvalidXLogRecPtr != lsn);
+   Assert(!XLogRecPtrIsInvalid(lsn));
dlist_push_tail(&txn->changes, &change->node);
txn->nentries++;
txn->nentries_mem++;

Since we do that, probably we should replace all lsn validity checks
with XLogRectPtrIsInvalid?

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



patch to allow disable of WAL recycling

2018-06-26 Thread Jerry Jelinek
Hello All,

Attached is a patch to provide an option to disable WAL recycling. We have
found that this can help performance by eliminating read-modify-write
behavior on old WAL files that are no longer resident in the filesystem
cache. The is a lot more detail on the background of the motivation for
this in the following thread.

https://www.postgresql.org/message-id/flat/CACukRjO7DJvub8e2AijOayj8BfKK3XXBTwu3KKARiTr67M3E3w%40mail.gmail.com#cacukrjo7djvub8e2aijoayj8bfkk3xxbtwu3kkaritr67m3...@mail.gmail.com

A similar change has been tested against our 9.6 branch that we're
currently running, but the attached patch is against master.

Thanks,
Jerry


0001-option-to-disable-WAL-recycling.patch
Description: Binary data


Re: Thinko/typo in ExecSimpleRelationInsert

2018-06-26 Thread Ashutosh Bapat
On Tue, Jun 26, 2018 at 6:18 PM, Amit Kapila  wrote:
> On Tue, Jun 26, 2018 at 4:33 PM, Ashutosh Bapat
>  wrote:
>> Looks like we need similar adjustment in ExecSimpleRelationUpdate() as
>> well. Updated the patch.
>>
>
> - /* Store the slot into tuple that we can write. */
> + /* Materialize slot into a tuple that we can inspect. */
>   tuple = ExecMaterializeSlot(slot);
>
> I think it is better to keep the last word of the sentence as "write"
> instead of "inspect" as was in the original sentence.

A copy-pasto while correcting a typo :)

> It makes more
> sense as we are materializing the tuple to write it.  Similarly, in
> the other change in the patch can use "write".

Are you suggesting that we should use "write" in the modified comment
instead of "inspect" in original comment.

Ok, I have now corrected grammar as well. Here's updated patch.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
From fd22bc6d7d786680eb490a99da136a5dc8ba2c1b Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Tue, 26 Jun 2018 15:02:07 +0530
Subject: [PATCH 2/2] Fix a thinko/typo in ExecSimpleRelationInsert/Update

A slot can not be stored in a tuple but it's vice versa.
---
 src/backend/executor/execReplication.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 4fbdfc0..74b3199 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -415,7 +415,7 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		if (rel->rd_att->constr)
 			ExecConstraints(resultRelInfo, slot, estate, true);
 
-		/* Store the slot into tuple that we can inspect. */
+		/* Materialize slot into a tuple that we can scribble upon. */
 		tuple = ExecMaterializeSlot(slot);
 
 		/* OK, store the tuple and create index entries for it */
@@ -480,7 +480,7 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		if (rel->rd_att->constr)
 			ExecConstraints(resultRelInfo, slot, estate, true);
 
-		/* Store the slot into tuple that we can write. */
+		/* Materialize slot into a tuple that we can scribble upon. */
 		tuple = ExecMaterializeSlot(slot);
 
 		/* OK, update the tuple and index entries for it */
-- 
2.7.4



Re: Online enabling of checksums

2018-06-26 Thread Sergei Kornilov
Hello

I tried build this patch and got error during make docs
> postgres.sgml:19626: element xref: validity error : IDREF attribute linkend 
> references an unknown ID "runtime-checksumhelper-cost-limit"
> postgres.sgml:19625: element xref: validity error : IDREF attribute linkend 
> references an unknown ID "runtime-checksumhelper-cost-delay"

Both new GUC checksumhelper_cost_delay and checksumhelper_cost_limit mentioned 
in postgresql.conf with special value -1 (-1 to use vacuum_cost_limit), but 
this value was not mentioned in docs. I noticed that the code and documentation 
describe different defaults.
Also i found one "in progress" in pg_enable_data_checksums() 
description. In other places status is called "inprogress" (without space).

>   VacuumPageHit = 0;
>   VacuumPageMiss = 0;
>   VacuumPageDirty = 0;
Hm, why these settings are set to 0 in checksumhelper process?

> /*
> * Force a checkpoint to get everything out to disk. XXX: this should
> * probably not be an IMMEDIATE checkpoint, but leave it there for now for
> * testing
> */
> RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
We need not forget that.

regards, Sergei



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-06-26 Thread Alexander Korotkov
On Tue, Jun 26, 2018 at 4:11 PM Darafei "Komяpa" Praliaskouski
 wrote:
> вт, 26 июн. 2018 г. в 15:42, Alexander Korotkov :
>>
>> On Tue, Jun 26, 2018 at 1:46 PM Masahiko Sawada  
>> wrote:
>> > On Fri, Jun 22, 2018 at 6:55 PM, Alexander Korotkov
>> >  wrote:
>> > > So, I propose to just
>> > > increase maximum value for both GUC and reloption.  See the attached
>> > > patch.  It also changes calculations _bt_vacuum_needs_cleanup() for
>> > > better handling of large values (just some kind of overflow paranoia).
>> >
>> > The patch looks good to me.
>>
>> Pushed, thanks!
>
>
> Thank you for the enhancement. Now Index Only Scans over Append-Only tables 
> in Postgres  can be implemented, even if it requires manual kicking of VACUUM 
> over large table, and that's a great enhancement for moving object databases. 
> :)
>
> My eye catches another thing, the error message in tests is:
>
> DETAIL:  Valid values are between "0.00" and 
> "179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00".
>
> a) do we really need to print digits of dblmax? "Valid values are double 
> precision, non-negative"?
> b) double precision binary-to-decimal noise starts at 16th digit. Why does it 
> stop at the point, and we have precise ".00"? Does it bite the conversion 
> somewhere else too?

Thank you for pointing.  I'm proposing to change output format from
"%f" to "%g" [1] [2].  It looks better and the same as what we do for
GUCs.

1. 
https://www.postgresql.org/message-id/CAPpHfdvewmr4PcpRjrkstoNn1n2_6dL-iHRB21CCfZ0efZdBTg%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/CAPpHfdsFBbJAd24F0b_o9TfTtu%2B%2BjH0bR5XS_e9xbSwk8SJwyQ%40mail.gmail.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-06-26 Thread Komяpa
вт, 26 июн. 2018 г. в 15:42, Alexander Korotkov :

> On Tue, Jun 26, 2018 at 1:46 PM Masahiko Sawada 
> wrote:
> > On Fri, Jun 22, 2018 at 6:55 PM, Alexander Korotkov
> >  wrote:
> > > So, I propose to just
> > > increase maximum value for both GUC and reloption.  See the attached
> > > patch.  It also changes calculations _bt_vacuum_needs_cleanup() for
> > > better handling of large values (just some kind of overflow paranoia).
> >
> > The patch looks good to me.
>
> Pushed, thanks!
>

Thank you for the enhancement. Now Index Only Scans over Append-Only tables
in Postgres  can be implemented, even if it requires manual kicking of
VACUUM over large table, and that's a great enhancement for moving object
databases. :)

My eye catches another thing, the error message in tests is:

DETAIL:  Valid values are between "0.00" and
"179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00".

a) do we really need to print digits of dblmax? "Valid values are double
precision, non-negative"?
b) double precision binary-to-decimal noise starts at 16th digit. Why does
it stop at the point, and we have precise ".00"? Does it bite the
conversion somewhere else too?


Re: unexpected relkind: 73 ERROR with partition table index

2018-06-26 Thread David Rowley
On 27 June 2018 at 00:18, Rajkumar Raghuwanshi
 wrote:
> postgres=> ALTER INDEX part_idx RENAME TO part_idx_renamed;
> ERROR:  unexpected relkind: 73

Seems to be caused by the auth failure code path in
RangeVarCallbackForAlterRelation().

Patch attached.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


teach_get_relkind_objtype_about_partitioned_indexes.patch
Description: Binary data


Re: [PROPOSAL] Nepali Snowball dictionary

2018-06-26 Thread Arthur Zakirov
Hello all,

On Wed, Feb 28, 2018 at 11:16:24AM +0300, Arthur Zakirov wrote:
> I've sent a pull request with nepali snowball algorithm into
> https://github.com/snowballstem [1]. They aren't againts the patch.
> 
> They haven't merged it yet, though. There are some problems with
> continuous testing via Travis CI which aren't related with the patch and
> require fix some scripts.

The patch with nepali stemmer was applied into the Snowball upstream
[1].

I attached second version of the patch. Nepali stemmer was generated by
latest Snowball compiler (commit
3c3d3953e8174b55e86aedd89544ea4e5d76db78).

I will add new commitfest entry.

Authors:
- Ingroj Shrestha, Nepali NLP Group
- Oleg Bartunov, Postgres Professional Ltd.
- Shreeya Singh Dhakal, Nepali NLP Group

1 - https://github.com/snowballstem/snowball/pull/70

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index 8075ea94e7..61280b03fc 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -3803,6 +3803,7 @@ Parser: "pg_catalog.default"
  pg_catalog | german_stem | snowball stemmer for german language
  pg_catalog | hungarian_stem  | snowball stemmer for hungarian language
  pg_catalog | italian_stem| snowball stemmer for italian language
+ pg_catalog | nepali_stem | snowball stemmer for nepali language
  pg_catalog | norwegian_stem  | snowball stemmer for norwegian language
  pg_catalog | portuguese_stem | snowball stemmer for portuguese language
  pg_catalog | romanian_stem   | snowball stemmer for romanian language
diff --git a/src/backend/snowball/Makefile b/src/backend/snowball/Makefile
index 50cbace41d..c29f4184f2 100644
--- a/src/backend/snowball/Makefile
+++ b/src/backend/snowball/Makefile
@@ -40,6 +40,7 @@ OBJS= $(WIN32RES) dict_snowball.o api.o utilities.o \
stem_UTF_8_german.o \
stem_UTF_8_hungarian.o \
stem_UTF_8_italian.o \
+   stem_UTF_8_nepali.o \
stem_UTF_8_norwegian.o \
stem_UTF_8_porter.o \
stem_UTF_8_portuguese.o \
@@ -62,6 +63,7 @@ LANGUAGES=  \
german  german  \
hungarian   hungarian   \
italian italian \
+   nepali  nepali  \
norwegian   norwegian   \
portuguese  portuguese  \
romanianromanian\
diff --git a/src/backend/snowball/dict_snowball.c 
b/src/backend/snowball/dict_snowball.c
index 78c9f73ef0..d96c849118 100644
--- a/src/backend/snowball/dict_snowball.c
+++ b/src/backend/snowball/dict_snowball.c
@@ -49,6 +49,7 @@
 #include "snowball/libstemmer/stem_UTF_8_german.h"
 #include "snowball/libstemmer/stem_UTF_8_hungarian.h"
 #include "snowball/libstemmer/stem_UTF_8_italian.h"
+#include "snowball/libstemmer/stem_UTF_8_nepali.h"
 #include "snowball/libstemmer/stem_UTF_8_norwegian.h"
 #include "snowball/libstemmer/stem_UTF_8_porter.h"
 #include "snowball/libstemmer/stem_UTF_8_portuguese.h"
@@ -102,6 +103,7 @@ static const stemmer_module stemmer_modules[] =
{"german", PG_UTF8, german_UTF_8_create_env, german_UTF_8_close_env, 
german_UTF_8_stem},
{"hungarian", PG_UTF8, hungarian_UTF_8_create_env, 
hungarian_UTF_8_close_env, hungarian_UTF_8_stem},
{"italian", PG_UTF8, italian_UTF_8_create_env, italian_UTF_8_close_env, 
italian_UTF_8_stem},
+   {"nepali", PG_UTF8, nepali_UTF_8_create_env, nepali_UTF_8_close_env, 
nepali_UTF_8_stem},
{"norwegian", PG_UTF8, norwegian_UTF_8_create_env, 
norwegian_UTF_8_close_env, norwegian_UTF_8_stem},
{"porter", PG_UTF8, porter_UTF_8_create_env, porter_UTF_8_close_env, 
porter_UTF_8_stem},
{"portuguese", PG_UTF8, portuguese_UTF_8_create_env, 
portuguese_UTF_8_close_env, portuguese_UTF_8_stem},
diff --git a/src/backend/snowball/libstemmer/stem_UTF_8_nepali.c 
b/src/backend/snowball/libstemmer/stem_UTF_8_nepali.c
new file mode 100644
index 00..d1c1be76f3
--- /dev/null
+++ b/src/backend/snowball/libstemmer/stem_UTF_8_nepali.c
@@ -0,0 +1,424 @@
+/* This file was generated automatically by the Snowball to ISO C compiler */
+/* http://snowballstem.org/ */
+
+#include "header.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+extern int nepali_UTF_8_stem(struct SN_env * z);
+#ifdef __cplusplus
+}
+#endif
+static int r_remove_category_3(struct SN_env * z);
+static int r_remove_category_2(struct SN_env * z);
+static int r_check_category_2(struct SN_env * z);
+static int r_remove_category_1(struct SN_env * z);
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+extern struct SN_env * nepali_UTF_8_create_env(void);
+extern void nepali_UTF_8_close_env(struct SN_env * z);
+
+
+#ifdef __cplusplus
+}
+#endif
+static const symbol s_0_0[6] = { 0xE0, 0xA4, 0x95, 0xE0, 0xA5, 0x80 };
+static const symbol s_0_1[9] = { 0xE0, 0xA4, 0xB2, 0xE0, 0xA4, 0xBE, 0xE0, 
0xA4, 0x87 };
+static const symbol s_0_2[6] = { 0xE0, 0xA4, 0xB2, 0xE0, 0x

Re: Thinko/typo in ExecSimpleRelationInsert

2018-06-26 Thread Amit Kapila
On Tue, Jun 26, 2018 at 4:33 PM, Ashutosh Bapat
 wrote:
> Looks like we need similar adjustment in ExecSimpleRelationUpdate() as
> well. Updated the patch.
>

- /* Store the slot into tuple that we can write. */
+ /* Materialize slot into a tuple that we can inspect. */
  tuple = ExecMaterializeSlot(slot);

I think it is better to keep the last word of the sentence as "write"
instead of "inspect" as was in the original sentence.  It makes more
sense as we are materializing the tuple to write it.  Similarly, in
the other change in the patch can use "write".

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-06-26 Thread Alexander Korotkov
On Tue, Jun 26, 2018 at 1:46 PM Masahiko Sawada  wrote:
> On Fri, Jun 22, 2018 at 6:55 PM, Alexander Korotkov
>  wrote:
> > So, I propose to just
> > increase maximum value for both GUC and reloption.  See the attached
> > patch.  It also changes calculations _bt_vacuum_needs_cleanup() for
> > better handling of large values (just some kind of overflow paranoia).
>
> The patch looks good to me.

Pushed, thanks!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Concurrency bug in UPDATE of partition-key

2018-06-26 Thread Amit Kapila
On Tue, Jun 26, 2018 at 11:40 AM, Amit Khandekar  wrote:
> On 25 June 2018 at 17:20, Amit Kapila  wrote:
>> On Mon, Jun 25, 2018 at 3:06 PM, Amit Khandekar  
>> wrote:
>>> On 23 June 2018 at 15:46, Amit Kapila  wrote:
>

 Why do you need to update when newslot is NULL?  Already *epqslot is
 initialized as NULL in the caller (ExecUpdate). I think we only want
 to update it when trigtuple is not null.
>>>
>>> But GetTupleForTrigger() updates the epqslot to NULL even when it
>>> returns NULL. So to be consistent with it, we want to do the same
>>> thing for ExecBRDeleteTriggers() : Update the epqslot even when
>>> GetTupleForTrigger() returns NULL.
>>>
>>> I think the reason we are doing "*newSlot = NULL" as the very first
>>> thing in the GetTupleForTrigger() code, is so that callers don't have
>>> to initialise newSlot to NULL before calling GetTupleForTrigger. And
>>> so ExecBRUpdateTriggers() does not initialize newSlot with NULL. We
>>> can follow the same approach while calling ExecDelete().
>>>
>>
>> Yeah, we can do that if it is required.
>
> It is not required as such, and there is no correctness issue.
>
>> I see your point, but I feel that is making code bit less readable.
>
> I did it that way only to be consistent with the existing trigger.c
> code, namely ExecBRUpdateTriggers() where it sets newSlot to NULL
> immediately. If you find some appropriate comments to make that
> snippet more readable, that can work.
>
> Or else, we can go ahead with the way you did it in your patch; and
> additionally mention in the ExecBRDeleteTriggers() header comments
> that epqslot value is undefined if there was no concurrently updated
> tuple passed. To me, explaining this last part seems confusing.
>

Yeah, so let's leave it as it is in the patch.  I think now we should
wait and see what Alvaro has to say about the overall patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Internal error XX000 with enable_partition_pruning=on, pg 11 beta1 on Debian

2018-06-26 Thread David Rowley
On 20 June 2018 at 13:20, David Rowley  wrote:
> select count(*) from (select * from p union all select * from p) p;
>
> Unpatched:
> tps = 8.530355 (excluding connections establishing)
>
> Patched:
> tps = 7.853939 (excluding connections establishing)

I've been thinking about this and I'm not so happy about my earlier
proposed fix about not flattening the Appends for UNION ALL parents
with partitioned relation children.  Apart from the additional
overhead of pulling all tuples through an additional Append node, I
also don't like it because we one day might decide to fix it and make
it flatten these again.  It would be much nicer not to fool around
with the plan shapes like that from one release to the next.

So, today I decided to write some code to just make the UNION ALL
parents work with partitioned tables while maintaining the Append
hierarchy flattening.

I've only just completed reading back through all the code and I think
it's correct.  I ended up changing add_paths_to_append_rel() so that
instead of performing concatenation on partitioned_rels from two UNION
ALL children, it creates a List of lists.  I believe this list can
only end up with a 2-level hierarchy of partitioned rels.  I tested
this and it seems to work as I expect, although I think I need to
study the code a bit more to ensure it can't happen. I need to check
if there's some cases where nested UNION ALLs cannot be flattened to
have a single UNION ALL parent.  Supporting this did cause me to have
to check the List->type in a few places. I only saw one other place in
the code where this is done, so I figured that meant it was okay.

In the executor side, I didn't add any pre-calculation code for each
partition hierarchy. I did this previously to save having to re-prune
on individual partitions, but I figured that was at a good enough
level not to have to add it for each partition hierarchy.  I imagine
most partition hierarchies will just contain a single partitioned
table anyway, so an additional level of caching might not buy very
much, but I might just be trying to convince myself of that because
it's late here now.

Anyway... Patch attached.  This is method 3 of the 3 methods I thought
to fix this, so if this is not suitable then I'm out of ideas.

It would be great to get some feedback on this as I'd really like to
be done with it before July.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v1-0001-Fix-run-time-partition-pruning-for-UNION-ALL-pare.patch
Description: Binary data


unexpected relkind: 73 ERROR with partition table index

2018-06-26 Thread Rajkumar Raghuwanshi
Hi,

I am getting "ERROR:  unexpected relkind: 73" when trying to rename
partition table index with below test case.

create user u1 superuser;
create user u2 nosuperuser login;
\c postgres u1

create table public.part(c1 int, c2 int) partition by range(c1);
create table public.part_p1 partition of public.part for values from
(minvalue) to (0);
create table public.part_p2 partition of public.part for values from (0) to
(maxvalue);
create index part_idx on public.part(c1);

create table public.nopart (c1 int, c2 int);
create index nopart_idx on public.nopart(c1);

--switch to nonsuperuser
\c postgres u2

postgres=> --rename the index owned by another user --non partition table
postgres=> ALTER INDEX nopart_idx RENAME TO nopart_idx_renamed;
ERROR:  must be owner of index nopart_idx
postgres=>
postgres=> --rename the index owned by another user --partition table
postgres=> ALTER INDEX part_idx RENAME TO part_idx_renamed;
ERROR:  unexpected relkind: 73


Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [bug fix] ECPG: freeing memory for pgtypes crashes on Windows

2018-06-26 Thread Thomas Munro
On Mon, Jun 25, 2018 at 8:16 PM, Tsunakawa, Takayuki
 wrote:
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
>> However I also don't see a problem to back-patch it, I don't see
>> a problem on such difference between versions.
>
> Thank you, Horiguchi-san and Robert.

Ok, back-patched.

It seems like the other patch[1] might need the same treatment, right?

[1] https://commitfest.postgresql.org/18/1669/

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Online enabling of checksums

2018-06-26 Thread Magnus Hagander
On Mon, Apr 9, 2018 at 7:22 PM, Magnus Hagander  wrote:

> On Sat, Apr 7, 2018 at 6:22 PM, Andres Freund  wrote:
>
>> Hi,
>>
>> On 2018-04-07 08:57:03 +0200, Magnus Hagander wrote:
>> > Note however that I'm sans-laptop until Sunday, so I will revert it
>> then or
>> > possibly Monday.
>>
>> I'll deactive the isolationtester tests until then. They've been
>> intermittently broken for days now and prevent other tests from being
>> exercised.
>>
>
> Thanks.
>
> I've pushed the revert now, and left the pg_verify_checksums in place for
> the time being.
>
>
PFA an updated version of the patch for the next CF. We believe this one
takes care of all the things pointed out so far.

For this version, we "implemented" the MegaExpensiveRareMemoryBarrier() by
simply requiring a restart of PostgreSQL to initiate the conversion
background. That is definitely going to guarantee a memory barrier. It's
certainly not ideal, but restarting the cluster is still a *lot* better
than having to do the entire conversion offline. This can of course be
improved upon in the future, but for now we stuck to the safe way.

The concurrent create-database-from-one-that-had-no-checksums is handled by
simply looping over the list of databases as long as new databases show up,
and waiting for all open transactions to finish at the right moment to
ensure there is no concurrently running one as we get the database list.

Since the worker is now a regular background worker started from
postmaster, the cost-delay parameters had to be made GUCs instead of
function arguments.

(And the more or less broken isolation tests are simply removed)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7bfbc87109..108e049a85 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2011,6 +2011,42 @@ include_dir 'conf.d'
  
 
 
+
+ Online Checksumming
+
+ 
+  
+   checksumhelper_cost_delay (integer)
+   
+checksumhelper_cost_delay configuration parameter
+   
+   
+   
+
+ The length of time, in milliseconds, that the process will sleep when
+ the cost limit has been exceeded. The default value is zero, which
+ disables the cost-based checksumming delay feature. Positive values
+ enable cost-based checksumming.
+
+   
+  
+
+  
+   checksumhelper_cost_limit (integer)
+   
+checksumhelper_cost_limit configuration parameter
+   
+   
+   
+
+ The accumulated cost that will cause the checksumming process to sleep.
+ It is turned off by default.
+
+   
+  
+ 
+
+
 
  Asynchronous Behavior
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5dce8ef178..154cf40cd3 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19582,6 +19582,74 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   
 
+  
+   Data Checksum Functions
+
+   
+The functions shown in  can
+be used to enable or disable data checksums in a running cluster.
+See  for details.
+   
+
+   
+Checksum SQL Functions
+
+ 
+  
+   Function
+   Return Type
+   Description
+  
+ 
+ 
+  
+   
+
+ pg_enable_data_checksums
+
+pg_enable_data_checksums()
+   
+   
+void
+   
+   
+
+ Initiates data checksums for the cluster. This will switch the data checksums mode
+ to in progress, but will not initiate the checksumming process.
+ In order to start checksumming the data pages the database must be restarted. Upon
+ restart a background worker will start processing all data in the database and enable
+ checksums for it. When all data pages have had checksums enabled, the cluster will
+ automatically switch to checksums on.
+
+
+ The  and
+  GUCs are used to 
+ throttle the
+ speed of the process is throttled using the same principles as
+ Cost-based Vacuum Delay.
+
+   
+  
+  
+   
+
+ pg_disable_data_checksums
+
+pg_disable_data_checksums()
+   
+   
+void
+   
+   
+Disables data checksums for the cluster. This takes effect immediately.
+   
+  
+ 
+
+   
+
+  
+
   
Database Object Management Functions
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 4489b585c7..be489e78b9 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -214,9 +214,9 @@ PostgreSQL documentation

 Use checksums on data pages to help detect corruption by the
 I/O system that would otherwise be silent. 

Re: Small fixes about backup history file in doc and pg_standby

2018-06-26 Thread Kyotaro HORIGUCHI
Hello.

Good catch!

At Tue, 26 Jun 2018 17:47:52 +0900, Yugo Nagata  wrote in 
<20180626174752.0ce505e3.nag...@sraoss.co.jp>
> Hi,
> 
> While looking into the backup and recovery code, I found small documentation 
> bugs. 
> The documatation says that the backup history files can be requested for 
> recovery, 
> but it's not used by the system and not requested anymore since PG 9.0 
> (commit 06f82b29616cd9effcaefd99c6b6e2e80697482f) and never be requested.
> 
> Attached patch (doc_backup_history_file.patch) corrects the description about 
> this.
> 
> In addition, the current pg_standby still can handle a backup history file 
> that are
> never requested. It is harmless but unnecessary code. Another attached patch
> (pg_standby.patch) removes this part of code.

The comment fix seems fine and they seem to be all occurances of
the word ".backup" in the context of recovery_command.

The definition of the symbol XLOG_BACKUP_LABEL is no longer
useful after your patch applied. Removing the symbol makes
XLOG_DATA and the variable nextWALFileName useless and finally we
can remove all branching using it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Thinko/typo in ExecSimpleRelationInsert

2018-06-26 Thread Ashutosh Bapat
Looks like we need similar adjustment in ExecSimpleRelationUpdate() as
well. Updated the patch.

On Tue, Jun 26, 2018 at 3:12 PM, Ashutosh Bapat
 wrote:
> Hi,
> There seems to be a thinko/typo in ExecSimpleRelationInsert(). A tuple
> can never store a slot, but a comment in that function says so. Tried
> to fix it in the patch attached.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
From 387148433d33eb7a9c2960e9ba554b1e28d04ef1 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Tue, 26 Jun 2018 15:02:07 +0530
Subject: [PATCH 2/2] Fix a thinko/typo in ExecSimpleRelationInsert/Update

A slot can not be stored in a tuple but it's vice versa.
---
 src/backend/executor/execReplication.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index 4fbdfc0..ee6ad37 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -415,7 +415,7 @@ ExecSimpleRelationInsert(EState *estate, TupleTableSlot *slot)
 		if (rel->rd_att->constr)
 			ExecConstraints(resultRelInfo, slot, estate, true);
 
-		/* Store the slot into tuple that we can inspect. */
+		/* Materialize slot into a tuple that we can inspect. */
 		tuple = ExecMaterializeSlot(slot);
 
 		/* OK, store the tuple and create index entries for it */
@@ -480,7 +480,7 @@ ExecSimpleRelationUpdate(EState *estate, EPQState *epqstate,
 		if (rel->rd_att->constr)
 			ExecConstraints(resultRelInfo, slot, estate, true);
 
-		/* Store the slot into tuple that we can write. */
+		/* Materialize slot into a tuple that we can inspect. */
 		tuple = ExecMaterializeSlot(slot);
 
 		/* OK, update the tuple and index entries for it */
-- 
2.7.4



Re: [HACKERS] GUC for cleanup indexes threshold.

2018-06-26 Thread Masahiko Sawada
On Fri, Jun 22, 2018 at 6:55 PM, Alexander Korotkov
 wrote:
> On Wed, Jun 20, 2018 at 12:00 PM Alexander Korotkov
>  wrote:
>> On Wed, Jun 20, 2018 at 8:32 AM Masahiko Sawada  
>> wrote:
>> > On Wed, Jun 20, 2018 at 1:00 AM, Alexander Korotkov
>> > > Ok.  I've rephrased comment a bit.  Also, you created "index vacuum"
>> > > subsection in the "resource usage" section.  I think it's not
>> > > appropriate for this option to be in "resource usage".  Ideally it
>> > > should be grouped with other vacuum options, but we don't have single
>> > > section for that.  "Autovacuum" section is also not appropriate,
>> > > because this guc works not only for autovacuum.  So, most semantically
>> > > close options, which affects vacuum in general, are
>> > > vacuum_freeze_min_age, vacuum_freeze_table_age,
>> > > vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age,
>> > > which are located in "client connection defaults" section.  So, I
>> > > decided to move this GUC into this section.  I also change the section
>> > > in GUC definition (guc.c) from AUTOVACUUM to CLIENT_CONN_STATEMENT.
>> >
>> > Agreed. So should we move it to 19.11 Client Connection Defaults in
>> > doc as well? And I think it's better if this option places next to
>> > other vacuum options for finding easier. Attached patch changes them
>> > so. Please review it.
>>
>> Right, thank you.  Looks good for me.
>> I'm going to commit this if no objections.
>
> Pushed.

Thank you!

>
> Regarding maximum value for vacuum_cleanup_index_scale_factor.  I
> think introducing special value with "never cleanup" meaning would be
> overkill for post feature freeze enhancement.

After more thought, adding a threshold to invoke index cleanups
perhaps work in order to support "never cleanup", it will be
PostgreSQL 12 item though. If a index has less tuples than the
threshold (say, vacuum_cleanup_index_threshold), index cleanups never
be executed.

> So, I propose to just
> increase maximum value for both GUC and reloption.  See the attached
> patch.  It also changes calculations _bt_vacuum_needs_cleanup() for
> better handling of large values (just some kind of overflow paranoia).
>

The patch looks good to me.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [WIP] [B-Tree] Retail IndexTuple deletion

2018-06-26 Thread Masahiko Sawada
On Fri, Jun 22, 2018 at 8:24 PM, Andrey V. Lepikhov
 wrote:
> Hi,
> According to your feedback, i develop second version of the patch.
> In this version:
> 1. High-level functions index_beginscan(), index_rescan() not used. Tree
> descent made by _bt_search(). _bt_binsrch() used for positioning on the
> page.
> 2. TID list introduced in amtargetdelete() interface. Now only one tree
> descent needed for deletion all tid's from the list with equal scan key
> value - logical duplicates deletion problem.
> 3. Only one WAL record for index tuple deletion per leaf page per
> amtargetdelete() call.
> 4. VACUUM can sort TID list preliminary for more quick search of duplicates.
>
> Background worker will come later.
>
>

Thank you for updating patches! Here is some comments for the latest patch.

+static void
+quick_vacuum_index(Relation irel, Relation hrel,
+  IndexBulkDeleteResult **overall_stats,
+  LVRelStats *vacrelstats)
+{
(snip)
+   /*
+* Collect statistical info
+*/
+   lazy_cleanup_index(irel, *overall_stats, vacrelstats);
+}

I think that we should not call lazy_cleanup_index at the end of
quick_vacuum_index because we call it multiple times during a lazy
vacuum and index statistics can be changed during vacuum. We already
call lazy_cleanup_index at the end of lazy_scan_heap.

bttargetdelete doesn't delete btree pages even if pages become empty.
I think we should do that. Otherwise empty page never be recycled. But
please note that if we delete btree pages during bttargetdelete,
recyclable pages might not be recycled. That is, if we choose the
target deletion method every time then the deleted-but-not-recycled
pages could never be touched, unless reaching
vacuum_cleanup_index_scale_factor. So I think we need to either run
bulk-deletion method or do cleanup index before btpo.xact wraparound.

+   ivinfo.indexRelation = irel;
+   ivinfo.heapRelation = hrel;
+   qsort((void *)vacrelstats->dead_tuples,
vacrelstats->num_dead_tuples, sizeof(ItemPointerData),
tid_comparator);

I think the sorting vacrelstats->dead_tuples is not necessary because
garbage TIDs  are stored in a sorted order.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: AtEOXact_ApplyLauncher() and subtransactions

2018-06-26 Thread Amit Khandekar
Added this into the July 2018 commitfest :

https://commitfest.postgresql.org/18/1696/

On 20 June 2018 at 22:22, Amit Khandekar  wrote:
> On 18 June 2018 at 15:02, Amit Khandekar  wrote:
>> On 16 June 2018 at 00:03, Amit Khandekar  wrote:
>>> The way I am implementing this can be seen in attached
>>> apply_launcher_subtrans_WIP.patch. (check launcher.c changes). I
>>> haven't started testing it yet though.
>>
>> Attached patch passes some basic testing I did. Will do some more
>> testing, and some self-review and code organising, if required.
>
> Done. Attached is v2 version of the patch. Comments welcome.
>
> Changed GetSubscriptionRelations() to GetSubscriptionRelids(), which
> now returns only the oids, not the subrel states. This was convenient
> for storing the exact returned list into the committed subscription
> rels. And anyways the subrel states were not used anywhere.
>
>> I will also split the patch into two : one containing the main issue
>> regarding subtransaction, and the other containing the other issue I
>> mentioned earlier that shows up without subtransaction as well.
>
> Did not split the patch. The changes for the other issue that shows up
> without subtransaction are all in subscriptioncmds.c , and that file
> contains mostly the changes for this issue. So kept it as a single
> patch. But if it gets inconvenient for someone while reviewing, I will
> be happy to split it.
>
> --
> Thanks,
> -Amit Khandekar
> EnterpriseDB Corporation
> The Postgres Database Company



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



RE: ON CONFLICT DO NOTHING on pg_dump

2018-06-26 Thread Ideriha, Takeshi
>> I agree with you though supporting MERGE or ON-CONFLICT-DO-UPDATE seems
>hard work.
>> Only ON-CONCLICT-DO-NOTHING use case may be narrow.
>
>Is it narrow, or is it just easy enough to add quickly?

Sorry for late replay.
I read your comment and rethought about it.
What I meant by "narrow" is that the number of people who use this new feature 
seems limited to me.
And I had a wrong impression that small improvements are always rejected by 
hackers.
But that's only the case for small improvements with huge source code 
modification. In short, cost/benefit ratio is low case.

This patch have some benefits with source code change is small.
So I just wait for other people reviews.

>And by the way, you don't need MERGE.  You can just generate INSERT/
>UPDATE/DELETE statements -- MERGE is mainly an optimization on that, and could
>wait until PG has a MERGE.
Oh, thank you for clarifying this. Now I understand it.

Best regards,
Takeshi Ideriha





RE: "wal receiver" process hang in syslog() while exiting after receiving SIGTERM while the postgres has been promoted.

2018-06-26 Thread Chen, Yan-Jack (NSB - CN/Hangzhou)
Hi,
  Well, if you agree with do not write log in signal handling function in any 
circumstance?  I see in many cases in postgresql signal handling function just 
set one flag which triggers its main process to handling the progress.
  How about simply remove the code lines?

--- walreceiver_old.c
+++ walreceiver.c
@@ -816,10 +816,6 @@
 
SetLatch(&WalRcv->latch);
 
-   /* Don't joggle the elbow of proc_exit */
-   if (!proc_exit_inprogress && WalRcvImmediateInterruptOK)
-   ProcessWalRcvInterrupts();
-
errno = save_errno;
 }
  


Br. ☺
Chen Yan(Jack)

-Original Message-
From: Robert Haas [mailto:robertmh...@gmail.com] 
Sent: Friday, June 22, 2018 8:51 PM
To: Chen, Yan-Jack (NSB - CN/Hangzhou) 
Cc: PostgreSQL Hackers 
Subject: Re: "wal receiver" process hang in syslog() while exiting after 
receiving SIGTERM while the postgres has been promoted.

On Thu, Jun 21, 2018 at 1:11 AM, Chen, Yan-Jack (NSB - CN/Hangzhou)
 wrote:
> Hi Hackers,
>   We encounter one problem happened while we try to promote standby
> postgres(version 9.6.9) server to active. From the trace(we triggered the
> process abort). We can see the process was hang in syslog() while handling
> SIGTERM. According to below article. Looks it is risky to write syslog in
> signal handling. Any idea to avoid it?

Huh.  I thought that Andres had removed all of this kind of stuff back
in 675f55e1d9bcb9da4323556b456583624a07,
4f85fde8eb860f263384fffdca660e16e77c7f76, and
387da18874afa17156ee3af63766f17efb53c4b9, and related commits, but
rereading the commit message I see that it wasn't that ambitious.
Probably a similar approach would make sense here, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


  1   2   >