Re: plan_cache_mode and postgresql.conf.sample

2018-08-21 Thread Michael Paquier
On Wed, Aug 22, 2018 at 06:26:52PM +1200, Thomas Munro wrote:
> Thanks, pushed.  I removed one tab because it looks like the comments
> in that file are supposed to line up with tabstop=8 (unlike our source
> files which use 4), and this one didn't.  I hope I got that right!

This line now spawns at 87 characters for me, so that's not quite it.  I
think that you should just have put the list into two lines.
--
Michael


signature.asc
Description: PGP signature


Re: plan_cache_mode and postgresql.conf.sample

2018-08-21 Thread Thomas Munro
On Wed, Aug 22, 2018 at 5:45 PM, David Rowley
 wrote:
> While testing something that I needed to ensure a generic plan was
> being used, during editing postgresql.conf I couldn't quite remember
> the exact spelling of the option to do that.  I think the valid
> options for the setting should be listed in the sample config file.
>
> Patch attached.

Thanks, pushed.  I removed one tab because it looks like the comments
in that file are supposed to line up with tabstop=8 (unlike our source
files which use 4), and this one didn't.  I hope I got that right!

-- 
Thomas Munro
http://www.enterprisedb.com



Re: JIT compiling with LLVM v12

2018-08-21 Thread Noah Misch
On Wed, Mar 28, 2018 at 02:27:51PM -0700, Andres Freund wrote:
> For now LLVM is enabled by default when compiled --with-llvm. I'm mildly
> inclined to leave it like that until shortly before the release, and
> then disable it by default (i.e. change the default of jit=off). But I
> think we can make that decision based on experience during the testing
> window. I'm opening an open items entry for that.

I'll vote for jit=on and letting any bugs shake out earlier, but it's not a
strong preference.

I see jit slows the regression tests considerably:

# x86_64, non-assert, w/o llvm
$ for n in 1 2 3; do env time make -C src/bin/pg_upgrade check; done 2>&1 | 
grep elapsed
7.64user 4.24system 0:36.40elapsed 32%CPU (0avgtext+0avgdata 36712maxresident)k
8.09user 4.50system 0:37.71elapsed 33%CPU (0avgtext+0avgdata 36712maxresident)k
7.53user 4.18system 0:36.54elapsed 32%CPU (0avgtext+0avgdata 36712maxresident)k

# x86_64, non-assert, w/  llvm trunk
$ for n in 1 2 3; do env time make -C src/bin/pg_upgrade check; done 2>&1 | 
grep elapsed
9.58user 5.79system 0:49.61elapsed 30%CPU (0avgtext+0avgdata 36712maxresident)k
9.47user 5.92system 0:47.84elapsed 32%CPU (0avgtext+0avgdata 36712maxresident)k
9.09user 5.51system 0:47.94elapsed 30%CPU (0avgtext+0avgdata 36712maxresident)k

# mips32el, assert, w/o llvm (buildfarm member topminnow) [1]
28min install-check-*
35min check-pg_upgrade

# mips32el, assert, w/  llvm 6.0.1 [1]
 63min install-check-*
166min check-pg_upgrade

Regardless of the choice of jit={on|off} default, these numbers tell me that
some or all of jit_*_cost defaults are too low.


[1] The mips32el runs used "nice -+20" and ran on a shared machine.  I include
them to show the trend, but exact figures may be non-reproducible.



plan_cache_mode and postgresql.conf.sample

2018-08-21 Thread David Rowley
Hi,

While testing something that I needed to ensure a generic plan was
being used, during editing postgresql.conf I couldn't quite remember
the exact spelling of the option to do that.  I think the valid
options for the setting should be listed in the sample config file.

Patch attached.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


sample_conf_plan_cache_mode.patch
Description: Binary data


Re: Improve behavior of concurrent ANALYZE/VACUUM

2018-08-21 Thread Michael Paquier
On Wed, Aug 22, 2018 at 02:17:44AM +, Bossart, Nathan wrote:
> I think this is doable by locking the table in SHARE mode.  That won't
> conflict with the AccessShareLock that expand_vacuum_rel() obtains,
> but it will conflict with the ShareUpdateExclusiveLock or
> AccessExclusiveLock that vacuum_rel() takes.

Good point.  Still is that really worth adding?  This implies a test
which has at least two roles, one switching the ownership to the other
and do so back-and-forth.  At least that should be on a different
isolation spec file to not complicate the first one.
--
Michael


signature.asc
Description: PGP signature


RE: [HACKERS] Cached plans and statement generalization

2018-08-21 Thread Yamaji, Ryo
On Tuesday, August 7, 2018 at 0:36 AM, Konstantin Knizhnik wrote: 

> I have registered the patch for next commitfest.
> For some reasons it doesn't find the latest autoprepare-10.patch and still
> refer to autoprepare-6.patch as the latest attachement.

I am sorry for the long delay in my response.
I confirmed it and become reviewer the patch.
I am going to review.


> Right now each prepared statement has two memory contexts: one for raw
> parse tree used as hash table key and another for cached plan itself.
> May be it will be possible to combine them. To calculate memory consumed
> by cached plans, it will be necessary to calculate memory usage statistic
> for all this contexts (which requires traversal of all context's chunks)
> and sum them. It is much more complex and expensive than current check:
> (++autoprepare_cached_plans > autoprepare_limit) although I so not think
> that it will have measurable impact on performance...
> May be there should be some faster way to estimate memory consumed by
> prepared statements.
> 
> So, the current autoprepare_limit allows to limit number of autoprepared
> statements and prevent memory overflow caused by execution of larger
> number of different statements.
> The question is whether we need more precise mechanism which will take
> in account difference between small and large queries. Definitely simple
> query can require 10-100 times less memory than complex query. But memory
> contexts themselves (even with small block size) somehow minimize
> difference in memory footprint of different queries, because of chunked
> allocation.

Thank you for telling me how to implement and its problem.
In many cases when actually designing a system, we estimate the amount of memory
to use and prepare a memory accordingly. Therefore, if we need to estimate 
memory
usage in both patterns that are limit amount of memory used by prepare queries 
or
number of prepared statements, I think that there is no problem with the current
implementation.

Apart from the patch, if it can implement an application that outputs estimates 
of
memory usage when we enter a query, you may be able to use autoprepare more 
comfortably.
But it sounds difficult..


Re: Two proposed modifications to the PostgreSQL FDW

2018-08-21 Thread Chris Travers
On Wed, Aug 22, 2018 at 3:12 AM Masahiko Sawada 
wrote:

> On Tue, Aug 21, 2018 at 5:36 PM Chris Travers 
> wrote:
> >
> >
> >
> > On Tue, Aug 21, 2018 at 8:42 AM Masahiko Sawada 
> wrote:
> >>
> >> On Tue, Aug 21, 2018 at 1:47 AM Chris Travers 
> wrote:
> >> >
> >> >
> >> >
> >> > On Mon, Aug 20, 2018 at 4:41 PM Andres Freund 
> wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On 2018-08-20 16:28:01 +0200, Chris Travers wrote:
> >> >> > 2.  TWOPHASECOMMIT=[off|on] option
> >> >>
> >> >> > The second major issue that I see with PostgreSQL's foreign
> database
> >> >> > wrappers is the fact that there is no two phase commit which means
> that a
> >> >> > single transaction writing to a group of tables has no expectation
> that all
> >> >> > backends will commit or rollback together.  With this patch an
> option would
> >> >> > be applied to foreign tables such that they could be set to use
> two phase
> >> >> > commit  When this is done, the first write to each backend would
> register a
> >> >> > connection with a global transaction handler and a pre-commit and
> commit
> >> >> > hooks would be set up to properly process these.
> >> >> >
> >> >> > On recommit a per-global-transaction file would be opened in the
> data
> >> >> > directory and prepare statements logged to the file.  On error, we
> simply
> >> >> > roll back our local transaction.
> >> >> >
> >> >> > On commit hook , we go through and start to commit the remote
> global
> >> >> > transactions.  At this point we make a best effort but track
> whether or not
> >> >> > we were successfully on all.  If successful on all, we delete the
> file.  If
> >> >> > unsuccessful we fire a background worker which re-reads the file
> and is
> >> >> > responsible for cleanup.  If global transactions persist, a SQL
> >> >> > administration function will be made available to restart the
> cleanup
> >> >> > process.  On rollback, we do like commit but we roll back all
> transactions
> >> >> > in the set.  The file has enough information to determine whether
> we should
> >> >> > be committing or rolling back on cleanup.
> >> >> >
> >> >> > I would like to push these both for Pg 12.  Is there any feedback
> on the
> >> >> > concepts and the problems first
> >> >>
> >>
> >> Thank you for the proposal. I agree that it's a major problem that
> >> postgres_fdw (or PostgreSQL core API) doesn't support two-phase
> >> commit.
> >>
> >> >> There's been *substantial* work on this. You should at least read the
> >> >> discussion & coordinate with the relevant developers.
> >> >
> >> >
> >> > I suppose I should forward this to them directly also.
> >> >
> >> > Yeah.   Also the transaction manager code for this I wrote while
> helping with a proof of concept for this copy-to-remote extension.
> >> >
> >> > There are a few big differences in implementation with the patches
> you mention and the disagreement was part of why I thought about going this
> direction.
> >> >
> >> > First, discussion of differences in implementation:
> >> >
> >> > 1.  I treat the local and remote transactions symmetrically and I
> make no assumptions about what might happen between prepare and an
> attempted local commit.
> >> >prepare goes into the precommit hook
> >> >commit goes into the commit hook and we never raise errors if it
> fails (because you cannot rollback at that point).  Instead a warning is
> raised and cleanup commences.
> >> >rollback goes into the rollback hook and we never raise errors if
> it fails (because you are already rolling back).
> >> >
> >> > 2.  By treating this as a property of a table rather than a property
> of a foreign data wrapper or a server, we can better prevent prepared
> transactions where they have not been enabled.
> >> >This also ensures that we know whether we are guaranteeing two
> phase commit or not by looking at the table.
> >> >
> >> > 3.  By making this opt-in it avoids a lot of problems with regards to
> incorrect configuration etc since if the DBA says "use two phase commit"
> and failed to enable prepared transactions on the other side...
> >> >
> >> > On to failure modes:
> >> >  1.  Its possible that under high load too many foreign transactions
> are prepared and things start rolling back instead of committing.  Oh
> well
> >> >  2.  In the event that a foreign server goes away between prepare and
> commit, we continue to retry via the background worker.  The background
> worker is very pessimistic and checks every remote system for the named
> transaction.
> >>
> >> If some participant servers fail during COMMIT PREPARED, will the
> >> client get a "committed"? or an "aborted"? If the client gets
> >> "aborted", that's not correct because the local changes are already
> >> committed at that point.
> >
> >
> > Ok so let's discuss this in more detail here.
> >
> > You have basically 6 states a TPC global transaction can be in.
> > 1.  We haven't gotten to the point of trying to commit (BEGIN)
> > 2.  We are trying to commit (PRE

Re: Improve behavior of concurrent ANALYZE/VACUUM

2018-08-21 Thread Bossart, Nathan
On 8/21/18, 7:44 PM, "Michael Paquier"  wrote:
> On Tue, Aug 21, 2018 at 04:01:50PM +, Bossart, Nathan wrote:
>> I think my biggest concern with this approach is that we'd be
>> introducing inconsistent behavior whenever there are concurrent
>> changes.  If a user never had permissions to VACUUM the partitioned
>> table, the partitions are skipped outright.  However, if the user
>> loses permissions to VACUUM the partitioned table between
>> expand_vacuum_rel() and vacuum_rel(), we'll still attempt to VACUUM
>> each individual partition.
>> 
>> I'll admit I don't have a great alternative proposal that doesn't
>> involve adding deadlock risk or complexity, but it still seems worth
>> mulling over.
>
> That counts only for a manual vacuum/analyze listing directly the
> relation in question.  If running a system-wide VACUUM then all the
> relations are still processed.  This is a rather edge case in my opinion
> but..  I don't mind mulling over it (as you say).  So please let me
> think over it for a couple of days.  I don't see a smart solution which
> does not create risks of lock upgrades and deadlocks now, there may be
> one able to preserve the existing behavior.

Right.  If we don't come up with anything, the behavior change for
this edge case is probably reasonable as long as we update the
documentation like you proposed earlier.

>>> - 0002 is the original patch discussed here.
>> 
>> I'd suggest even splitting 0002 into two patches: one for refactoring
>> the existing permissions checks into vacuum_is_relation_owner() and
>> another for the new checks.
>
> Hmmm.  The second patch changes also some comment blocks when calling
> vacuum_is_relation_owner(), so we finish by changing the same code
> areas, resulting in more code churn for no real gain.

I see.  I only made this suggestion so that we could get some of the
easy changes out of the way, but there's no need if it's just adding
unnecessary code churn.

>> +# The role doesn't have privileges to vacuum the table, so VACUUM should
>> +# immediately skip the table without waiting for a lock.
>> 
>> Can we add tests for concurrent changes that cause the relation to be
>> skipped in vacuum_rel() and analyze_rel() instead of
>> expand_vacuum_rel()?
>
> Doing that deterministically with concurrent tests look difficult to me
> as doing ALTER TABLE OWNER TO to a relation in a first session causes a
> second session running VACUUM to block in expand_vacuum_rel(), be it
> with a plain table or a partitioned table (doing the ALTER TABLE on a
> leaf will block scanning the parent as well).

I think this is doable by locking the table in SHARE mode.  That won't
conflict with the AccessShareLock that expand_vacuum_rel() obtains,
but it will conflict with the ShareUpdateExclusiveLock or
AccessExclusiveLock that vacuum_rel() takes.

session 1> BEGIN; LOCK test IN SHARE MODE;
session 2> VACUUM test;
session 1> ALTER TABLE test OWNER TO not_session_2_user; COMMIT;

Nathan



Re: Two proposed modifications to the PostgreSQL FDW

2018-08-21 Thread Masahiko Sawada
On Tue, Aug 21, 2018 at 5:36 PM Chris Travers  wrote:
>
>
>
> On Tue, Aug 21, 2018 at 8:42 AM Masahiko Sawada  wrote:
>>
>> On Tue, Aug 21, 2018 at 1:47 AM Chris Travers  
>> wrote:
>> >
>> >
>> >
>> > On Mon, Aug 20, 2018 at 4:41 PM Andres Freund  wrote:
>> >>
>> >> Hi,
>> >>
>> >> On 2018-08-20 16:28:01 +0200, Chris Travers wrote:
>> >> > 2.  TWOPHASECOMMIT=[off|on] option
>> >>
>> >> > The second major issue that I see with PostgreSQL's foreign database
>> >> > wrappers is the fact that there is no two phase commit which means that 
>> >> > a
>> >> > single transaction writing to a group of tables has no expectation that 
>> >> > all
>> >> > backends will commit or rollback together.  With this patch an option 
>> >> > would
>> >> > be applied to foreign tables such that they could be set to use two 
>> >> > phase
>> >> > commit  When this is done, the first write to each backend would 
>> >> > register a
>> >> > connection with a global transaction handler and a pre-commit and commit
>> >> > hooks would be set up to properly process these.
>> >> >
>> >> > On recommit a per-global-transaction file would be opened in the data
>> >> > directory and prepare statements logged to the file.  On error, we 
>> >> > simply
>> >> > roll back our local transaction.
>> >> >
>> >> > On commit hook , we go through and start to commit the remote global
>> >> > transactions.  At this point we make a best effort but track whether or 
>> >> > not
>> >> > we were successfully on all.  If successful on all, we delete the file. 
>> >> >  If
>> >> > unsuccessful we fire a background worker which re-reads the file and is
>> >> > responsible for cleanup.  If global transactions persist, a SQL
>> >> > administration function will be made available to restart the cleanup
>> >> > process.  On rollback, we do like commit but we roll back all 
>> >> > transactions
>> >> > in the set.  The file has enough information to determine whether we 
>> >> > should
>> >> > be committing or rolling back on cleanup.
>> >> >
>> >> > I would like to push these both for Pg 12.  Is there any feedback on the
>> >> > concepts and the problems first
>> >>
>>
>> Thank you for the proposal. I agree that it's a major problem that
>> postgres_fdw (or PostgreSQL core API) doesn't support two-phase
>> commit.
>>
>> >> There's been *substantial* work on this. You should at least read the
>> >> discussion & coordinate with the relevant developers.
>> >
>> >
>> > I suppose I should forward this to them directly also.
>> >
>> > Yeah.   Also the transaction manager code for this I wrote while helping 
>> > with a proof of concept for this copy-to-remote extension.
>> >
>> > There are a few big differences in implementation with the patches you 
>> > mention and the disagreement was part of why I thought about going this 
>> > direction.
>> >
>> > First, discussion of differences in implementation:
>> >
>> > 1.  I treat the local and remote transactions symmetrically and I make no 
>> > assumptions about what might happen between prepare and an attempted local 
>> > commit.
>> >prepare goes into the precommit hook
>> >commit goes into the commit hook and we never raise errors if it fails 
>> > (because you cannot rollback at that point).  Instead a warning is raised 
>> > and cleanup commences.
>> >rollback goes into the rollback hook and we never raise errors if it 
>> > fails (because you are already rolling back).
>> >
>> > 2.  By treating this as a property of a table rather than a property of a 
>> > foreign data wrapper or a server, we can better prevent prepared 
>> > transactions where they have not been enabled.
>> >This also ensures that we know whether we are guaranteeing two phase 
>> > commit or not by looking at the table.
>> >
>> > 3.  By making this opt-in it avoids a lot of problems with regards to 
>> > incorrect configuration etc since if the DBA says "use two phase commit" 
>> > and failed to enable prepared transactions on the other side...
>> >
>> > On to failure modes:
>> >  1.  Its possible that under high load too many foreign transactions are 
>> > prepared and things start rolling back instead of committing.  Oh well
>> >  2.  In the event that a foreign server goes away between prepare and 
>> > commit, we continue to retry via the background worker.  The background 
>> > worker is very pessimistic and checks every remote system for the named 
>> > transaction.
>>
>> If some participant servers fail during COMMIT PREPARED, will the
>> client get a "committed"? or an "aborted"? If the client gets
>> "aborted", that's not correct because the local changes are already
>> committed at that point.
>
>
> Ok so let's discuss this in more detail here.
>
> You have basically 6 states a TPC global transaction can be in.
> 1.  We haven't gotten to the point of trying to commit (BEGIN)
> 2.  We are trying to commit (PREPARE)
> 3.  We have committed to committing all transactions (COMMIT)
> 4.  We have committed to rolling back

Re: Improve behavior of concurrent ANALYZE/VACUUM

2018-08-21 Thread Michael Paquier
On Tue, Aug 21, 2018 at 04:01:50PM +, Bossart, Nathan wrote:
> I think my biggest concern with this approach is that we'd be
> introducing inconsistent behavior whenever there are concurrent
> changes.  If a user never had permissions to VACUUM the partitioned
> table, the partitions are skipped outright.  However, if the user
> loses permissions to VACUUM the partitioned table between
> expand_vacuum_rel() and vacuum_rel(), we'll still attempt to VACUUM
> each individual partition.
> 
> I'll admit I don't have a great alternative proposal that doesn't
> involve adding deadlock risk or complexity, but it still seems worth
> mulling over.

That counts only for a manual vacuum/analyze listing directly the
relation in question.  If running a system-wide VACUUM then all the
relations are still processed.  This is a rather edge case in my opinion
but..  I don't mind mulling over it (as you say).  So please let me
think over it for a couple of days.  I don't see a smart solution which
does not create risks of lock upgrades and deadlocks now, there may be
one able to preserve the existing behavior.

>> I have split the patch into two parts:
>> - 0001 includes new tests which generate WARNING messages for VACUUM,
>> ANALYZE and VACUUM (ANALYZE).  That's useful separately.
> 
> 0001 looks good to me.

Thanks, I have pushed this one.

>> - 0002 is the original patch discussed here.
> 
> I'd suggest even splitting 0002 into two patches: one for refactoring
> the existing permissions checks into vacuum_is_relation_owner() and
> another for the new checks.

Hmmm.  The second patch changes also some comment blocks when calling
vacuum_is_relation_owner(), so we finish by changing the same code
areas, resulting in more code churn for no real gain.

> +# The role doesn't have privileges to vacuum the table, so VACUUM should
> +# immediately skip the table without waiting for a lock.
> 
> Can we add tests for concurrent changes that cause the relation to be
> skipped in vacuum_rel() and analyze_rel() instead of
> expand_vacuum_rel()?

Doing that deterministically with concurrent tests look difficult to me
as doing ALTER TABLE OWNER TO to a relation in a first session causes a
second session running VACUUM to block in expand_vacuum_rel(), be it
with a plain table or a partitioned table (doing the ALTER TABLE on a
leaf will block scanning the parent as well).
--
Michael


signature.asc
Description: PGP signature


Re: Getting NOT NULL constraint from pg_attribute

2018-08-21 Thread Wu Ivy
Thanks for the response. Really appreciate it!

Regards,
Ivy

2018-08-20 10:40 GMT-07:00 David G. Johnston :

> On Monday, August 20, 2018, Wu Ivy  wrote:
>>
>> Thanks for the quick respond.
>> Why are SELECT query never marked nullable? For nullable columns, when I
>> call SPI_getvalue(), the result (in char*) is NULL. I don’t think I’m too
>> clear on the definition of *attnotnull*. Can you give me a example in
>> which the tupleTable is can be marked nullable?
>> Also, is there any other ways to get nullability of each column while
>> getting the data from SPI_cursor_fetch? The only way I can think is to call
>> another separate command to query the table schema, but it will be in a
>> separate transaction in that case.
>>
>
> Basically the nullability property is used by the planner for optimization
> during the joining of physical tables.  As soon as you try outputting
> columns the ability to enforce not null goes away because of, in
> particular, outer joins.  While some changes could maybe be made the
> cost-benefit to do so doesn't seem favorable.
>
> David J.
>
>


Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-08-21 Thread Andres Freund
On 2018-08-21 17:58:00 -0400, Andrew Dunstan wrote:
> 
> 
> On 08/21/2018 04:49 PM, Andres Freund wrote:
> > On 2018-08-21 11:09:15 -0700, Joshua D. Drake wrote:
> > > On 08/21/2018 11:06 AM, Andrew Dunstan wrote:
> > > > 
> > > > 
> > > > XP at least is essentially a dead platform for us. My animals are not
> > > > able to build anything after release 10.
> > > I wouldn't think XP should even be on our list anymore. Microsoft hasn't
> > > supported it in 4 years.
> > XP isn't the only thing relevant here, vista and 2008 R1 are in the same
> > class.
> > 
> 
> 
> I do have a machine in my laptop graveyard with Vista. The only WS2008
> instace I have available is R2 and AWS doesn't seem to have any AMIs for R1.
> 
> Honestly, I don't think these matter terribly much. Anyone building now is
> not likely to be targeting them.

I agree, I think we should just decree that the minimum is MSVC 2013 and
that people building 12 need to deal with that.  I would personally
*additionally* would say that we officially don't support *running* (not
compiling) on XP, 2003, 2008R1 and Vista (all unsupported by MS) - but
that's a somewhat orthogonal decision.

Greetings,

Andres Freund



Re: patch to allow disable of WAL recycling

2018-08-21 Thread Jerry Jelinek
Tomas,

Thanks for doing all of this testing. Your testing and results are much
more detailed than anything I did. Please let me know if there is any
follow-up that I should attempt.

Thanks again,
Jerry


On Thu, Aug 16, 2018 at 3:43 PM, Tomas Vondra 
wrote:

> On 07/22/2018 10:50 PM, Tomas Vondra wrote:
> > On 07/21/2018 12:04 AM, Jerry Jelinek wrote:
> >> Thomas,
> >>
> >> Thanks for your offer to run some tests on different OSes and
> >> filesystems that you have. Anything you can provide here would be much
> >> appreciated. I don't have anything other than our native SmartOS/ZFS
> >> based configurations, but I might be able to setup some VMs and get
> >> results that way. I should be able to setup a VM running FreeBSD. If you
> >> have a chance to collect some data, just let me know the exact
> >> benchmarks you ran and I'll run the same things on the FreeBSD VM.
> >> Obviously you're under no obligation to do any of this, so if you don't
> >> have time, just let me know and I'll see what I can do on my own.
> >>
> >
> > Sounds good. I plan to start with the testing in a couple of days - the
> > boxes are currently running some other tests at the moment. Once I have
> > some numbers I'll share them here, along with the test scripts etc.
> >
>
> I do have initial results from one of the boxes. It's not complete, and
> further tests are still running, but I suppose it's worth sharing what I
> have at this point.
>
> As usual, the full data and ugly scripts are available in a git repo:
>
>https://bitbucket.org/tvondra/wal-recycle-test-xeon/src/master/
>
> Considering the WAL recycling only kicks in after a while, I've decided
> to do a single long (6-hour) pgbench run for each scale, instead of the
> usual "multiple short runs" approach.
>
> So far I've tried on these filesystems:
>
> * btrfs
> * ext4 / delayed allocation enabled (default)
> * ext4 / delayed allocation disabled
> * xfs
>
> The machine has 64GB of RAM, so I've chosen scales 200 (fits into
> shared_buffers), 2000 (in RAM) and 8000 (exceeds RAM), to trigger
> different I/O patterns. I've used the per-second aggregated logging,
> with the raw data available in the git repo. The charts attached to this
> message are per-minute tps averages, to demonstrate the overall impact
> on throughtput which would otherwise be hidden in jitter.
>
> All these tests are done on Optane 900P 280GB SSD, which is pretty nice
> storage but the limited size is somewhat tight for the scale 8000 test.
>
> For the traditional filesystems (ext4, xfs) the WAL recycling seems to
> be clearly beneficial - for the in-memory datasets the difference seems
> to be negligible, but for the largest scale it gives maybe +20% benefit.
> The delalloc/nodellalloc on ext4 makes pretty much no difference here,
> and both xfs and ext4 peform almost exactly the same here - the main
> difference seems to be that on ext4 the largest scale ran out of disk
> space while xfs managed to keep running. Clearly there's a difference in
> free space management, but that's unrelated to this patch.
>
> On BTRFS, the results on the two smaller scales show about the same
> behavior (minimal difference between WAL recycling and not recycling),
> except that the throughput is perhaps 25-50% of ext4/xfs. Fair enough, a
> different type of filesystem, and LVM snapshots would likely have the
> same impact. But no clear win with recycling disabled. On the largest
> scale, the device ran out of space after 10-20 minutes, which makes it
> impossible to make any reasonable conclusions :-(
>
>
> I plan to do some more tests with zfsonlinux, and LVM with snapshots. I
> wonder if those will show some benefit of disabling the WAL recycling.
> And then, if time permits, I'll redo some of those tests with a small
> SATA-based RAID array (aka spinning rust). Mostly out of curiosity.
>
> FWIW I've planned to do these tests on another machine, but I've ran
> into some strange data corruption issues on it, and I've spent quite a
> bit of time investigating that and trying to reproduce it, which delayed
> these tests a bit. And of course, once I added elog(PANIC) to the right
> place it stopped happening :-/
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-08-21 Thread Andrew Dunstan




On 08/21/2018 04:49 PM, Andres Freund wrote:

On 2018-08-21 11:09:15 -0700, Joshua D. Drake wrote:

On 08/21/2018 11:06 AM, Andrew Dunstan wrote:



XP at least is essentially a dead platform for us. My animals are not
able to build anything after release 10.

I wouldn't think XP should even be on our list anymore. Microsoft hasn't
supported it in 4 years.

XP isn't the only thing relevant here, vista and 2008 R1 are in the same
class.




I do have a machine in my laptop graveyard with Vista. The only WS2008 
instace I have available is R2 and AWS doesn't seem to have any AMIs for R1.


Honestly, I don't think these matter terribly much. Anyone building now 
is not likely to be targeting them.


Of the buildfarm animals, dory looks OK, hamerkop, bowerbird and thrips 
look not ok, and whelk and woodlouse look borderline.


I'm perfectly prepared to upgrade bowerbird if necessary.

cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-08-21 Thread Andres Freund
On 2018-08-21 11:09:15 -0700, Joshua D. Drake wrote:
> On 08/21/2018 11:06 AM, Andrew Dunstan wrote:
> > 
> > 
> > 
> > XP at least is essentially a dead platform for us. My animals are not
> > able to build anything after release 10.
> 
> I wouldn't think XP should even be on our list anymore. Microsoft hasn't
> supported it in 4 years.

XP isn't the only thing relevant here, vista and 2008 R1 are in the same
class.

Greetings,

Andres Freund



Re: [HACKERS] proposal: schema variables

2018-08-21 Thread Pavel Stehule
Hi Fabien

Dne út 21. 8. 2018 19:56 uživatel Fabien COELHO 
napsal:

>
> Hello Pavel,
>
> AFAICR, I had an objection on such new objects when you first proposed
> something similar in October 2016.
>
> Namely, if session variables are not transactional, they cannot be used to
> implement security related auditing features which were advertised as the
> motivating use case: an the audit check may fail on a commit because of a
> differed constraint, but the variable would keep its "okay" value unduly,
> which would create a latent security issue, the audit check having failed
> but the variable saying the opposite.
>
> So my point was that they should be transactional by default, although I
> would be ok with an option for having a voluntary non transactional
> version.
>
> Is this issue addressed somehow with this ?



1. I respect your opinion, but I dont agree with it. Oracle, db2 has
similar or very similar feature non transactional, and I didnt find any
requests to change it.

2. the prototype implementation was based on relclass items, and some
transactional behave was possible. Peter E. had objections to this design
and proposed own catalog table. I did it. Now, the transactional behave is
harder to implement, although it is not impossible. This patch is not small
now, so I didnt implement it. I have a strong opinion so default behave
have to be non transactional.

Transactional variables significantly increases complexity of this patch,
now is simple, because we can reset variable on drop variable command.
Maybe I miss some simply implementation, but I spent on it more than few
days. Still, any cooperation are welcome.

Regards

Pavel




> --
> Fabien.
>


Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-08-21 Thread Andres Freund
On 2018-08-21 14:06:18 -0400, Andrew Dunstan wrote:
> 
> 
> On 08/21/2018 01:31 PM, Andres Freund wrote:
> > Hi,
> > 
> > On 2018-08-21 13:29:20 -0400, Tom Lane wrote:
> > > Peter Eisentraut  writes:
> > > > So, does anyone with Windows build experience want to comment on this?
> > > > The proposal is to desupport anything older than (probably) MSVC 2013,
> > > > or alternatively anything that cannot compile the attached test file.
> > > We've got a buildfarm handy that could answer the question.
> > > Let's just stick a test function in there for a day and see
> > > which animals fail.
> > I think we pretty much know the answer already, anything before 2013
> > will fail. The question is more whether that's problematic for the
> > people building on windows.  My theory, quoted by Peter upthread, is
> > that it shouldn't be problematic because 2013 can build binaries that
> > run on XP etc.

> XP at least is essentially a dead platform for us. My animals are not able
> to build anything after release 10.

I'm perfectly fine with that, FWIW. It's out of extended support for
several years now.  But my point was that you can newer versions of MSVC
to build things that run on XP (and more relevantly 2008, vista, 7 etc).
No idea if we'd need to change anything in our build infra for that.

Greetings,

Andres Freund



Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-08-21 Thread Joshua D. Drake

On 08/21/2018 11:06 AM, Andrew Dunstan wrote:




XP at least is essentially a dead platform for us. My animals are not 
able to build anything after release 10.


I wouldn't think XP should even be on our list anymore. Microsoft hasn't 
supported it in 4 years.


JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
***  A fault and talent of mine is to tell it exactly how it is.  ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
* Unless otherwise stated, opinions are my own.   *




Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-08-21 Thread Andrew Dunstan




On 08/21/2018 01:31 PM, Andres Freund wrote:

Hi,

On 2018-08-21 13:29:20 -0400, Tom Lane wrote:

Peter Eisentraut  writes:

So, does anyone with Windows build experience want to comment on this?
The proposal is to desupport anything older than (probably) MSVC 2013,
or alternatively anything that cannot compile the attached test file.

We've got a buildfarm handy that could answer the question.
Let's just stick a test function in there for a day and see
which animals fail.

I think we pretty much know the answer already, anything before 2013
will fail. The question is more whether that's problematic for the
people building on windows.  My theory, quoted by Peter upthread, is
that it shouldn't be problematic because 2013 can build binaries that
run on XP etc.





XP at least is essentially a dead platform for us. My animals are not 
able to build anything after release 10.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-08-21 Thread Chapman Flack
On 08/21/2018 01:46 PM, Tom Lane wrote:
> Andres Freund  writes:
>> On 2018-08-21 13:29:20 -0400, Tom Lane wrote:
>>> We've got a buildfarm handy that could answer the question.
>>> Let's just stick a test function in there for a day and see
>>> which animals fail.
> 
>> I think we pretty much know the answer already, anything before 2013
>> will fail.
> 
> Do we know that for sure?  I thought it was theoretical.

I thought I remembered a message where it had been looked up in docs,
but I think the one I was remembering was Peter's "According to my
research (completely untested in practice), you need 2010 for
mixed code and declarations and 2013 for named initialization
of structs." [1] which didn't quite actually say it was documented.

-Chap

[1]
https://www.postgresql.org/message-id/ef986aa7-c7ca-ec34-19d9-fef38716b109%402ndquadrant.com



Re: [HACKERS] proposal: schema variables

2018-08-21 Thread Fabien COELHO



Hello Pavel,

AFAICR, I had an objection on such new objects when you first proposed 
something similar in October 2016.


Namely, if session variables are not transactional, they cannot be used to 
implement security related auditing features which were advertised as the 
motivating use case: an the audit check may fail on a commit because of a 
differed constraint, but the variable would keep its "okay" value unduly, 
which would create a latent security issue, the audit check having failed 
but the variable saying the opposite.


So my point was that they should be transactional by default, although I 
would be ok with an option for having a voluntary non transactional 
version.


Is this issue addressed somehow with this version?

--
Fabien.



Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-08-21 Thread Andres Freund
Hi,

On 2018-08-21 13:46:10 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-08-21 13:29:20 -0400, Tom Lane wrote:
> >> We've got a buildfarm handy that could answer the question.
> >> Let's just stick a test function in there for a day and see
> >> which animals fail.
> 
> > I think we pretty much know the answer already, anything before 2013
> > will fail.
> 
> Do we know that for sure?  I thought it was theoretical.

Pretty much. I'm on mobile data so I don't want to search too much, but
I've previously looked it up, and designated initializer support was
introduced in 2013.  See e.g. the graph in
https://blogs.msdn.microsoft.com/somasegar/2013/06/28/c-conformance-roadmap/

Greetings,

Andres Freund



Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-08-21 Thread Tom Lane
Andres Freund  writes:
> On 2018-08-21 13:29:20 -0400, Tom Lane wrote:
>> We've got a buildfarm handy that could answer the question.
>> Let's just stick a test function in there for a day and see
>> which animals fail.

> I think we pretty much know the answer already, anything before 2013
> will fail.

Do we know that for sure?  I thought it was theoretical.

regards, tom lane



Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-08-21 Thread Andres Freund
Hi,

On 2018-08-21 13:29:20 -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > So, does anyone with Windows build experience want to comment on this?
> > The proposal is to desupport anything older than (probably) MSVC 2013,
> > or alternatively anything that cannot compile the attached test file.
> 
> We've got a buildfarm handy that could answer the question.
> Let's just stick a test function in there for a day and see
> which animals fail.

I think we pretty much know the answer already, anything before 2013
will fail. The question is more whether that's problematic for the
people building on windows.  My theory, quoted by Peter upthread, is
that it shouldn't be problematic because 2013 can build binaries that
run on XP etc.

Greetings,

Andres Freund



Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-08-21 Thread Tom Lane
Peter Eisentraut  writes:
> So, does anyone with Windows build experience want to comment on this?
> The proposal is to desupport anything older than (probably) MSVC 2013,
> or alternatively anything that cannot compile the attached test file.

We've got a buildfarm handy that could answer the question.
Let's just stick a test function in there for a day and see
which animals fail.

regards, tom lane



Re: [PATCH] Add regress test for pg_read_all_stats role

2018-08-21 Thread Alexandra Ryzhevich
>
> - There is no need for the initial DROP ROLE commands as those already
> get dropped at the end of the tests.
>
Removed.

- There is already rolenames.sql which has a tiny coverage for default
> roles, why not just using it?
>
Moved changes to rolenames.sql.


> +-- should fail because regress_role_nopriv has not CONNECT permission
> on this db
> +SELECT pg_database_size('regression') > 0 AS canread;
> +ERROR:  permission denied for database regression
> +-- should fail because regress_role_nopriv has not CREATE permission on
> this tablespace
> +SELECT pg_tablespace_size('pg_global') > 0 AS canread;
> +ERROR:  permission denied for tablespace pg_global
> Why is that part of a test suite for default roles?
>
Just to check if changes broke something. I haven't find these checks in
other
regress tests. In other way we get only positive tests. If this is not
needed then
should I remove all the checks for regress_role_nopriv role or negative
regress_role_nopriv tests only?

2) is easy to be triggered as a negative test (which fails), less as a
> positive test.  In order to make a positive test failure-proof with
> installcheck you would need to have a parameter which can be changed by
> a superuser at session level which gets updated to a certain value, and
> would fail to show for another user, so you could use one which is
> GUC_SUPERUSER_ONLY and of category PGC_SUSET, like
> session_preload_libraries or dynamic_preload_libraries.  Still that's
> pretty restrictive, and would only test one out of the three code paths
> available.
>
Changed to use session_preload_libraries.

Alexandra
From 9d16cdb419b8cea547a40bf4f188b0bd744de310 Mon Sep 17 00:00:00 2001
From: Alexandra Ryzhevich 
Date: Tue, 21 Aug 2018 16:01:30 +0100
Subject: [PATCH 1/1] Add regress tests for default monitoring roles

---
 src/test/regress/expected/rolenames.out | 98 +
 src/test/regress/sql/rolenames.sql  | 70 ++
 2 files changed, 168 insertions(+)

diff --git a/src/test/regress/expected/rolenames.out b/src/test/regress/expected/rolenames.out
index 7daba9fc12..67a9cb75a2 100644
--- a/src/test/regress/expected/rolenames.out
+++ b/src/test/regress/expected/rolenames.out
@@ -944,9 +944,107 @@ SELECT proname, proacl FROM pg_proc WHERE proname LIKE 'testagg_';
  testagg9 | 
 (9 rows)
 
+-- DEFAULT MONITORING ROLES
+CREATE ROLE regress_role_haspriv;
+CREATE ROLE regress_role_nopriv;
+-- pg_read_all_stats
+REVOKE CONNECT ON DATABASE regression FROM public;
+GRANT pg_read_all_stats TO regress_role_haspriv;
+SET SESSION AUTHORIZATION regress_role_haspriv;
+-- should not fail because regress_role_haspriv is a member of pg_read_all_stats
+SELECT pg_database_size('regression') > 0 AS canread;
+ canread 
+-
+ t
+(1 row)
+
+-- should not fail because regress_role_haspriv is a member of pg_read_all_stats
+SELECT pg_tablespace_size('pg_global') > 0 AS canread;
+ canread 
+-
+ t
+(1 row)
+
+-- should not fail because it is a default tablespace
+SELECT pg_tablespace_size('pg_default') > 0 AS canread;
+ canread 
+-
+ t
+(1 row)
+
+-- should be true because regress_role_haspriv is member of pg_read_all_stats
+SELECT COUNT(*) = 0 AS haspriv FROM pg_stat_activity WHERE "query" = '';
+ haspriv 
+-
+ t
+(1 row)
+
+SET SESSION AUTHORIZATION regress_role_nopriv;
+-- should fail because regress_role_nopriv has not CONNECT permission on this db
+SELECT pg_database_size('regression') > 0 AS canread;
+ERROR:  permission denied for database regression
+-- should fail because regress_role_nopriv has not CREATE permission on this tablespace
+SELECT pg_tablespace_size('pg_global') > 0 AS canread;
+ERROR:  permission denied for tablespace pg_global
+-- should not fail because it is a default tablespace
+SELECT pg_tablespace_size('pg_default') > 0 AS canread;
+ canread 
+-
+ t
+(1 row)
+
+-- should be false because current session belongs to superuser
+SELECT COUNT(*) = 0 AS haspriv FROM pg_stat_activity WHERE "query" = '';
+ haspriv 
+-
+ f
+(1 row)
+
+RESET SESSION AUTHORIZATION;
+GRANT CONNECT ON DATABASE regression TO public;
+REVOKE pg_read_all_stats FROM regress_role_haspriv;
+-- pg_read_all_settings
+GRANT pg_read_all_settings TO regress_role_haspriv;
+BEGIN;
+-- session_preload_libraries is of PGC_SUSET category
+SET LOCAL session_preload_libraries TO '/tmp/';
+-- vacuum_cost_delay is of PGC_USERSET category
+SET LOCAL vacuum_cost_delay TO 40;
+SET SESSION AUTHORIZATION regress_role_haspriv;
+-- should not fail because regress_role_haspriv is a member of pg_read_all_settings
+SHOW session_preload_libraries;
+ session_preload_libraries 
+---
+ "/tmp/"
+(1 row)
+
+-- should not fail because this config option has not GUC_SUPERUSER_ONLY flag
+SHOW vacuum_cost_delay;
+ vacuum_cost_delay 
+---
+ 40ms
+(1 row)
+
+SET SESSION AUTHORIZATION regress_role_nopriv;
+-- should not fail because this config option has not GUC_SUPERUSER_ONLY flag
+SH

Re: Improve behavior of concurrent ANALYZE/VACUUM

2018-08-21 Thread Bossart, Nathan
On 8/20/18, 8:29 PM, "Michael Paquier"  wrote:
>> In short, my vote would be to maintain the current behavior for now
>> and to bring up any logging improvements separately.
>
> On the other hand, it would be useful for the user to know exactly what
> is getting skipped.  For example if VACUUM ANALYZE is used then both
> operations would happen, but now the user would only know that VACUUM
> has been skipped, and may miss the fact that ANALYZE was not attempted.
> Let's do as you suggest at the end, aka if both VACOPT_VACUUM and
> VACOPT_ANALYZE are passed down to vacuum_is_relation_owner, then only
> the log for VACUUM is generated, which is consistent.  Any other changes
> could happen later on if necessary.

Sounds good.

> If we don't want to change the current behavior, then one simple
> solution would be close to what you mention, aka skip adding the
> partitioned table to the list, include *all* the partitions in the list
> as we cannot sanely check their ACLs at this stage, and rely on the
> checks already happening in vacuum_rel() and analyze_rel().  This would
> cause the original early lock attempts to not be solved for partitions,
> which is why the approach taken in the patches makes the most sense.

I think my biggest concern with this approach is that we'd be
introducing inconsistent behavior whenever there are concurrent
changes.  If a user never had permissions to VACUUM the partitioned
table, the partitions are skipped outright.  However, if the user
loses permissions to VACUUM the partitioned table between
expand_vacuum_rel() and vacuum_rel(), we'll still attempt to VACUUM
each individual partition.

I'll admit I don't have a great alternative proposal that doesn't
involve adding deadlock risk or complexity, but it still seems worth
mulling over.

> I have split the patch into two parts:
> - 0001 includes new tests which generate WARNING messages for VACUUM,
> ANALYZE and VACUUM (ANALYZE).  That's useful separately.

0001 looks good to me.

> - 0002 is the original patch discussed here.

I'd suggest even splitting 0002 into two patches: one for refactoring
the existing permissions checks into vacuum_is_relation_owner() and
another for the new checks.

+# The role doesn't have privileges to vacuum the table, so VACUUM should
+# immediately skip the table without waiting for a lock.

Can we add tests for concurrent changes that cause the relation to be
skipped in vacuum_rel() and analyze_rel() instead of
expand_vacuum_rel()?

Nathan



Re: Pre-v11 appearances of the word "procedure" in v11 docs

2018-08-21 Thread Peter Eisentraut
On 17/08/2018 21:57, Peter Geoghegan wrote:
> On Fri, Aug 17, 2018 at 7:15 AM, Peter Eisentraut
>  wrote:
>> Attached are my proposed patches.
> 
> I take it that you propose all 3 for backpatch to v11?

yes

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-08-21 Thread Peter Eisentraut
On 16/08/2018 15:00, Andres Freund wrote:
>> According to my research (completely untested in practice), you need
>> 2010 for mixed code and declarations and 2013 for named initialization
>> of structs.
>>
>> I wonder what raising the msvc requirement would imply for supporting
>> older Windows versions.
> One relevant tidbit is that afaict 2013 still allows *targeting* older
> versions of windows, down to XP and 2003, while requiring a newer
> platforms to run. See:
> https://docs.microsoft.com/en-us/visualstudio/productinfo/vs2013-compatibility-vs
> I don't know if that's hard to do, but I strongly suspect that the
> existing installers already do that (otherwise supporting newer versions
> would likely require separate builds).

So, does anyone with Windows build experience want to comment on this?

The proposal is to desupport anything older than (probably) MSVC 2013,
or alternatively anything that cannot compile the attached test file.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
struct foo
{
int a;
int b;
};

struct foo f = {
.a = 1,
.b = 2,
};

int
bar()
{
int x;

x = 1;

int y;

y = 2;

for (int i = 0; i < 5; i++)
;

return 0;
}


Re: remove ATTRIBUTE_FIXED_PART_SIZE

2018-08-21 Thread Peter Eisentraut
On 20/08/2018 15:14, Tom Lane wrote:
> I agree this is all moot as long as there's no pad bytes.  What I'm
> trying to figure out is if we need to put in place some provisions
> to prevent there from being pad bytes at the end of any catalog struct.
> According to what Andres is saying, it seems like we do (at least for
> ones with varlena fields).

Yes, I think there could be a problem.  I took a brief look through the
catalogs, and while there are plenty of catalogs with trailing padding,
finding that in combination with trailing varlena fields that might
legitimately be all null in practice might require a closer look.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ALTER TABLE on system catalogs

2018-08-21 Thread Andres Freund
On 2018-08-21 17:04:41 +0200, Peter Eisentraut wrote:
> On 20/08/2018 15:34, Tom Lane wrote:
> > Andres Freund  writes:
> >> On 2018-08-20 14:38:25 +0200, Peter Eisentraut wrote:
> >>> Do you have an alternative in mind?
> > 
> >> One is to just not do anything. I'm not sure I'm on board with the goal
> >> of changing things to make DDL on system tables more palatable.
> > 
> > FWIW, I agree with doing as little as possible here.  I'd be on board
> > with Andres' suggestion of just swapping the two code stanzas so that
> > the no-op case doesn't error out.  As soon as you go beyond that, you
> > are in wildly unsafe and untested territory.
> 
> That doesn't solve the original problem, which is being able to set
> reloptions on pg_attribute, because pg_attribute doesn't have a toast
> table but would need one according to existing rules.

I still think it's wrong to work around this than to actually fix the
issue with pg_attribute not having a toast table.


> Attached is a patch that instead moves those special cases into
> needs_toast_table(), which was one of the options suggested by Andres.
> There were already similar checks there, so I guess this makes a bit of
> sense.

The big difference is that that then only takes effect when called with
check=true. The callers without it, importantly NewHeapCreateToastTable
& NewRelationCreateToastTable, then won't run into this check. But still
into the error (see below).


> @@ -145,21 +146,6 @@ create_toast_table(Relation rel, Oid toastOid, Oid 
> toastIndexOid,
>   ObjectAddress baseobject,
>   toastobject;
>  
> - /*
> -  * Toast table is shared if and only if its parent is.
> -  *
> -  * We cannot allow toasting a shared relation after initdb (because
> -  * there's no way to mark it toasted in other databases' pg_class).
> -  */
> - shared_relation = rel->rd_rel->relisshared;
> - if (shared_relation && !IsBootstrapProcessingMode())
> - ereport(ERROR,
> - 
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> -  errmsg("shared tables cannot be toasted after 
> initdb")));

This error check imo shouldn't be removed, but moved down.


Greetings,

Andres Freund



Re: remove ATTRIBUTE_FIXED_PART_SIZE

2018-08-21 Thread Peter Eisentraut
On 20/08/2018 12:46, Andres Freund wrote:
> static VacAttrStats *
> examine_attribute(Relation onerel, int attnum, Node *index_expr)
> {
> ...
>   /*
>* Create the VacAttrStats struct.  Note that we only have a copy of the
>* fixed fields of the pg_attribute tuple.
>*/
>   stats = (VacAttrStats *) palloc0(sizeof(VacAttrStats));
>   stats->attr = (Form_pg_attribute) palloc(ATTRIBUTE_FIXED_PART_SIZE);
>   memcpy(stats->attr, attr, ATTRIBUTE_FIXED_PART_SIZE);
> 
> Accesses to stats->attr can legitimately assume that the padding at the
> tail end of attr is present (i.e. widening a load / store).

Yeah, that's probably just broken.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WaitForOlderSnapshots refactoring

2018-08-21 Thread Peter Eisentraut
On 20/08/2018 14:39, Andres Freund wrote:
>> The question is where to put it.  This patch just leaves it static in
>> indexcmds.c, which doesn't help other uses.  A sensible place might be a
>> new src/backend/commands/common.c.  Or we make it non-static in
>> indexcmds.c when the need arises.
> Why not move it to procarray.c?  Most of the referenced functionality
> resides there IIRC.

I was thinking about that, too.  I thought that that would create a
circular dependency between lock.c and procarray.c, but seeing that
that's already the case, I guess it's OK.  (lock.c includes procarray.h,
but procarray.c uses stuff from lock.h, even though it doesn't include
it directly.)  I'll rework the patch accordingly.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [GSoC] Summery of pg performance farm

2018-08-21 Thread Mark Wong
On Sun, Aug 19, 2018 at 12:20:37PM -0300, Alvaro Herrera wrote:
> On 2018-Aug-19, Hongyuan Ma wrote:
> > 2. Implementation of the data report related to the list page. Compare each
> > metrics whith the previous results. If any of the metrics are a 5%
> > improvement( or regression),  there is one aspect that is progressive (or
> > regressive). This means there may be aspects of "improvement", "regression"
> > and "status quo" in one test result.This is the report List page for an
> > example: http://140.211.168.111/#/machineInfo/pmJEjJjSk3WREM3Q
> 
> Great stuff!
> 
> I think the visualization that many had in mind was that the numbers
> would be displayed in charts there time is the horizontal axis, and the
> numerical performance result number is the other axis.   That way, we
> can see how the results go up or down with commits.
> 
> Individual happy/sad faces for individual commits are not enough to see
> the bigger picture.

I advised Hongyuan to try something simple here at first.  My initial
thought was a quick indicator (to not take up too much space on the
screen) and to drill down into the individual plant specifics to view
more details of history.

pgbench is running read-only and read-write tests with scale factors at
10, 1 x memory, and 2 x memory.  We could reduce the variations of the
tests if folks feel that would make more sense.  I thought the current
number of variations might be too many things to trend on this page, but
we can change that.

Regards,
Mark
-- 
Mark Wonghttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, RemoteDBA, Training & Services



ALTER TABLE on system catalogs

2018-08-21 Thread Peter Eisentraut
On 20/08/2018 15:34, Tom Lane wrote:
> Andres Freund  writes:
>> On 2018-08-20 14:38:25 +0200, Peter Eisentraut wrote:
>>> Do you have an alternative in mind?
> 
>> One is to just not do anything. I'm not sure I'm on board with the goal
>> of changing things to make DDL on system tables more palatable.
> 
> FWIW, I agree with doing as little as possible here.  I'd be on board
> with Andres' suggestion of just swapping the two code stanzas so that
> the no-op case doesn't error out.  As soon as you go beyond that, you
> are in wildly unsafe and untested territory.

That doesn't solve the original problem, which is being able to set
reloptions on pg_attribute, because pg_attribute doesn't have a toast
table but would need one according to existing rules.

Attached is a patch that instead moves those special cases into
needs_toast_table(), which was one of the options suggested by Andres.
There were already similar checks there, so I guess this makes a bit of
sense.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From c83d2f64195d93991c129812d91dfc6118bae44b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 18 May 2018 17:05:27 -0400
Subject: [PATCH v2 1/2] Ignore attempts to add TOAST table to shared or
 catalog tables

Running ALTER TABLE on any table will check if a TOAST table needs to be
added.  On shared tables, this would previously fail, thus effectively
disabling ALTER TABLE for those tables.  On (non-shared) system
catalogs, on the other hand, it would add a TOAST table, even though we
don't really want TOAST tables on some system catalogs.  In some cases,
it would also fail with an error "AccessExclusiveLock required to add
toast table.", depending on what locks the ALTER TABLE actions had
already taken.

So instead, just ignore attempts to add TOAST tables to such tables,
outside of bootstrap mode, pretending they don't need one.

This allows running ALTER TABLE on such tables without messing up the
TOAST situation.  (All this still requires allow_system_table_mods,
which is independent of this.)
---
 src/backend/catalog/toasting.c | 42 +-
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 3baaa08238..c94d7ef9a5 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -17,6 +17,7 @@
 #include "access/tuptoaster.h"
 #include "access/xact.h"
 #include "catalog/binary_upgrade.h"
+#include "catalog/catalog.h"
 #include "catalog/dependency.h"
 #include "catalog/heap.h"
 #include "catalog/index.h"
@@ -145,21 +146,6 @@ create_toast_table(Relation rel, Oid toastOid, Oid 
toastIndexOid,
ObjectAddress baseobject,
toastobject;
 
-   /*
-* Toast table is shared if and only if its parent is.
-*
-* We cannot allow toasting a shared relation after initdb (because
-* there's no way to mark it toasted in other databases' pg_class).
-*/
-   shared_relation = rel->rd_rel->relisshared;
-   if (shared_relation && !IsBootstrapProcessingMode())
-   ereport(ERROR,
-   
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
-errmsg("shared tables cannot be toasted after 
initdb")));
-
-   /* It's mapped if and only if its parent is, too */
-   mapped_relation = RelationIsMapped(rel);
-
/*
 * Is it already toasted?
 */
@@ -259,6 +245,12 @@ create_toast_table(Relation rel, Oid toastOid, Oid 
toastIndexOid,
binary_upgrade_next_toast_pg_type_oid = InvalidOid;
}
 
+   /* Toast table is shared if and only if its parent is. */
+   shared_relation = rel->rd_rel->relisshared;
+
+   /* It's mapped if and only if its parent is, too */
+   mapped_relation = RelationIsMapped(rel);
+
toast_relid = heap_create_with_catalog(toast_relname,

   namespaceid,

   rel->rd_rel->reltablespace,
@@ -398,7 +390,6 @@ create_toast_table(Relation rel, Oid toastOid, Oid 
toastIndexOid,
  * (1) there are any toastable attributes, and (2) the maximum length
  * of a tuple could exceed TOAST_TUPLE_THRESHOLD.  (We don't want to
  * create a toast table for something like "f1 varchar(20)".)
- * No need to create a TOAST table for partitioned tables.
  */
 static bool
 needs_toast_table(Relation rel)
@@ -410,9 +401,28 @@ needs_toast_table(Relation rel)
int32   tuple_length;
int i;
 
+   /*
+* No need to create a TOAST table for partitioned tables.
+*/
if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
return false;
 
+   /*
+*

Re: Proposal: SLRU to Buffer Cache

2018-08-21 Thread Andres Freund
Hi,

On 2018-08-21 09:53:21 -0400, Shawn Debnath wrote:
> > I was wondering what the point of exposing the OIDs to users in a
> > catalog would be though.  It's not necessary to do that to reserve
> > them (and even if it were, pg_database would be the place): the OIDs
> > we choose for undo, clog, ... just have to be in the system reserved
> > range to be safe from collisions.

Maybe I'm missing something, but how are conflicts prevented just by
being in the system range?  There's very commonly multiple patches
trying to use the same oid, and that is just discovered by the
'duplicate_oids' script. But if there's no catalog representation, I
don't see how that'd discover them?



> > I suppose one benefit would be the
> > ability to join eg pg_buffer_cache against it to get a human readable
> > name like "clog", but that'd be slightly odd because the DB OID field
> > would refer to entries in pg_database or pg_storage_manager depending
> > on the number range.

> Good points. However, there are very few cases where our internal 
> representation using DB OIDs will be exposed, one such being 
> pg_buffercache. Wondering if updating the documentation here would be 
> sufficient as pg_buffercache is an extension used by developers and DBEs 
> rather than by consumers. We can circle back to this after the initial 
> set of patches are out.

Showing the oids in pg_database or such seems like it'd make it a bit
harder to change later because people rely on things like joining
against it.  I don't think I like that.  I'm kinda inclined to something
somewhat crazy like instead having a reserved & shared pg_class entry or
such.  Don't like that that much either. Hm.


> > >   5. Due to the on-disk format changes, simply copying the segments
> > >  during upgrade wouldn't work anymore. Given the nature of data
> > >  stored within SLRU segments today, we can extend pg_upgrade to
> > >  translate the segment files by scanning from relfrozenxid and
> > >  relminmxid and recording the corresponding values at the new
> > >  offsets in the target segments.
> > 
> > +1
> > 
> > (Hmm, if we're going to change all this stuff, I wonder if there would
> > be any benefit to switching to 64 bit xids for the xid-based SLRUs
> > while we're here...)
> 
> Do you mean switching or reserving space for it on the block? The latter 
> I hope :-)

I'd make the addressing work in a way that never requires wraparounds,
but instead allows trimming at the beginning. That shouldn't result in
any additional space, while allowing to fully switch to 64bit xids.

Greetings,

Andres Freund



Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2018-08-21 Thread Alexander Korotkov
On Fri, Aug 17, 2018 at 9:55 PM Tom Lane  wrote:
> But then you are injecting bad pages into the shared buffer arena.
> In any case, depending on that behavior seems like a bad idea, because
> it's a pretty questionable kluge in itself.
>
> Another point is that the truncation code attempts to remove all
> to-be-truncated-away pages from the shared buffer arena, but that only
> works if nobody else is loading such pages into shared buffers
> concurrently.  In the presence of concurrent scans, we might be left
> with valid-looking buffers for pages that have been truncated away
> on-disk.  That could cause all sorts of fun later.  Yeah, the buffers
> should contain only dead tuples ... but, for example, they might not
> be hinted dead.  If somebody sets one of those hint bits and then
> writes the buffer back out to disk, you've got real problems.

I didn't yet get how that's possible... count_nondeletable_pages()
ensures that every to-be-truncated-away page doesn't contain any used
item.  From reading [1] I got that we might have even live tuples if
truncation was failed.  But if truncation wasn't failed, we shouldn't
get this hint problem.  Please, correct me if I'm wrong.

After reading [1] and [2] I got that there are at least 3 different
issues with heap truncation:
1) Data corruption on file truncation error (explained in [1]).
2) Expensive scanning of the whole shared buffers before file truncation.
3) Cancel of read-only queries on standby even if hot_standby_feedback
is on, caused by replication of AccessExclusiveLock.

It seems that fixing any of those issues requires redesign of heap
truncation.  So, ideally redesign of heap truncation should fix all
the issues of above.  Or at least it should be understood how the rest
of issues can be fixed later using the new design.

I would like to share some my sketchy thoughts about new heap
truncation design.  Let's imagine we introduced dirty_barrier buffer
flag, which prevents dirty buffer from being written (and
correspondingly evicted).  Then truncation algorithm could look like
this:

1) Acquire ExclusiveLock on relation.
2) Calculate truncation point using count_nondeletable_pages(), while
simultaneously placing dirty_barrier flag on dirty buffers and saving
their numbers to array.  Assuming no writes are performing
concurrently, no to-be-truncated-away pages should be written from
this point.
3) Truncate data files.
4) Iterate past truncation point buffers and clean dirty and
dirty_barrier flags from them (using numbers we saved to array on step
#2).
5) Release relation lock.
*) On exception happen after step #2, iterate past truncation point
buffers and clean dirty_barrier flags from them (using numbers we
saved to array on step #2)

After heap truncation using this algorithm, shared buffers may contain
past-OEF buffers.  But those buffers are empty (no used items) and
clean.  So, real-only queries shouldn't hint those buffers dirty
because there are no used items.  Normally, these buffers will be just
evicted away from the shared buffer arena.  If relation extension will
happen short after heap truncation then some of those buffers could be
found after relation extension.  I think this situation could be
handled.  For instance, we can teach vacuum to claim page as new once
all the tuples were gone.

We're taking only exclusive lock here.  And assuming we will teach our
scans to treat page-past-OEF situation as no-visible-tuples-found,
concurrent read-only queries will work concurrently with heap
truncate.  Also we don't have to scan whole shared buffers, only past
truncation point buffers are scanned at step #2.  Later flags are
cleaned only from truncation point dirty buffers.  Data corruption on
truncation error also shouldn't happen as well, because we don't
forget to write any dirty buffers before insure that data files were
successfully truncated.

The problem I see with this approach so far is placing too many
dirty_barrier flags can affect concurrent activity.  In order to cope
that we may, for instance, truncate relation in multiple iterations
when we find too many past truncation point dirty buffers.

Any thoughts?

Links.
1. 
https://www.postgresql.org/message-id/flat/5BBC590AE8DF4ED1A170E4D48F1B53AC%40tunaPC
2. 
https://www.postgresql.org/message-id/flat/CAHGQGwE5UqFqSq1%3DkV3QtTUtXphTdyHA-8rAj4A%3DY%2Be4kyp3BQ%40mail.gmail.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: FETCH FIRST clause PERCENT option

2018-08-21 Thread Andres Freund
On 2018-08-21 15:47:07 +0300, Surafel Temesgen wrote:
> On Thu, Aug 16, 2018 at 5:34 PM, Andres Freund  wrote:
> 
> >
> > Won't that have rather massive issues with multiple evaluations of
> > clauses in the query?  Besides being really expensive?
> >
> The plan re-scane after first execution I can’t see issue for multiple
> execution of a clause in this case

Imagine volatile functions being used. That can be problematic because
of repeated side-effects, but also will lead to plainly wrong
results. Consider what happens with your approach where somebody does
something like WHERE value < random().

Greetings,

Andres Freund



Re: FETCH FIRST clause PERCENT option

2018-08-21 Thread Surafel Temesgen
On Thu, Aug 16, 2018 at 5:34 PM, Andres Freund  wrote:

>
> Won't that have rather massive issues with multiple evaluations of
> clauses in the query?  Besides being really expensive?
>
The plan re-scane after first execution I can’t see issue for multiple
execution of a clause in this case

>
> I think you'd have to instead spill the query results into a tuplestore
>
> The downside of it is all the result have to be stored even if needed
tuple is a fraction of it and also store it for longer so the choice became
memory or cpu utilization


>
> - Andres
>
Regards
Surafel


Re: Pluggable Storage - Andres's take

2018-08-21 Thread Andres Freund
Hi,

On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:
> On Sun, Aug 5, 2018 at 7:48 PM Andres Freund  wrote:
> > I'm currently in the process of rebasing zheap onto the pluggable
> > storage work. The goal, which seems to work surprisingly well, is to
> > find issues that the current pluggable storage patch doesn't yet deal
> > with.  I plan to push a tree including a lot of fixes and improvements
> > soon.
> >
> 
> Sorry for coming late to this thread.

No worries.


> That's good. Did you find any problems in porting zheap into pluggable
> storage? Does it needs any API changes or new API requirement?

A lot, yes. The big changes are:
- removal of HeapPageScanDesc
- introduction of explicit support functions for tablesample & bitmap scans
- introduction of callbacks for vacuum_rel, cluster

And quite a bit more along those lines.

> Does the new TupleTableSlot abstraction patches has fixed any of these
> issues in the recent thread [1]? so that I can look into the change of
> FDW API to return slot instead of tuple.

Yea, that'd be a good thing to start with.

Greetings,

Andres Freund



Re: Two proposed modifications to the PostgreSQL FDW

2018-08-21 Thread Chris Travers
On Mon, Aug 20, 2018 at 4:44 PM Tom Lane  wrote:

> Chris Travers  writes:
> > I am looking at trying to make two modifications to the PostgreSQL FDW
> and
> > would like feedback on this before I do.
>
> > 1.  INSERTMETHOD=[insert|copy] option on foreign table.
>
> > One significant limitation of the PostgreSQL FDW is that it does a
> prepared
> > statement insert on each row written which imposes a per-row latency.
> This
> > hits environments where there is significant latency or few latency
> > guarantees particularly hard, for example, writing to a foreign table
> that
> > might be physically located on another continent.  The idea is that
> > INSERTMETHOD would default to insert and therefore have no changes but
> > where needed people could specify COPY which would stream the data out.
> > Updates would still be unaffected.
>
> It seems unlikely to me that an FDW option would be at all convenient
> for this.  What about selecting it dynamically based on the planner's
> estimate of the number of rows to be inserted?
>
> A different thing we could think about is enabling COPY TO/FROM a
> foreign table.
>

Actually as I start to understand some aspects Andres's concern above,
there are issues beyond numbers of rows.  But yes, selecting dynamically
would be preferable.

Two major things I think we cannot support on this are RETURNING clauses
and ON CONFLICT clauses.  So anywhere we need to worry about those a copy
node could not be used.  So it is more complex than merely row estimates.

>
> > 2.  TWOPHASECOMMIT=[off|on] option
>
> > The second major issue that I see with PostgreSQL's foreign database
> > wrappers is the fact that there is no two phase commit which means that a
> > single transaction writing to a group of tables has no expectation that
> all
> > backends will commit or rollback together.  With this patch an option
> would
> > be applied to foreign tables such that they could be set to use two phase
> > commit  When this is done, the first write to each backend would
> register a
> > connection with a global transaction handler and a pre-commit and commit
> > hooks would be set up to properly process these.
>
> ENOINFRASTRUCTURE ... and the FDW pieces of that hardly seem like the
> place to start.
>

I disagree about the lack of infrastructure.  We have every piece of
infrastructure we need for a minimum viable offering.
1.  Two Phase Commit in PostgreSQL
2.  Custom Background Workers
3.  Pre/Post Commit/Rollback hooks for callbacks.

Those are sufficient to handle the vast majority of error cases.

The one thing we *might* want that we don't have is a startup process to
scan a directory of background worker status files and fire off appropriate
background workers on database start.  That hardly seems difficult though.


>
> regards, tom lane
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Two proposed modifications to the PostgreSQL FDW

2018-08-21 Thread Chris Travers
On Tue, Aug 21, 2018 at 8:42 AM Masahiko Sawada 
wrote:

> On Tue, Aug 21, 2018 at 1:47 AM Chris Travers 
> wrote:
> >
> >
> >
> > On Mon, Aug 20, 2018 at 4:41 PM Andres Freund 
> wrote:
> >>
> >> Hi,
> >>
> >> On 2018-08-20 16:28:01 +0200, Chris Travers wrote:
> >> > 2.  TWOPHASECOMMIT=[off|on] option
> >>
> >> > The second major issue that I see with PostgreSQL's foreign database
> >> > wrappers is the fact that there is no two phase commit which means
> that a
> >> > single transaction writing to a group of tables has no expectation
> that all
> >> > backends will commit or rollback together.  With this patch an option
> would
> >> > be applied to foreign tables such that they could be set to use two
> phase
> >> > commit  When this is done, the first write to each backend would
> register a
> >> > connection with a global transaction handler and a pre-commit and
> commit
> >> > hooks would be set up to properly process these.
> >> >
> >> > On recommit a per-global-transaction file would be opened in the data
> >> > directory and prepare statements logged to the file.  On error, we
> simply
> >> > roll back our local transaction.
> >> >
> >> > On commit hook , we go through and start to commit the remote global
> >> > transactions.  At this point we make a best effort but track whether
> or not
> >> > we were successfully on all.  If successful on all, we delete the
> file.  If
> >> > unsuccessful we fire a background worker which re-reads the file and
> is
> >> > responsible for cleanup.  If global transactions persist, a SQL
> >> > administration function will be made available to restart the cleanup
> >> > process.  On rollback, we do like commit but we roll back all
> transactions
> >> > in the set.  The file has enough information to determine whether we
> should
> >> > be committing or rolling back on cleanup.
> >> >
> >> > I would like to push these both for Pg 12.  Is there any feedback on
> the
> >> > concepts and the problems first
> >>
>
> Thank you for the proposal. I agree that it's a major problem that
> postgres_fdw (or PostgreSQL core API) doesn't support two-phase
> commit.
>
> >> There's been *substantial* work on this. You should at least read the
> >> discussion & coordinate with the relevant developers.
> >
> >
> > I suppose I should forward this to them directly also.
> >
> > Yeah.   Also the transaction manager code for this I wrote while helping
> with a proof of concept for this copy-to-remote extension.
> >
> > There are a few big differences in implementation with the patches you
> mention and the disagreement was part of why I thought about going this
> direction.
> >
> > First, discussion of differences in implementation:
> >
> > 1.  I treat the local and remote transactions symmetrically and I make
> no assumptions about what might happen between prepare and an attempted
> local commit.
> >prepare goes into the precommit hook
> >commit goes into the commit hook and we never raise errors if it
> fails (because you cannot rollback at that point).  Instead a warning is
> raised and cleanup commences.
> >rollback goes into the rollback hook and we never raise errors if it
> fails (because you are already rolling back).
> >
> > 2.  By treating this as a property of a table rather than a property of
> a foreign data wrapper or a server, we can better prevent prepared
> transactions where they have not been enabled.
> >This also ensures that we know whether we are guaranteeing two phase
> commit or not by looking at the table.
> >
> > 3.  By making this opt-in it avoids a lot of problems with regards to
> incorrect configuration etc since if the DBA says "use two phase commit"
> and failed to enable prepared transactions on the other side...
> >
> > On to failure modes:
> >  1.  Its possible that under high load too many foreign transactions are
> prepared and things start rolling back instead of committing.  Oh well
> >  2.  In the event that a foreign server goes away between prepare and
> commit, we continue to retry via the background worker.  The background
> worker is very pessimistic and checks every remote system for the named
> transaction.
>
> If some participant servers fail during COMMIT PREPARED, will the
> client get a "committed"? or an "aborted"? If the client gets
> "aborted", that's not correct because the local changes are already
> committed at that point.


Ok so let's discuss this in more detail here.

You have basically 6 states a TPC global transaction can be in.
1.  We haven't gotten to the point of trying to commit (BEGIN)
2.  We are trying to commit (PREPARE)
3.  We have committed to committing all transactions (COMMIT)
4.  We have committed to rolling back all transactions (ROLLBACK)
5.  We have successfully committed OR rolled back all transactions
(COMPLETE)
6.  We tried to commit or rollback all transactions and got some errors
(INCOMPLETE)

During COMMIT PREPARED we cannot raise errors to PostgreSQL.  We have
already committed to

Re: A typo in guc.c

2018-08-21 Thread Kyotaro HORIGUCHI
At Tue, 21 Aug 2018 12:17:57 +0900, Michael Paquier  wrote 
in <20180821031757.gf2...@paquier.xyz>
> On Tue, Aug 21, 2018 at 11:58:41AM +0900, Kyotaro HORIGUCHI wrote:
> > The "user" should be "use".
> > 
> > As you(who?) know, this applies only 11 and dev.
> 
> Thanks, applied.

Thank you.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Slotification of partition tuple conversion

2018-08-21 Thread Amit Khandekar
On 21 August 2018 at 08:12, Amit Langote  wrote:
> Here are some comments on the patch:

Thanks for the review.

>
> +ConvertTupleSlot
>
> Might be a good idea to call this ConvertSlotTuple?

I thought the name 'ConvertTupleSlot' emphasizes the fact that we are
operating rather on a slot without having to touch the tuple wherever
possible. Hence I chose the name. But I am open to suggestions if
there are more votes against this.

>
> +/*
> + * Get the tuple in the per-tuple context. Also, we
> cannot call
> + * ExecMaterializeSlot(), otherwise the tuple will get freed
> + * while storing the next tuple.
> + */
> +oldcontext =
> MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
> +tuple = ExecCopySlotTuple(slot);
> +MemoryContextSwitchTo(oldcontext);
>
> The comment about why we cannot call ExecMaterializeSlot is a bit unclear.
>  Maybe, the comment doesn't need to mention that?  Instead, expand a bit
> more on why the context switch here or how it interacts with recently
> tuple buffering (0d5f05cde01)?

Done :

-   * Get the tuple in the per-tuple context. Also, we cannot call
-   * ExecMaterializeSlot(), otherwise the tuple will get freed
-   * while storing the next tuple.
+  * Get the tuple in the per-tuple context, so that it will be
+  * freed after each batch insert.

Specifically, we could not call ExecMaterializeSlot() because it sets
tts_shouldFree to true which we don't want for batch inserts.

>
> Seeing that all the sites in the partitioning code that previously called
> do_convert_tuple() now call ConvertTupleSlot, I wonder if we don't need a
> full TupleConversionMap, just the attribute map in it.  We don't need the
> input/output Datum and bool arrays in it, because we'd be using the ones
> from input and output slots of ConvertTupleSlot.  So, can we replace all
> instances of TupleConversionMap in the partitioning code and data
> structures by AttributeNumber arrays.

Yeah, I earlier thought I could get rid of do_convert_tuple()
altogether. But there are places where we currently deal with tuples
rather than slots. And here, we could not replace do_convert_tuple()
unless we slotify the surrounding code similar to how you did in the
Allocate-dedicated-slots patch. E.g.  ExecEvalConvertRowtype() and
AfterTriggerSaveEvent() deals with tuples so the do_convert_tuple()
calls there couldn't be replaced.

So I think we can work towards what you have in mind in form of
incremental patches.

>
> Also, how about putting ConvertTupleSlot in execPartition.c and exporting
> it in execPartition.h, instead of tupconvert.c and tupconvert.h, resp.?

I think the goal of ConvertTupleSlot() is to eventually replace
do_convert_tuple() as well, so it would look more of a general
conversion rather than specific for partitions. Hence I think it's
better to have it in tupconvert.c .



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


tup_convert_v3.patch
Description: Binary data


Re: Two proposed modifications to the PostgreSQL FDW

2018-08-21 Thread Amit Langote
On 2018/08/20 23:43, Tom Lane wrote:
> Chris Travers  writes:
>> I am looking at trying to make two modifications to the PostgreSQL FDW and
>> would like feedback on this before I do.
> 
>> 1.  INSERTMETHOD=[insert|copy] option on foreign table.
> 
>> One significant limitation of the PostgreSQL FDW is that it does a prepared
>> statement insert on each row written which imposes a per-row latency.  This
>> hits environments where there is significant latency or few latency
>> guarantees particularly hard, for example, writing to a foreign table that
>> might be physically located on another continent.  The idea is that
>> INSERTMETHOD would default to insert and therefore have no changes but
>> where needed people could specify COPY which would stream the data out.
>> Updates would still be unaffected.
> 
> It seems unlikely to me that an FDW option would be at all convenient
> for this.  What about selecting it dynamically based on the planner's
> estimate of the number of rows to be inserted?
> 
> A different thing we could think about is enabling COPY TO/FROM a
> foreign table.

Fwiw, the following commit did introduce COPY FROM support for foreign
tables, although using a FDW INSERT interface, so not exactly optimized
for bulk-loading yet.

commit 3d956d9562aa4811b5eaaaf5314d361c61be2ae0
Author: Robert Haas 
Date:   Fri Apr 6 19:16:11 2018 -0400

Allow insert and update tuple routing and COPY for foreign tables.

Thanks,
Amit