date:20161020

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-10-20 Thread Ashutosh Bapat

On Wed, Oct 19, 2016 at 9:17 PM, Robert Haas  wrote:
> On Thu, Oct 13, 2016 at 7:27 AM, Amit Langote
>  wrote:
>> However, when I briefly read the description in "Transaction Management in
>> the R* Distributed Database Management System (C. Mohan et al)" [2], it
>> seems that what Ashutosh is saying might be a correct way to proceed after
>> all:
>
> I think Ashutosh is mostly right, but I think there's a lot of room to
> doubt whether the design of this patch is good enough that we should
> adopt it.
>
> Consider two possible designs.  In design #1, the leader performs the
> commit locally and then tries to send COMMIT PREPARED to every standby
> server afterward, and only then acknowledges the commit to the client.
> In design #2, the leader performs the commit locally and then
> acknowledges the commit to the client at once, leaving the task of
> running COMMIT PREPARED to some background process.  Design #2
> involves a race condition, because it's possible that the background
> process might not complete COMMIT PREPARED on every node before the
> user submits the next query, and that query might then fail to see
> supposedly-committed changes.  This can't happen in design #1.  On the
> other hand, there's always the possibility that the leader's session
> is forcibly killed, even perhaps by pulling the plug.  If the
> background process contemplated by design #2 is well-designed, it can
> recover and finish sending COMMIT PREPARED to each relevant server
> after the next restart.  In design #1, that background process doesn't
> necessarily exist, so inevitably there is a possibility of orphaning
> prepared transactions on the remote servers, which is not good. Even
> if the DBA notices them, it won't be easy to figure out whether to
> commit them or roll them back.
>
> I think this thought experiment shows that, on the one hand, there is
> a point to waiting for commits on the foreign servers, because it can
> avoid the anomaly of not seeing the effects of your own commits.  On
> the other hand, it's ridiculous to suppose that every case can be
> handled by waiting, because that just isn't true.  You can't be sure
> that you'll be able to wait long enough for COMMIT PREPARED to
> complete, and even if that works out, you may not want to wait
> indefinitely for a dead server.  Waiting for a ROLLBACK PREPARED has
> no value whatsoever unless the system design is such that failing to
> wait for it results in the ROLLBACK PREPARED never getting performed
> -- which is a pretty poor excuse.
>
> Moreover, there are good reasons to think that doing this kind of
> cleanup work in the post-commit hooks is never going to be acceptable.
> Generally, the post-commit hooks need to be no-fail, because it's too
> late to throw an ERROR.  But there's very little hope that a
> connection to a remote server can be no-fail; anything that involves a
> network connection is, by definition, prone to failure.  We can try to
> guarantee that every single bit of code that runs in the path that
> sends COMMIT PREPARED only raises a WARNING or NOTICE rather than an
> ERROR, but that's going to be quite difficult to do: even palloc() can
> throw an error.  And what about interrupts?  We don't want to be stuck
> inside this code for a long time without any hope of the user
> recovering control of the session by pressing ^C, but of course the
> way that works is it throws an ERROR, which we can't handle here.  We
> fixed a similar issue for synchronous replication in
> 9a56dc3389b9470031e9ef8e45c95a680982e01a by making an interrupt emit a
> WARNING in that case and then return control to the user.  But if we
> do that here, all of the code that every FDW emits has to be aware of
> that rule and follow it, and it just adds to the list of ways that the
> user backend can escape this code without having cleaned up all of the
> prepared transactions on the remote side.

Hmm, IIRC, my patch and possibly patch by Masahiko-san and Vinayak,
tries to resolve prepared transactions in post-commit code. I agree
with you here, that it should be avoided and the backend should take
over the job of resolving transactions.

>
> It seems to me that the only way to really make this feature robust is
> to have a background worker as part of the equation.  The background
> worker launches at startup and looks around for local state that tells
> it whether there are any COMMIT PREPARED or ROLLBACK PREPARED
> operations pending that weren't completed during the last server
> lifetime, whether because of a local crash or remote unavailability.
> It attempts to complete those and retries periodically.  When a new
> transaction needs this type of coordination, it adds the necessary
> crash-proof state and then signals the background worker.  If
> appropriate, it can wait for the background worker to complete, just
> like a CHECKPOINT waits for the checkpointer to finish -- but if the
> CHECKPOINT

Re: [HACKERS] Fun fact about autovacuum and orphan temp tables

2016-10-20 Thread Michael Paquier

On Fri, Oct 21, 2016 at 2:29 PM, Michael Paquier
 wrote:
> On Thu, Oct 20, 2016 at 9:30 PM, Constantin S. Pan  wrote:
>> I tried to fix the problem with a new backend not being
>> able to reuse a temporary namespace when it contains
>> thousands of temporary tables. I disabled locking of objects
>> during namespace clearing process. See the patch attached.
>> Please tell me if there are any reasons why this is wrong.
>
> That's invasive. I am wondering if a cleaner approach here would be a
> flag in deleteOneObject() that performs the lock cleanup, as that's
> what you are trying to solve here.
>
>> I also added a GUC variable and changed the condition in
>> autovacuum to let it nuke orphan tables on its way.
>> See another patch attached.
>
> It seems to me that you'd even want to make the drop of orphaned
> tables mandatory once it is detected even it is not a wraparound
> autovacuum. Dangling temp tables have higher chances to hit users than
> diagnostic of orphaned temp tables after a crash. (A background worker
> could be used for existing versions to clean up that more aggressively
> actually)

You should as well add your patch to the next commit fest, so as to be
sure that it will attract more reviews and more attention:
https://commitfest.postgresql.org/11/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fun fact about autovacuum and orphan temp tables

2016-10-20 Thread Michael Paquier

On Thu, Oct 20, 2016 at 9:30 PM, Constantin S. Pan  wrote:
> I tried to fix the problem with a new backend not being
> able to reuse a temporary namespace when it contains
> thousands of temporary tables. I disabled locking of objects
> during namespace clearing process. See the patch attached.
> Please tell me if there are any reasons why this is wrong.

That's invasive. I am wondering if a cleaner approach here would be a
flag in deleteOneObject() that performs the lock cleanup, as that's
what you are trying to solve here.

> I also added a GUC variable and changed the condition in
> autovacuum to let it nuke orphan tables on its way.
> See another patch attached.

It seems to me that you'd even want to make the drop of orphaned
tables mandatory once it is detected even it is not a wraparound
autovacuum. Dangling temp tables have higher chances to hit users than
diagnostic of orphaned temp tables after a crash. (A background worker
could be used for existing versions to clean up that more aggressively
actually)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fun fact about autovacuum and orphan temp tables

2016-10-20 Thread Michael Paquier

On Thu, Sep 8, 2016 at 12:38 AM, Robert Haas  wrote:
> On Mon, Sep 5, 2016 at 1:14 PM, Bruce Momjian  wrote:
>> I don't think we look at those temp tables frequently enough to justify
>> keeping them around for all users.
>
> +1.  I think it would be much better to nuke them more aggressively.

+1 from here as well. Making the deletion of orphaned temp tables even
in the non-wraparound autovacuum case mandatory looks to be the better
move to me. I can see that it could be important to be able to look at
some of temp tables' data after a crash, but the argument looks weak
compared to the potential bloat of catalog tables because of those
dangling temp relations. And I'd suspect that there are far more users
who would like to see this removal more aggressive than users caring
about having a look at those orphaned tables after a crash.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove autovacuum GUC?

2016-10-20 Thread Craig Ringer

On 21 Oct. 2016 12:57 am, "Joshua D. Drake"  wrote:
>
> Hello,
>
> What about a simpler solution to all of this. Let's just remove it from
postgresql.conf. Out of sight. If someone needs to test they can but a
uneducated user won't immediately know what to do about that "autovacuum
process" and when they look it up the documentation is exceedingly blunt
about why to *not* turn it off.

Then they'll just do what I've seen at multiple sites:  create a cron job
that kills it as soon as it starts. Then their DB performance issues go
away ... for a while. By the time they're forced to confront it their DB is
immensely listed and barely staggering along, or has reached wraparound
shutdown. So we get the fun job of trying to fix it using special freeze
tools etc because they broke the normal ones...

We still have fsync=off available. If you want a user foot gun to crusade
against, start there. Even that's useful and legitimate though I wish it
were called enable_crash_safety = off. It's legit to use it in testing, in
data ingestion where you'll fsync afterward, in cloud deployments where you
rely on replication and the whole instance gets nuked if it crashes anyway.

There are similarly legit reasons to turn autovac off but the consequences
are less bad.

Personally what I think is needed here is to make monitoring and bloat
visibility not completely suck. So we can warn users if tables haven't been
vac'd in ages and have recent churn. And so they can easily SELECT a view
to get bloat estimates with an estimate of how much drift there could've
been since last vacuum.

Users turn off vacuum because they cannot see that it is doing anything
except wasting I/O and cpu. So:

* A TL;DR in the docs saying what vac does and why not to turn it off. In
particular warning that turning autovac off will make a slow SB get slower
even though it seems to help at first.

* A comment in the conf file with the same TL;DR. Comments are free, let's
use a few lines.

* Warn on startup when autovac is off?

Personally I wouldn't mind encouraging most users to prefer table or db
level autovac controls. Though we really need to make them more visible. If
that improved I wouldn't really mind removing the global autovac option
from the conf file though I'd prefer to just give it a decent comment.

Re: [HACKERS] FSM corruption leading to errors

2016-10-20 Thread Michael Paquier

On Thu, Oct 20, 2016 at 3:37 PM, Pavan Deolasee
 wrote:
> Just to clarify, I meant if we truncate the entire FSM then we'll need API
> to truncate VM as well so that VACUUM rebuilds everything completely. OTOH
> if we provide a function just to truncate FSM to match the size of the
> table, then we don't need to rebuild the FSM. So that's probably a better
> way to handle FSM corruption, as far as this particular issue is concerned.

To be honest, I think that just having in the release notes the method
that does not involve the use any extra extension or SQL routine is
fine. So we could just tell to users to:
1) Run something like the query you gave upthread, giving to the user
a list of the files that are corrupted. And add this query to the
release notes.
2) If anything is found, stop the server and delete the files manually.
3) Re-start the server.
OK, that's troublesome and costly for large relations, but we know
that's the safest way to go for any versions, and there is no need to
complicate the code with any one-time repairing extensions.

Speaking of which, I implemented a small extension able to truncate
the FSM up to the size of the relation as attached, but as I looked at
it SMGR_TRUNCATE_FSM has been introduced in 9.6 so its range of action
is rather limited... And I pushed as well a version on github:
https://github.com/michaelpq/pg_plugins/tree/master/pg_fix_truncation
The limitation range of such an extension is a argument good enough to
just rely on the stop/delete-FSM/start method to fix an instance and
let VACUUM do the rest of the work. That looks to work but use it at
your own risk.

This bug would be a good blog topic by the way...
-- 
Michael

pg_fix_truncation.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Dilip Kumar

On Thu, Oct 20, 2016 at 9:15 PM, Robert Haas  wrote:
> So here's my theory.  The whole reason why Tomas is having difficulty
> seeing any big effect from these patches is because he's testing on
> x86.  When Dilip tests on x86, he doesn't see a big effect either,
> regardless of workload.  But when Dilip tests on POWER, which I think
> is where he's mostly been testing, he sees a huge effect, because for
> some reason POWER has major problems with this lock that don't exist
> on x86.

Right, because on POWER we can see big contention on ClogControlLock
with 300 scale factor, even at 96 client count, but on X86 with 300
scan factor there is almost no contention on ClogControlLock.

However at 1000 scale factor we can see significant contention on
ClogControlLock on X86 machine.

I want to test on POWER with 1000 scale factor to see whether
contention on ClogControlLock become much worse ?

I will run this test and post the results.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Dilip Kumar

On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra
 wrote:

> In the results you've posted on 10/12, you've mentioned a regression with 32
> clients, where you got 52k tps on master but only 48k tps with the patch (so
> ~10% difference). I have no idea what scale was used for those tests,

That test was with scale factor 300 on POWER 4 socket machine. I think
I need to repeat this test with multiple reading to confirm it was
regression or run to run variation. I will do that soon and post the
results.

> and I
> see no such regression in the current results (but you only report results
> for some of the client counts).

This test is on X86 8 socket machine, At 1000 scale factor I have
given reading with all client counts (32,64,96,192), but at 300 scale
factor I posted only with 192 because on this machine (X86 8 socket
machine) I did not see much load on ClogControlLock at 300 scale
factor.
>
> Also, which of the proposed patches have you been testing?
I tested with GroupLock patch.

> Can you collect and share a more complete set of data, perhaps based on the
> scripts I use to do tests on the large machine with 36/72 cores, available
> at https://bitbucket.org/tvondra/hp05-results ?

I think from my last run I did not share data for -> X86 8 socket
machine, 300 scale factor, 32,64,96 client. I already have those data
so I ma sharing it. (Please let me know if you want to see at some
other client count, for that I need to run another test.)

Head:
scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 77233356
latency average: 0.746 ms
tps = 42907.363243 (including connections establishing)
tps = 42907.546190 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 300_32_ul.txt
 111757  |
   3666
   1289  LWLockNamed | ProcArrayLock
   1142  Lock| transactionid
318  LWLockNamed | CLogControlLock
299  Lock| extend
109  LWLockNamed | XidGenLock
 70  LWLockTranche   | buffer_content
 35  Lock| tuple
 29  LWLockTranche   | lock_manager
 14  LWLockTranche   | wal_insert
  1 Tuples only is on.
  1  LWLockNamed | CheckpointerCommLock

Group Lock Patch:

scaling factor: 300
query mode: prepared
number of clients: 32
number of threads: 32
duration: 1800 s
number of transactions actually processed: 77544028
latency average: 0.743 ms
tps = 43079.783906 (including connections establishing)
tps = 43079.960331 (excluding connections establishing
 112209  |
   3718
   1402  LWLockNamed | ProcArrayLock
   1070  Lock| transactionid
245  LWLockNamed | CLogControlLock
188  Lock| extend
 80  LWLockNamed | XidGenLock
 76  LWLockTranche   | buffer_content
 39  LWLockTranche   | lock_manager
 31  Lock| tuple
  7  LWLockTranche   | wal_insert
  1 Tuples only is on.
  1  LWLockTranche   | buffer_mapping

Head:
number of clients: 64
number of threads: 64
duration: 1800 s
number of transactions actually processed: 76211698
latency average: 1.512 ms
tps = 42339.731054 (including connections establishing)
tps = 42339.930464 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 300_64_ul.txt
 215734  |
   5106  Lock| transactionid
   3754  LWLockNamed | ProcArrayLock
   3669
   3267  LWLockNamed | CLogControlLock
661  Lock| extend
339  LWLockNamed | XidGenLock
310  Lock| tuple
289  LWLockTranche   | buffer_content
205  LWLockTranche   | lock_manager
 50  LWLockTranche   | wal_insert
  2  LWLockTranche   | buffer_mapping
  1 Tuples only is on.
  1  LWLockTranche   | proc

GroupLock patch:
scaling factor: 300
query mode: prepared
number of clients: 64
number of threads: 64
duration: 1800 s
number of transactions actually processed: 76629309
latency average: 1.503 ms
tps = 42571.704635 (including connections establishing)
tps = 42571.905157 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 300_64_ul.txt
 217840  |
   5197  Lock| transactionid
   3744  LWLockNamed | ProcArrayLock
   3663
966  Lock| extend
849  LWLockNamed | CLogControlLock
372  Lock| tuple
305  LWLockNamed | XidGenLock
199  LWLockTranche   | buffer_content
184  LWLockTranche   | lock_manager
 35  LWLockTranche   | wal_insert
  1 Tuples only is on.
  1  LWLockTranche   | proc
  1  LWLockTranche   | buffer_mapping

Head:
scaling factor: 300
query mode: prepared
number of clients: 96
number of threads: 96
duration: 1800 s
number of transactions actually processed: 77663593
latency average: 2.225 ms
tps = 43145.624864 (including connections establishing)
tps = 43145.838167 (excluding connections establishing)

 302317

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread David G. Johnston

On Thu, Oct 20, 2016 at 6:03 PM, Robert Haas  wrote:

> On Thu, Oct 20, 2016 at 3:46 PM, Tom Lane  wrote:
> > I'm mostly with Stephen on this.  As the names stand, they encourage
> > people to go look at the documentation,
> > https://www.postgresql.org/docs/devel/static/storage-file-layout.html
> > which will provide more information than you'd ever get out of any
> > reasonable directory name.
>
> Well, we could change them all to pg_a, pg_b, pg_c, pg_d, ... which
> would encourage that even more strongly.  But I don't think that
> proposal can be taken seriously.  Giving things meaningful names is a
> good practice in almost every case.
>

Those don't have the virtue of being at least somewhat 
m
nemonic
 like pg_xact.

I'll toss my lot in with Steven's and Tom's on this.

I have no problem continuing keeping with historical precedent and
allowing mnemonic abbreviations in our directory and file names at this
point.

David J.

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Michael Paquier

On Fri, Oct 21, 2016 at 10:03 AM, Robert Haas  wrote:
> On Thu, Oct 20, 2016 at 3:46 PM, Tom Lane  wrote:
>> I'm mostly with Stephen on this.  As the names stand, they encourage
>> people to go look at the documentation,
>> https://www.postgresql.org/docs/devel/static/storage-file-layout.html
>> which will provide more information than you'd ever get out of any
>> reasonable directory name.
>
> Well, we could change them all to pg_a, pg_b, pg_c, pg_d, ... which
> would encourage that even more strongly.  But I don't think that
> proposal can be taken seriously.  Giving things meaningful names is a
> good practice in almost every case.

Moving on with the topic of this thread... I would think that
"pg_xact" is what we are moving to rename pg_clog.

On top of that, after reading the thread, here are the options and
binaries that could be renamed for consistency with the renaming of
pg_xlog:
- pg_xlogdump -> pg_waldump
- pg_resetxlog -> pg_resetwal
- pg_receivexlog -> pg_receivewal
- initdb --xlogdir -> --waldir
- pg_basebackup --xlogmethod --xlogdir -> --walmethod --waldir
That's quite a number, and each change is trivial. Previous options
would be better kept for backward-compatibility. Personally, I see no
urge in changing all that and I am fine with just renaming the *log
directories of PGDATA for this thread. But as the point has been
raised, here are all things that could be done.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Typo in pgstat.c

2016-10-20 Thread vinayak


Hi,

Attached patch fixes a typo in pgstat.c

s/addtions/additions/g

Regards,

Vinayak Pokale

NTT Open Source Software Center



typo-pgstat-c.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Robert Haas

On Thu, Oct 20, 2016 at 3:46 PM, Tom Lane  wrote:
> I'm mostly with Stephen on this.  As the names stand, they encourage
> people to go look at the documentation,
> https://www.postgresql.org/docs/devel/static/storage-file-layout.html
> which will provide more information than you'd ever get out of any
> reasonable directory name.

Well, we could change them all to pg_a, pg_b, pg_c, pg_d, ... which
would encourage that even more strongly.  But I don't think that
proposal can be taken seriously.  Giving things meaningful names is a
good practice in almost every case.

> Having said that, I still don't like "pg_logical", but I suppose
> renaming it would have more downsides than upsides.

Remind me what your beef is?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Robert Haas

On Thu, Oct 20, 2016 at 4:04 PM, Tomas Vondra
 wrote:
>> I then started a run at 96 clients which I accidentally killed shortly
>> before it was scheduled to finish, but the results are not much
>> different; there is no hint of the runaway CLogControlLock contention
>> that Dilip sees on power2.
>>
> What shared_buffer size were you using? I assume the data set fit into
> shared buffers, right?

8GB.

> FWIW as I explained in the lengthy post earlier today, I can actually
> reproduce the significant CLogControlLock contention (and the patches do
> reduce it), even on x86_64.

/me goes back, rereads post.  Sorry, I didn't look at this carefully
the first time.

> For example consider these two tests:
>
> * http://tvondra.bitbucket.org/#dilip-300-unlogged-sync
> * http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip
>
> However, it seems I can also reproduce fairly bad regressions, like for
> example this case with data set exceeding shared_buffers:
>
> * http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip

I'm not sure how seriously we should take the regressions.  I mean,
what I see there is that CLogControlLock contention goes down by about
50% -- which is the point of the patch -- and WALWriteLock contention
goes up dramatically -- which sucks, but can't really be blamed on the
patch except in the indirect sense that a backend can't spend much
time waiting for A if it's already spending all of its time waiting
for B.  It would be nice to know why it happened, but we shouldn't
allow CLogControlLock to act as an admission control facility for
WALWriteLock (I think).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Typo in pgstat.h

2016-10-20 Thread vinayak


Hi,


- * st_progress_command_target, and st_progress_command[].

+ * st_progress_command_target, and st_progress_param[].


Attached patch fixed typo.


Regards,

Vinayak Pokale

NTT Open Source Software Center




typo-pgstat-h.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Michael Paquier

On Fri, Oct 21, 2016 at 12:35 AM, Robert Haas  wrote:
> On Wed, Oct 12, 2016 at 10:22 PM, Michael Paquier
>  wrote:
>> OK. I can live with that as well. Attached are three patches. The
>> pg_xlog -> pg_wal move, the pg_clog -> pg_transaction move, and the
>> pg_clog -> pg_xact move. Only one survivor to be chosen among the last
>> two ones.
>
> Committed 0001.

Thanks!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indirect indexes

2016-10-20 Thread Claudio Freire

On Thu, Oct 20, 2016 at 1:08 PM, Pavan Deolasee
 wrote:
> On Thu, Oct 20, 2016 at 9:20 PM, Claudio Freire 
> wrote:
>>
>>
>>
>> With indirect indexes, since you don't need to insert a tid, you can
>> just "insert on conflict do nothing" on the index.
>
>
> Would that work with non-unique indexes? Anyways, the point I was trying to
> make is that there are a similar technical challenges and we could solve it
> for WARM as well with your work for finding conflicting  pairs and
> then not doing inserts. My thinking currently is that it will lead to other
> challenges, especially around vacuum, but I could be wrong.

Consider this:

Have a vacuum_cycle_id field in the metapage, with one bit reserved to
whether there's a vacuum in progress.

While there is a vacuum in progress on the index, all kinds of
modifications will look up the  entry, and store the current
vacuum_cycle_id on the unused space for the tid pointer on the index
entry. When not, only new entries will be added (with the current
vacuum cycle id). So, during vacuum, indirect indexes incur a similar
cost to that of regular indexes, but only during vacuum.

When vacuuming, allocate 1/2 maintenance_work_mem for a bloom filter,
and increase all vacuum cycle ids (on the metapage) and mark a vacuum
in progress.

Scan the heap, add  pairs of *non-dead* tuples to the bloom
filter. That's one BF per index, sadly, but bear with me.

Then scan the indexes.  pairs *not* in the BF that have the
*old* vacuum cycle id get removed.

Clear the vacuum in progress flag on all indexes' metapage.

The only drawback here is that mwm dictates the amount of uncleanable
waste left on the indexes (BF false positives). Surely, the BF could
be replaced with an accurate set rather than an approximate one, but
that could require a lot of memory if keys are big, and a lot of scans
of the indexes. The BF trick bounds the amount of waste left while
minimizing work.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve output of BitmapAnd EXPLAIN ANALYZE

2016-10-20 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> That would break code that tries to parse that stuff, eg depesz.com.

> I don't believe Jim was suggesting that we back-patch such a change.

I don't either.

> Changing it in a new major release seems entirely reasonable.

It's still a crock though.  I wonder whether it wouldn't be better to
change the nodeBitmap code so that when EXPLAIN ANALYZE is active,
it expends extra effort to try to produce a rowcount number.

We could certainly run through the result bitmap and count the number
of exact-TID bits.  I don't see a practical way of doing something
with lossy page bits, but maybe those occur infrequently enough
that we could ignore them?  Or we could arbitrarily decide that
a lossy page should be counted as MaxHeapTuplesPerPage, or a bit
less arbitrarily, count it as the relation's average number
of tuples per page.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-20 Thread Merlin Moncure

On Thu, Oct 20, 2016 at 3:16 PM, Alvaro Herrera
 wrote:
> Merlin Moncure wrote:
>
>> single user mode dumps core :(
>>
>> bash-4.1$ postgres --single -D /var/lib/pgsql/9.5/data castaging
>> LOG:  0: could not change directory to "/root": Permission denied
>> LOCATION:  resolve_symlinks, exec.c:293
>> Segmentation fault (core dumped)
>>
>> Core was generated by `postgres --single -D /var/lib/pgsql/9.5/data 
>> castaging'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x00797d6f in ?? ()
>> Missing separate debuginfos, use: debuginfo-install
>> postgresql95-server-9.5.2-1PGDG.rhel6.x86_64
>> (gdb) bt
>> #0  0x00797d6f in ?? ()
>> #1  0x0079acf1 in RelationCacheInitializePhase3 ()
>> #2  0x007b35c5 in InitPostgres ()
>> #3  0x006b9b53 in PostgresMain ()
>> #4  0x005f30fb in main ()
>
> Maybe
>   rm global/pg_internal.init
> and try again?

Will do when I can do that had to do emergency restore + some unfun
data reconstruction from the query log.

Notably there is a much larger database in the same cluster which is
undamaged.  This server is new to production usage, maybe 2 months.

Here is contents of pg_extension
 plpgsql
 dblink
 hstore
 postgres_fdw
 plsh * not used
 pg_trgm * not used
 plr * not used
 tablefunc * not used
 adminpack * not used
 plpythonu * not used
 postgis * not used
 postgis_topology * not used

Short term plan is to separate the database to it's own cluster,
install replication and checksums.  All queries to this database are
logged.  Here is the contents of the log leading into and after the
the crash:

oct 17 crash:
2016-10-17 12:12:24 CDT [rms@castaging]: DETAIL:  parameters: $1 =
'21121', $2 = '8', $3 = '2016-10-13', $4 = NULL, $5 = NULL, $6 = NULL,
$7 = NULL, $8 = NULL, $9 = NULL, $10 = NULL, $11 = 't', $12
2016-10-17 12:12:24 CDT [rms@castaging]: LOG:  execute :
SELECT NULL AS PROCEDURE_CAT, n.nspname AS PROCEDURE_SCHEM, p.proname
AS PROCEDURE_NAME, NULL, NULL, NULL, d.description AS REMARKS
2016-10-17 12:12:24 CDT [rms@castaging]: LOG:  execute :
SELECT n.nspname,p.proname,p.prorettype,p.proargtypes,
t.typtype,t.typrelid , p.proargnames, p.proargmodes, p.proallargtypes
, p.o
2016-10-17 12:12:24 CDT [rms@castaging]: LOG:  execute :
select * from checkin($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11,
$12) as result
2016-10-17 12:12:24 CDT [rms@castaging]: DETAIL:  parameters: $1 =
'114333', $2 = 'rrosillo', $3 = 'CALLER', $4 = 'Survey', $5 = 'Happy',
$6 = 'Callback', $7 = 'OTHER', $8 = '2016-10-18 01:05:00',
2016-10-17 12:12:24 CDT [rms@castaging]: LOG:  execute S_3: COMMIT
2016-10-17 12:12:25 CDT [@]: ERROR:  could not open relation with OID
1203933 <-- first sign of damage
2016-10-17 12:12:25 CDT [@]: CONTEXT:  automatic analyze of table
"castaging.public.apartment"

oct 20 crash:
2016-10-20 12:46:38 CDT [postgres@castaging]: LOG:  statement: SELECT
CallsByUser() AS byuser
2016-10-20 12:46:40 CDT [postgres@castaging]: LOG:  statement: SELECT
CallCenterOverviewJSON() AS overview
2016-10-20 12:46:41 CDT [postgres@castaging]: LOG:  statement: SELECT
CallCenterUserTrackingJSON() AS tracking
2016-10-20 12:46:41 CDT [postgres@castaging]: LOG:  statement: SELECT
MarketOverviewJSON() AS market
2016-10-20 12:46:42 CDT [postgres@castaging]: LOG:  execute :
SELECT SubMarketOverviewJSON($1::TEXT) AS submkt
2016-10-20 12:46:42 CDT [postgres@castaging]: DETAIL:  parameters: $1 = '640'
2016-10-20 12:46:44 CDT [postgres@castaging]: LOG:  statement: SELECT
CallsByUser() AS byuser
2016-10-20 12:46:46 CDT [postgres@castaging]: LOG:  statement: SELECT
CallCenterOverviewJSON() AS overview
2016-10-20 12:46:47 CDT [postgres@castaging]: LOG:  statement: SELECT
CallCenterUserTrackingJSON() AS tracking
2016-10-20 12:46:47 CDT [postgres@castaging]: ERROR:
"pg_description_o_c_o_index" is an index <-- first sign of damage
2016-10-20 12:46:47 CDT [postgres@castaging]: CONTEXT:  SQL function
"callcenterusertrackingjson" during startup
2016-10-20 12:46:47 CDT [postgres@castaging]: STATEMENT:  SELECT
CallCenterUserTrackingJSON() AS tracking
2016-10-20 12:46:47 CDT [postgres@castaging]: WARNING:  leaking
still-referenced relcache entry for "pg_class_oid_index"

CallCenterUserTrackingJSON() and friends are not particularly
interesting except that they are making use of of json_agg().  They
were also called basically all day long in 5 second intervals.   I
guess this isn't saying very much, but I'm starting to smell a rat
here.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Fix invalid XML explain plans for track_io_timing

2016-10-20 Thread Tom Lane

I wrote:
> Markus Winand  writes:
>> The XML output of explain potentially outputs the XML tag names 
>> "I/O-Write-Time"
>> and "I/O-Read-Time", which are invalid due to the slash.

> Ooops.

After further thought I decided we should go with the whitelist solution.
The extra time needed to produce XML-format output isn't really likely to
bother anyone.  And given that this bug escaped notice for several years,
it seems likely that the next time somebody decides to be creative about
picking a tag name, we might not notice an XML syntax violation for
several more years.  So a future-proof fix seems advisable.

I pushed a patch using the strchr() approach.  Thanks for reporting this!

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve output of BitmapAnd EXPLAIN ANALYZE

2016-10-20 Thread Stephen Frost

Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Jim Nasby  writes:
> > A customer just pinged me wondering how it was that a BitmapAnd node was 
> > reporting 0 tuples when the Bitmap Heap Scan above it showed it had in 
> > fact generated tuples.
> 
> > While this is mentioned in the docs, I think it would be very helpful to 
> > have ANALYZE spit out "N/A" instead of 0 for these nodes.
> 
> That would break code that tries to parse that stuff, eg depesz.com.

I don't believe Jim was suggesting that we back-patch such a change.

Changing it in a new major release seems entirely reasonable.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Improve output of BitmapAnd EXPLAIN ANALYZE

2016-10-20 Thread Tom Lane

Jim Nasby  writes:
> A customer just pinged me wondering how it was that a BitmapAnd node was 
> reporting 0 tuples when the Bitmap Heap Scan above it showed it had in 
> fact generated tuples.

> While this is mentioned in the docs, I think it would be very helpful to 
> have ANALYZE spit out "N/A" instead of 0 for these nodes.

That would break code that tries to parse that stuff, eg depesz.com.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Improve output of BitmapAnd EXPLAIN ANALYZE

2016-10-20 Thread Jim Nasby

A customer just pinged me wondering how it was that a BitmapAnd node was 
reporting 0 tuples when the Bitmap Heap Scan above it showed it had in 
fact generated tuples.


While this is mentioned in the docs, I think it would be very helpful to 
have ANALYZE spit out "N/A" instead of 0 for these nodes. AFAICT that 
would just require adding a special case to the "if (es->costs)" block 
at line ~1204 in explain.c?


BTW, it looks like it would actually be possible to return a real 
row-count if none of the TIDBitmap pages are chunks, but I'm not sure if 
it's worth the extra effort.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-20 Thread Alvaro Herrera

Merlin Moncure wrote:

> single user mode dumps core :(
> 
> bash-4.1$ postgres --single -D /var/lib/pgsql/9.5/data castaging
> LOG:  0: could not change directory to "/root": Permission denied
> LOCATION:  resolve_symlinks, exec.c:293
> Segmentation fault (core dumped)
> 
> Core was generated by `postgres --single -D /var/lib/pgsql/9.5/data 
> castaging'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00797d6f in ?? ()
> Missing separate debuginfos, use: debuginfo-install
> postgresql95-server-9.5.2-1PGDG.rhel6.x86_64
> (gdb) bt
> #0  0x00797d6f in ?? ()
> #1  0x0079acf1 in RelationCacheInitializePhase3 ()
> #2  0x007b35c5 in InitPostgres ()
> #3  0x006b9b53 in PostgresMain ()
> #4  0x005f30fb in main ()

Maybe
  rm global/pg_internal.init
and try again?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Tomas Vondra


On 10/20/2016 07:59 PM, Robert Haas wrote:

On Thu, Oct 20, 2016 at 11:45 AM, Robert Haas  wrote:

On Thu, Oct 20, 2016 at 3:36 AM, Dilip Kumar  wrote:

On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas  wrote:

>>

...

So here's my theory.  The whole reason why Tomas is having difficulty
seeing any big effect from these patches is because he's testing on
x86.  When Dilip tests on x86, he doesn't see a big effect either,
regardless of workload.  But when Dilip tests on POWER, which I think
is where he's mostly been testing, he sees a huge effect, because for
some reason POWER has major problems with this lock that don't exist
on x86.

If that's so, then we ought to be able to reproduce the big gains on
hydra, a community POWER server.  In fact, I think I'll go run a quick
test over there right now...


And ... nope.  I ran a 30-minute pgbench test on unpatched master
using unlogged tables at scale factor 300 with 64 clients and got
these results:

 14  LWLockTranche   | wal_insert
 36  LWLockTranche   | lock_manager
 45  LWLockTranche   | buffer_content
223  Lock| tuple
527  LWLockNamed | CLogControlLock
921  Lock| extend
   1195  LWLockNamed | XidGenLock
   1248  LWLockNamed | ProcArrayLock
   3349  Lock| transactionid
  85957  Client  | ClientRead
 135935  |

I then started a run at 96 clients which I accidentally killed shortly
before it was scheduled to finish, but the results are not much
different; there is no hint of the runaway CLogControlLock contention
that Dilip sees on power2.



What shared_buffer size were you using? I assume the data set fit into 
shared buffers, right?


FWIW as I explained in the lengthy post earlier today, I can actually 
reproduce the significant CLogControlLock contention (and the patches do 
reduce it), even on x86_64.


For example consider these two tests:

* http://tvondra.bitbucket.org/#dilip-300-unlogged-sync
* http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip

However, it seems I can also reproduce fairly bad regressions, like for 
example this case with data set exceeding shared_buffers:


* http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Tom Lane

Stephen Frost  writes:
> * David Fetter (da...@fetter.org) wrote:
>> On Thu, Oct 20, 2016 at 02:23:32PM -0400, Tom Lane wrote:
>>> I don't see one single one of those subdirectory names that I'd call
>>> self-documenting.

>> That's a problem we should do something about, even if we can't do it
>> by renaming these all in one go.  At the very least, we can do this
>> for any new names.

> I disagree that excessivly long names for files or directories are
> useful and should be fully self-documenting.

I'm mostly with Stephen on this.  As the names stand, they encourage
people to go look at the documentation,
https://www.postgresql.org/docs/devel/static/storage-file-layout.html
which will provide more information than you'd ever get out of any
reasonable directory name.

Having said that, I still don't like "pg_logical", but I suppose
renaming it would have more downsides than upsides.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Stephen Frost

* David Fetter (da...@fetter.org) wrote:
> On Thu, Oct 20, 2016 at 02:23:32PM -0400, Tom Lane wrote:
> > Robert Haas  writes:
> > > On Thu, Oct 20, 2016 at 2:09 PM, Tom Lane  wrote:
> > >> We have the two precedents "pg_subtrans" and "pg_multixact", so
> > >> unless we want to get into renaming those too, I think "pg_trans"
> > >> and "pg_xact" are really the only options worth considering.
> > >> 
> > >> Personally I'd go for "pg_trans", but it's only a weak preference.
> > 
> > > Heaven forfend we actually use enough characters to make it 
> > > self-documenting.
> > 
> > $ ls $PGDATA
> > PG_VERSION pg_dynshmem/   pg_notify/ pg_stat_tmp/  
> > postgresql.auto.conf
> > base/  pg_hba.confpg_replslot/   pg_subtrans/  postgresql.conf
> > global/pg_ident.conf  pg_serial/ pg_tblspc/postmaster.opts
> > pg_clog/   pg_logical/pg_snapshots/  pg_twophase/  postmaster.pid
> > pg_commit_ts/  pg_multixact/  pg_stat/   pg_wal/
> > 
> > I don't see one single one of those subdirectory names that I'd call
> > self-documenting.
> 
> That's a problem we should do something about, even if we can't do it
> by renaming these all in one go.  At the very least, we can do this
> for any new names.

I disagree that excessivly long names for files or directories are
useful and should be fully self-documenting.  We should name our
directories with useful hints at what they are used for that remind
those who are knowledgable where things live.  Secondary to that is
using an approach to naming which avoid implying anything about the
directories to those who are *not* knowledgable, which leads us to the
current discussion regarding the removal of directories with 'log' in
the name.

Individuals who are not knowledgable in this area are not going to get
any more benefit from a directory named 'pg_trans' or
'pg_transaction_status' than one named 'pg_clog' or 'pg_xact' and
therefore this whole line of reasoning is a red herring.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] emergency outage requiring database restart

2016-10-20 Thread Merlin Moncure

On Thu, Oct 20, 2016 at 2:07 PM, Tom Lane  wrote:
> Merlin Moncure  writes:
>> single user mode dumps core :(
>
> You've got a mess there :-(
>
>> Missing separate debuginfos, use: debuginfo-install
>> postgresql95-server-9.5.2-1PGDG.rhel6.x86_64
>
> This backtrace would likely be much more informative if you did the above.

can't; don't have the package unfortunately.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-20 Thread Tom Lane

Merlin Moncure  writes:
> single user mode dumps core :(

You've got a mess there :-(

> Missing separate debuginfos, use: debuginfo-install
> postgresql95-server-9.5.2-1PGDG.rhel6.x86_64

This backtrace would likely be much more informative if you did the above.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-20 Thread Merlin Moncure

On Thu, Oct 20, 2016 at 1:52 PM, Merlin Moncure  wrote:
> On Wed, Oct 19, 2016 at 2:39 PM, Merlin Moncure  wrote:
>> On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian  wrote:
>>> On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote:
 > Yeah.  Believe me -- I know the drill.  Most or all the damage seemed
 > to be to the system catalogs with at least two critical tables dropped
 > or inaccessible in some fashion.  A lot of the OIDs seemed to be
 > pointing at the wrong thing.  Couple more datapoints here.
 >
 > *) This database is OLTP, doing ~ 20 tps avg (but very bursty)
 > *) Another database on the same cluster was not impacted.  However
 > it's more olap style and may not have been written to during the
 > outage
 >
 > Now, this infrastructure running this system is running maybe 100ish
 > postgres clusters and maybe 1000ish sql server instances with
 > approximately zero unexplained data corruption issues in the 5 years
 > I've been here.  Having said that, this definitely smells and feels
 > like something on the infrastructure side.  I'll follow up if I have
 > any useful info.

 After a thorough investigation I now have credible evidence the source
 of the damage did not originate from the database itself.
 Specifically, this database is mounted on the same volume as the
 operating system (I know, I know) and something non database driven
 sucked up disk space very rapidly and exhausted the volume -- fast
 enough that sar didn't pick it up.  Oh well :-) -- thanks for the help
>>>
>>> However, disk space exhaustion should not lead to corruption unless the
>>> underlying layers lied in some way.
>>
>> I agree -- however I'm sufficiently separated from the things doing
>> the things that I can't verify that in any real way.   In the meantime
>> I'm going to take standard precautions (enable checksums/dedicated
>> volume/replication).  Low disk space also does not explain the bizarre
>> outage I had last friday.
>
> ok, data corruption struck again.  This time disk space is ruled out,
> and access to the database is completely denied:
> postgres=# \c castaging
> WARNING:  leaking still-referenced relcache entry for
> "pg_index_indexrelid_index"

single user mode dumps core :(

bash-4.1$ postgres --single -D /var/lib/pgsql/9.5/data castaging
LOG:  0: could not change directory to "/root": Permission denied
LOCATION:  resolve_symlinks, exec.c:293
Segmentation fault (core dumped)

Core was generated by `postgres --single -D /var/lib/pgsql/9.5/data castaging'.
Program terminated with signal 11, Segmentation fault.
#0  0x00797d6f in ?? ()
Missing separate debuginfos, use: debuginfo-install
postgresql95-server-9.5.2-1PGDG.rhel6.x86_64
(gdb) bt
#0  0x00797d6f in ?? ()
#1  0x0079acf1 in RelationCacheInitializePhase3 ()
#2  0x007b35c5 in InitPostgres ()
#3  0x006b9b53 in PostgresMain ()
#4  0x005f30fb in main ()
(gdb)


merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread David Fetter

On Thu, Oct 20, 2016 at 02:23:32PM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Thu, Oct 20, 2016 at 2:09 PM, Tom Lane  wrote:
> >> We have the two precedents "pg_subtrans" and "pg_multixact", so
> >> unless we want to get into renaming those too, I think "pg_trans"
> >> and "pg_xact" are really the only options worth considering.
> >> 
> >> Personally I'd go for "pg_trans", but it's only a weak preference.
> 
> > Heaven forfend we actually use enough characters to make it 
> > self-documenting.
> 
> $ ls $PGDATA
> PG_VERSION pg_dynshmem/   pg_notify/ pg_stat_tmp/  
> postgresql.auto.conf
> base/  pg_hba.confpg_replslot/   pg_subtrans/  postgresql.conf
> global/pg_ident.conf  pg_serial/ pg_tblspc/postmaster.opts
> pg_clog/   pg_logical/pg_snapshots/  pg_twophase/  postmaster.pid
> pg_commit_ts/  pg_multixact/  pg_stat/   pg_wal/
> 
> I don't see one single one of those subdirectory names that I'd call
> self-documenting.

That's a problem we should do something about, even if we can't do it
by renaming these all in one go.  At the very least, we can do this
for any new names.

> Are you proposing we rename them all with carpal-
> tunnel-syndrome-promoting names?

Are you saying that people are getting carpal tunnel syndrome from
hitting the tab key, which has been standard for completion in shells
for decades?  I'm pretty sure that doesn't actually happen.

> There's certainly some case to be made for renaming at least one of
> "pg_subtrans" and "pg_multixact" so that these three similarly-purposed
> subdirectories can all have similar names.  But I think on the whole
> that's (a) fixing what ain't broken, and (b) making it even more unlikely
> that we'll ever get to consensus on changing anything.  We've managed to
> agree that we need to change the names ending in "log"; let's do that
> and be happy that we've removed one foot-gun from the system.

Removing foot guns, un-sexy as it may be from a developer's
perspective, is very useful work.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] emergency outage requiring database restart

2016-10-20 Thread Merlin Moncure

On Wed, Oct 19, 2016 at 2:39 PM, Merlin Moncure  wrote:
> On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian  wrote:
>> On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote:
>>> > Yeah.  Believe me -- I know the drill.  Most or all the damage seemed
>>> > to be to the system catalogs with at least two critical tables dropped
>>> > or inaccessible in some fashion.  A lot of the OIDs seemed to be
>>> > pointing at the wrong thing.  Couple more datapoints here.
>>> >
>>> > *) This database is OLTP, doing ~ 20 tps avg (but very bursty)
>>> > *) Another database on the same cluster was not impacted.  However
>>> > it's more olap style and may not have been written to during the
>>> > outage
>>> >
>>> > Now, this infrastructure running this system is running maybe 100ish
>>> > postgres clusters and maybe 1000ish sql server instances with
>>> > approximately zero unexplained data corruption issues in the 5 years
>>> > I've been here.  Having said that, this definitely smells and feels
>>> > like something on the infrastructure side.  I'll follow up if I have
>>> > any useful info.
>>>
>>> After a thorough investigation I now have credible evidence the source
>>> of the damage did not originate from the database itself.
>>> Specifically, this database is mounted on the same volume as the
>>> operating system (I know, I know) and something non database driven
>>> sucked up disk space very rapidly and exhausted the volume -- fast
>>> enough that sar didn't pick it up.  Oh well :-) -- thanks for the help
>>
>> However, disk space exhaustion should not lead to corruption unless the
>> underlying layers lied in some way.
>
> I agree -- however I'm sufficiently separated from the things doing
> the things that I can't verify that in any real way.   In the meantime
> I'm going to take standard precautions (enable checksums/dedicated
> volume/replication).  Low disk space also does not explain the bizarre
> outage I had last friday.

ok, data corruption struck again.  This time disk space is ruled out,
and access to the database is completely denied:
postgres=# \c castaging
WARNING:  leaking still-referenced relcache entry for
"pg_index_indexrelid_index"

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Robert Haas

On Thu, Oct 20, 2016 at 2:23 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Oct 20, 2016 at 2:09 PM, Tom Lane  wrote:
>>> We have the two precedents "pg_subtrans" and "pg_multixact", so
>>> unless we want to get into renaming those too, I think "pg_trans"
>>> and "pg_xact" are really the only options worth considering.
>>>
>>> Personally I'd go for "pg_trans", but it's only a weak preference.
>
>> Heaven forfend we actually use enough characters to make it self-documenting.
>
> $ ls $PGDATA
> PG_VERSION pg_dynshmem/   pg_notify/ pg_stat_tmp/  
> postgresql.auto.conf
> base/  pg_hba.confpg_replslot/   pg_subtrans/  postgresql.conf
> global/pg_ident.conf  pg_serial/ pg_tblspc/postmaster.opts
> pg_clog/   pg_logical/pg_snapshots/  pg_twophase/  postmaster.pid
> pg_commit_ts/  pg_multixact/  pg_stat/   pg_wal/
>
> I don't see one single one of those subdirectory names that I'd call
> self-documenting.  Are you proposing we rename them all with carpal-
> tunnel-syndrome-promoting names?

No.  Are you proposing that self-documenting names are a bad thing
rather than a good thing?

> There's certainly some case to be made for renaming at least one of
> "pg_subtrans" and "pg_multixact" so that these three similarly-purposed
> subdirectories can all have similar names.  But I think on the whole
> that's (a) fixing what ain't broken, and (b) making it even more unlikely
> that we'll ever get to consensus on changing anything.  We've managed to
> agree that we need to change the names ending in "log"; let's do that
> and be happy that we've removed one foot-gun from the system.

I agree that there is an overwhelming consensus in favor of getting
"log" out of the names, but I do not agree that the only two possible
alternative names are "pg_trans" and "pg_xact", which strikes me as
being akin to saying that the only two options for dinner tonight are
overripe peaches and lunch meats a week past the sell-by date.  If I
had to pick only between those two, I suppose I'd go with "pg_xact"
which is clear enough to someone familiar with PostgreSQL internals,
as opposed to "pg_trans" which I don't think will be clear even to
those people.  But I don't like either one very much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Bruce Momjian

On Thu, Oct 20, 2016 at 02:02:27PM -0400, Robert Haas wrote:
> On Thu, Oct 20, 2016 at 1:39 PM, Bruce Momjian  wrote:
> > On Thu, Oct 20, 2016 at 12:29:47PM -0400, Robert Haas wrote:
> >> > When it comes to the name, I tend to think of 'pg_xact' as saying "this
> >> > is where we persist info we need to keep about transactions."  Today
> >> > that's just the commit status info, but I could imagine that there
> >> > might, someday, be other things that go in there.  "pg_multixact" is
> >> > an example of something quite similar but does have more than just one
> >> > "thing."  Also, using "pg_xact" and then "pg_subxact" and "pg_multixact"
> >> > bring them all under one consistent naming scheme.
> >>
> >> I don't dispute the fact that you tend to think of it that way, but I
> >> think it's a real stretch to say that "pg_xact" is a clear name from
> >> the point of view of the uninitiated.  Now, maybe the point is to be a
> >> little bit deliberately unclear, but "xact" for "transaction" is not a
> >> lot better than "xlog" for "write-ahead log".  It's just arbitrary
> >> abbreviations we made up and you either know what they mean or you
> >> don't.  We could call it "pg_xkcd" and we wouldn't be removing much in
> >> the way of clarity.
> >
> > What is your suggestion for a name?  If you have none, I suggest we use
> > "pg_xact".
> 
> I'm not sure.  pg_transaction_status would be clear, but it's long.
> Is pg_xact actually better than pg_clog?

Uh, yeah, no "log".  Wasn't that the point?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Tom Lane

Robert Haas  writes:
> On Thu, Oct 20, 2016 at 2:09 PM, Tom Lane  wrote:
>> We have the two precedents "pg_subtrans" and "pg_multixact", so
>> unless we want to get into renaming those too, I think "pg_trans"
>> and "pg_xact" are really the only options worth considering.
>> 
>> Personally I'd go for "pg_trans", but it's only a weak preference.

> Heaven forfend we actually use enough characters to make it self-documenting.

$ ls $PGDATA
PG_VERSION pg_dynshmem/   pg_notify/ pg_stat_tmp/  postgresql.auto.conf
base/  pg_hba.confpg_replslot/   pg_subtrans/  postgresql.conf
global/pg_ident.conf  pg_serial/ pg_tblspc/postmaster.opts
pg_clog/   pg_logical/pg_snapshots/  pg_twophase/  postmaster.pid
pg_commit_ts/  pg_multixact/  pg_stat/   pg_wal/

I don't see one single one of those subdirectory names that I'd call
self-documenting.  Are you proposing we rename them all with carpal-
tunnel-syndrome-promoting names?

There's certainly some case to be made for renaming at least one of
"pg_subtrans" and "pg_multixact" so that these three similarly-purposed
subdirectories can all have similar names.  But I think on the whole
that's (a) fixing what ain't broken, and (b) making it even more unlikely
that we'll ever get to consensus on changing anything.  We've managed to
agree that we need to change the names ending in "log"; let's do that
and be happy that we've removed one foot-gun from the system.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Robert Haas

On Thu, Oct 20, 2016 at 2:09 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Is pg_xact actually better than pg_clog?
>
> Yes, because it doesn't contain the three letters "log".

I figured somebody was going to say that.

> We have the two precedents "pg_subtrans" and "pg_multixact", so
> unless we want to get into renaming those too, I think "pg_trans"
> and "pg_xact" are really the only options worth considering.
>
> Personally I'd go for "pg_trans", but it's only a weak preference.

Heaven forfend we actually use enough characters to make it self-documenting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Tom Lane

Robert Haas  writes:
> Is pg_xact actually better than pg_clog?

Yes, because it doesn't contain the three letters "log".

We have the two precedents "pg_subtrans" and "pg_multixact", so
unless we want to get into renaming those too, I think "pg_trans"
and "pg_xact" are really the only options worth considering.

Personally I'd go for "pg_trans", but it's only a weak preference.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Robert Haas

On Thu, Oct 20, 2016 at 1:39 PM, Bruce Momjian  wrote:
> On Thu, Oct 20, 2016 at 12:29:47PM -0400, Robert Haas wrote:
>> > When it comes to the name, I tend to think of 'pg_xact' as saying "this
>> > is where we persist info we need to keep about transactions."  Today
>> > that's just the commit status info, but I could imagine that there
>> > might, someday, be other things that go in there.  "pg_multixact" is
>> > an example of something quite similar but does have more than just one
>> > "thing."  Also, using "pg_xact" and then "pg_subxact" and "pg_multixact"
>> > bring them all under one consistent naming scheme.
>>
>> I don't dispute the fact that you tend to think of it that way, but I
>> think it's a real stretch to say that "pg_xact" is a clear name from
>> the point of view of the uninitiated.  Now, maybe the point is to be a
>> little bit deliberately unclear, but "xact" for "transaction" is not a
>> lot better than "xlog" for "write-ahead log".  It's just arbitrary
>> abbreviations we made up and you either know what they mean or you
>> don't.  We could call it "pg_xkcd" and we wouldn't be removing much in
>> the way of clarity.
>
> What is your suggestion for a name?  If you have none, I suggest we use
> "pg_xact".

I'm not sure.  pg_transaction_status would be clear, but it's long.
Is pg_xact actually better than pg_clog?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Robert Haas

On Thu, Oct 20, 2016 at 11:45 AM, Robert Haas  wrote:
> On Thu, Oct 20, 2016 at 3:36 AM, Dilip Kumar  wrote:
>> On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas  wrote:
>>> I agree with these conclusions.  I had a chance to talk with Andres
>>> this morning at Postgres Vision and based on that conversation I'd
>>> like to suggest a couple of additional tests:
>>>
>>> 1. Repeat this test on x86.  In particular, I think you should test on
>>> the EnterpriseDB server cthulhu, which is an 8-socket x86 server.
>>
>> I have done my test on cthulhu, basic difference is that In POWER we
>> saw ClogControlLock on top at 96 and more client with 300 scale
>> factor. But, on cthulhu at 300 scale factor transactionid lock is
>> always on top. So I repeated my test with 1000 scale factor as well on
>> cthulhu.
>
> So the upshot appears to be that this problem is a lot worse on power2
> than cthulhu, which suggests that this is architecture-dependent.  I
> guess it could also be kernel-dependent, but it doesn't seem likely,
> because:
>
> power2: Red Hat Enterprise Linux Server release 7.1 (Maipo),
> 3.10.0-229.14.1.ael7b.ppc64le
> cthulhu: CentOS Linux release 7.2.1511 (Core), 3.10.0-229.7.2.el7.x86_64
>
> So here's my theory.  The whole reason why Tomas is having difficulty
> seeing any big effect from these patches is because he's testing on
> x86.  When Dilip tests on x86, he doesn't see a big effect either,
> regardless of workload.  But when Dilip tests on POWER, which I think
> is where he's mostly been testing, he sees a huge effect, because for
> some reason POWER has major problems with this lock that don't exist
> on x86.
>
> If that's so, then we ought to be able to reproduce the big gains on
> hydra, a community POWER server.  In fact, I think I'll go run a quick
> test over there right now...

And ... nope.  I ran a 30-minute pgbench test on unpatched master
using unlogged tables at scale factor 300 with 64 clients and got
these results:

 14  LWLockTranche   | wal_insert
 36  LWLockTranche   | lock_manager
 45  LWLockTranche   | buffer_content
223  Lock| tuple
527  LWLockNamed | CLogControlLock
921  Lock| extend
   1195  LWLockNamed | XidGenLock
   1248  LWLockNamed | ProcArrayLock
   3349  Lock| transactionid
  85957  Client  | ClientRead
 135935  |

I then started a run at 96 clients which I accidentally killed shortly
before it was scheduled to finish, but the results are not much
different; there is no hint of the runaway CLogControlLock contention
that Dilip sees on power2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Bruce Momjian

On Thu, Oct 20, 2016 at 12:29:47PM -0400, Robert Haas wrote:
> > When it comes to the name, I tend to think of 'pg_xact' as saying "this
> > is where we persist info we need to keep about transactions."  Today
> > that's just the commit status info, but I could imagine that there
> > might, someday, be other things that go in there.  "pg_multixact" is
> > an example of something quite similar but does have more than just one
> > "thing."  Also, using "pg_xact" and then "pg_subxact" and "pg_multixact"
> > bring them all under one consistent naming scheme.
> 
> I don't dispute the fact that you tend to think of it that way, but I
> think it's a real stretch to say that "pg_xact" is a clear name from
> the point of view of the uninitiated.  Now, maybe the point is to be a
> little bit deliberately unclear, but "xact" for "transaction" is not a
> lot better than "xlog" for "write-ahead log".  It's just arbitrary
> abbreviations we made up and you either know what they mean or you
> don't.  We could call it "pg_xkcd" and we wouldn't be removing much in
> the way of clarity.

What is your suggestion for a name?  If you have none, I suggest we use
"pg_xact".

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support

2016-10-20 Thread Greg Stark

On Oct 20, 2016 5:27 PM, "Noah Misch"  wrote:
>
> On Wed, Oct 19, 2016 at 11:08:39AM +0100, Greg Stark wrote:
>
> > The MEMPOOL_FREE doesn't take any size argument and mcxt.c doesn't
> > have convenient access to a size argument. It could call the
> > GetChunkSpace method but that will include the allocation overhead and
>
> That is indeed a problem for making VALGRIND_MEMPOOL_FREE() an alias of
> VALGRIND_MAKE_MEM_NOACCESS() under ASAN as I suggested.  Calling
> GetMemoryChunkSpace() in the macro would cause memdebug.h to embed an
> assumption of mcxt.c, which is messy.  Including the allocation overhead
is
> fine, though.

I think the way out is to simply have aset.c make the memory
undefined/noaccess even if it's redundant under valgrind. It's a bit
unfortunate that the macros would have different semantics under the two.
If it's too confusing or we're worried about the performance overhead we
could make a MAKE_MEM_{UNDEFINED,NOACCESS}_IF_NO_MEMPOOL() but I don't
think it's worth it myself.

> > in any case isn't this memory already marked noaccess by aset.c?
>
> Only sometimes, when AllocSetFree() happens to call free() or wipe_mem().

I think if I did further surgery here I would rename wipe_mem and
randomise_mem and make them responsible for making the memory undefined and
noaccess as well. They would always be defined so that would get rid of all
the ifdefs except within those functions.

I have a patch working under asan on both gcc and clang. That handles
noaccess but not undefined memory reads. I need to try msan to make sure
the macro definitions work well for that API too.

There are a couple build oddities both with gcc and clang before I can
commit anything though. And I can't test valgrind to be sure the redundant
calls aren't causing a problem.

Re: [HACKERS] Remove autovacuum GUC?

2016-10-20 Thread Robert Haas

On Thu, Oct 20, 2016 at 12:32 PM, Joshua D. Drake  
wrote:
> That argument suggests we shouldn't have autovacuum :P

It certainly does not.  That, too, would be removing a useful option.
In fact, it would be removing the most useful option that is the right
choice for most users in 99% of cases.

> As mentioned in an other reply, I am not suggesting we can't turn off
> autovacuum as a whole. Heck, I even suggested being able to turn it off per
> database (versus just per table). I am suggesting that we get rid of a foot
> gun for unsophisticated (and thus majority of our users).

It has to be possible to shut it off in postgresql.conf, before
starting the server.  Anything per-database wouldn't have that
characteristic.

>> I've run into these kinds of situations, but I know for a fact that
>> there are quite a few EnterpriseDB customers who test extremely
>> thoroughly, read the documentation in depth, and really understand the
>> system at a very deep level.
>
> Those aren't exactly the users we are talking about are we? I also run into
> those users all the time.

Well, we can't very well ship the autovacuum option only to the smart
customers and remove it for the dumb ones, can we?  The option either
exists or it doesn't.

> 1 != 10

I concede the truth of that statement, but not whatever it is you
intend to imply thereby.

> I find it interesting that we are willing to do that every time we add a
> feature but once we have that feature it is like pulling teeth to show the
> people that implemented those features that some people don't think it was
> better :P

Well, we don't allow much dumb stuff to get added in the first place,
so there isn't much to take out.

I'm not trying to argue that we're perfect here.  There are certainly
changes I'd like to make that other people oppose and, well, I think
they are wrong.  And I know I'm wrong about some things, too: I just
don't know which things, or I'd change my mind about just those.

> Seriously though. I am only speaking from experience from 20 years of
> customers. CMD also has customers just like yours but we also run into lots
> and lots of people that still do really silly things like we have already
> discussed.

I think it would be better to confine this thread to the specific
issue of whether removing the autovacuum GUC is a good idea rather
than turning it into a referendum on whether you are an experience
PostgreSQL professional.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index Scans

2016-10-20 Thread Robert Haas

On Wed, Oct 19, 2016 at 11:07 PM, Amit Kapila  wrote:
>> Ideally, the parallel_workers storage parameter will rarely be
>> necessary because the optimizer will generally do the right thing in
>> all case.
>
> Yeah, we can choose not to provide any parameter for parallel index
> scans, but some users might want to have a parameter similar to
> parallel table scans, so it could be handy for them to use.

I think the parallel_workers reloption should override the degree of
parallelism for any sort of parallel scan on that table.  Had I
intended it to apply only to sequential scans, I would have named it
differently.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove autovacuum GUC?

2016-10-20 Thread Joshua D. Drake


Hello,

What about a simpler solution to all of this. Let's just remove it from 
postgresql.conf. Out of sight. If someone needs to test they can but a 
uneducated user won't immediately know what to do about that "autovacuum 
process" and when they look it up the documentation is exceedingly blunt 
about why to *not* turn it off.


Sincerely,

JD


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Robert Haas

On Thu, Oct 20, 2016 at 12:12 PM, Stephen Frost  wrote:
> That said, I'd also like to see a --force or similar option or mechanism
> put in place to reduce the risk of users trashing their system because
> they think pg_resetwal is "safe." ("It's just gonna reset things to make
> the database start again, should be fine.").

You know we already have that, right?

> pg_destroydb almost seems like a better choice, though I suppose
> 'pg_clearwal' would be more acceptable.  Doesn't have quite the same
> impact though.
>
> Not sure on the best answer here, but it's definitely foot-gun that some
> users have ended up using on themselves with depressing regularity.

Just to provide some perspective from the other side of this, I
handled a severity-1 escalation a few weeks ago where a user basically
called up and explained their situation and I told them based on all
of the facts and circumstances presented that running pg_resetxlog was
going to be the best way forward and they said that was pretty much
what they were expecting to hear, and did so.  I think there's far too
much anti-pg_resetxlog bias on this list.  The situation where you
need to use it is not common in general, but it's pretty common in
support cases that reach me.  I don't think I'd be exaggerating if I
said that 25% of the customer escalations I handle personally require
the use pg_resetxlog.  Of course, that's because by the time the
ticket gets to me, it's typically the case that all of the easy, neat
solutions have already been ruled out.

My experience is that users who are faced with this situation
typically understand that they are in a bad situation and when I
explain to them --- in a clear and direct way but without the sort of
hysterical exaggeration sometimes seen on this mailing list --- what
the consequences are, they are not surprised by those consequences and
they are willing to accept them because they know that the
alternatives are worse.  For example, consider a customer with a
mostly read-only 20TB database.  If that user runs pg_resetxlog, they
can be back up in 5 minutes and it is likely (though of course not
certain) that corruption will be minimal.  If they restore from
backup, they will be down for many more hours.  After pg_resetxlog,
they can restore from backup on another system or do the
dump-and-restore which I invariably recommend while they are up.  For
many customers, this is a good trade-off.  The decision, because of
the risks associated with it, is usually passed up from the DBA I'm
working with to some business leader.  If that business leader feels
that benefits of being up immediately rather than many hours later are
sufficient to justify the risk of a possibly-corrupted database,
running pg_resetxlog is a perfectly reasonable decision.

I have not yet had one DBA or business leader call back and say "hey,
that thing where we ran pg_resetxlog?  you should have discouraged us
more, because that was a disaster".  Now maybe those disasters have
happened and I am just not aware of them, but I think we should all
take a deep breath and remind ourselves that the database exists in
service to the business and not the other way around.  My job -- when
doing support -- is to tell you what your options are and what
consequences they may have, good and bad -- not to tell you how to run
your company.  Telling users that pg_resetxlog will "destroy your
database" is over-the-top hyperbole.  It's a sharp tool with risks and
benefits and it deserves to be presented exactly that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Robert Haas

On Thu, Oct 20, 2016 at 12:05 PM, Stephen Frost  wrote:
>> To be honest, I don't really like either pg_transaction or pg_xact.
>
>> Neither name captures the fact that what we're really tracking here is
>> the transaction *status*.  pg_xact is slightly worse because it's a
>> poor abbreviation for transaction, but I think the argument against
>> even pg_transaction is similar to the one Tom previously levied
>> against pg_logical - viz. "logical what?".  The transaction themselves
>> are not stored in the directory, just the commit status.  The fact
>> that commit status is saved is the source of the "c" in "clog".
>
> This really needs to move forward also.
>
> When it comes to the name, I tend to think of 'pg_xact' as saying "this
> is where we persist info we need to keep about transactions."  Today
> that's just the commit status info, but I could imagine that there
> might, someday, be other things that go in there.  "pg_multixact" is
> an example of something quite similar but does have more than just one
> "thing."  Also, using "pg_xact" and then "pg_subxact" and "pg_multixact"
> bring them all under one consistent naming scheme.

I don't dispute the fact that you tend to think of it that way, but I
think it's a real stretch to say that "pg_xact" is a clear name from
the point of view of the uninitiated.  Now, maybe the point is to be a
little bit deliberately unclear, but "xact" for "transaction" is not a
lot better than "xlog" for "write-ahead log".  It's just arbitrary
abbreviations we made up and you either know what they mean or you
don't.  We could call it "pg_xkcd" and we wouldn't be removing much in
the way of clarity.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove autovacuum GUC?

2016-10-20 Thread Joshua D. Drake


On 10/20/2016 09:24 AM, Robert Haas wrote:

On Thu, Oct 20, 2016 at 11:53 AM, Joshua D. Drake  
wrote:

The right answer isn't the answer founded in the reality for many if not
most of our users.


I think that's high-handed nonsense.  Sure, there are some
unsophisticated users who do incredibly stupid things and pay the
price for it, but there are many users who are very sophisticated and
make good decisions and don't want or need the system itself to act as
a nanny.  When we constrain the range of possible choices because


That argument suggests we shouldn't have autovacuum :P


somebody might do the wrong thing, sophisticated users are hurt
because they can no longer make meaningful choices, and stupid users
still get in trouble because that's what being stupid does.  The only
way to fix that is to help people be less stupid.  You can't tell
adult users running enterprise-grade software "I'm sorry, Dave, I
can't do that".  Or at least it can cause a few problems.


As mentioned in an other reply, I am not suggesting we can't turn off 
autovacuum as a whole. Heck, I even suggested being able to turn it off 
per database (versus just per table). I am suggesting that we get rid of 
a foot gun for unsophisticated (and thus majority of our users).





I mean that the right answer for -hackers isn't necessarily the right answer
for users. Testing? Users don't test. They deploy. Education? If most people
read the docs, CMD and a host of other companies would be out of business.


I've run into these kinds of situations, but I know for a fact that
there are quite a few EnterpriseDB customers who test extremely
thoroughly, read the documentation in depth, and really understand the
system at a very deep level.


Those aren't exactly the users we are talking about are we? I also run 
into those users all the time.




And I've seen customers solve real production problems by doing
exactly that, and scheduling vacuums manually.  Have I seen people


1 != 10


cause bigger problems than the ones they were trying to solve?  Yes.
Have I recommended something a little less aggressive than completely
shutting autovacuum off as perhaps being a better solution?  Of
course.  But I'm not going to sit here and tell somebody "well, you
know, what you are doing is working whereas the old thing was not
working, but trust me, the way that didn't work was way better...".



I find it interesting that we are willing to do that every time we add a 
feature but once we have that feature it is like pulling teeth to show 
the people that implemented those features that some people don't think 
it was better :P


Seriously though. I am only speaking from experience from 20 years of 
customers. CMD also has customers just like yours but we also run into 
lots and lots of people that still do really silly things like we have 
already discussed.


Sincerely,

Joshua D. Drake


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove autovacuum GUC?

2016-10-20 Thread Peter Geoghegan

On Thu, Oct 20, 2016 at 9:24 AM, Robert Haas  wrote:
> On Thu, Oct 20, 2016 at 11:53 AM, Joshua D. Drake  
> wrote:
>> The right answer isn't the answer founded in the reality for many if not
>> most of our users.
>
> I think that's high-handed nonsense.  Sure, there are some
> unsophisticated users who do incredibly stupid things and pay the
> price for it, but there are many users who are very sophisticated and
> make good decisions and don't want or need the system itself to act as
> a nanny.  When we constrain the range of possible choices because
> somebody might do the wrong thing, sophisticated users are hurt
> because they can no longer make meaningful choices, and stupid users
> still get in trouble because that's what being stupid does.  The only
> way to fix that is to help people be less stupid.  You can't tell
> adult users running enterprise-grade software "I'm sorry, Dave, I
> can't do that".  Or at least it can cause a few problems.

+1

I really don't like this paternalistic mindset.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove autovacuum GUC?

2016-10-20 Thread Robert Haas

On Thu, Oct 20, 2016 at 11:53 AM, Joshua D. Drake  
wrote:
> The right answer isn't the answer founded in the reality for many if not
> most of our users.

I think that's high-handed nonsense.  Sure, there are some
unsophisticated users who do incredibly stupid things and pay the
price for it, but there are many users who are very sophisticated and
make good decisions and don't want or need the system itself to act as
a nanny.  When we constrain the range of possible choices because
somebody might do the wrong thing, sophisticated users are hurt
because they can no longer make meaningful choices, and stupid users
still get in trouble because that's what being stupid does.  The only
way to fix that is to help people be less stupid.  You can't tell
adult users running enterprise-grade software "I'm sorry, Dave, I
can't do that".  Or at least it can cause a few problems.

> I mean that the right answer for -hackers isn't necessarily the right answer
> for users. Testing? Users don't test. They deploy. Education? If most people
> read the docs, CMD and a host of other companies would be out of business.

I've run into these kinds of situations, but I know for a fact that
there are quite a few EnterpriseDB customers who test extremely
thoroughly, read the documentation in depth, and really understand the
system at a very deep level.   I can't say whether the majority of our
customers fall into that category, but we certainly spend a lot of
time working with the ones who do.

> I am not saying I have the right solution but I am saying I think we need a
> *different* solution. Something that limits a *USERS* choice to turn off
> autovacuum. If -hackers need testing or enterprise developers need testing,
> let's account for that but for the user that says this:
>
> My machine/instance bogs down every time autovacuum runs, oh I can turn it
> off

And I've seen customers solve real production problems by doing
exactly that, and scheduling vacuums manually.  Have I seen people
cause bigger problems than the ones they were trying to solve?  Yes.
Have I recommended something a little less aggressive than completely
shutting autovacuum off as perhaps being a better solution?  Of
course.  But I'm not going to sit here and tell somebody "well, you
know, what you are doing is working whereas the old thing was not
working, but trust me, the way that didn't work was way better...".

> Let's fix *that* problem.

I will say again that I do not think that problem has a technical
solution.  It is a problem of knowledge, not technology.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Joshua D. Drake


On 10/20/2016 09:12 AM, Stephen Frost wrote:

* Tom Lane (t...@sss.pgh.pa.us) wrote:

Robert Haas  writes:



That said, I'd also like to see a --force or similar option or mechanism
put in place to reduce the risk of users trashing their system because
they think pg_resetwal is "safe." ("It's just gonna reset things to make
the database start again, should be fine.").

pg_destroydb almost seems like a better choice, though I suppose
'pg_clearwal' would be more acceptable.  Doesn't have quite the same
impact though.


pg_dropwal

Users won't *drop* things they shouldn't on purpose (usually) but they 
will reset and will clear them. Destroydb isn't anymore accurate because 
it doesn't destroy it. Instead it makes it so I can log in again and see 
my data.


(Yes we all know the real implications with it but from a DUH user 
perspective...)


Sincerely,

JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > One idea would be to rename pg_resetxlog to pg_resetwal.  I think
> > that's actually an improvement.
> 
> This would fit in as part of a general plan to s/xlog/wal/g throughout
> our user-visible names and documentation.  Which seems like a good idea
> to me; there's no need to expose two different terms for the same thing.
> 
> (I don't feel a great need to unify the terminology in the code, though.
> Just in stuff users see.)

+1 on the general change of xlog -> wal.

That said, I'd also like to see a --force or similar option or mechanism
put in place to reduce the risk of users trashing their system because
they think pg_resetwal is "safe." ("It's just gonna reset things to make
the database start again, should be fine.").

pg_destroydb almost seems like a better choice, though I suppose
'pg_clearwal' would be more acceptable.  Doesn't have quite the same
impact though.

Not sure on the best answer here, but it's definitely foot-gun that some
users have ended up using on themselves with depressing regularity.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Indirect indexes

2016-10-20 Thread Pavan Deolasee

On Thu, Oct 20, 2016 at 9:20 PM, Claudio Freire 
wrote:

>
>
> With indirect indexes, since you don't need to insert a tid, you can
> just "insert on conflict do nothing" on the index.
>

Would that work with non-unique indexes? Anyways, the point I was trying to
make is that there are a similar technical challenges and we could solve it
for WARM as well with your work for finding conflicting  pairs
and then not doing inserts. My thinking currently is that it will lead to
other challenges, especially around vacuum, but I could be wrong.

What I tried to do with initial WARM patch is to show significant
improvement even with just 50% WARM updates, yet keep the patch simple. But
there are of course several things we can do to improve it further and
support other index types.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> On Wed, Oct 12, 2016 at 10:22 PM, Michael Paquier
>  wrote:
> > OK. I can live with that as well. Attached are three patches. The
> > pg_xlog -> pg_wal move, the pg_clog -> pg_transaction move, and the
> > pg_clog -> pg_xact move. Only one survivor to be chosen among the last
> > two ones.
> 
> Committed 0001.

Glad to see that happen, finally.

> To be honest, I don't really like either pg_transaction or pg_xact.

> Neither name captures the fact that what we're really tracking here is
> the transaction *status*.  pg_xact is slightly worse because it's a
> poor abbreviation for transaction, but I think the argument against
> even pg_transaction is similar to the one Tom previously levied
> against pg_logical - viz. "logical what?".  The transaction themselves
> are not stored in the directory, just the commit status.  The fact
> that commit status is saved is the source of the "c" in "clog".

This really needs to move forward also.

When it comes to the name, I tend to think of 'pg_xact' as saying "this
is where we persist info we need to keep about transactions."  Today
that's just the commit status info, but I could imagine that there
might, someday, be other things that go in there.  "pg_multixact" is
an example of something quite similar but does have more than just one
"thing."  Also, using "pg_xact" and then "pg_subxact" and "pg_multixact"
bring them all under one consistent naming scheme.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Disable autovacuum guc?

2016-10-20 Thread Joshua D. Drake


On 10/20/2016 08:54 AM, Josh Berkus wrote:

On 10/20/2016 06:34 AM, Joshua D. Drake wrote:

On 10/19/2016 07:22 PM, Josh Berkus wrote:

On 10/19/2016 06:27 PM, Joshua D. Drake wrote:



Hrm, true although that is by far a minority of our users. What if we
made it so we disabled the autovacuum guc but made it so you could
disable autovacuum per database (ALTER DATABASE SET or something such
thing?).


Well, that wouldn't fix the problem; people would just disable it per
database, even if it was a bad idea.


I doubt this very much. It requires a different level of sophistication.

General users (not you, not me, and certainly not Haas or Lane) don't 
run anything but an application backed to an ORM. They understand a conf 
file but they aren't going to touch anything they consider

"underneath".

Sincerely,

JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disable autovacuum guc?

2016-10-20 Thread Josh Berkus

On 10/20/2016 06:34 AM, Joshua D. Drake wrote:
> On 10/19/2016 07:22 PM, Josh Berkus wrote:
>> On 10/19/2016 06:27 PM, Joshua D. Drake wrote:
>>> Hello,
>>>
>>> After all these years, we are still regularly running into people who
>>> say, "performance was bad so we disabled autovacuum". I am not talking
>>> about once in a while, it is often. I would like us to consider removing
>>> the autovacuum option. Here are a few reasons:
>>>
>>> 1. It does not hurt anyone
>>> 2. It removes a foot gun
>>> 3. Autovacuum is *not* optional, we shouldn't let it be
>>> 4. People could still disable it at the table level for those tables
>>> that do fall into the small window of, no maintenance is o.k.
>>> 5. People would still have the ability to decrease the max_workers to 1
>>> (although I could argue about that too).
>>
>> People who run data warehouses where all of the data comes in as batch
>> loads regularly disable autovacuum, and should do so.  For the DW/batch
>> load use-case, it makes far more sense to do batch loads interspersed
>> with ANALYZEs and VACUUMS of loaded/updated tables.
> 
> Hrm, true although that is by far a minority of our users. What if we
> made it so we disabled the autovacuum guc but made it so you could
> disable autovacuum per database (ALTER DATABASE SET or something such
> thing?).

Well, that wouldn't fix the problem; people would just disable it per
database, even if it was a bad idea.

If I can't get rid of vacuum_defer_cleanup_age, you're not going to be
able to get rid of autovacuum.

Now, if you want to "fix" this issue, one thing which would help a lot
is making it possible to disable/enable Autovacuum on one table without
exclusive locking (for the ALTER statement), and create a how-to doc
somewhere which explains how and why to disable autovac per-table.  Most
of our users don't know that it's even possible to adjust this per-table.


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove autovacuum GUC?

2016-10-20 Thread Joshua D. Drake


On 10/20/2016 07:12 AM, Robert Haas wrote:

On Wed, Oct 19, 2016 at 9:24 PM, Joshua D. Drake  wrote:

Setting autovacuum=off is at least useful for testing purposes and
I've used it that way.  On the other hand, I haven't seen a customer
disable this unintentionally in years.  Generally, the customers I've
worked with have found subtler ways of hosing themselves with
autovacuum.  One of my personal favorites is autovacuum_naptime='1 d'
-- for the record, that did indeed work out very poorly.


Yes, I have seen that as well and you are right, it ends poorly.



I think that this the kind of problem that can only properly be solved
by education.  If somebody thinks that they want to turn off
autovacuum, and you keep them from turning it off, they just get
frustrated.  Sometimes, they then find a back-door way of getting what


I think I am coming at this from a different perspective than the 
-hackers. Let me put this another way.


The right answer isn't the answer founded in the reality for many if not 
most of our users.


What do I mean by that?

I mean that the right answer for -hackers isn't necessarily the right 
answer for users. Testing? Users don't test. They deploy. Education? If 
most people read the docs, CMD and a host of other companies would be 
out of business.


I am not saying I have the right solution but I am saying I think we 
need a *different* solution. Something that limits a *USERS* choice to 
turn off autovacuum. If -hackers need testing or enterprise developers 
need testing, let's account for that but for the user that says this:


My machine/instance bogs down every time autovacuum runs, oh I can turn 
it off


Let's fix *that* problem.

Sincerely,

JD



--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indirect indexes

2016-10-20 Thread Claudio Freire

On Thu, Oct 20, 2016 at 12:30 PM, Pavan Deolasee
 wrote:
>
>
> On Thu, Oct 20, 2016 at 8:44 PM, Petr Jelinek  wrote:
>>
>>
>>
>> WARM can do WARM update 50% of time, indirect index can do HOT update
>> 100% of time (provided the column is not changed), I don't see why we
>> could not have both solutions.
>>
>
> I think the reason why I restricted WARM to one update per chain, also
> applies to indirect index. For example, if a indirect column value is
> changed from 'a' to 'b' and back to 'a', there will be two pointers from 'a'
> to the PK and AFAICS that would lead to the same duplicate scan issue.
>
> We have a design to convert WARM chains back to HOT and that will increase
> the percentage of WARM updates much beyond 50%. I was waiting for feedback
> on the basic patch before putting in more efforts, but it went unnoticed
> last CF.

With indirect indexes, since you don't need to insert a tid, you can
just "insert on conflict do nothing" on the index.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Tom Lane

Robert Haas  writes:
> One idea would be to rename pg_resetxlog to pg_resetwal.  I think
> that's actually an improvement.

This would fit in as part of a general plan to s/xlog/wal/g throughout
our user-visible names and documentation.  Which seems like a good idea
to me; there's no need to expose two different terms for the same thing.

(I don't feel a great need to unify the terminology in the code, though.
Just in stuff users see.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Robert Haas

On Thu, Oct 20, 2016 at 3:36 AM, Dilip Kumar  wrote:
> On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas  wrote:
>> I agree with these conclusions.  I had a chance to talk with Andres
>> this morning at Postgres Vision and based on that conversation I'd
>> like to suggest a couple of additional tests:
>>
>> 1. Repeat this test on x86.  In particular, I think you should test on
>> the EnterpriseDB server cthulhu, which is an 8-socket x86 server.
>
> I have done my test on cthulhu, basic difference is that In POWER we
> saw ClogControlLock on top at 96 and more client with 300 scale
> factor. But, on cthulhu at 300 scale factor transactionid lock is
> always on top. So I repeated my test with 1000 scale factor as well on
> cthulhu.

So the upshot appears to be that this problem is a lot worse on power2
than cthulhu, which suggests that this is architecture-dependent.  I
guess it could also be kernel-dependent, but it doesn't seem likely,
because:

power2: Red Hat Enterprise Linux Server release 7.1 (Maipo),
3.10.0-229.14.1.ael7b.ppc64le
cthulhu: CentOS Linux release 7.2.1511 (Core), 3.10.0-229.7.2.el7.x86_64

So here's my theory.  The whole reason why Tomas is having difficulty
seeing any big effect from these patches is because he's testing on
x86.  When Dilip tests on x86, he doesn't see a big effect either,
regardless of workload.  But when Dilip tests on POWER, which I think
is where he's mostly been testing, he sees a huge effect, because for
some reason POWER has major problems with this lock that don't exist
on x86.

If that's so, then we ought to be able to reproduce the big gains on
hydra, a community POWER server.  In fact, I think I'll go run a quick
test over there right now...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Robert Haas

On Wed, Oct 12, 2016 at 10:22 PM, Michael Paquier
 wrote:
> OK. I can live with that as well. Attached are three patches. The
> pg_xlog -> pg_wal move, the pg_clog -> pg_transaction move, and the
> pg_clog -> pg_xact move. Only one survivor to be chosen among the last
> two ones.

Committed 0001.

To be honest, I don't really like either pg_transaction or pg_xact.
Neither name captures the fact that what we're really tracking here is
the transaction *status*.  pg_xact is slightly worse because it's a
poor abbreviation for transaction, but I think the argument against
even pg_transaction is similar to the one Tom previously levied
against pg_logical - viz. "logical what?".  The transaction themselves
are not stored in the directory, just the commit status.  The fact
that commit status is saved is the source of the "c" in "clog".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indirect indexes

2016-10-20 Thread Petr Jelinek

On 20/10/16 17:24, Bruce Momjian wrote:
> On Thu, Oct 20, 2016 at 05:14:51PM +0200, Petr Jelinek wrote:
>>> Also, it seems indirect indexes would be useful for indexing columns
>>> that are not updated frequently on tables that are updated frequently,
>>> and whose primary key is not updated frequently.  That's quite a logic
>>> problem for users to understand.
>>>
>>
>> Which covers like 99.9% of problematic cases I see on daily basis.
>>
>> And by that logic we should not have indexes at all, they are not
>> automatically created and user needs to think about if they need them or
>> not.
> 
> Do you have to resort to extreme statements to make your point?  The use
> of indexes is clear to most users, while the use of indirect indexes
> would not be, as I stated earlier.
> 

Not extreme statement just pointing flaw in that logic. People need to
understand same limitation for example when using most of current
trigger-based replication systems as they don't support pkey updates.
And no, many users don't know when to use indexes and which one is most
appropriate even though indexes have been here for decades.

The fact that some feature is not useful for everybody never stopped us
from adding it before, especially when it can be extremely useful to some.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Tomas Vondra


On 10/20/2016 09:36 AM, Dilip Kumar wrote:

On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas  wrote:

I agree with these conclusions.  I had a chance to talk with Andres
this morning at Postgres Vision and based on that conversation I'd
like to suggest a couple of additional tests:

1. Repeat this test on x86.  In particular, I think you should test on
the EnterpriseDB server cthulhu, which is an 8-socket x86 server.


I have done my test on cthulhu, basic difference is that In POWER we
saw ClogControlLock on top at 96 and more client with 300 scale
factor. But, on cthulhu at 300 scale factor transactionid lock is
always on top. So I repeated my test with 1000 scale factor as well on
cthulhu.

All configuration is same as my last test.

Test with 1000 scale factor
-

Test1: number of clients: 192

Head:
tps = 21206.108856 (including connections establishing)
tps = 21206.245441 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt
 310489  LWLockNamed | CLogControlLock
 296152  |
  35537  Lock| transactionid
  15821  LWLockTranche   | buffer_mapping
  10342  LWLockTranche   | buffer_content
   8427  LWLockTranche   | clog
   3961
   3165  Lock| extend
   2861  Lock| tuple
   2781  LWLockNamed | ProcArrayLock
   1104  LWLockNamed | XidGenLock
745  LWLockTranche   | lock_manager
371  LWLockNamed | CheckpointerCommLock
 70  LWLockTranche   | wal_insert
  5  BufferPin   | BufferPin
  3  LWLockTranche   | proc

Patch:
tps = 28725.038933 (including connections establishing)
tps = 28725.367102 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt
 540061  |
  57810  LWLockNamed | CLogControlLock
  36264  LWLockTranche   | buffer_mapping
  29976  Lock| transactionid
   4770  Lock| extend
   4735  LWLockTranche   | clog
   4479  LWLockNamed | ProcArrayLock
   4006
   3955  LWLockTranche   | buffer_content
   2505  LWLockTranche   | lock_manager
   2179  Lock| tuple
   1977  LWLockNamed | XidGenLock
905  LWLockNamed | CheckpointerCommLock
222  LWLockTranche   | wal_insert
  8  LWLockTranche   | proc

Test2: number of clients: 96

Head:
tps = 25447.861572 (including connections establishing)
tps = 25448.012739 (excluding connections establishing)
 261611  |
  69604  LWLockNamed | CLogControlLock
   6119  Lock| transactionid
   4008
   2874  LWLockTranche   | buffer_mapping
   2578  LWLockTranche   | buffer_content
   2355  LWLockNamed | ProcArrayLock
   1245  Lock| extend
   1168  LWLockTranche   | clog
232  Lock| tuple
217  LWLockNamed | CheckpointerCommLock
160  LWLockNamed | XidGenLock
158  LWLockTranche   | lock_manager
 78  LWLockTranche   | wal_insert
  5  BufferPin   | BufferPin

Patch:
tps = 32708.368938 (including connections establishing)
tps = 32708.765989 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_96_ul.txt
 326601  |
   7471  LWLockNamed | CLogControlLock
   5387  Lock| transactionid
   4018
   3331  LWLockTranche   | buffer_mapping
   3144  LWLockNamed | ProcArrayLock
   1372  Lock| extend
722  LWLockTranche   | buffer_content
393  LWLockNamed | XidGenLock
237  LWLockTranche   | lock_manager
234  Lock| tuple
194  LWLockTranche   | clog
 96  Lock| relation
 88  LWLockTranche   | wal_insert
 34  LWLockNamed | CheckpointerCommLock

Test3: number of clients: 64

Head:

tps = 28264.194438 (including connections establishing)
tps = 28264.336270 (excluding connections establishing)

 218264  |
  10314  LWLockNamed | CLogControlLock
   4019
   2067  Lock| transactionid
   1950  LWLockTranche   | buffer_mapping
   1879  LWLockNamed | ProcArrayLock
592  Lock| extend
565  LWLockTranche   | buffer_content
222  LWLockNamed | XidGenLock
143  LWLockTranche   | clog
131  LWLockNamed | CheckpointerCommLock
 63  LWLockTranche   | lock_manager
 52  Lock| tuple
 35  LWLockTranche   | wal_insert

Patch:
tps = 27906.376194 (including connections establishing)
tps = 27906.531392 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_64_ul.txt
 228108  |
   4039
   2294  Lock| transactionid
   2116  LWLockTranche   | buffer_mapping
   1757  LWLockNamed | ProcArrayLock
   1553  LWLockNamed | CLogControlLock
800  Lock| extend
403  LWLockTranche   | buffer_content
 92  LWLockNamed | XidGenLock
 74  LWLockTranche   | lock_manager
 42  Lock| tuple
 35  LWLockTranche   | wal_insert
 34  LWLockTranche   | clog

Re: [HACKERS] Indirect indexes

2016-10-20 Thread Pavan Deolasee

On Thu, Oct 20, 2016 at 8:44 PM, Petr Jelinek  wrote:

>
>
> WARM can do WARM update 50% of time, indirect index can do HOT update
> 100% of time (provided the column is not changed), I don't see why we
> could not have both solutions.
>
>
I think the reason why I restricted WARM to one update per chain, also
applies to indirect index. For example, if a indirect column value is
changed from 'a' to 'b' and back to 'a', there will be two pointers from
'a' to the PK and AFAICS that would lead to the same duplicate scan issue.

We have a design to convert WARM chains back to HOT and that will increase
the percentage of WARM updates much beyond 50%. I was waiting for feedback
on the basic patch before putting in more efforts, but it went unnoticed
last CF.


> That all being said, it would be interesting to hear Álvaro's thoughts
> about which use-cases he expects indirect indexes to work better than WARM.
>
>
Yes, will be interesting to see that comparison. May be we need both or may
be just one. Even better may be they complement each other.. I'll also put
in some thoughts in this area.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Indirect indexes

2016-10-20 Thread Bruce Momjian

On Thu, Oct 20, 2016 at 05:14:51PM +0200, Petr Jelinek wrote:
> > Also, it seems indirect indexes would be useful for indexing columns
> > that are not updated frequently on tables that are updated frequently,
> > and whose primary key is not updated frequently.  That's quite a logic
> > problem for users to understand.
> > 
> 
> Which covers like 99.9% of problematic cases I see on daily basis.
> 
> And by that logic we should not have indexes at all, they are not
> automatically created and user needs to think about if they need them or
> not.

Do you have to resort to extreme statements to make your point?  The use
of indexes is clear to most users, while the use of indirect indexes
would not be, as I stated earlier.

> Also helping user who does not have performance problem by 1% is very
> different from helping user who has performance problem by 50% even if
> she needs to think about the solution a bit.
> 
> WARM can do WARM update 50% of time, indirect index can do HOT update
> 100% of time (provided the column is not changed), I don't see why we
> could not have both solutions.

We don't know enough about the limits of WARM to say it is limited to
50%.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indirect indexes

2016-10-20 Thread Claudio Freire

On Thu, Oct 20, 2016 at 12:14 PM, Petr Jelinek  wrote:
> WARM can do WARM update 50% of time, indirect index can do HOT update
> 100% of time (provided the column is not changed), I don't see why we
> could not have both solutions.
>
> That all being said, it would be interesting to hear Álvaro's thoughts
> about which use-cases he expects indirect indexes to work better than WARM.

I'm not Alvaro, but it's quite evident that indirect indexes don't
need space on the same page to get the benefits of HOT update (even
though it wouldn't be HOT).

That's a big difference IMO.

That said, WARM isn't inherently limited to 50%, but it *is* limited
to HOT-like updates (new tuple is in the same page as the old), and
since in many cases that is a limiting factor for HOT updates, one can
expect WARM will be equally limited.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Fix invalid XML explain plans for track_io_timing

2016-10-20 Thread Tom Lane

Markus Winand  writes:
> The XML output of explain potentially outputs the XML tag names 
> "I/O-Write-Time"
> and "I/O-Read-Time", which are invalid due to the slash.

Ooops.

> Although the patch fixes the problem for the moment, it is incomplete in that
> sense that it continues to check against an incomplete black list. I guess
> this is how it slipped in: XML explain was added in 9.0, I/O timings in 9.2.

Yeah.  The whitelist approach would look something like

appendStringInfoChar(es->str, strchr(XMLCHARS, *s) ? *s : '-');

which would be quite a few more cycles than just testing for ' ' and '/'.
So I'm not sure it's worth it.  On the other hand, I have little faith
that we wouldn't make a similar mistake in future.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indirect indexes

2016-10-20 Thread Petr Jelinek

On 20/10/16 14:29, Bruce Momjian wrote:
> On Wed, Oct 19, 2016 at 01:04:16PM -0400, Bruce Momjian wrote:
>> On Wed, Oct 19, 2016 at 06:58:05PM +0200, Simon Riggs wrote:
> I agree. Also, I think the recheck mechanism will have to be something 
> like
> what I wrote for WARM i.e. only checking for index quals won't be enough 
> and we
> would actually need to verify that the heap tuple satisfies the key in the
> indirect index.

 I personally would like to see how far we get with WARM before adding
 this feature that requires a DBA to evaluate and enable it.
>>>
>>> Assuming WARM is accepted, that *might* be fine.
>>
>> First, I love WARM because everyone gets the benefits by default.  For
>> example, a feature that improves performance by 10% but is only used by
>> 1% of users has a usefulness of 0.1% --- at least that is how I think of
>> it.
> 
> Just to clarify, if a feature improves performance by 1%, but is enabled
> by default, that is 10x more useful across our entire user base as the
> feature numbers listed above, 1% vs 0.1%.
> 
> Also, it seems indirect indexes would be useful for indexing columns
> that are not updated frequently on tables that are updated frequently,
> and whose primary key is not updated frequently.  That's quite a logic
> problem for users to understand.
> 

Which covers like 99.9% of problematic cases I see on daily basis.

And by that logic we should not have indexes at all, they are not
automatically created and user needs to think about if they need them or
not.

Also helping user who does not have performance problem by 1% is very
different from helping user who has performance problem by 50% even if
she needs to think about the solution a bit.

WARM can do WARM update 50% of time, indirect index can do HOT update
100% of time (provided the column is not changed), I don't see why we
could not have both solutions.

That all being said, it would be interesting to hear Álvaro's thoughts
about which use-cases he expects indirect indexes to work better than WARM.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] File content logging during execution of COPY queries

2016-10-20 Thread Stephen Frost

Grigory,

* Grigory Smolkin (g.smol...@postgrespro.ru) wrote:
> On 10/20/2016 12:36 PM, Aleksander Alekseev wrote:
> According to my colleagues it would be very nice to have this feature.
> For instance, if you are trying to optimize PostgreSQL for application
> that uses COPY and you don't have access to or something like this.
> It could also be useful in some other cases.
> >>>This use-case doesn't really make much sense to me.  Can you explain it
> >>>in more detail?  Is the goal here to replicate all of the statements
> >>>that are changing data in the database?
> >>The idea is to record application workload in real environment and write
> >>a benchmark based on this record. Then using this benchmark we could try
> >>different OS/DBMS configuration (or maybe hardware), find an extremum,
> >>then change configuration in production environment.
> >>
> >>It's not always possible to change an application or even database (e.g.
> >>to use triggers) for this purpose. For instance, if DBMS is provided as
> >>a service.
> >>
> >>Currently PostgreSQL allows to record all workload _except_ COPY
> >>queries. Considering how easily it could be done I think it's wrong.
> >>Basically the only real question here is how it should look like in
> >>postgresql.conf.
> >OK, how about introducing a new boolean parameter named log_copy?
> >Corresponding patch is attached.
>
> This is a useful feature I was waiting for some time.
> If some application which workload you want to collect is using COPY
> statement, then recording network traffic was your only option.

As I pointed out to Aleksander, you would still need to record network
traffic to see all of the data going to and from the database since we
do not include SELECT or ... RETURNING results in the log files.  If
that is needed then that's a whole different discussion.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-20 Thread Robert Haas

On Wed, Oct 19, 2016 at 7:05 AM, Christoph Berg  wrote:
> (tl;dr: rename pg_xlog yes, rename pg_resetxlog only if we have a good
> alternative.)

I'm amused by the idea of a TL;DR in parentheses at the very bottom of
the email, but maybe I'm just easily amused.

One idea would be to rename pg_resetxlog to pg_resetwal.  I think
that's actually an improvement.  The "x" in "xlog" is not a clear
abbreviation for "write-ahead", especially when we also use "x" in
other contexts to mean "transaction" - think "xid".  Given our current
practices, we could justifiably rename pg_clog to pg_xlog -- after
all, that's where we log the status of the xacts.  Ugh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WIP: Fix invalid XML explain plans for track_io_timing

2016-10-20 Thread Markus Winand

Hi!

The XML output of explain potentially outputs the XML tag names "I/O-Write-Time"
and "I/O-Read-Time", which are invalid due to the slash.

This can easily be seen with this:
   set track_io_timing = on;
   explain (analyze true, buffers true, format xml) select 1;

   [...]
   0.000
   0.000
   [...]

Attached is a patch against master that translates slashes to dashes during XML
formatting (very much like spaces are already translated to dashes).

Removing the slash from those property names is another option, but is an
incompatible change to the other formats (neither JSON nor YAML have issues
with '/‘ in key names).

Although the patch fixes the problem for the moment, it is incomplete in that
sense that it continues to check against an incomplete black list. I guess
this is how it slipped in: XML explain was added in 9.0, I/O timings in 9.2.

Checking against an (abbreviated?) white list would be more future proof if new
explain properties are added. Let me know if you consider this a better 
approach.

I've also done a simple check to see if there are other dangerous
characters used in explain properties at the moment:

   sed -n 's/.*ExplainProperty[^(]*(\s*"\([^"]*\)\".*/\1/p' 
src/backend/commands/explain.c |grep  '[^-A-Za-z /]'

Result: none.

A similar check could be used at build-time to prevent introducing new property
names that invalidate the XML output (not sure if this could ever reach 100%
safety).

Comments?

--
Markus Winand - winand.at





0001-Fix-invalid-XML-explain-plans-for-track_io_timing.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] File content logging during execution of COPY queries

2016-10-20 Thread Grigory Smolkin




On 10/20/2016 12:36 PM, Aleksander Alekseev wrote:

According to my colleagues it would be very nice to have this feature.
For instance, if you are trying to optimize PostgreSQL for application
that uses COPY and you don't have access to or something like this.
It could also be useful in some other cases.

This use-case doesn't really make much sense to me.  Can you explain it
in more detail?  Is the goal here to replicate all of the statements
that are changing data in the database?

The idea is to record application workload in real environment and write
a benchmark based on this record. Then using this benchmark we could try
different OS/DBMS configuration (or maybe hardware), find an extremum,
then change configuration in production environment.

It's not always possible to change an application or even database (e.g.
to use triggers) for this purpose. For instance, if DBMS is provided as
a service.

Currently PostgreSQL allows to record all workload _except_ COPY
queries. Considering how easily it could be done I think it's wrong.
Basically the only real question here is how it should look like in
postgresql.conf.

OK, how about introducing a new boolean parameter named log_copy?
Corresponding patch is attached.


This is a useful feature I was waiting for some time.
If some application which workload you want to collect is using COPY 
statement, then recording network traffic was your only option.


--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Avoiding pin scan during btree vacuum

2016-10-20 Thread Robert Haas

On Wed, Oct 19, 2016 at 6:30 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> On Mon, Jan 4, 2016 at 10:30 AM, Tom Lane  wrote:
>> >> This seems like a might subtle thing to backpatch. If we really want to
>> >> go there, ISTM that the relevant code should stew in an unreleased
>> >> branch for a while, before being backpatched.
>> >
>> > I'm definitely -1 on back-patching such a thing.  Put it in HEAD for
>> > awhile.  If it survives six months or so then we could discuss it again.
>>
>> I agree with Tom.
>
> Okay, several months have passed with this in the development branch and
> now seems a good time to backpatch this all the way back to 9.4.
>
> Any objections?

Although the code has now been in the tree for six months, it's only
been in a released branch for three weeks, which is something about
which to think.

I guess what's really bothering me is that I don't think this is a bug
fix.  It seems like a performance optimization.  And I think that we
generally have a policy that we don't back-patch performance
optimizations.  Of course, there is some fuzziness because when the
performance gets sufficiently bad, we sometimes decide that it amounts
to a bug, as in 73cc7d3b240e1d46b1996382e5735a820f8bc3f7.  Maybe this
case qualifies for similar treatment, but I'm not sure.

Why are you talking about back-patching this to 9.4 rather than all
supported branches?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-20 Thread Robert Haas

On Tue, Oct 18, 2016 at 6:27 PM, Petr Jelinek  wrote:
> I agree though that the usability beyond the ReoderBuffer is limited
> because everything that will want to use it for part of allocations will
> get much more complicated by the fact that it will have to use two
> different allocators.
>
> I was wondering if rather than trying to implement new allocator we
> should maybe implement palloc_fixed which would use some optimized
> algorithm for fixed sized objects in our current allocator. The
> advantage of that would be that we could for example use that for things
> like ListCell easily (memory management of which I see quite often in
> profiles).

The sb_alloc allocator I proposed a couple of years ago would work
well for this case, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] incorrect libpq comment

2016-10-20 Thread Robert Haas

On Wed, Oct 19, 2016 at 1:35 PM, Bruce Momjian  wrote:
> On Wed, Oct 19, 2016 at 01:16:28PM -0400, Robert Haas wrote:
>> Bruce's commit 5d305d86bd917723f09ab4f15c075d90586a210a back in April
>> of 2014 includes this change:
>>
>>  /* See PQconnectPoll() for how we use 'int' and not 'pgsocket'. */
>> -int sock;   /* Unix FD for socket, -1 if not connected 
>> */
>> +pgsocketsock;   /* FD for socket, PGINVALID_SOCKET if
>> unconnected */
>>
>> I suppose Bruce must have overlooked the fact that the comment on the
>> previous line is now false.  I think we should remove it, because it
>> makes no sense to say how we are using 'int' rather than 'pgsocket'
>> when in fact we are not using 'int' any more.
>
> Agreed.

Great.  Nuked it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support

2016-10-20 Thread Noah Misch

On Wed, Oct 19, 2016 at 11:08:39AM +0100, Greg Stark wrote:
> On Sat, Feb 6, 2016 at 4:52 AM, Noah Misch  wrote:
> > aset.c relies on the fact that VALGRIND_MEMPOOL_ALLOC() has an implicit
> > VALGRIND_MAKE_MEM_UNDEFINED() and VALGRIND_MEMPOOL_FREE() has an implicit
> > VALGRIND_MAKE_MEM_NOACCESS().  #define those two accordingly.  If ASAN has 
> > no
> 
> Actually this is confusing.
> 
> aset.c doesn't actually use the MEMPOOL_ALLOC macro at all, it just
> calls UNDEFINED, DEFINED, and NOACCESS. mcxt.c does however do the
> MEMPOOL_ALLOC/FREE.

Correct.

> So both layers are calling these macros for
> overlapping memory areas which I find very confusing and I'm not sure
> what the net effect is.

The net effect looks like this, at the instant an mcxt.c function returns:

NOACCESS:
- requested_size field of AllocChunkData
- Trailing padding, if any, of AllocChunkData
- Any chunk bytes after the last byte of the most recent requested size

DEFINED:
- palloc0() return value, up to requested size

UNDEFINED:
- palloc() return value, up to requested size
- repalloc() new portion, after size increase (with MEMORY_CONTEXT_CHECKING
  disabled, memory unfortunately becomes DEFINED instead)

> The MEMPOOL_FREE doesn't take any size argument and mcxt.c doesn't
> have convenient access to a size argument. It could call the
> GetChunkSpace method but that will include the allocation overhead and

That is indeed a problem for making VALGRIND_MEMPOOL_FREE() an alias of
VALGRIND_MAKE_MEM_NOACCESS() under ASAN as I suggested.  Calling
GetMemoryChunkSpace() in the macro would cause memdebug.h to embed an
assumption of mcxt.c, which is messy.  Including the allocation overhead is
fine, though.

> in any case isn't this memory already marked noaccess by aset.c?

Only sometimes, when AllocSetFree() happens to call free() or wipe_mem().


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

2016-10-20 Thread Robert Haas

On Tue, Oct 18, 2016 at 8:27 PM, Amit Kapila  wrote:
> By this problem, I mean to say deadlocks for suspended scans, that can
> happen in btree for non-Mvcc or other type of scans where we don't
> release pin during scan.  In my mind, we have below options:
>
> a. problem of deadlocks for suspended scans should be tackled as a
> separate patch as it exists for other indexes (at least for some type
> of scans).
> b. Implement page-scan mode and then we won't have deadlock problem
> for MVCC scans.
> c. Let's not care for non-MVCC scans unless we have some way to hit
> those for hash indexes and proceed with Dead tuple marking idea.  I
> think even if we don't care for non-MVCC scans, we might hit this
> problem (deadlocks) when the index relation is unlogged.
>
> Here, even if we want to go with (b), I think we can handle it in a
> separate patch, unless you think otherwise.

After some off-list discussion with Amit, I think I get his point
here: the deadlock hazard which is introduced by this patch already
exists for btree and has for a long time, and nobody's gotten around
to fixing it (although 2ed5b87f96d473962ec5230fd820abfeaccb2069
improved things).  So it's probably OK for hash indexes to have the
same issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indirect indexes

2016-10-20 Thread Alvaro Herrera

Joshua D. Drake wrote:

> That said would it be possible to make this index an extension (like rum?).
> Assuming of course we can get any required infrastructure changes done in a
> general way.

Well, the patch I currently have creates a separate index AM called
"ibtree" which is an indirect btree that internally calls the regular
btree code -- pretty much what you propose.  I think it can work that
way, but it's not as efficient as it can be done if the feature is
incorporated into core.  There are things like obtaining the primary key
value from the indexed tuple: in my extension I simply do "heap_fetch"
to obtain the heap tuple, then heap_getattr() to obtain the values from
the primary key columns.  This works, but if instead the executor passes
the primary key values as parameters, I can save both those steps, which
are slow.

Now if we do incorporate the infrastructure changes in core in a general
way, which is what I am proposing, then the in-core provided btree AM
can do indirect indexes without any further extension.  The same
infrastructure changes can later provide support for GIN indexes.

> I do think the feature has merit.

Thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove vacuum_defer_cleanup_age

2016-10-20 Thread Robert Haas

On Wed, Oct 19, 2016 at 9:09 PM, Bruce Momjian  wrote:
> On Wed, Oct 19, 2016 at 08:17:46PM -0400, Robert Haas wrote:
>> On Wed, Oct 19, 2016 at 12:59 PM, Bruce Momjian  wrote:
>> > Uh, vacuum_defer_cleanup_age sets an upper limit on how long, in terms
>> > of xids, that a standby query can run before cancel, like
>> > old_snapshot_threshold, no?
>>
>> No, not really.  It affects the behavior of the master, not the standby.
>
> But it controls when/if cancels happen on the standby.

True.  I see your point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove autovacuum GUC?

2016-10-20 Thread Robert Haas

On Wed, Oct 19, 2016 at 9:24 PM, Joshua D. Drake  wrote:
> After all these years, we are still regularly running into people who say,
> "performance was bad so we disabled autovacuum". I am not talking about once
> in a while, it is often. I would like us to consider removing the autovacuum
> option. Here are a few reasons:
>
> 1. It does not hurt anyone
> 2. It removes a foot gun
> 3. Autovacuum is *not* optional, we shouldn't let it be
> 4. People could still disable it at the table level for those tables that do
> fall into the small window of, no maintenance is o.k.
> 5. People would still have the ability to decrease the max_workers to 1
> (although I could argue about that too).

Setting autovacuum=off is at least useful for testing purposes and
I've used it that way.  On the other hand, I haven't seen a customer
disable this unintentionally in years.  Generally, the customers I've
worked with have found subtler ways of hosing themselves with
autovacuum.  One of my personal favorites is autovacuum_naptime='1 d'
-- for the record, that did indeed work out very poorly.

I think that this the kind of problem that can only properly be solved
by education.  If somebody thinks that they want to turn off
autovacuum, and you keep them from turning it off, they just get
frustrated.  Sometimes, they then find a back-door way of getting what
they want, like setting up a script to kill it whenever it starts, or
changing the other thresholds so that it barely ever runs.  But
whether they resort to such measures or not, there is no real chance
that they will be happy with PostgreSQL.  And why should they be?  It
doesn't let them configure the settings that they want to configure.
When some other program doesn't let me do what I want, I decide it's
stupid.  Pretty much the same thing here.  The only way you actually
get out from under this problem is by teaching people the right way to
think about the settings they're busy misconfiguring.

I'd actually rather go the other way with this and add a new
autovacuum setting, autovacuum=really_off, that doesn't let autovacuum
run even for wraparound.  For example, let's say I've just recovered a
badly damaged cluster using pg_resetxlog.  I want to start it up and
try to recover my data.  I do *not* want VACUUM to decide to start
removing things that I'm trying to recover.  But there's no way to
guarantee that today.  So, you can't start up the cluster, look
around, and then shut it down with the intent to change the next
transaction ID if it's not right.  Your data will be disappearing
underneath you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indirect indexes

2016-10-20 Thread Joshua D. Drake


On 10/20/2016 06:39 AM, Alvaro Herrera wrote:

Bruce Momjian wrote:



Also, it seems indirect indexes would be useful for indexing columns
that are not updated frequently on tables that are updated frequently,
and whose primary key is not updated frequently.  That's quite a logic
problem for users to understand.


I don't think we should be optimizing only for dumb users.  In any case,
updating primary key values is very rare; some would say it never
happens.


Just because a person doesn't understand a use case doesn't make them dumb.

That said would it be possible to make this index an extension (like 
rum?). Assuming of course we can get any required infrastructure changes 
done in a general way.


I do think the feature has merit.

JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indirect indexes

2016-10-20 Thread Bruce Momjian

On Thu, Oct 20, 2016 at 10:39:23AM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > Just to clarify, if a feature improves performance by 1%, but is enabled
> > by default, that is 10x more useful across our entire user base as the
> > feature numbers listed above, 1% vs 0.1%.
> 
> Great.  But not all users are alike.  We have big profile users that
> write blog posts that get echoed all over the world who would benefit
> from things that perhaps other users would not benefit that much from.
> 
> > Also, it seems indirect indexes would be useful for indexing columns
> > that are not updated frequently on tables that are updated frequently,
> > and whose primary key is not updated frequently.  That's quite a logic
> > problem for users to understand.
> 
> I don't think we should be optimizing only for dumb users.  In any case,
> updating primary key values is very rare; some would say it never
> happens.

Uh, so we are writing the database for sophisticated users who write
blog posts who make us look bad?  Really?

My point is that we need to look at the entire user base, and look at
the adoption rates of features, and figure out how to maximize that. 

You are right that sophisticated users can hit roadblocks, and we need
to give them a solution that keeps them on Postgres.  However, if we can
solve the problem in a way that everyone benefits from by default (e.g.
WARM), why not implement that instead, or at least go as far as we can
with that before adding a feature that only sophisticated users will
know to enable.  (My "logic problem" above shows that only sophisticated
users will likely use indirect indexes.)

If we have to solve a vacuum problem with indirect indexes, and WARM
also is limited by the same vacuum problem, let's fix the vacuum problem
for WARM, which will be used by all users.

In summary, I need to know what problems indirect indexes fix that WARM
cannot, or eventually cannot if given the amount of engineering work we
would need to implement indirect indexes.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indirect indexes

2016-10-20 Thread Alvaro Herrera

Bruce Momjian wrote:

> Just to clarify, if a feature improves performance by 1%, but is enabled
> by default, that is 10x more useful across our entire user base as the
> feature numbers listed above, 1% vs 0.1%.

Great.  But not all users are alike.  We have big profile users that
write blog posts that get echoed all over the world who would benefit
from things that perhaps other users would not benefit that much from.

> Also, it seems indirect indexes would be useful for indexing columns
> that are not updated frequently on tables that are updated frequently,
> and whose primary key is not updated frequently.  That's quite a logic
> problem for users to understand.

I don't think we should be optimizing only for dumb users.  In any case,
updating primary key values is very rare; some would say it never
happens.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-20 Thread Tomas Vondra


On 10/19/2016 02:51 PM, Tomas Vondra wrote:

...

>

Yeah. There are three contexts in reorder buffers:

- changes (fixed size)
- txns (fixed size)
- tuples (variable size)

The first two work perfectly fine with Slab.

The last one (tuples) is used to allocate variable-sized bits, so I've
tried to come up with something smart - a sequence of Slabs + overflow
AllocSet. I agree that in hindsight it's a bit strange, and that the
"generational" aspect is the key aspect here - i.e. it might be possible
to implement a memory context that allocates variable-length chunks but
still segregates them into generations. That is, don't build this on top
of Slab. That would also fix the issue with two allocators in GenSlab.
I'll think about this.


And here is a fairly complete prototype of this idea, adding "Gen" 
generational memory context based only on the "similar lifespan" 
assumption (and abandoning the fixed-size assumption). It's much simpler 
than GenSlab (which it's supposed to replace), and abandoning the idea 
of composing two memory contexts also fixed the warts with some of the 
API methods (e.g. repalloc).


I've also been thinking that perhaps "Gen" would be useful for all three 
contexts in ReorderBuffer - so I've done a quick test comparing the 
various combinations (using the test1() function used before).


 master   slabs   slabs+genslab   slabs+gen gens
 
   50k  18700   210 220 190  190
  100k 16   380 470 350  350
  200kN/A   750 920 740  679
  500kN/A  225022401790 1740
 1000kN/A  460050003910 3700

Where:

* master - 23ed2ba812117
* slabs  - all three contexts use Slab (patch 0001)
* slabs+genslab - third context is GenSlab (patch 0002)
* slabs+gen - third context is the new Gen (patch 0003)
* gens - all three contexts use Gen

The results are a bit noisy, but I think it's clear the new Gen context 
performs well - it actually seems a bit faster than GenSlab, and using 
only Gen for all three contexts does not hurt peformance.


This is most likely due to the trivial (practically absent) freespace 
management in Gen context, compared to both Slab and GenSlab. So the 
speed is not the only criteria - I haven't measured memory consumption, 
but I'm pretty sure there are cases where Slab consumes much less memory 
than Gen, thanks to reusing free space.


I'd say throwing away GenSlab and keeping Slab+Gen is the way to go.

There's still a fair bit of work on this, particularly implementing the 
missing API methods in Gen - GenCheck() and GenStats(). As Robert 
pointed out, there's also quite a bit of duplicated code between the 
different memory contexts (randomization and valgrind-related), so this 
needs to be moved to a shared place.


I'm also thinking that we need better names, particularly for the Gen 
allocator. It's supposed to mean Generational, although there are no 
explicit generations anymore. Slab is probably OK - it does not match 
any of the existing kernel slab allocators exactly, but it follows the 
same principles, which is what matters.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


slab-allocators-v4.tgz
Description: application/compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disable autovacuum guc?

2016-10-20 Thread Joshua D. Drake


On 10/19/2016 07:22 PM, Josh Berkus wrote:

On 10/19/2016 06:27 PM, Joshua D. Drake wrote:

Hello,

After all these years, we are still regularly running into people who
say, "performance was bad so we disabled autovacuum". I am not talking
about once in a while, it is often. I would like us to consider removing
the autovacuum option. Here are a few reasons:

1. It does not hurt anyone
2. It removes a foot gun
3. Autovacuum is *not* optional, we shouldn't let it be
4. People could still disable it at the table level for those tables
that do fall into the small window of, no maintenance is o.k.
5. People would still have the ability to decrease the max_workers to 1
(although I could argue about that too).


People who run data warehouses where all of the data comes in as batch
loads regularly disable autovacuum, and should do so.  For the DW/batch
load use-case, it makes far more sense to do batch loads interspersed
with ANALYZEs and VACUUMS of loaded/updated tables.


Hrm, true although that is by far a minority of our users. What if we 
made it so we disabled the autovacuum guc but made it so you could 
disable autovacuum per database (ALTER DATABASE SET or something such 
thing?).


Sincerely,

JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Remove autovacuum GUC?

2016-10-20 Thread Joshua D. Drake


Hello,

After all these years, we are still regularly running into people who 
say, "performance was bad so we disabled autovacuum". I am not talking 
about once in a while, it is often. I would like us to consider removing 
the autovacuum option. Here are a few reasons:


1. It does not hurt anyone
2. It removes a foot gun
3. Autovacuum is *not* optional, we shouldn't let it be
4. People could still disable it at the table level for those tables 
that do fall into the small window of, no maintenance is o.k.
5. People would still have the ability to decrease the max_workers to 1 
(although I could argue about that too).


Sincerely,

JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: File content logging during execution of COPY queries (was: Better logging of COPY queries if log_statement='all')

2016-10-20 Thread Stephen Frost

Aleksander,

* Aleksander Alekseev (a.aleks...@postgrespro.ru) wrote:
> > The idea is to record application workload in real environment and write
> > a benchmark based on this record. Then using this benchmark we could try
> > different OS/DBMS configuration (or maybe hardware), find an extremum,
> > then change configuration in production environment.
> > 
> > It's not always possible to change an application or even database (e.g.
> > to use triggers) for this purpose. For instance, if DBMS is provided as
> > a service.
> > 
> > Currently PostgreSQL allows to record all workload _except_ COPY
> > queries. Considering how easily it could be done I think it's wrong.
> > Basically the only real question here is how it should look like in
> > postgresql.conf.
> 
> OK, how about introducing a new boolean parameter named log_copy?
> Corresponding patch is attached.

The parameter would be better as 'log_copy_data', I believe.  The actual
COPY command is already logged with just 'log_statement = all', of
course.

Also..

> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index 8c25b45..84a7542 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -5205,6 +5205,20 @@ FROM pg_stat_activity;
>
>   
>  
> + 
> +  log_copy (boolean)
> +  
> +   log_copy configuration parameter
> +  
> +  
> +  
> +   
> +Controls whether file content is logged during execution of
> +COPY queries.  The default is off.
> +   
> +  
> + 

"file" isn't accurate here and I don't know that it actually makes sense
to log "COPY TO" data- we don't log the results of SELECT statements,
after all, and the use-case you outlined above (which I generally agree
is one we should consider) doesn't have any need for the data of "COPY
TO" statements to be in the log, it seems to me.

Can you elaborate on why we would want to log the data sent to the
client with a COPY TO command.  If there is a reason, why wouldn't we
want to support that for SELECT and ... RETURNING commands too?

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Fun fact about autovacuum and orphan temp tables

2016-10-20 Thread Constantin S. Pan

On Mon, 5 Sep 2016 14:54:05 +0300
Grigory Smolkin  wrote:

> Hello, hackers!
> 
> We were testing how well some application works with PostgreSQL and 
> stumbled upon an autovacuum behavior which I fail to understand.
> Application in question have a habit to heavily use temporary tables
> in funny ways.
> For example it creates A LOT of them.
> Which is ok.
> Funny part is that it never drops them. So when backend is finally 
> terminated, it tries to drop them and fails with error:
> 
> FATAL:  out of shared memory
> HINT:  You might need to increase max_locks_per_transaction
> 
> If I understand that rigth, we are trying to drop all these temp
> tables in one transaction and running out of locks to do so.
> After that postgresql.log is flooded at the rate 1k/s with messages
> like that:
> 
> LOG:  autovacuum: found orphan temp table "pg_temp_15"."tt38147" in 
> database "DB_TEST"
> 
> It produces a noticeable load on the system and it`s getting worst
> with every terminated backend or restart.
> I did some RTFS and it appears that autovacuum has no intention of 
> cleaning that orphan tables unless
> it`s wraparound time:
> 
> src/backend/postmaster/autovacuum.c
>   /* We just ignore it if the owning backend is still
> active */ 2037 if (backendID == MyBackendId || 
> BackendIdGetProc(backendID) == NULL)
>   2038 {
>   2039 /*
>   2040  * We found an orphan temp table (which was 
> probably left
>   2041  * behind by a crashed backend).  If it's so
> old as to need
>   2042  * vacuum for wraparound, forcibly drop it. 
> Otherwise just
>   2043  * log a complaint.
>   2044  */
>   2045 if (wraparound)
>   2046 {
>   2047 ObjectAddress object;
>   2048
>   2049 ereport(LOG,
>   2050 (errmsg("autovacuum: dropping
> orphan temp table \"%s\".\"%s\" in database \"%s\"",
>   2051 get_namespace_name(classForm->relnamespace),
>   2052 NameStr(classForm->relname),
>   2053 get_database_name(MyDatabaseId;
>   2054 object.classId = RelationRelationId;
>   2055 object.objectId = relid;
>   2056 object.objectSubId = 0;
>   2057 performDeletion(, DROP_CASCADE, 
> PERFORM_DELETION_INTERNAL);
>   2058 }
>   2059 else
>   2060 {
>   2061 ereport(LOG,
>   2062 (errmsg("autovacuum: found orphan 
> temp table \"%s\".\"%s\" in database \"%s\"",
>   2063 get_namespace_name(classForm->relnamespace),
>   2064 NameStr(classForm->relname),
>   2065 get_database_name(MyDatabaseId;
>   2066 }
>   2067 }
>   2068 }
> 
> 
> What is more troubling is that pg_statistic is starting to bloat
> badly.
> 
> LOG:  automatic vacuum of table "DB_TEST.pg_catalog.pg_statistic":
> index scans: 0
>  pages: 0 removed, 68225 remain, 0 skipped due to pins
>  tuples: 0 removed, 2458382 remain, 2408081 are dead but not
> yet removable
>  buffer usage: 146450 hits, 31 misses, 0 dirtied
>  avg read rate: 0.010 MB/s, avg write rate: 0.000 MB/s
>  system usage: CPU 3.27s/6.92u sec elapsed 23.87 sec
> 
> What is the purpose of keeping orphan tables around and not dropping 
> them on the spot?
> 
> 

Hey Hackers,

I tried to fix the problem with a new backend not being
able to reuse a temporary namespace when it contains
thousands of temporary tables. I disabled locking of objects
during namespace clearing process. See the patch attached.
Please tell me if there are any reasons why this is wrong.

I also added a GUC variable and changed the condition in
autovacuum to let it nuke orphan tables on its way.
See another patch attached.

Regards,
Constantin Pan
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5c8db97..d7707ac 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5665,6 +5665,20 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  autovacuum_wipe_orphan_temp_tables (boolean)
+  
+   autovacuum_wipe_orphan_temp_tables configuration parameter
+  
+  
+  
+   
+Controls whether the server should drop orphan temporary tables during
+autovacuum. This is on by default.
+   
+  
+ 
+
  
   log_autovacuum_min_duration (integer)
   
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 3768f50..e5318f4 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -123,6 +123,8 @@ int			autovacuum_vac_cost_limit;
 
 int			Log_autovacuum_min_duration = -1;
 
+bool		autovacuum_wipe_orphan_temp_tables = true;
+
 /* how long to keep

Re: [HACKERS] Indirect indexes

2016-10-20 Thread Bruce Momjian

On Wed, Oct 19, 2016 at 01:04:16PM -0400, Bruce Momjian wrote:
> On Wed, Oct 19, 2016 at 06:58:05PM +0200, Simon Riggs wrote:
> > >> I agree. Also, I think the recheck mechanism will have to be something 
> > >> like
> > >> what I wrote for WARM i.e. only checking for index quals won't be enough 
> > >> and we
> > >> would actually need to verify that the heap tuple satisfies the key in 
> > >> the
> > >> indirect index.
> > >
> > > I personally would like to see how far we get with WARM before adding
> > > this feature that requires a DBA to evaluate and enable it.
> > 
> > Assuming WARM is accepted, that *might* be fine.
> 
> First, I love WARM because everyone gets the benefits by default.  For
> example, a feature that improves performance by 10% but is only used by
> 1% of users has a usefulness of 0.1% --- at least that is how I think of
> it.

Just to clarify, if a feature improves performance by 1%, but is enabled
by default, that is 10x more useful across our entire user base as the
feature numbers listed above, 1% vs 0.1%.

Also, it seems indirect indexes would be useful for indexing columns
that are not updated frequently on tables that are updated frequently,
and whose primary key is not updated frequently.  That's quite a logic
problem for users to understand.

Also, if vacuum is difficult for indirect indexes, and also for WARM,
perhaps the vacuum problem can be solved for WARM and WARM can be used
in this case too.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Danger of automatic connection reset in psql

2016-10-20 Thread Oleksandr Shulgin

On Thu, Oct 20, 2016 at 12:28 PM, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

>
> Since this is already an improvement, I'm attaching a patch.
>
> If on the other hand, someone is pasting into psql's terminal a block of
> commands enclosed in BEGIN/COMMIT, the same bug is triggered: BEGIN doesn't
> have effect and the rest of commands run outside of transaction.
>
> Is it possible at all to protect against the latter case?  How?
>

One idea I was just suggested is to make interactive psql sessions buffer
in all available input, before it's going to block.  This way it doesn't
matter if the multiple statements are appearing on one line or are being
pasted.

Feasible?

--
Alex

Re: [HACKERS] Danger of automatic connection reset in psql

2016-10-20 Thread Oleksandr Shulgin

On Thu, Oct 20, 2016 at 12:28 PM, Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:
>
>
> I'm not the first one to discover that, a search in archives gives at
least 3 results:
>
>
https://www.postgresql.org/message-id/flat/1097174870.9273.8.camel%40ipso.snappymail.ca#1097174870.9273.8.ca...@ipso.snappymail.ca
> https://www.postgresql.org/message-id/flat/4BF6A496.2090106%40comgate.c

Sorry that one got broken, should be:

  https://www.postgresql.org/message-id/flat/4BF6A496.2090106%40comgate.cz

Re: [HACKERS] Gather Merge

2016-10-20 Thread Rushabh Lathia

On Thu, Oct 20, 2016 at 12:22 AM, Peter Geoghegan  wrote:

> On Tue, Oct 4, 2016 at 11:05 PM, Rushabh Lathia
>  wrote:
> > Query 4:  With GM 7901.480 -> Without GM 9064.776
> > Query 5:  With GM 53452.126 -> Without GM 55059.511
> > Query 9:  With GM 52613.132 -> Without GM 98206.793
> > Query 15: With GM 68051.058 -> Without GM 68918.378
> > Query 17: With GM 129236.075 -> Without GM 160451.094
> > Query 20: With GM 259144.232 -> Without GM 306256.322
> > Query 21: With GM 153483.497 -> Without GM 168169.916
> >
> > Here from the results we can see that query 9, 17 and 20 are the one
> which
> > show good performance benefit with the Gather Merge.
>
> Were all other TPC-H queries unaffected? IOW, did they have the same
> plan as before with your patch applied? Did you see any regressions?
>
>
Yes, all other TPC-H queries where unaffected with the patch. At the
initially stage of patch development I noticed the regressions, but then
realize that it is because I am not allowing leader to participate in the
GM. Later on I fixed that and after that I didn't noticed any regressions.

I assume that this patch has each worker use work_mem for its own
> sort, as with hash joins today. One concern with that model when
> testing is that you could end up with a bunch of internal sorts for
> cases with a GM node, where you get one big external sort for cases
> without one. Did you take that into consideration?
>
>
Yes, but isn't that good? Please correct me if I am missing anything.


> --
> Peter Geoghegan
>



--
Rushabh Lathia
www.EnterpriseDB.com

[HACKERS] Danger of automatic connection reset in psql

2016-10-20 Thread Oleksandr Shulgin

Hi Hackers!

When using psql interactively one might be tempted to guard potentially
destructive commands such as "UPDATE / DELETE / DROP " by starting
the input line with an explicit "BEGIN; ...".  This has the added benefit
that then you invoke the command by reverse-searching the command history,
you get it together with the guarding transaction open statement.

This, however, is not 100% safe as I've found out a few days ago.  Should
the connection to the server get lost, the first of semicolon-separated
statements, "BEGIN;" will only trigger connection reset, and if that is
successful the following command(s) are going to be executed on the newly
opened connection, but without the transaction guard.

I'm not the first one to discover that, a search in archives gives at least
3 results:

https://www.postgresql.org/message-id/flat/1097174870.9273.8.camel%40ipso.snappymail.ca#1097174870.9273.8.ca...@ipso.snappymail.ca
https://www.postgresql.org/message-id/flat/4BF6A496.2090106%40comgate.c
https://www.postgresql.org/message-id/flat/CAD3a31U%2BfSBsq%3Dtxw2G-D%2BfPND_UN-nSojrGyaD%2BhkYUzvxusQ%40mail.gmail.com

The second one even resulted in a TODO item:

  Prevent psql from sending remaining single-line multi-statement queries
  after reconnection

I was thinking that simply adding a bool flag in the pset struct, to
indicate that connection was reset during attempt to execute the last query
would do the trick, but it only helps in exactly the case described above.

Since this is already an improvement, I'm attaching a patch.

If on the other hand, someone is pasting into psql's terminal a block of
commands enclosed in BEGIN/COMMIT, the same bug is triggered: BEGIN doesn't
have effect and the rest of commands run outside of transaction.

Is it possible at all to protect against the latter case?  How?

--
Alex
From 3eae5e5d91084e0882a286bac464782701e17d21 Mon Sep 17 00:00:00 2001
From: Oleksandr Shulgin 
Date: Thu, 20 Oct 2016 12:24:48 +0200
Subject: [PATCH] psql: stop sending commands after connection reset

Previsouly an input line such as "BEGIN; UPDATE something..." could
result in UPDATE running outside of transaction if the first statement
happen to trigger connection reset.

NB: this does *not* protect from blocks of commands pasted into the
terminal.
---
 src/bin/psql/common.c   | 2 ++
 src/bin/psql/mainloop.c | 5 -
 src/bin/psql/settings.h | 3 +++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index a7789df..34a4507 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -386,6 +386,8 @@ CheckConnection(void)
 		}
 		else
 			psql_error("Succeeded.\n");
+
+		pset.conn_was_reset = true;
 	}
 
 	return OK;
diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c
index 37dfa4d..6d39ce8 100644
--- a/src/bin/psql/mainloop.c
+++ b/src/bin/psql/mainloop.c
@@ -391,11 +391,14 @@ MainLoop(FILE *source)
 			}
 
 			/* fall out of loop if lexer reached EOL */
-			if (scan_result == PSCAN_INCOMPLETE ||
+			if (pset.conn_was_reset ||
+scan_result == PSCAN_INCOMPLETE ||
 scan_result == PSCAN_EOL)
 break;
 		}
 
+		pset.conn_was_reset = false;
+
 		/* Add line to pending history if we didn't execute anything yet */
 		if (pset.cur_cmd_interactive && !line_saved_in_history)
 			pg_append_history(line, history_buf);
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 8cfe9d2..39a4be0 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -102,6 +102,9 @@ typedef struct _psqlSettings
 	FILE	   *cur_cmd_source; /* describe the status of the current main
  * loop */
 	bool		cur_cmd_interactive;
+	bool		conn_was_reset;	/* indicates that the connection was reset
+ * during the last attempt to execute an
+ * interactive command */
 	int			sversion;		/* backend server version */
 	const char *progname;		/* in case you renamed psql */
 	char	   *inputfile;		/* file being currently processed, if any */
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] File content logging during execution of COPY queries (was: Better logging of COPY queries if log_statement='all')

2016-10-20 Thread Aleksander Alekseev

> > > According to my colleagues it would be very nice to have this feature.
> > > For instance, if you are trying to optimize PostgreSQL for application
> > > that uses COPY and you don't have access to or something like this. 
> > > It could also be useful in some other cases.
> > 
> > This use-case doesn't really make much sense to me.  Can you explain it
> > in more detail?  Is the goal here to replicate all of the statements
> > that are changing data in the database?
> 
> The idea is to record application workload in real environment and write
> a benchmark based on this record. Then using this benchmark we could try
> different OS/DBMS configuration (or maybe hardware), find an extremum,
> then change configuration in production environment.
> 
> It's not always possible to change an application or even database (e.g.
> to use triggers) for this purpose. For instance, if DBMS is provided as
> a service.
> 
> Currently PostgreSQL allows to record all workload _except_ COPY
> queries. Considering how easily it could be done I think it's wrong.
> Basically the only real question here is how it should look like in
> postgresql.conf.

OK, how about introducing a new boolean parameter named log_copy?
Corresponding patch is attached.

-- 
Best regards,
Aleksander Alekseev
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8c25b45..84a7542 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5205,6 +5205,20 @@ FROM pg_stat_activity;
   
  
 
+ 
+  log_copy (boolean)
+  
+   log_copy configuration parameter
+  
+  
+  
+   
+Controls whether file content is logged during execution of
+COPY queries.  The default is off.
+   
+  
+ 
+
  
   log_replication_commands (boolean)
   
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5947e72..1863e27 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -331,6 +331,38 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
 
 
 /*
+ * Logs file content during COPY ... FROM / COPY ... TO execution if
+ * log_copy = 'on'.
+ */
+static void
+CopyLogStatement(const char* str, bool flush)
+{
+	static StringInfo logString = NULL;
+
+	if(log_copy == false)
+		return;
+
+	if(logString == NULL)
+	{
+		MemoryContext oldctx = MemoryContextSwitchTo(TopMemoryContext);
+		logString = makeStringInfo();
+		MemoryContextSwitchTo(oldctx);
+	}
+
+	appendStringInfoString(logString, str);
+
+	if(flush)
+	{
+		ereport(LOG,
+(errmsg("statement: %s", logString->data),
+ errhidestmt(true),
+ errhidecontext(true)));
+
+		resetStringInfo(logString);
+	}
+}
+
+/*
  * Send copy start/stop messages for frontend copies.  These have changed
  * in past protocol redesigns.
  */
@@ -2045,14 +2077,20 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls)
 		if (!cstate->binary)
 		{
 			if (need_delim)
+			{
 CopySendChar(cstate, cstate->delim[0]);
+CopyLogStatement(cstate->delim, false);
+			}
 			need_delim = true;
 		}
 
 		if (isnull)
 		{
 			if (!cstate->binary)
+			{
 CopySendString(cstate, cstate->null_print_client);
+CopyLogStatement(cstate->null_print_client, false);
+			}
 			else
 CopySendInt32(cstate, -1);
 		}
@@ -2062,6 +2100,9 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls)
 			{
 string = OutputFunctionCall(_functions[attnum - 1],
 			value);
+
+CopyLogStatement(string, false);
+
 if (cstate->csv_mode)
 	CopyAttributeOutCSV(cstate, string,
 		cstate->force_quote_flags[attnum - 1],
@@ -2083,6 +2124,7 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls)
 	}
 
 	CopySendEndOfRow(cstate);
+	CopyLogStatement("", true);
 
 	MemoryContextSwitchTo(oldcontext);
 }
@@ -2914,6 +2956,8 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
 	if (done && cstate->line_buf.len == 0)
 		return false;
 
+	CopyLogStatement(cstate->line_buf.data, true);
+
 	/* Parse the line into de-escaped field values */
 	if (cstate->csv_mode)
 		fldct = CopyReadAttributesCSV(cstate);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index bc9d33f..0f035ac 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -415,6 +415,8 @@ bool		log_planner_stats = false;
 bool		log_executor_stats = false;
 bool		log_statement_stats = false;		/* this is sort of all three
  * above together */
+bool		log_copy = false;
+
 bool		log_btree_build_stats = false;
 char	   *event_source;
 
@@ -1161,6 +1163,15 @@ static struct config_bool ConfigureNamesBool[] =
 		false,
 		check_log_stats, NULL, NULL
 	},
+	{
+		{"log_copy", PGC_SUSET, STATS_MONITORING,
+			gettext_noop("Writes file content during COPY queries to the server log."),
+			NULL
+		},
+		_copy,
+		false,
+		NULL, NULL, NULL
+	},
 #ifdef BTREE_BUILD_STATS
 	{
 		{"log_btree_build_stats",

Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server

2016-10-20 Thread Jeevan Chalke

On Thu, Oct 20, 2016 at 12:49 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> The patch compiles and make check-world doesn't show any failures.
>
> >>
> >
> >
> > I have tried it. Attached separate patch for it.
> > However I have noticed that istoplevel is always false (at-least for the
> > testcase we have, I get it false always). And I also think that taking
> > this decision only on PlanState is enough. Does that in attached patch.
> > To fix this, I have passed deparse_namespace to the private argument and
> > accessed it in get_special_variable(). Changes looks very simple. Please
> > have a look and let me know your views.
> > This issue is not introduced by the changes done for the aggregate push
> > down, it only got exposed as we now ship expressions in the target list.
> > Thus I think it will be good if we make these changes separately as new
> > patch, if required.
> >
>
>
> The patch looks good and pretty simple.
>
> +* expression.  However if underneath PlanState is ForeignScanState,
> then
> +* don't force parentheses.
> We need to explain why it's safe not to add paranthesis. The reason
> this function adds paranthesis so as to preserve any operator
> precedences imposed by the expression tree of which this IndexVar is
> part. For ForeignScanState, the Var may represent the root of the
> expression tree, and thus not need paranthesis. But we need to verify
> this and explain it in the comments.
>
> As you have explained this is an issue exposed by this patch;
> something not must have in this patch. If committers feel that
> aggregate pushdown needs this fix as well, please provide a patch
> addressing the above comment.
>
>
Sure.


>
> Ok. PFA patch with cosmetic changes (mostly rewording comments). Let
> me know if those look good. I am marking this patch is ready for
> committer.
>

Changes look good to me.
Thanks for the detailed review.

-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-10-20 Thread Etsuro Fujita


On 2016/10/20 16:27, Ashutosh Bapat wrote:

I wrote:

Besides
that, I modified add_rte_to_flat_rtable so that the plan's dependencies
are
tracked, but that would lead to tracking the dependencies of unreferenced
foreign tables in dead subqueries or the dependencies of foreign tables
excluded from the plan by eg, constraint exclusion.  But I thought that
would be also OK by the same reason as above.


You wrote:

If those unreferenced relations become references because of the
changes in options, we will require those query dependencies to be
recorded. So, if we are recording query dependencies, we should record
the ones on unreferenced relations as well.


I wrote:

I mean plan dependencies here, not query dependencies.



A query dependency also implies plan dependency since changing query
implies plan changes. So, if query depends upon something, so does the
plan.


Right.

Sorry, my explanation was not enough, but what I just wanted to mention 
is: for unreferenced relations in the plan (eg, foreign tables excluded 
from the plan by constraint exclusion), we don't need to record the 
dependencies of those relations on FDW-related objects in the plan 
dependency list (ie, glob->invalItems), because the changes to 
attributes of the FDW-related objects for those relations wouldn't 
affect the plan.


I wrote:

Having said that, I like the latest version (v6), so I'd vote for marking
this as Ready For Committer.



Marked that way.


Thanks!

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Gather Merge

2016-10-20 Thread Amit Kapila

On Tue, Oct 18, 2016 at 5:29 PM, Rushabh Lathia
 wrote:
> On Mon, Oct 17, 2016 at 2:26 PM, Amit Kapila 
> wrote:
>>
>> There is lot of common code between ExecGatherMerge and ExecGather.
>> Do you think it makes sense to have a common function to avoid the
>> duplicity?
>>
>> I see there are small discrepancies in both the codes like I don't see
>> the use of single_copy flag, as it is present in gather node.
>>
>
> Yes, even I thought to centrilize some things of ExecGather and
> ExecGatherMerge,
> but its really not something that is fixed. And I thought it might change
> particularly
> for the Gather Merge. And as explained by Robert single_copy doesn't make
> sense
> for the Gather Merge. I will still look into this to see if something can be
> make
> centralize.
>

Okay, I haven't thought about it, but do let me know if you couldn't
find any way to merge the code.

>>
>> 3.
>> +gather_merge_readnext(GatherMergeState * gm_state, int reader, bool
>> force)
>> {
>> ..
>> + tup = gm_readnext_tuple(gm_state, reader, force, NULL);
>> +
>> + /*
>> +
>>  * try to read more tuple into nowait mode and store it into the tuple
>> + * array.
>> +
>>  */
>> + if (HeapTupleIsValid(tup))
>> + fill_tuple_array(gm_state, reader);
>>
>> How the above read tuple is stored in array?  In anycase the above
>> interface seems slightly awkward to me. Basically, I think what you
>> are trying to do here is after reading first tuple in waitmode, you
>> are trying to fill the array by reading more tuples.  So, can't we
>> push reading of this fist tuple into that function and name it as
>> form_tuple_array().
>>
>
> Yes, you are right.
>

You have not answered my first question.  I will try to ask again, how
the tuple read by below code is stored in the array:

>> + tup = gm_readnext_tuple(gm_state, reader, force, NULL);

> First its trying to read tuple into wait-mode, and once
> it
> find tuple then its try to fill the tuple array (which basically try to read
> tuple
> into nowait-mode). Reason I keep it separate is because in case of
> initializing
> the gather merge, if we unable to read tuple from all the worker - while
> trying
> re-read, we again try to fill the tuple array for the worker who already
> produced
> atleast a single tuple (see gather_merge_init() for more details).
>

Whenever any worker produced one tuple, you already try to fill the
array in gather_merge_readnext(), so does the above code help much?

> Also I
> thought
> filling tuple array (which basically read tuple into nowait mode) and
> reading tuple
> (into wait-mode) are two separate task - and if its into separate function
> that code
> look more clear.

To me that looks slightly confusing.

> If you have any suggestion for the function name
> (fill_tuple_array)
> then I am open to change that.
>

form_tuple_array (form_tuple is used at many places in code, so it
should look okay).  I think you might want to consider forming array
even for leader, although it might not be as beneficial as for
workers, OTOH, the code will get simplified if we do that way.


Today, I observed another issue in code:

+gather_merge_init(GatherMergeState *gm_state)
{
..
+reread:
+ for (i = 0; i < nreaders + 1; i++)
+ {
+ if (TupIsNull(gm_state->gm_slots[i]) ||
+ gm_state->gm_slots[i]->tts_isempty)
+ {
+ if (gather_merge_readnext(gm_state, i, initialize ? false : true))
+ {
+ binaryheap_add_unordered(gm_state->gm_heap,
+ Int32GetDatum(i));
+ }
+ }
+ else
+ fill_tuple_array(gm_state, i);
+ }
+ initialize = false;
+
+ for (i = 0; i < nreaders; i++)
+ if (TupIsNull(gm_state->gm_slots[i]) || gm_state->gm_slots[i]->tts_isempty)
+ goto reread;
..
}

This code can cause infinite loop.  Consider a case where one of the
worker doesn't get any tuple because by the time it starts all the
tuples are consumed by all other workers. The above code will keep on
looping to fetch the tuple from that worker whereas that worker can't
return any tuple.  I think you can fix it by checking if the
particular queue has been exhausted.

>> >
>> > Open Issue:
>> >
>> > - Commit af33039317ddc4a0e38a02e2255c2bf453115fd2 fixed the leak into
>> > tqueue.c by
>> > calling gather_readnext() into per-tuple context. Now for gather merge
>> > that
>> > is
>> > not possible, as we storing the tuple into Tuple array and we want tuple
>> > to
>> > be
>> > free only its get pass through the merge sort algorithm. One idea is, we
>> > can
>> > also call gm_readnext_tuple() under per-tuple context (which will fix
>> > the
>> > leak
>> > into tqueue.c) and then store the copy of tuple into tuple array.
>> >
>>
>> Won't extra copy per tuple impact performance?  Is the fix in
>> mentioned commit was for record or composite types, if so, does
>> GatherMerge support such types and if it support, does it provide any
>> benefit over Gather?
>>
>
> I don't think was specificially for the record or composite types - but I
> might be
> wrong. As per my understanding

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-10-20 Thread Dilip Kumar

On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas  wrote:
> I agree with these conclusions.  I had a chance to talk with Andres
> this morning at Postgres Vision and based on that conversation I'd
> like to suggest a couple of additional tests:
>
> 1. Repeat this test on x86.  In particular, I think you should test on
> the EnterpriseDB server cthulhu, which is an 8-socket x86 server.

I have done my test on cthulhu, basic difference is that In POWER we
saw ClogControlLock on top at 96 and more client with 300 scale
factor. But, on cthulhu at 300 scale factor transactionid lock is
always on top. So I repeated my test with 1000 scale factor as well on
cthulhu.

All configuration is same as my last test.

Test with 1000 scale factor
-

Test1: number of clients: 192

Head:
tps = 21206.108856 (including connections establishing)
tps = 21206.245441 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt
 310489  LWLockNamed | CLogControlLock
 296152  |
  35537  Lock| transactionid
  15821  LWLockTranche   | buffer_mapping
  10342  LWLockTranche   | buffer_content
   8427  LWLockTranche   | clog
   3961
   3165  Lock| extend
   2861  Lock| tuple
   2781  LWLockNamed | ProcArrayLock
   1104  LWLockNamed | XidGenLock
745  LWLockTranche   | lock_manager
371  LWLockNamed | CheckpointerCommLock
 70  LWLockTranche   | wal_insert
  5  BufferPin   | BufferPin
  3  LWLockTranche   | proc

Patch:
tps = 28725.038933 (including connections establishing)
tps = 28725.367102 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt
 540061  |
  57810  LWLockNamed | CLogControlLock
  36264  LWLockTranche   | buffer_mapping
  29976  Lock| transactionid
   4770  Lock| extend
   4735  LWLockTranche   | clog
   4479  LWLockNamed | ProcArrayLock
   4006
   3955  LWLockTranche   | buffer_content
   2505  LWLockTranche   | lock_manager
   2179  Lock| tuple
   1977  LWLockNamed | XidGenLock
905  LWLockNamed | CheckpointerCommLock
222  LWLockTranche   | wal_insert
  8  LWLockTranche   | proc

Test2: number of clients: 96

Head:
tps = 25447.861572 (including connections establishing)
tps = 25448.012739 (excluding connections establishing)
 261611  |
  69604  LWLockNamed | CLogControlLock
   6119  Lock| transactionid
   4008
   2874  LWLockTranche   | buffer_mapping
   2578  LWLockTranche   | buffer_content
   2355  LWLockNamed | ProcArrayLock
   1245  Lock| extend
   1168  LWLockTranche   | clog
232  Lock| tuple
217  LWLockNamed | CheckpointerCommLock
160  LWLockNamed | XidGenLock
158  LWLockTranche   | lock_manager
 78  LWLockTranche   | wal_insert
  5  BufferPin   | BufferPin

Patch:
tps = 32708.368938 (including connections establishing)
tps = 32708.765989 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_96_ul.txt
 326601  |
   7471  LWLockNamed | CLogControlLock
   5387  Lock| transactionid
   4018
   3331  LWLockTranche   | buffer_mapping
   3144  LWLockNamed | ProcArrayLock
   1372  Lock| extend
722  LWLockTranche   | buffer_content
393  LWLockNamed | XidGenLock
237  LWLockTranche   | lock_manager
234  Lock| tuple
194  LWLockTranche   | clog
 96  Lock| relation
 88  LWLockTranche   | wal_insert
 34  LWLockNamed | CheckpointerCommLock

Test3: number of clients: 64

Head:

tps = 28264.194438 (including connections establishing)
tps = 28264.336270 (excluding connections establishing)

 218264  |
  10314  LWLockNamed | CLogControlLock
   4019
   2067  Lock| transactionid
   1950  LWLockTranche   | buffer_mapping
   1879  LWLockNamed | ProcArrayLock
592  Lock| extend
565  LWLockTranche   | buffer_content
222  LWLockNamed | XidGenLock
143  LWLockTranche   | clog
131  LWLockNamed | CheckpointerCommLock
 63  LWLockTranche   | lock_manager
 52  Lock| tuple
 35  LWLockTranche   | wal_insert

Patch:
tps = 27906.376194 (including connections establishing)
tps = 27906.531392 (excluding connections establishing)
[dilip.kumar@cthulhu bin]$ cat 1000_64_ul.txt
 228108  |
   4039
   2294  Lock| transactionid
   2116  LWLockTranche   | buffer_mapping
   1757  LWLockNamed | ProcArrayLock
   1553  LWLockNamed | CLogControlLock
800  Lock| extend
403  LWLockTranche   | buffer_content
 92  LWLockNamed | XidGenLock
 74  LWLockTranche   | lock_manager
 42  Lock| tuple
 35  LWLockTranche   | wal_insert
 34  LWLockTranche   | clog
 14  LWLockNamed |

Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-10-20 Thread Ashutosh Bapat

>
>
>> In fact, I am not in
>> favour of tracking the query dependencies for UPDATE/DELETE since we
>> don't have any concrete example as to when that would be needed.
>
>
> Right, but as I said before, some FDW might consult FDW options stored in
> those objects during AddForeignUpdateTargets, so we should do that.
>
>>> Besides
>>> that, I modified add_rte_to_flat_rtable so that the plan's dependencies
>>> are
>>> tracked, but that would lead to tracking the dependencies of unreferenced
>>> foreign tables in dead subqueries or the dependencies of foreign tables
>>> excluded from the plan by eg, constraint exclusion.  But I thought that
>>> would be also OK by the same reason as above.  (Another reason for that
>>> was
>>> it seemed better to me to collect the dependencies in the same place as
>>> for
>>> relation OIDs.)
>
>
>> If those unreferenced relations become references because of the
>> changes in options, we will require those query dependencies to be
>> recorded. So, if we are recording query dependencies, we should record
>> the ones on unreferenced relations as well.
>
>
> I mean plan dependencies here, not query dependencies.

A query dependency also implies plan dependency since changing query
implies plan changes. So, if query depends upon something, so does the
plan.

>
> Having said that, I like the latest version (v6), so I'd vote for marking
> this as Ready For Committer.
>

Marked that way.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FSM corruption leading to errors

2016-10-20 Thread Pavan Deolasee

On Thu, Oct 20, 2016 at 11:34 AM, Michael Paquier  wrote:

> On Thu, Oct 20, 2016 at 2:50 PM, Pavan Deolasee
>  wrote:
> > Actually, if we could add an API which can truncate FSM to the given heap
> > block, then the user may not even need to run VACUUM, which could be
> costly
> > for very large tables.
>
> FreeSpaceMapTruncateRel()?
>

Right. We need a SQL callable function to invoke that.

>
> > Also, AFAICS we will need to backport
> > pg_truncate_visibility_map() to older releases because unless the VM is
> > truncated along with the FSM, VACUUM may not scan all pages and the FSM
> for
> > those pages won't be recorded.
>
> This would need a careful lookup, and it would not be that complicated
> to implement. And this bug justifies the presence of a tool like that
> actually... But usually those don't get a backpatch, so the
> probability of getting that backported is low IMO, personally I am not
> sure I like that either.
>

Just to clarify, I meant if we truncate the entire FSM then we'll need API
to truncate VM as well so that VACUUM rebuilds everything completely. OTOH
if we provide a function just to truncate FSM to match the size of the
table, then we don't need to rebuild the FSM. So that's probably a better
way to handle FSM corruption, as far as this particular issue is concerned.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] FSM corruption leading to errors

2016-10-20 Thread Michael Paquier

On Thu, Oct 20, 2016 at 2:50 PM, Pavan Deolasee
 wrote:
> Actually, if we could add an API which can truncate FSM to the given heap
> block, then the user may not even need to run VACUUM, which could be costly
> for very large tables.

FreeSpaceMapTruncateRel()?

> Also, AFAICS we will need to backport
> pg_truncate_visibility_map() to older releases because unless the VM is
> truncated along with the FSM, VACUUM may not scan all pages and the FSM for
> those pages won't be recorded.

This would need a careful lookup, and it would not be that complicated
to implement. And this bug justifies the presence of a tool like that
actually... But usually those don't get a backpatch, so the
probability of getting that backported is low IMO, personally I am not
sure I like that either.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

99 matches

Mail list logo