Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-04 Thread Brent Dearth
Tom, Andres -

Is there an issue tracker I could be looking at to follow along on the
progress on this issue?

Thanks so much!

On Mon, Oct 2, 2017 at 9:06 PM, Tom Lane  wrote:

> Andres Freund  writes:
> > On 2017-10-02 19:50:51 -0400, Tom Lane wrote:
> >> What I saw was that the backend process was consuming 100% of (one) CPU,
> >> while the I/O transaction rate viewed by "iostat 1" started pretty low
> >> --- under 10% of what the machine is capable of --- and dropped from
> >> there as the copy proceeded.  I did not think to check if that was user
> >> or kernel-space CPU, but I imagine it has to be the latter.
>
> > So that's pretty clearly a kernel bug... Hm. I wonder if it's mmap() or
> > msync() that's the problem here. I guess you didn't run a profile?
>
> Interestingly, profiling with Activity Monitor seems to blame the problem
> entirely on munmap() ... which squares with the place I hit every time
> when randomly stopping the process with gdb^Hlldb, so I'm inclined to
> believe it.
>
> This still offers no insight as to why CREATE DATABASE is hitting the
> problem while regular flush activity doesn't.
>
> > One interesting thing here is that in the CREATE DATABASE case there'll
> > probably be a lot larger contiguous mappings than in *_flush_after
> > cases. So it might be related to the size of the mapping / flush "unit".
>
> Meh, the mapping is only 64K in this case vs. 8K in the other.  Hard
> to credit that it breaks that easily.
>
> regards, tom lane
>


Re: [HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Brent Dearth
Thanks for this breakdown Tom!

FWIW - I'm on Postgres 9.6.5 as bundled with Postgres.app (2.0.5) running
on 2013 MBP (2.7GHz i7 / 16GB / SSD) setup. It looks like this might be a
priority for an upcoming release, so I might try to hold out for
downstream, but thanks for the patch. It will help if we need get custom
builds out to fellow devs if this becomes too unbearable.

On Mon, Oct 2, 2017 at 1:42 PM, Tom Lane  wrote:

> I wrote:
> > In short, therefore, APFS cannot cope with the way we're using msync().
>
> I experimented with this further by seeing whether the msync() code path
> is of any value on Sierra either.  The answer seems to be "no": cloning
> a scale-1000 pgbench database takes about 17-18 seconds on my Sierra
> laptop using unmodified HEAD, but if I dike out the msync() logic then
> it takes 16-17 seconds.  Both numbers jump around a little, but using
> msync is strictly worse.
>
> I propose therefore that an appropriate fix is to unconditionally disable
> the msync code path on Darwin, as we have already done for Windows.  When
> and if Apple changes their kernel so that this path is actually of some
> value, we can figure out how to detect whether to use it.
>
> The msync logic seems to date back to this thread:
>
> https://www.postgresql.org/message-id/flat/alpine.DEB.2.
> 10.150601132.28433%40sto
>
> wherein Andres opined
> >> I think this patch primarily needs:
> >> * Benchmarking on FreeBSD/OSX to see whether we should enable the
> >>   mmap()/msync(MS_ASYNC) method by default. Unless somebody does so, I'm
> >>   inclined to leave it off till then.
>
> but so far as I can tell from the thread, only testing on FreeBSD ever
> got done.  So there's no evidence that this was ever beneficial on macOS,
> and we now have evidence that it's between counterproductive and
> catastrophic depending on which kernel version you look at.
>
> regards, tom lane
>


[HACKERS] Horrible CREATE DATABASE Performance in High Sierra

2017-10-02 Thread Brent Dearth
I just recently "upgraded" to High Sierra and experiencing horrendous CREATE
DATABASE performance. Creating a database from a 3G template DB used to
take ~1m but post-upgrade is taking ~22m at a sustained write of around
4MB/s. Occasionally, attempting to create an empty database hangs
indefinitely as well. When this happens, restarting the Postgres server
allows empty database initialization in ~1s.

I had been running on an encrypted APFS volume (FileVault), but after
dropping encryption, saw the tasks drop to about *~15m* per run. Still a
far cry from the previous *~1m* threshold.

A multi-threaded pg_restore seems to sustain writes of ~38M/s and completes
in about the same time as pre-upgrade (Sierra), so I'm not sure it's
entirely related to APFS / disk IO.

I've completely rebuilt the Postgres data directory, re-installed Postgres
(Postgres.app 2.0.5) etc. I don't have any reasonable explanation for what
could have broken so catastrophically.

Coworker has seen the exact same issue. Has anyone else experienced this
yet or have any insight as to what could be happening?


Thanks in advance!