date:20180924

Re: Proposal for Signal Detection Refactoring

2018-09-24 Thread Chris Travers

On Tue, Sep 25, 2018 at 3:03 AM Tom Lane  wrote:

> Michael Paquier  writes:
> > And then within separate signal handlers things like:
> > void
> > StatementCancelHandler(SIGNAL_ARGS)
> > {
> > [...]
> > signalPendingFlags |= PENDING_INTERRUPT | PENDING_CANCEL_QUERY;
> > [...]
> > }
>
> AFAICS this still wouldn't work.  The machine code is still going to
> look (on many machines) like "load from signalPendingFlags,
> OR in some bits, store to signalPendingFlags".  So there's still a
> window for another signal handler to interrupt that and store some
> bits that will get lost.
>
> You could only fix that by blocking all signal handling during the
> handler, which would be expensive and rather pointless.
>
> I do not think that it's readily possible to improve on the current
> situation with one sig_atomic_t per flag.
>

After a fair bit of reading I think there are ways of doing this in C11 but
I don't think those are portable to C99.

In C99 (and, in practice C89, as the C99 committee noted there were no
known C89 implementations where reading was unsafe), reading or writing a
static sig_atomic_t inside a signal handler is safe, but a round-trip is
*not* guaranteed not to clobber.  While I could be wrong, I think it is
only in C11 that you have any round-trip operations which are guaranteed
not to clobber in the language itself.

Basically we are a long way out to be able to consider these a single value
as flags.

However,  what I think one could do is use a struct of volatile
sig_atomic_t members and macros for checking/setting.  Simply writing a
value is safe in C89 and higher.

> regards, tom lane
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Re: Strange failure in LWLock on skink in REL9_5_STABLE

2018-09-24 Thread Thomas Munro

On Sat, Sep 22, 2018 at 4:52 PM Amit Kapila  wrote:
> On Sat, Sep 22, 2018 at 2:28 AM Andres Freund  wrote:
> > On 2018-09-22 08:54:57 +1200, Thomas Munro wrote:
> > > On Fri, Sep 21, 2018 at 4:43 PM Tom Lane  wrote:
> > > > Unless it looks practical to support this behavior in the Windows
> > > > and SysV cases, I think we should get rid of it rather than expend
> > > > effort on supporting it for just some platforms.
> > >
> > > We can remove it in back-branches without breaking API compatibility:
> > >
> > > 1.  Change dsm_impl_can_resize() to return false unconditionally (I
> > > suppose client code is supposed to check this before using
> > > dsm_resize(), though I'm not sure why it has an "impl" in its name if
> > > it's part of the public interface of this module).
> > > 2.  Change dsm_resize() and dsm_remap() to raise an error conditionally.
> > > 3.  Rip out the DSM_OP_RESIZE cases from various places.
> > >
> > > Then in master, remove all of those functions completely.  However,
> > > I'd feel like a bit of a vandal.  Robert and Amit probably had plans
> > > for that code...?
> >
> > Robert, Amit: ^
>
> I went through and check the original proposal [1] to see if any use
> case is mentioned there, but nothing related has been discussed.   I
> couldn't think of much use of this facility except maybe for something
> like parallelizing correalated sub-queries where the size of outer var
> can change across executions and we might need to resize the initially
> allocated memory.  This is just a wild thought, I don't have any
> concrete idea about this.   Having said that, I don't object to
> removing this especially because the implementation doesn't seem to be
> complete.  In future, if someone needs such a facility, they can first
> develop a complete version of this API.

Thanks for looking into that.  Here's a pair of draft patches to
disable and then remove dsm_resize() and dsm_map().

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Desupport-dsm_resize-and-dsm_remap.patch
Description: Binary data


0002-Remove-dsm_resize-and-dsm_remap.patch
Description: Binary data

Re: GetSnapshotData round two(for me)

2018-09-24 Thread Dilip Kumar

On Tue, Sep 25, 2018 at 11:00 AM, Daniel Wood  wrote:
> I was about to suggest creating a single shared snapshot instead of having
> multiple backends compute what is essentially the same snapshot.  Luckily,
> before posting, I discovered Avoiding repeated snapshot computation from
> Pavan and POC: Cache data in GetSnapshotData() from Andres.
>
I think Mithun has also worked on this[1] and posted some analysis
about which cases he has seen improvement and also the cases where it
did not perform well, you might want to have a look.

[1] 
https://www.postgresql.org/message-id/CAD__OujRZEjE5y3vfmmZmSSr3oYGZSHRxwDwF7kyhBHB2BpW_g%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

GetSnapshotData round two(for me)

2018-09-24 Thread Daniel Wood

I was about to suggest creating a single shared snapshot instead of having
multiple backends compute what is essentially the same snapshot. Luckily,
before posting, I discovered Avoiding repeated snapshot computation
https://www.postgresql.org/message-id/caboikdmsj4osxta7xbv2quhkyuo_4105fjf4n+uyroybazs...@mail.gmail.com
from Pavan and POC: Cache data in GetSnapshotData()
https://www.postgresql.org/message-id/20150202152706.gd9...@alap3.anarazel.de
from Andres.

Andres, could I get a short summary of the biggest drawback that may have
prevented this from being released? Before I saw this I had did my own
implementation and saw some promising results(25% on 48 cores). I do need to
do some mixed RO and RW workloads to see how the invalidations of the shared
copy, at EOT time, affect the results. There are some differences in my
implementation. I choose, perhaps incorrectly?, to busy spin other users
trying to get a snapshot while the first guy in builds the shared copy. My
thinking is to not increase latency of using the snapshot. The improvement of
the idea doesn't come from getting off the CPU, by using a WAIT, but in not
reading PGXACT cache lines on all the cpus acquiring the snapshot that are
constantly being dirtied. One backend can do the heavy lifting and the others
can immediately jump on the shared copy once created.

And something else quite weird: As I was evolving a standard setup for
benchmark runs and getting baselines I was getting horrible numbers
sometimes(680K) and other times I'd get over 1 million QPS. I was thinking I
had a bad machine. What I found was that even though I was running a fixed 192
clients I had set max_connections to 600 sometimes and 1000 on other runs.
Here is what I see running select-only scale 1000 pgbench with 192 clients on a
48 core box(2 sockets) using different values for max_connections:

200 tps = 1092043
250 tps = 1149490
300 tps = 732080
350 tps = 719611
400 tps = 681170
450 tps = 687527
500 tps = 859978
550 tps = 927161
600 tps = 1092283
650 tps = 1154916
700 tps = 1237271
750 tps = 1195968
800 tps = 1162221
850 tps = 1140626
900 tps = 749519
950 tps = 648398
1000 tps = 653460

This is on the base 12x codeline. The only thought I've had so far is that
each PGXACT in use(192) is being scattered across the full set of
max_connections, instead of being physically contiguous in the first 192 slots.
This would cause more cache lines to be scanned. It doesn't make a lot of
sense given that it goes up back again from 500 peaking at 700. Also this is
after a fresh restart so the proc's in the freelist shouldn't have been
scrambled yet in terms of ordering.

NOTE: I believe you'll only see this huge difference on a dual socket machine.
It'd probably only take 30 minutes or so on a big machine to confirm with a
couple of few minute runs at different values for max_connections. I'll be
debugging this soon. But I've been postponing it while experimenting with my
shared snapshot code.

76 matches

Mail list logo