Re: [HACKERS] CLUSTER, reform_and_rewrite_tuple(), and parallelism

2016-10-27 Thread Claudio Freire
On Wed, Oct 26, 2016 at 9:33 PM, Peter Geoghegan  wrote:
> Besides, parallel CREATE INDEX alone will probably
> be quite effective at speeding up CLUSTER in practice, simply because
> that's often where most wall clock time is spent during a CLUSTER
> operation.

Creating all indexes in parallel could also help considerably, for the
case when there are multiple indexes, and seems far simpler.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER, reform_and_rewrite_tuple(), and parallelism

2016-10-26 Thread Peter Geoghegan
On Wed, Aug 17, 2016 at 4:12 PM, Peter Geoghegan  wrote:
> During preliminary analysis of what it would take to produce a
> parallel CLUSTER patch that is analogous of what I came up with for
> CREATE INDEX, which in general seems quite possible, I identified
> reform_and_rewrite_tuple() as a major bottleneck for the current
> CLUSTER implementation.

I now think my original analysis here was more or less wrong.

With today's git tip, which has numerous optimizations to merging in
particular [1], CLUSTER seems like an unappealing target for
parallelism. As noted recently [1], it seems like there has been a
significant decrease in the time spent merging by serial external
CREATE INDEX operations; as a proportion of total time, it's now often
as low as 15%. Things are generally moderately CPU bound. The
situation with CLUSTER is quite different; CLUSTER can spend up to 50%
of the time or more merging, because it's distinctly I/O bound,
presumably due to typically processing relatively wide tuples, with a
tuple header, etc. Furthermore, even if it is possible to push some of
the cycles into worker processes, that seems hard.

I think that Heikki's preload tuplesort work will help serial CLUSTER
cases quite a bit, by allowing tuplesort to buffer more tuples in
memory, but the overall picture is the same, which is that merging for
CLUSTER is very I/O bound (this is when we write out the new version
of the heap relation, of course). I'm not excited about the prospect
of using parallelism to speed up scanning and sorting to generate
input runs in the style of my parallel CREATE INDEX patch, because
Amdahl's law is going to bite hard when only 50% of the work can be
parallelized [2]. Besides, parallel CREATE INDEX alone will probably
be quite effective at speeding up CLUSTER in practice, simply because
that's often where most wall clock time is spent during a CLUSTER
operation.

The only thing that generalizing my parallel CREATE INDEX approach to
CLUSTER really has to recommend it is that it isn't that hard to get
working, and will still help somewhat. But, once you factor in the
need for us to alter the CLUSTER cost model (we'll need to make
significant updates to plan_cluster_use_sort()), then even that isn't
true. I don't think I'm going to work on parallel CLUSTER after all.

[1] 
https://www.postgresql.org/message-id/cam3swzssvig8eztxcg1l9bzzw+-uwdj48yxsm41qmnltjeq...@mail.gmail.com
[2] https://en.wikipedia.org/wiki/File:AmdahlsLaw.svg
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER, reform_and_rewrite_tuple(), and parallelism

2016-08-19 Thread Amit Kapila
On Thu, Aug 18, 2016 at 4:42 AM, Peter Geoghegan  wrote:
>
> Does anyone have any ideas on how to:
>
> 1). Directly address the reform_and_rewrite_tuple() bottleneck.
>
> and/or:
>
> 2). Push down some or all of the reform_and_rewrite_tuple() work till
> before tuples are passed to the tuplesort.
>
> "2" would probably make it straightforward to have
> reform_and_rewrite_tuple() work occur in parallel workers instead,
> which buys us a lot.
>

I am not aware what exactly you have in mind to parallelize Cluster
command, but anything that is CPU intensive is probably a good bet to
push down to workers (provided it is safe).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER, reform_and_rewrite_tuple(), and parallelism

2016-08-17 Thread Andres Freund
On 2016-08-17 17:35:32 -0700, Peter Geoghegan wrote:
> On Wed, Aug 17, 2016 at 5:20 PM, Andres Freund  wrote:
> > libc isn't compiled with -fno-omit-frame-pointer (and even if, it uses
> > assembly without setup of the frame pointer), so frame pointer based
> > call graphs are wrong through libc.  The attributions are based on
> > random stuff in the frame pointer at that point.  You either need to use
> > dwarf or lbr to get accurate ones.
> 
> Is it worth doing that here, and redoing the test, so that the glibc
> attributions are correct?

I'd say yes.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER, reform_and_rewrite_tuple(), and parallelism

2016-08-17 Thread Peter Geoghegan
On Wed, Aug 17, 2016 at 5:20 PM, Andres Freund  wrote:
> libc isn't compiled with -fno-omit-frame-pointer (and even if, it uses
> assembly without setup of the frame pointer), so frame pointer based
> call graphs are wrong through libc.  The attributions are based on
> random stuff in the frame pointer at that point.  You either need to use
> dwarf or lbr to get accurate ones.

Is it worth doing that here, and redoing the test, so that the glibc
attributions are correct?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER, reform_and_rewrite_tuple(), and parallelism

2016-08-17 Thread Andres Freund
On 2016-08-17 21:17:56 -0300, Alvaro Herrera wrote:
> Peter Geoghegan wrote:
> 
> > This doesn't seem that interesting, but not sure what you're looking for.
> > 
> > I also attach cycles flamegraph.
> 
> I may be blind, but what are those write() calls attributed to
> heap_form_tuple?

libc isn't compiled with -fno-omit-frame-pointer (and even if, it uses
assembly without setup of the frame pointer), so frame pointer based
call graphs are wrong through libc.  The attributions are based on
random stuff in the frame pointer at that point.  You either need to use
dwarf or lbr to get accurate ones.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER, reform_and_rewrite_tuple(), and parallelism

2016-08-17 Thread Alvaro Herrera
Peter Geoghegan wrote:

> This doesn't seem that interesting, but not sure what you're looking for.
> 
> I also attach cycles flamegraph.

I may be blind, but what are those write() calls attributed to
heap_form_tuple?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER, reform_and_rewrite_tuple(), and parallelism

2016-08-17 Thread Andres Freund
On 2016-08-17 16:58:49 -0700, Peter Geoghegan wrote:
> On Wed, Aug 17, 2016 at 4:28 PM, Andres Freund  wrote:
> > Could you also provide a strace -ttt -T -c and a cpu cycles flamegraph?
> 
> Here is the output from that strace invocation, plus a -p (to attach
> to the relevant backend):
> 
> strace: -t has no effect with -c
> strace: -T has no effect with -c

Ugh, swapped -c and -C, sorry.


> This doesn't seem that interesting, but not sure what you're looking for.

I'd like to know the amount of time spent in various syscalls, and how
much that varies.


> I also attach cycles flamegraph.

Thanks.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER, reform_and_rewrite_tuple(), and parallelism

2016-08-17 Thread Peter Geoghegan
On Wed, Aug 17, 2016 at 4:28 PM, Andres Freund  wrote:
> Could you also provide a strace -ttt -T -c and a cpu cycles flamegraph?

Here is the output from that strace invocation, plus a -p (to attach
to the relevant backend):

strace: -t has no effect with -c
strace: -T has no effect with -c
strace: Process 27986 attached
strace: Process 27986 detached
% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 55.750.629331   179813516 unlink
 17.490.197422   0   2505449   write
 11.690.132000   1100012   fsync
  8.130.091799   0   2078837   read
  5.320.06   12000 5   ftruncate
  0.980.011011  24   460   brk
  0.640.0072181805 4   munmap
  0.000.50   0  6382   lseek
  0.000.00   058 5 open
  0.000.00   058   close
  0.000.00   014   stat
  0.000.00   0 4   mmap
  0.000.00   0 2   rt_sigprocmask
  0.000.00   012 6 rt_sigreturn
  0.000.00   0 1   select
  0.000.00   016   sendto
  0.000.00   0 2 1 recvfrom
  0.000.00   016   kill
  0.000.00   019   semop
  0.000.00   063   getrusage
  0.000.00   0 5   epoll_create
  0.000.00   0 9 4 epoll_wait
  0.000.00   010   epoll_ctl
-- --- --- - - 
100.001.128831   459147332 total

This doesn't seem that interesting, but not sure what you're looking for.

I also attach cycles flamegraph.

trace_sort indicated that the tuplesort CLUSTER takes just under 3
minutes (this includes writing out the new heap, of course).

-- 
Peter Geoghegan


cycles-cluster-presorted-flamegraph.svg.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER, reform_and_rewrite_tuple(), and parallelism

2016-08-17 Thread Andres Freund
On 2016-08-17 16:23:29 -0700, Peter Geoghegan wrote:
> On Wed, Aug 17, 2016 at 4:16 PM, Andres Freund  wrote:
> >> Does anyone have any ideas on how to:
> >>
> >> 1). Directly address the reform_and_rewrite_tuple() bottleneck.
> >
> > What part of is actually the expensive bit? It does a whole lot of
> > things. Forming/Deforming tuples, the hash lookups in
> > rewrite_heap_tuple(), ...?
> 
> Perhaps the attached svg "flamegraph" will give you some idea. This is
> based on the "perf cache-misses" event type.

Could you also provide a strace -ttt -T -c and a cpu cycles flamegraph?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER, reform_and_rewrite_tuple(), and parallelism

2016-08-17 Thread Peter Geoghegan
On Wed, Aug 17, 2016 at 4:16 PM, Andres Freund  wrote:
>> Does anyone have any ideas on how to:
>>
>> 1). Directly address the reform_and_rewrite_tuple() bottleneck.
>
> What part of is actually the expensive bit? It does a whole lot of
> things. Forming/Deforming tuples, the hash lookups in
> rewrite_heap_tuple(), ...?

Perhaps the attached svg "flamegraph" will give you some idea. This is
based on the "perf cache-misses" event type.

-- 
Peter Geoghegan


cache-misses-cluster-presorted-flamegraph.svg.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER, reform_and_rewrite_tuple(), and parallelism

2016-08-17 Thread Andres Freund
> Does anyone have any ideas on how to:
> 
> 1). Directly address the reform_and_rewrite_tuple() bottleneck.

What part of is actually the expensive bit? It does a whole lot of
things. Forming/Deforming tuples, the hash lookups in
rewrite_heap_tuple(), ...?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CLUSTER, reform_and_rewrite_tuple(), and parallelism

2016-08-17 Thread Peter Geoghegan
During preliminary analysis of what it would take to produce a
parallel CLUSTER patch that is analogous of what I came up with for
CREATE INDEX, which in general seems quite possible, I identified
reform_and_rewrite_tuple() as a major bottleneck for the current
CLUSTER implementation.

Excluding the cost of the subsequent REINDEX of the clustered-on
index, reform_and_rewrite_tuple() appears to account for roughly 25% -
35% of both the cache misses, and instructions executed, for my test
case (this used a tuplesort, not an indexscan on the old heap
relation, of course). Merging itself was far less expensive (with my
optimization of how the heap is maintained during merging + 16
tapes/runs), so it would be reasonable to not parallelize that part,
just as it was for parallel CREATE INDEX. I don't think that it's
reasonable to not do anything about this reform_and_rewrite_tuple()
bottleneck, though.

Does anyone have any ideas on how to:

1). Directly address the reform_and_rewrite_tuple() bottleneck.

and/or:

2). Push down some or all of the reform_and_rewrite_tuple() work till
before tuples are passed to the tuplesort.

"2" would probably make it straightforward to have
reform_and_rewrite_tuple() work occur in parallel workers instead,
which buys us a lot.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers