Re: speeding up planning with partitions

Amit Langote Sun, 31 Mar 2019 21:56:47 -0700

(I've closed the CF entry: https://commitfest.postgresql.org/22/1778/)


On 2019/04/01 2:04, Tom Lane wrote:
> Amit Langote <amitlangot...@gmail.com> writes:
>> On Sun, Mar 31, 2019 at 11:45 AM Imai Yoshikazu <yoshikazu_i...@live.jp> 
>> wrote:
>>> Certainly, using bitmapset contributes to the performance when scanning
>>> one partition(few partitions) from large partitions.
> 
>> Thanks Imai-san for testing.
> 
> I tried to replicate these numbers with the code as-committed, and
> could not.

Thanks for that.

> What I get, using the same table-creation code as you
> posted and a pgbench script file like
> 
> \set param random(1, :N)
> select * from rt where a = :param;
> 
> is scaling like this:
> 
> N     tps, range      tps, hash
> 
> 2     10520.519932    10415.230400
> 8     10443.361457    10480.987665
> 32    10341.196768    10462.551167
> 128   10370.953849    10383.885128
> 512   10207.578413    10214.049394
> 1024  10042.794340    10121.683993
> 4096  8937.561825     9214.993778
> 8192  8247.614040     8486.728918
> 
> If I use "-M prepared" the numbers go up a bit for lower N, but
> drop at high N:
> 
> N     tps, range      tps, hash
> 
> 2     11449.920527    11462.253871
> 8     11530.513146    11470.812476
> 32    11372.412999    11450.213753
> 128   11289.351596    11322.698856
> 512   11095.428451    11200.683771
> 1024  10757.646108    10805.052480
> 4096  8689.165875     8930.690887
> 8192  7301.609147     7502.806455
> 
> Digging into that, it seems like the degradation with -M prepared is
> mostly in LockReleaseAll's hash_seq_search over the locallock hash table.
> What I think must be happening is that with -M prepared, at some point the
> plancache decides to try a generic plan, which causes opening/locking all
> the partitions, resulting in permanent bloat in the locallock hash table.
> We immediately go back to using custom plans, but hash_seq_search has
> more buckets to look through for the remainder of the process' lifetime.

Ah, we did find this to be a problem upthread [1] and Tsunakawa-san then
even posted a patch which is being discussed at:

https://commitfest.postgresql.org/22/1993/

> I do see some cycles getting spent in apply_scanjoin_target_to_paths
> that look to be due to scanning over the long part_rels array,
> which your proposal would ameliorate.  But (a) that's pretty small
> compared to other effects, and (b) IMO, apply_scanjoin_target_to_paths
> is a remarkable display of brute force inefficiency to begin with.
> I think we should see if we can't nuke that function altogether in
> favor of generating the paths with the right target the first time.

That's an option if we can make it work.

Shouldn't we look at *all* of the places that have code that now look like
this:

  for (i = 0; i < rel->nparts; i++)
  {
      RelOptInfo *partrel = rel->part_rels[i];

      if (partrel == NULL)
          continue;
      ...
  }

Beside apply_scanjoin_target_to_paths(), there are:

create_partitionwise_grouping_paths()
make_partitionedrel_pruneinfo()


> BTW, the real elephant in the room is the O(N^2) cost of creating
> these tables in the first place.  The runtime for the table-creation
> scripts looks like
> 
> N     range           hash
> 
> 2     0m0.011s        0m0.011s
> 8     0m0.015s        0m0.014s
> 32    0m0.032s        0m0.030s
> 128   0m0.132s        0m0.099s
> 512   0m0.969s        0m0.524s
> 1024  0m3.306s        0m1.442s
> 4096  0m46.058s       0m15.522s
> 8192  3m11.995s       0m58.720s
> 
> This seems to be down to the expense of doing RelationBuildPartitionDesc
> to rebuild the parent's relcache entry for each child CREATE TABLE.
> Not sure we can avoid that, but maybe we should consider adopting a
> cheaper-to-read representation of partition descriptors.  The fact that
> range-style entries seem to be 3X more expensive to load than hash-style
> entries is strange.

I've noticed this many times too, but never prioritized doing something
about it.  I'll try sometime.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAKJS1f-dn1hDZqObwdMrYdV7-cELJwWCPRWet6EQX_WaV8JLgw%40mail.gmail.com

Re: speeding up planning with partitions

Reply via email to