Re: Asymmetric partition-wise JOIN

Andrei Lepikhov Sat, 14 Oct 2023 22:41:08 -0700

On 15/10/2023 07:18, Alexander Korotkov wrote:

Hi Alexander,
Hi Andrey,


Thank you for your work on this subject.

On Mon, Jan 17, 2022 at 1:42 PM Alexander Pyhalov
<a.pyha...@postgrespro.ru> wrote:

The patch does not longer apply cleanly, so I rebased it. Attaching
rebased version.


Not surprising that the patch doesn't apply after 1.5 years since the
last message.  Could you please rebase it?

I read the thread and the patch.  The patch improves the joining of
partitioned tables with non-partitioned relations.  Let's denote
non-partitioned relation as A, partitions as P1 ... PN.  The patch
allows to Append(Join(A, P1), ... Join(A, PN) instead of Join(A,
Append(P1, ... PN).  That could be cheaper because it's generally
cheaper to join small pieces rather than do one big join.  The
drawback is the need to scan A multiple times.  But is this really
necessary and acceptable?  Let's consider multiple options.

1) A is non-table.  For instance, A is a function scan.  In this case,
doing multiple scans of A is not just expensive, but could lead to
unexpected side effects.  When the user includes a function once in
the FROM clause, she expects this function to be evaluated once.  I
propose that we should materialize a scan of non-table relations.  So,
materialized representation will be scanned multiple times, but the
source only scanned once.  That would be similar to CTE.
2) A is the table to be scanned with the parametrized path in the
inner part of the nested loop join.  In this case, there is no big
scan of A and nothing to materialize.
3) A is the table to be used in merge join or outer part of nested
loop join.  In this case, it would be nice to consider materialize.
It's not always good to materialize, because materialization has its
additional costs.  I think that could be a cost-based decision.
4) A is used in the hash join.  Could we re-use the hashed
representation of A between multiple joins?  I read upthread it was
proposed to share a hashed table between multiple background workers
via shared memory.  But the first step would be to just share it
between multiple join nodes within the same process.

As we consider joining with each partition individually, there could
be chosen different join methods.  As I get, the current patch
considers joining with each of the partitions as a separate isolated
optimization task.  However, if we share resources between the
multiple joins, then rises a need for some global optimization.  For
instance, a join type could be expensive when applied to an individual
partition, but cheap when applied to all the partitions thanks to
saving the common work.

My idea is to consider generated common resources (such as
materialized scans) as a property of the path.  For instance, if the
nested loop join is cheaper than the hash join, but the hash join
generates a common hash map of table A, we don't drop hash join
immediately from the consideration and leave it to see how it could
help join other partitions.  What do you think?


Thanks for such detailed feedback!

The rationale for this patch was to give the optimizer additional waysto push down more joins into foreign servers. And, because ofasynchronous append, the benefit of that optimization was obvious.Unfortunately, we hadn't found other applications for this feature,which was why this patch was postponed in the core.You have brought new ideas about applying this idea locally. Moreover,the main issue of the patch was massive memory consumption in the caseof many joins and partitions - because of reparameterization. But now,postponing the reparameterization proposed in the thread [1] resolvesthat problem and gives some insights into the reparameterizationtechnique of some fields, like lateral references.

Hence, I think we can restart this work.

The first thing here (after rebase, of course) is to figure out andimplement in the cost model cases of effectiveness when asymmetric joinwould give significant performance.


[1] Oversight in reparameterize_path_by_child leading to executor crash
https://www.postgresql.org/message-id/flat/CAMbWs496%2BN%3DUAjOc%3DrcD3P7B6oJe4rZw08e_TZRUsWbPxZW3Tw%40mail.gmail.com

--
regards,
Andrey Lepikhov
Postgres Professional

Re: Asymmetric partition-wise JOIN

Reply via email to