Re: Performance issues with parallelism and LIMIT

David Geier Mon, 20 Feb 2023 10:18:36 -0800

Hi,

On 2/8/23 11:42, Tomas Vondra wrote:

On 2/1/23 14:41, David Geier wrote:


Yeah, this is a pretty annoying regression. We already can hit poor
behavior when matching rows are not distributed uniformly in the tables
(which is what LIMIT costing assumes), and this makes it more likely to
hit similar issues. A bit like when doing many HTTP requests makes it
more likely to hit at least one 99% outlier.

Are you talking about the use of ordering vs filtering indexes inqueries where there's both an ORDER BY and a filter present (e.g. usingan ordering index but then all rows passing the filter are at the end ofthe table)? If not, can you elaborate a bit more on that and maybe givean example.

No opinion on these options, but worth a try. Alternatively, we could
try the usual doubling approach - start with a low threshold (and set
the latch frequently), and then gradually increase it up to the 1/4.

That should work both for queries expecting only few rows and those
producing a lot of data.

I was thinking about this variant as well. One more alternative would belatching the leader once a worker has produced 1/Nth of the LIMIT whereN is the number of workers. Both variants have the disadvantage thatthere are still corner cases where the latch is set too late; but itwould for sure be much better than what we have today.

I also did some profiling and - at least on my development laptop with 8physical cores - the original example, motivating the batching change isslower than when it's disabled by commenting out:


    if (force_flush || mqh->mqh_send_pending > (mq->mq_ring_size >> 2))

SET parallel_tuple_cost TO 0;
CREATE TABLE b (a int);
INSERT INTO b SELECT generate_series(1, 200000000);
ANALYZE b;
EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM b;

Gather (cost=1000.00..1200284.61 rows=200375424 width=4) (actualrows=200000000 loops=1)

   Workers Planned: 7
   Workers Launched: 7

-> Parallel Seq Scan on b (cost=0.00..1199284.61 rows=28625061width=4) (actual rows=25000000 loops=8)


Always latch: 19055 ms
Batching:     19575 ms

If I find some time, I'll play around a bit more and maybe propose a patch.

...

We would need something similar to CHECK_FOR_INTERRUPTS() which returns
a NULL slot if a parallel worker is supposed to stop execution (we could
e.g. check if the queue got detached). Or could we amend
CHECK_FOR_INTERRUPTS() to just stop the worker gracefully if the queue
got detached?

That sounds reasonable, but I'm not very familiar the leader-worker
communication, so no opinion on how it should be done.

I think an extra macro that needs to be called from dozens of places tocheck if parallel execution is supposed to end is the least preferredapproach. I'll read up more on how CHECK_FOR_INTERRUPTS() works and ifwe cannot actively signal the workers that they should stop.


--
David Geier
(ServiceNow)

Re: Performance issues with parallelism and LIMIT

Reply via email to