Hello,

Michael, Tom: thanks for all the insights and informations in your previous
mails.

A quick update of the explain outputs (this time using explain (analyze,
buffers, verbose))

*The good: *

*LOG Time: | 2022-02-28 09:30:01.400777+01 | order rows: |  9668*



 Limit  (cost=616.37..653.30 rows=10 width=22) (actual time=1.062..1.063
rows=1 loops=1)

   Output: version, content

   Buffers: shared hit=154

   ->  Bitmap Heap Scan on orderstore."order"  (cost=616.37..793.63 rows=48
width=22) (actual time=1.061..1.062 rows=1 loops=1)

         Output: version, content

         Recheck Cond: (jsonb_to_tsvector('english'::regconfig,
"order".content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)

         Heap Blocks: exact=1

         Buffers: shared hit=154

         ->  Bitmap Index Scan on idx_fulltext_content  (cost=0.00..616.36
rows=48 width=0) (actual time=1.053..1.053 rows=1 loops=1)

               Index Cond: (jsonb_to_tsvector('english'::regconfig,
"order".content, '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)

               Buffers: shared hit=153

Planning:

   Buffers: shared hit=50

Planning Time: 0.408 ms
*Execution Time: 1.079 ms*

*pg_stat_all_tables: *


n_tup_ins           | 102

*n_tup_upd           | 1554*

n_tup_del           | 0

n_tup_hot_upd       | 0

n_live_tup          | 9668

*n_dead_tup          | 69*

n_mod_since_analyze | 61

n_ins_since_vacuum  | 8

last_vacuum         | 2022-02-25 07:54:46.196508+01

last_autovacuum     |

last_analyze        | 2022-02-28 03:20:38.761482+01

last_autoanalyze    |


*The bad: *


*LOG Time: | 2022-02-28 09:45:01.662702+01 | order rows: |  9668*



LIMIT 10:



Limit  (cost=0.00..805.63 rows=10 width=22) (actual
time=24175.964..25829.767 rows=1 loops=1)

   Output: version, content

   Buffers: shared hit=26284 read=12550 dirtied=4

   ->  Seq Scan on orderstore."order"  (cost=0.00..3867.01 rows=48
width=22) (actual time=24175.962..25829.763 rows=1 loops=1)

         Output: version, content

         Filter: (jsonb_to_tsvector('english'::regconfig, "order".content,
'["all"]'::jsonb) @@ '''1.20709841'''::tsquery)

         Rows Removed by Filter: 9667

         Buffers: shared hit=26284 read=12550 dirtied=4

Planning:

   Buffers: shared hit=50

Planning Time: 0.377 ms

*Execution Time: 25829.778 ms*

*pg_stat_all_tables:*

n_tup_ins           | 102

*n_tup_upd           | 1585*

n_tup_del           | 0

n_tup_hot_upd       | 0

n_live_tup          | 9668

*n_dead_tup          | 100*

n_mod_since_analyze | 92

n_ins_since_vacuum  | 8

last_vacuum         | 2022-02-25 07:54:46.196508+01

last_autovacuum     |

last_analyze        | 2022-02-28 03:20:38.761482+01

last_autoanalyze    |


*The ugly:*


It should be mentioned that the table in question mainly lives in toast
land (but I have no idea if this also influences the query planner):


    oid   | table_schema | table_name | row_estimate | total_bytes |
index_bytes | toast_bytes | table_bytes | total  | index  | toast | table
--------+--------------+------------+--------------+-------------+-------------+-------------+-------------+--------+--------+-------+-------
 155544 | orderstore   | order      |         9570 |   229826560 |
120184832 |    98557952 |    11083776 | 219 MB | 115 MB | 94 MB | 11 MB


Since tinkering with the text search functions is out of the question we
came up with three possibilities on how to deal with this issue:

- significantly increase the limit clause or omit it at all (meh ...)
- use 'set random_page_cost = 0.5'  in the transaction in order to convince
the query planner to prefer the index (tested and works)
- schedule an hourly vacuum job for  the table (the most likely solution we
will settle on since it comes with the least implementation effort)

None of these seems very elegant or viable in the long run ... we'll see.

Ah, yes: our global settings for random_page_cost and autovacuum/analyze
are set to the defaults.

 Will json-processing experience some improvements in pg14/15? We are about
to update to 14 in the near future with our devs saying that this topic is
the main trigger to do so.

Any further thoughts on the case are very much appreciated.

kr p.


Am Do., 24. Feb. 2022 um 17:10 Uhr schrieb Tom Lane <t...@sss.pgh.pa.us>:

> Peter Adlersburg <peter.adlersb...@gmail.com> writes:
> >  Limit  (cost=0.00..804.97 rows=10 width=22) (actual
> > time=23970.845..25588.432 rows=1 loops=1)
> >    ->  Seq Scan on "order"  (cost=0.00..3863.86 rows=48 width=22) (actual
> > time=23970.843..25588.429 rows=1 loops=1)
> >          Filter: (jsonb_to_tsvector('english'::regconfig, content,
> > '["all"]'::jsonb) @@ '''1.20709841'''::tsquery)
> >          Rows Removed by Filter: 9652
> >  Planning Time: 0.430 ms
> >  Execution Time: 25588.448 ms
>
> I think the expense here comes from re-executing jsonb_to_tsvector
> a lot of times.  By default that's estimated as 100 times more expensive
> than a simple function (such as addition), but these results make it
> seem like that's an understatement.  You might try something like
>
> alter function jsonb_to_tsvector(regconfig, jsonb, jsonb) cost 1000;
>
> to further discourage the planner from picking this plan shape.
>
> Possibly the cost estimate for ts_match_tq (the function underlying
> this variant of @@) needs to be bumped up as well.
>
> (Bear in mind that pg_dump will not propagate such hacks on
> system-defined objects, so you'll need a note to reapply
> any such changes after dump/reload or pg_upgrade.)
>
>                         regards, tom lane
>

Reply via email to