Re: BRIN index worse than sequential scan for large search set

2023-02-24 Thread Mickael van der Beek
Hello Justin,

Thanks for the quick response!

The table may be dense, but the tuples aren't.  You're asking to return
> 1/1000th of the tuples, across the entire table.  Suppose there are ~100
> tuples per page, and you need to read about every 10th page.  It makes
> sense that it's slow to read a large amount of data nonsequentially.


Ah, of course, you're right!
I forgot that the BRIN indexes store ranges that are not fully covered by
the row values and that PostgreSQL has to double-check (bitmap heap scan)
...
Would you thus advise to only use BRIN indexes for columns who's values are
(1) monotonically increasing but also (2) close to each other?

I guess that in my use case, something like a roaring bitmap would have
been perfect but I do not believe that those exist in PostgreSQL by default.
Btrees work well performance wise but are simply too large (> 400MB for 20M
rows) even when the index fill factor is 100% (+/- 380 MB) for my use case
as I need to index around 6B rows partitioned in roughly 3K buckets which
would result in ~120GB of  Btree indexes which seems a bit much for simple
existence checks.

Because it's necessary to check if the tuple is visible to the current
> transaction.  It might be from an uncommited/aborted transaction.


Interesting, that's something I did not consider indeed.
I'm guessing that this is a cost brought by MVCC that you can't get around
no matter the isolation level?

Mickael


On Fri, Feb 24, 2023 at 6:19 PM Justin Pryzby  wrote:

> On Fri, Feb 24, 2023 at 05:40:55PM +0100, Mickael van der Beek wrote:
> > Hello everyone,
> >
> > I'm playing around with BRIN indexes so as to get a feel for the feature.
> > During my tests, I was unable to make BRIN indexes perform better than a
> > sequential scan for queries searching for large value sets (20K values in
> > the example down below).
>
> > And now let's query 20K random rows from our 20M total rows:
>
> I didn't try your test, but I think *random* is the problem/explanation.
>
> > By default, this query will not use the BRIN index and simply run a 1.5s
> > long sequential scan (hitting 700 MB) and a 2.47s hash join for a total
> > 8.7s query time:
> > https://explain.dalibo.com/plan/46c3191g8a6c1bc7
>
> > If we force the use of the BRIN index using (`SET LOCAL enable_seqscan =
> > OFF;`) the same query will now take 50s with 2.5s spent on the bitmap
> index
> > scan (hitting 470 MB of data) and a whopping 42s on the bitmap heap scan
> > (hitting 20 GB of data!):
> > https://explain.dalibo.com/plan/7f73bg9172a8b226
>
> That means the planner's cost model correctly preferred a seq scan.
>
> > So I had the following two questions:
> >
> >1. Why is the BRIN index systematically worse than a sequential scan,
> >even when the table is x1000 larger than the search set, physically
> >pre-sorted, dense (fillfactor at 100%) and the search rows are
> themselves
> >sorted?
>
> The table may be dense, but the tuples aren't.  You're asking to return
> 1/1000th of the tuples, across the entire table.  Suppose there are ~100
> tuples per page, and you need to read about every 10th page.  It makes
> sense that it's slow to read a large amount of data nonsequentially.
> That's why random_page_cost is several times higher than seq_page_cost.
>
> I would expect brin to win if the pages to be accessed were dense rather
> than distributed across the whole table.
>
> >2. Since we only select the "idx" column, why does the BRIN index not
> >simply return the searched value if included in one of it's ranges?
> >Hitting the actual row data stored in the table seems to be unnessary
> no?
>
> Because it's necessary to check if the tuple is visible to the current
> transaction.  It might be from an uncommited/aborted transaction.
>
> --
> Justin
>


-- 
Mickael van der BeekWeb developer & Security analyst

mickael.van.der.b...@gmail.com


BRIN index worse than sequential scan for large search set

2023-02-24 Thread Mickael van der Beek
Hello everyone,

I'm playing around with BRIN indexes so as to get a feel for the feature.
During my tests, I was unable to make BRIN indexes perform better than a
sequential scan for queries searching for large value sets (20K values in
the example down below).

Creating the table with one single BIGINT required column:

> CREATE TABLE
>   test_brin
>   (
> idx BIGINT NOT NULL
>   )
> WITH
>   (
> fillfactor = 100
>   )
> ;


Filling the table with 20 million sorted random BIGINT values:

> INSERT INTO
>   test_brin
>   (
> idx
>   )
> SELECT
>   CAST(FLOOR(RANDOM() * 9223372036854775806) AS BIGINT)
> AS idx
> FROM
>   GENERATE_SERIES(1, 20 * 1000 * 1000, 1)
> AS g
> ORDER BY
>   idx ASC
> ;


Now we cluster the table (even though this shouldn't be needed):

> CREATE UNIQUE INDEX test_brin_idx_uniq ON test_brin (idx);
> CLUSTER test_brin USING test_brin_idx_uniq;
> DROP INDEX test_brin_idx_uniq;



Now we create the BRIN index on what should be a perfectly ordered and
dense table:

> CREATE INDEX
>   test_brin_idx
> ON
>   test_brin
> USING
>   BRIN
>   (
> idx
>   )
> ;


Let's VACUUM the table just to be safe:

> VACUUM test_brin;
> VACUUM ANALYSE test_brin;


And now let's query 20K random rows from our 20M total rows:

> EXPLAIN (
>   ANALYZE,
>   VERBOSE,
>   COSTS,
>   BUFFERS,
>   TIMING
> )
> SELECT
>   tb.idx
> FROM
>   test_brin
> AS tb
> WHERE
>   EXISTS (
> SELECT
> FROM
>   (
> SELECT
>   idx
> FROM
>   (
> SELECT
>   -- Just trying to break the optimisation that would
> recognize "idx" as being an indexed column
>   (idx + (CEIL(RANDOM()) - 1))::BIGINT
> AS idx
> FROM
>   test_brin
> ORDER BY
>   RANDOM()
> LIMIT
>   2
>   )
> AS t
> ORDER BY
>   idx ASC
>   )
> AS q
> WHERE
>   tb.idx = q.idx
>   )
> ;


By default, this query will not use the BRIN index and simply run a 1.5s
long sequential scan (hitting 700 MB) and a 2.47s hash join for a total
8.7s query time:

https://explain.dalibo.com/plan/46c3191g8a6c1bc7

If we force the use of the BRIN index using (`SET LOCAL enable_seqscan =
OFF;`) the same query will now take 50s with 2.5s spent on the bitmap index
scan (hitting 470 MB of data) and a whopping 42s on the bitmap heap scan
(hitting 20 GB of data!):

https://explain.dalibo.com/plan/7f73bg9172a8b226

Since the bitmap heap scan is taking a long time, lets reduce the
`pages_per_range` parameter from its 128 default value to 32:

> CREATE INDEX
>   test_brin_idx
> ON
>   test_brin
> USING
>   BRIN
>   (
> idx
>   )
> WITH
>   (
> pages_per_range = 32
>   )
> ;


The query now takes 25s, half the time we had before, with 9.7s (up from
2.5s) spent on the bitmap index scan (hitting 2.6GB of data, up from 470
MB) and 10s (down from 42s) on the bitmap heap scan (hitting 4.9GB of data,
down from 20 GB):

https://explain.dalibo.com/plan/64fh5h1daaheeab3

We go a step further and lower the `pages_per_range` parameter to 8 (the
other extreme).

The query now takes 45s, close-ish to the initial time, with 38.5s (up from
2.5s) spent on the bitmap index scan (hitting 9.8GB of data, up from 470
MB) and 2.6s (down from 42s) on the bitmap heap scan (hitting 1.2GB of
data, down from 20 GB):

https://explain.dalibo.com/plan/431fbde7gb19g6g6

So I had the following two questions:

   1. Why is the BRIN index systematically worse than a sequential scan,
   even when the table is x1000 larger than the search set, physically
   pre-sorted, dense (fillfactor at 100%) and the search rows are themselves
   sorted?
   This setup would seem to be the ideal conditions for a BRIN index to run
   well.

   2. Since we only select the "idx" column, why does the BRIN index not
   simply return the searched value if included in one of it's ranges?
   Hitting the actual row data stored in the table seems to be unnessary no?

Here's my test environnement:

   -   PostgreSQL version: v14
   -   Memory: 32GB
   -   CPUs: 8

Thanks a lot in advance,

Mickael


Re: Array of integer indexed nested-loop semi join

2022-05-23 Thread Mickael van der Beek
Hello Jeff,

Thank you again for your advice.

I did indeed think of the ARRAY_AGG() version of the query.
Although this method is very fast (and does use indexes) for smallish array
sizes, this is sadly not practical in my case because the arrays of
matching rows can reach multiple hundreds of thousands of rows.
I thought of maybe "batching" the ARRAY_AGG() in batches of max n rows in a
subquery and then calculating intersection on that but it doesn't seem
practical or faster in the end.

> You could just add a DISTINCT to get rid of the duplicates.  Of course
that will also take some time on a large returned data set, but probably
less time than scanning a giant table.  I think this is probably cleaner
than the alternatives.

Yes, and a GROUP BY will do the trick as well.
The fact that the current solution is a "nested loop" instead of a "nested
loop semi join" means that the query is much slower due to needing to GROUP
BY the rows.
This is why I tried various version using EXISTS, ANY, ARRAY_AGG(), etc
with no avail.
Would you have an idea on why PostgreSQL doesn't use the existing indexes
for this type of subqueries ?

> I don't know if you misunderstood.  I meant specifically the intarray
extension.  You can use integer arrays with built-in GIN indexes without
help from the intarray extension.  Maybe you know that already and are just
saying that the extension is even faster than the built-in indexed
operators are and you need that extra speed.

Are there specific advantages to not using the intarray extension and it's
indexes in this case?
I was under the impression that it supported more operation types and was
generally faster for this niche use case.

Thank you again for your help,

Mickael



On Mon, May 23, 2022 at 4:11 PM Jeff Janes  wrote:

> On Mon, May 23, 2022 at 3:57 AM Mickael van der Beek <
> mickael.van.der.b...@gmail.com> wrote:
>
>> Hello Jeff,
>>
>> Sadly, the query you suggested won't work because you are only returning
>> the first row of the matching inner query rows.
>>
>
> Sure, but the query I replaced did the same thing.  (I thought that was
> what you wanted, but I guess that was just to get it to run fast enough to
> ever finish--in that case it is probably better to use EXPLAIN without the
> ANALYZE so that we can see the plan of the correct query).  To get around
> that one-row limit you have to write it somewhat differently, getting rid
> of the ARRAY and adding an array_agg():
>
> SELECT fu.*
> FROM
>   fact_users AS fu
> WHERE
>   fu.w2_page_idxs && (select array_agg(idx) from fact_pages where
> attribute_idxs && ARRAY[201]);
>
> This way of writing it is better, as it still works with the LIMIT 1 but
> also works without it.  This still uses the indexes for me, at least when
> enable_seqscan is off.
>
>
>> The INNER JOIN version of the query will return all matching rows but
>> also include duplicates:
>>
>
> You could just add a DISTINCT to get rid of the duplicates.  Of course
> that will also take some time on a large returned data set, but probably
> less time than scanning a giant table.  I think this is probably cleaner
> than the alternatives.
>
>
>>
>> The reason I'm using integer arrays is because it is the only way I have
>> found in PostgreSQL to get fast inclusion / exclusion checks on large
>> datasets (hundreds of millions of values).
>> Did I misunderstand your response?
>>
>
> I don't know if you misunderstood.  I meant specifically the intarray
> extension.  You can use integer arrays with built-in GIN indexes without
> help from the intarray extension.  Maybe you know that already and are just
> saying that the extension is even faster than the built-in indexed
> operators are and you need that extra speed.
>
> Cheers,
>
> Jeff
>
>>

-- 
Mickael van der BeekWeb developer & Security analyst

mickael.van.der.b...@gmail.com


Re: Array of integer indexed nested-loop semi join

2022-05-23 Thread Mickael van der Beek
Hello Jeff,

Sadly, the query you suggested won't work because you are only returning
the first row of the matching inner query rows.
Example:

SELECT
>   u.idx,
>   u.page_idxs
> FROM
>   (
> VALUES
>   (1, ARRAY[11, 21, 31]),
>   (2, ARRAY[12, 21, 32]),
>   (3, ARRAY[13, 23, 31])
>   )
> AS u(idx, page_idxs)
> WHERE
>   u.page_idxs && ARRAY[(
> SELECT
>   p.idx
> FROM
>   (
> VALUES
>   (11, ARRAY[101, 201, 301]),
>   (21, ARRAY[102, 201, 302]),
>   (13, ARRAY[103, 203, 301])
>   )
> AS p(idx, attribute_idxs)
> WHERE
>   p.attribute_idxs && ARRAY[201]
> FETCH FIRST 1 ROWS ONLY
>   )]
> ;


This query only returns one row while it should actually return two:

1 {11,21,31}


The INNER JOIN version of the query will return all matching rows but also
include duplicates:

SELECT
>   u.idx,
>   u.page_idxs
> FROM
>   (
> VALUES
>   (1, ARRAY[11, 21, 31]),
>   (2, ARRAY[12, 21, 32]),
>   (3, ARRAY[13, 23, 31])
>   )
> AS u(idx, page_idxs)
> INNER JOIN
>   (
> SELECT
>   p.idx
> FROM
>   (
> VALUES
>   (11, ARRAY[101, 201, 301]),
>   (21, ARRAY[102, 201, 302]),
>   (13, ARRAY[103, 203, 301])
>   )
> AS p(idx, attribute_idxs)
> WHERE
>   p.attribute_idxs && ARRAY[201]
>   )
>   AS p2
>   ON u.page_idxs && ARRAY[p2.idx]
> ;


Results:

1 {11,21,31}
> 1 {11,21,31}
> 2 {12,21,32}


As far as I know, the the IN + sub-expression query can't work since the
left side of the operation is an array of integers and the right side a set
of rows with a single integer column.
The reason I'm using integer arrays is because it is the only way I have
found in PostgreSQL to get fast inclusion / exclusion checks on large
datasets (hundreds of millions of values).
Did I misunderstand your response?
Thank you for the ongoing help,

Mickael

On Mon, May 23, 2022 at 4:45 AM Jeff Janes  wrote:

>
>
> On Fri, May 20, 2022 at 6:42 AM Mickael van der Beek <
> mickael.van.der.b...@gmail.com> wrote:
>
>>
>> Query:
>>
>> EXPLAIN (
>>>   ANALYZE,
>>>   VERBOSE,
>>>   COSTS,
>>>   BUFFERS,
>>>   TIMING
>>> )
>>> SELECT
>>>   fu.w2_page_idxs
>>> FROM
>>>   fact_users
>>> AS fu
>>> WHERE
>>>   EXISTS (
>>> SELECT
>>> FROM
>>>   (
>>> SELECT
>>>   ARRAY[idx] AS page_idx
>>> FROM
>>>   fact_pages
>>> WHERE
>>>   attribute_idxs && ARRAY[30160]
>>> FETCH FIRST 1 ROWS ONLY
>>>   )
>>> AS fp
>>> WHERE
>>>   fu.w2_page_idxs && fp.page_idx
>>>   )
>>> ;
>>
>>
>> Without any surprises, the planner is using a sequential scan on the
>> "fact_users" table which is very large instead of using the GIN index set
>> on the "w2_page_idxs" column.
>>
>
> For me, using the subquery in and expression, instead of the EXISTS, does
> get it to use the gin index.  And I think it must give the same results.
>
> SELECT
>   fu.w2_page_idxs
> FROM  fact_users AS fu
> WHERE
>   fu.w2_page_idxs && ARRAY[(select idx from fact_pages where
> attribute_idxs && ARRAY[3003] FETCH FIRST 1 ROWS ONLY)];
>
> But why are you using intarray?  That is unnecessary here, and by creating
> ambiguity about the array operators it might be harmful.
>
> Cheers,
>
> Jeff
>
>>

-- 
Mickael van der BeekWeb developer & Security analyst

mickael.van.der.b...@gmail.com


Re: Array of integer indexed nested-loop semi join

2022-05-20 Thread Mickael van der Beek
Hello Jeff,

Sorry for the delay, here are the EXPLAIN ANALYSE results for one single
row in the inner-query:

Nested Loop Semi Join  (cost=1000993.81..10004731160.70 rows=536206
> width=28) (actual time=93765.182..93765.183 rows=0 loops=1)
>   Output: fu.w2_page_idxs
>   Join Filter: (fu.w2_page_idxs && (ARRAY[fact_pages.idx]))
>   Rows Removed by Join Filter: 53762825
>   Buffers: shared hit=569194 read=2821768
>   I/O Timings: read=56586.955
>   ->  Seq Scan on public.fact_users fu
>  (cost=100.00..10003925857.68 rows=53620568 width=28) (actual
> time=79.139..67423.779 rows=53762825 loops=1)
> Output: fu.w2_page_idxs
> Buffers: shared hit=567884 read=2821768
> I/O Timings: read=56586.955
>   ->  Materialize  (cost=993.81..994.50 rows=1 width=32) (actual
> time=0.000..0.000 rows=1 loops=53762825)
> Output: (ARRAY[fact_pages.idx])
> Buffers: shared hit=148
> ->  Limit  (cost=993.81..994.48 rows=1 width=32) (actual
> time=26.382..26.383 rows=1 loops=1)
>   Output: (ARRAY[fact_pages.idx])
>   Buffers: shared hit=148
>   ->  Bitmap Heap Scan on public.fact_pages
>  (cost=993.81..70645.00 rows=103556 width=32) (actual time=26.378..26.379
> rows=1 loops=1)
> Output: ARRAY[fact_pages.idx]
> Recheck Cond: (fact_pages.attribute_idxs &&
> '{30160}'::integer[])
> Heap Blocks: exact=1
> Buffers: shared hit=148
> ->  Bitmap Index Scan on fact_pages_attribute_idxs_int
>  (cost=0.00..967.92 rows=103556 width=0) (actual time=14.865..14.865
> rows=101462 loops=1)
>   Index Cond: (fact_pages.attribute_idxs &&
> '{30160}'::integer[])
>   Buffers: shared hit=147
> Query Identifier: 6779965332684941204
> Planning:
>   Buffers: shared hit=2
> Planning Time: 0.162 ms
> JIT:
>   Functions: 10
>   Options: Inlining true, Optimization true, Expressions true, Deforming
> true
>   Timing: Generation 1.507 ms, Inlining 9.797 ms, Optimization 54.902 ms,
> Emission 14.314 ms, Total 80.521 ms
> Execution Time: 93766.772 ms


Query:

EXPLAIN (
>   ANALYZE,
>   VERBOSE,
>   COSTS,
>   BUFFERS,
>   TIMING
> )
> SELECT
>   fu.w2_page_idxs
> FROM
>   fact_users
> AS fu
> WHERE
>   EXISTS (
> SELECT
> FROM
>   (
> SELECT
>   ARRAY[idx] AS page_idx
> FROM
>   fact_pages
> WHERE
>   attribute_idxs && ARRAY[30160]
> FETCH FIRST 1 ROWS ONLY
>   )
> AS fp
> WHERE
>   fu.w2_page_idxs && fp.page_idx
>   )
> ;


Without any surprises, the planner is using a sequential scan on the
"fact_users" table which is very large instead of using the GIN index set
on the "w2_page_idxs" column.

Link to the query plan visualiser: https://explain.dalibo.com/plan/1vC

Thank you very much in advance,

Mickael

On Wed, Apr 27, 2022 at 4:54 PM Mickael van der Beek <
mickael.van.der.b...@gmail.com> wrote:

> Hello Jeff,
>
> I have waited a few hours without the query ever finishing which is the
> reason I said "never finishes".
> Especially because the INNER JOIN version finishes within a few minutes
> while being combinatorial and less efficient.
> The query probably only does sequential scans.
>
> You will find the query plan using EXPLAIN here:
> - Visual query plan: https://explain.dalibo.com/plan#plan
> - Raw query plan: https://explain.dalibo.com/plan#raw
>
> Thanks for your help,
>
> Mickael
>
> On Wed, Apr 27, 2022 at 4:28 PM Jeff Janes  wrote:
>
>> On Wed, Apr 27, 2022 at 8:19 AM Mickael van der Beek <
>> mickael.van.der.b...@gmail.com> wrote:
>>
>>>
>>> The last query does not finish after waiting for more than 15 minutes.
>>> (The temporary view creation is very fast and required due to the same
>>> query in a CTE greatly reducing performance (by more than 5 min.) due to
>>> the optimisation barrier I'm guessing.)
>>>
>>
>> How much over 15 minutes?  20 minutes doesn't seem that long to wait to
>> get a likely definitive answer.  But at the least show us the EXPLAIN
>> without ANALYZE of it, that should take no milliseconds.
>>
>> And what does it mean for something to take 5 minutes longer than "never
>> finishes"?
>>
>> (Also, putting every or every other token on a separate line does not
>> make it easier to read)
>>
>> Cheer,
>>
>> Jeff
>>
>>>
>
> --
> Mickael van der BeekWeb developer & Security analyst
>
> mickael.van.der.b...@gmail.com
>


-- 
Mickael van der BeekWeb developer & Security analyst

mickael.van.der.b...@gmail.com


Re: Array of integer indexed nested-loop semi join

2022-04-27 Thread Mickael van der Beek
Hello Jeff,

I have waited a few hours without the query ever finishing which is the
reason I said "never finishes".
Especially because the INNER JOIN version finishes within a few minutes
while being combinatorial and less efficient.
The query probably only does sequential scans.

You will find the query plan using EXPLAIN here:
- Visual query plan: https://explain.dalibo.com/plan#plan
- Raw query plan: https://explain.dalibo.com/plan#raw

Thanks for your help,

Mickael

On Wed, Apr 27, 2022 at 4:28 PM Jeff Janes  wrote:

> On Wed, Apr 27, 2022 at 8:19 AM Mickael van der Beek <
> mickael.van.der.b...@gmail.com> wrote:
>
>>
>> The last query does not finish after waiting for more than 15 minutes.
>> (The temporary view creation is very fast and required due to the same
>> query in a CTE greatly reducing performance (by more than 5 min.) due to
>> the optimisation barrier I'm guessing.)
>>
>
> How much over 15 minutes?  20 minutes doesn't seem that long to wait to
> get a likely definitive answer.  But at the least show us the EXPLAIN
> without ANALYZE of it, that should take no milliseconds.
>
> And what does it mean for something to take 5 minutes longer than "never
> finishes"?
>
> (Also, putting every or every other token on a separate line does not make
> it easier to read)
>
> Cheer,
>
> Jeff
>
>>

-- 
Mickael van der BeekWeb developer & Security analyst

mickael.van.der.b...@gmail.com


Fwd: Array of integer indexed nested-loop semi join

2022-04-27 Thread Mickael van der Beek
Hello everyone,

*1) Context*

I'm working with large tables containing arrays of integers, indexed with "
*gin__int_ops*" GIN indexes offered by the "*intarray*" extension.
The goal I'm trying to achieve is to do a "nested loop semi join" using the
array inclusion operation (@>) as join condition but in an indexed way.
(Basically an INNER JOIN without the duplicate rows and without needing to
use columns from the joined table.)

*2) Configuration*

The queries are run on a PostgreSQL v14 server with 32GB RAM and 8 vCPUs on
a 64 bit ARM Neoverse architecture (m6g.2xlarge AWS RDS instance).
PostgreSQL's configuration uses the following key values:


   - work_mem = 8GB (only set for this query)
   - shared_buffers = 8GB
   - effective_cache_size = 22GB
   - max_worker_processes = 8
   - max_parallel_workers_per_gather = 4
   - jit = on

*3) Tables*

The "light_pages_attributes" contains about 2 million rows, each with an
"attributes" column containing on average 20 integers.

CREATE TABLE
>   light_pages_attributes
>   (
> idINTEGER   NOT NULL,
> "attributes"  INTEGER[] NOT NULL
>   )
> ;
> CREATE INDEX
>   light_pages_attributes_attributes
> ON
>   light_pages_attributes
> USING
>   gin
>   (
> attributes gin__int_ops
>   )
> ;


The "light_pages_views" contains about 25 million rows, each with a
"page_ids" column containing on average 20 integers as well.

CREATE TABLE
>   light_pages_views
>   (
> vector_id BIGINTNOT NULL,
> page_ids  INTEGER[] NOT NULL
>   )
> ;
> CREATE INDEX
>   light_pages_views_page_ids
> ON
>   light_pages_views
> USING
>   gin
>   (
> page_ids gin__int_ops
>   )
> ;


*4) Query*

The query I'm trying to optimise is the following:

BEGIN;



SET LOCAL work_mem = '8GB';



CREATE TEMPORARY VIEW
>   urls
>   AS
>   (
> SELECT ARRAY[lpa.id]
> AS page_id
>   FROM
> light_pages_attributes
>   AS lpa
>   WHERE
> lpa."attributes" @> ARRAY[189376]
>   );
> EXPLAIN (
>   ANALYZE,
>   VERBOSE,
>   COSTS,
>   BUFFERS,
>   TIMING
> )
> SELECT
>   COUNT(*)
> FROM
>   light_pages_views
> AS lpv
> WHERE
>   EXISTS (
> SELECT
>   1
> FROM
>   urls
> AS u
> WHERE
>   lpv.page_ids @> u.page_id
>   )
> ;



COMMIT;


The last query does not finish after waiting for more than 15 minutes.
(The temporary view creation is very fast and required due to the same
query in a CTE greatly reducing performance (by more than 5 min.) due to
the optimisation barrier I'm guessing.)
This alternative query, which should be far slower due to the fact that it
generates duplicate lines through the INNER JOIN, is in fact much faster, 1
min. and 39 s.:

EXPLAIN (
>   ANALYZE,
>   VERBOSE,
>   COSTS,
>   BUFFERS,
>   TIMING
> )
> SELECT
>   COUNT(*)
> FROM
>   (
> SELECT
>   1
> FROM
>   light_pages_views
> AS lpv
> INNER JOIN
>   urls
> AS u
> ON lpv.page_ids @> u.page_id
> GROUP BY
>   lpv.vector_id
>   )
> AS t
> ;


Visual query plan: https://explain.dalibo.com/plan/bc3#plan
Raw query plan: https://explain.dalibo.com/plan/bc3#raw

Other strategies I've tried as well:

   - lpv.page_ids @> ANY(SELECT u.page_id FROM urls AS u)
   - FULL OUTER JOIN, not possible due to the condition not being
   merge-joinable

The end-goal would be to update all matching "light_pages_views" rows by
appending an integer to their array of integer.
So possibly millions of tows to be updated.

Thank you a lot in advance for your help!

Mickael


Very long query planning times for database with lots of partitions

2019-01-22 Thread Mickael van der Beek
Hey everyone,

I have a PostgreSQL 10 database that contains two tables which both have
two levels of partitioning (by list and using a single value). Meaning that
a partitioned table gets repartitioned again.

The data in my use case is stored on 5K to 10K partitioned tables (children
and grand-children of the two tables mentioned above) depending on usage
levels.

Three indexes are set on the grand-child partition. The partitioning
columns are not covered by them.
(I don't believe that it is needed to index partition columns no?)

With this setup, I experience queries that have very slow planning times
but fast execution times.
Even for simple queries where only a couple partitions are searched on and
the partition values are hard-coded.

Researching the issue, I thought that the linear search in use by
PostgreSQL 10 to find the partition table metadata was the cause.

cf: https://blog.2ndquadrant.com/partition-elimination-postgresql-11/

So I decided to try ou PostgreSQL 11 which included the two aforementioned
fixes:

-
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=499be013de65242235ebdde06adb08db887f0ea5
-
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9fdb675fc5d2de825414e05939727de8b120ae81

Helas, it seems that the version update didn't change anything.
I ran an `ANALYZE` before doing my tests so I believe that the statistics
are calculated and fresh.

Now I know that PostgreSQL doesn't like having lots of partitions but I
still would like to understand why the query planner is so slow in
PostgreSQL 10 and PostgreSQL 11.
(I was also wondering what "a lot" of partitions is in PostgreSQL? When I
look at use cases of extensions like TimescaleDB, I would expect that 5K to
10K partitions wouldn't be a whole lot.)

An example of a simple query that I run on both PostgreSQL version would be:

EXPLAIN ANALYZE
> SELECT
> table_a.a,
> table_b.a
> FROM
> (
> SELECT
> a,
> b
> FROM
> table_a
> WHERE
> partition_level_1_column = 'foo'
> AND
> partition_level_2_column = 'bar'
> )
> AS table_a
> INNER JOIN
> (
> SELECT
> a,
> b
> FROM
> table_b
> WHERE
> partition_level_1_column = 'baz'
> AND
> partition_level_2_column = 'bat'
> )
> AS table_b
> ON table_b.b = table_a.b
> LIMIT
> 10;


Running this query on my database with 5K partitions (split roughly 2/3rds
of the partitions for table_b and 1/3rd of the partitions for table_a) will
return:

- Planning Time: 7155.647 ms
- Execution Time: 2.827 ms

Thank you in advance for your help!

Mickael