jrgemignani commented on issue #2194:
URL: https://github.com/apache/age/issues/2194#issuecomment-3166241750
@serdarmicrosoft PR #2199 adds additional variables that can have a huge
impact on the performance of this query. Below I have added the output of
explain analyze on my server for the above query -
**SQL baseline**
```
psql-17.5-5432-pgsql=# explain analyze select count(*), u.id from
goodreads_graph_x."User" u, goodreads_graph_x."HAS_INTERACTION" h,
goodreads_graph_x."Book" b where u.id = h.start_id and b.id = h.end_id GROUP BY
u.id ORDER BY count(*) DESC LIMIT 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------
Limit (cost=828731.73..828731.75 rows=10 width=16) (actual
time=4988.478..5016.965 rows=10 loops=1)
-> Sort (cost=828731.73..830343.44 rows=644686 width=16) (actual
time=4988.476..5016.962 rows=10 loops=1)
Sort Key: (count(*)) DESC
Sort Method: top-N heapsort Memory: 25kB
-> Finalize GroupAggregate (cost=651469.34..814800.29 rows=644686
width=16) (actual time=4860.272..4986.960 rows=132661 loops=1)
Group Key: u.id
-> Gather Merge (cost=651469.34..801906.57 rows=1289372
width=16) (actual time=4860.263..4925.715 rows=135627 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=650469.31..652081.03 rows=644686
width=16) (actual time=4852.214..4858.651 rows=45209 loops=3)
Sort Key: u.id
Sort Method: quicksort Memory: 2966kB
Worker 0: Sort Method: quicksort Memory: 2954kB
Worker 1: Sort Method: quicksort Memory: 2927kB
-> Partial HashAggregate
(cost=538245.23..577244.80 rows=644686 width=16) (actual
time=4816.899..4831.099 rows=45209 loops=3)
Group Key: u.id
Planned Partitions: 8 Batches: 1 Memory
Usage: 6673kB
Worker 0: Batches: 1 Memory Usage: 6673kB
Worker 1: Batches: 1 Memory Usage: 6673kB
-> Parallel Hash Join
(cost=19772.67..303865.71 rows=4166747 width=8) (actual time=2707.748..3833.753
rows=3333333 loops=3)
Hash Cond: (h.start_id = u.id)
-> Parallel Hash Join
(cost=2374.89..219877.18 rows=4166747 width=8) (actual time=17.647..1876.387
rows=3333333 loops=3
)
Hash Cond: (h.end_id = b.id)
-> Parallel Seq Scan on
"HAS_INTERACTION" h (cost=0.00..184515.47 rows=4166747 width=16) (actual
time=0.039..507.
552 rows=3333333 loops=3)
-> Parallel Hash
(cost=1888.44..1888.44 rows=38916 width=8) (actual time=16.853..16.854
rows=31133 loops=3)
Buckets: 131072 Batches:
1 Memory Usage: 4768kB
-> Parallel Index Only
Scan using idx_book_id on "Book" b (cost=0.29..1888.44 rows=38916 width=8)
(actual t
ime=0.049..7.166 rows=31133 loops=3)
Heap Fetches: 0
-> Parallel Hash
(cost=12990.05..12990.05 rows=268619 width=8) (actual time=113.746..113.747
rows=214895 loops=3)
Buckets: 262144 (originally
262144) Batches: 16 (originally 8) Memory Usage: 12352kB
-> Parallel Index Only Scan
using "User_id_idx" on "User" u (cost=0.42..12990.05 rows=268619 width=8)
(actual tim
e=0.046..50.956 rows=214895 loops=3)
Heap Fetches: 0
Planning Time: 0.929 ms
Execution Time: 5019.400 ms
(34 rows)
psql-17.5-5432-pgsql=#
```
**Original Cypher query without Book endpoint**
```
psql-17.5-5432-pgsql=# SELECT * FROM cypher('goodreads_graph_x', $$explain
analyze MATCH (u:User)-[:HAS_INTERACTION]->() RETURN count(*), id(u) ORDER BY
count(*) DESC LIMIT 10 $$) AS (count agtype, id agtype);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------
Limit (cost=34041184.37..34041184.40 rows=10 width=64) (actual
time=82382.918..82407.803 rows=10 loops=1)
-> Sort (cost=34041184.37..34042796.09 rows=644686 width=64) (actual
time=82382.916..82407.800 rows=10 loops=1)
Sort Key: ((count(*))::agtype) DESC
Sort Method: top-N heapsort Memory: 26kB
-> GroupAggregate (cost=1256126.22..34027252.94 rows=644686
width=64) (actual time=25704.595..82308.264 rows=132661 loops=1)
Group Key: age_id(_agtype_build_vertex(u.id,
_label_name('237668'::oid, u.id), u.properties))
-> Nested Loop (cost=1256126.22..21684786.36
rows=2465914571 width=32) (actual time=25702.797..79040.874 rows=10000000
loops=1)
-> Gather Merge (cost=1256126.21..2420813.48
rows=10000193 width=118) (actual time=25702.610..40552.783 rows=10000000
loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=1255126.18..1265543.05
rows=4166747 width=118) (actual time=25658.964..33652.909 rows=3333333 loops=3)
Sort Key:
(age_id(_agtype_build_vertex(u.id, _label_name('237668'::oid, u.id),
u.properties)))
Sort Method: external merge Disk: 483984kB
Worker 0: Sort Method: external merge
Disk: 491056kB
Worker 1: Sort Method: external merge
Disk: 502712kB
-> Parallel Hash Join
(cost=21619.93..284274.14 rows=4166747 width=118) (actual
time=1935.223..13026.837 rows=3333333 loops=3
)
Hash Cond:
(_age_default_alias_0.start_id = u.id)
-> Parallel Seq Scan on
"HAS_INTERACTION" _age_default_alias_0 (cost=0.00..184515.47 rows=4166747
width=16) (actual tim
e=0.050..845.514 rows=3333333 loops=3)
-> Parallel Hash
(cost=13802.19..13802.19 rows=268619 width=110) (actual time=318.674..318.675
rows=214895 loops=3)
Buckets: 65536 (originally
65536) Batches: 32 (originally 16) Memory Usage: 9824kB
-> Parallel Seq Scan on "User"
u (cost=0.00..13802.19 rows=268619 width=110) (actual time=0.160..199.981
rows=214
895 loops=3)
-> Memoize (cost=0.01..1.54 rows=5 width=8) (actual
time=0.001..0.001 rows=1 loops=10000000)
Cache Key: _age_default_alias_0.end_id
Cache Mode: logical
Hits: 9907873 Misses: 92127 Evictions: 17229
Overflows: 0 Memory Usage: 8193kB
-> Append (cost=0.00..1.53 rows=5 width=8)
(actual time=0.011..0.020 rows=1 loops=92127)
-> Seq Scan on _ag_label_vertex
_age_default_alias_1_1 (cost=0.00..0.00 rows=1 width=8) (actual
time=0.000..0.000 rows=0 loop
s=92127)
Filter: (id =
_age_default_alias_0.end_id)
-> Index Only Scan using "User_id_idx" on
"User" _age_default_alias_1_2 (cost=0.42..0.44 rows=1 width=8) (actual
time=0.005..
0.005 rows=0 loops=92127)
Index Cond: (id =
_age_default_alias_0.end_id)
Heap Fetches: 0
-> Index Only Scan using idx_book_id on
"Book" _age_default_alias_1_3 (cost=0.29..0.31 rows=1 width=8) (actual
time=0.004..0.
005 rows=1 loops=92127)
Index Cond: (id =
_age_default_alias_0.end_id)
Heap Fetches: 0
-> Index Only Scan using idx_author_id on
"Author" _age_default_alias_1_4 (cost=0.29..0.31 rows=1 width=8) (actual
time=0.003
..0.003 rows=0 loops=92127)
Index Cond: (id =
_age_default_alias_0.end_id)
Heap Fetches: 0
-> Index Only Scan using idx_review_id on
"Review" _age_default_alias_1_5 (cost=0.43..0.45 rows=1 width=8) (actual
time=0.005
..0.005 rows=0 loops=92127)
Index Cond: (id =
_age_default_alias_0.end_id)
Heap Fetches: 0
Planning Time: 1.476 ms
Execution Time: 82440.110 ms
(42 rows)
psql-17.5-5432-pgsql=#
```
**Original Cypher query with Book endpoint**
```
psql-17.5-5432-pgsql=# SELECT * FROM cypher('goodreads_graph_x', $$explain
analyze MATCH (u:User)-[:HAS_INTERACTION]->(:Book) RETURN count(*), id(u) ORDER
BY count(*) DESC LIMIT 10 $$) AS (count agtype, id agtype);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------
Limit (cost=1007580.65..1007580.67 rows=10 width=64) (actual
time=18222.030..18248.143 rows=10 loops=1)
-> Sort (cost=1007580.65..1009192.36 rows=644686 width=64) (actual
time=18222.029..18248.140 rows=10 loops=1)
Sort Key: ((count(*))::agtype) DESC
Sort Method: top-N heapsort Memory: 26kB
-> Finalize GroupAggregate (cost=823871.40..993649.21 rows=644686
width=64) (actual time=17789.320..18166.135 rows=132661 loops=1)
Group Key: (age_id(_agtype_build_vertex(u.id,
_label_name('237668'::oid, u.id), u.properties)))
-> Gather Merge (cost=823871.40..974308.63 rows=1289372
width=40) (actual time=17789.302..17949.998 rows=132923 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=822871.37..824483.09 rows=644686
width=40) (actual time=17762.365..17772.000 rows=44308 loops=3)
Sort Key: (age_id(_agtype_build_vertex(u.id,
_label_name('237668'::oid, u.id), u.properties)))
Sort Method: quicksort Memory: 3503kB
Worker 0: Sort Method: quicksort Memory: 3432kB
Worker 1: Sort Method: quicksort Memory: 3393kB
-> Partial HashAggregate
(cost=674786.11..743035.36 rows=644686 width=40) (actual
time=17462.389..17476.932 rows=44308 loops=3)
Group Key:
age_id(_agtype_build_vertex(u.id, _label_name('237668'::oid, u.id),
u.properties))
Planned Partitions: 16 Batches: 1 Memory
Usage: 6673kB
Worker 0: Batches: 1 Memory Usage: 6417kB
Worker 1: Batches: 1 Memory Usage: 6417kB
-> Parallel Hash Join
(cost=23994.82..342748.46 rows=4166747 width=32) (actual
time=2755.273..14367.058 rows=3333333 loops=3)
Hash Cond:
(_age_default_alias_0.start_id = u.id)
-> Parallel Hash Join
(cost=2374.89..219877.18 rows=4166747 width=8) (actual time=17.622..1915.252
rows=3333333 loops=3
)
Hash Cond:
(_age_default_alias_0.end_id = _age_default_alias_1.id)
-> Parallel Seq Scan on
"HAS_INTERACTION" _age_default_alias_0 (cost=0.00..184515.47 rows=4166747
width=16) (actu
al time=0.024..519.098 rows=3333333 loops=3)
-> Parallel Hash
(cost=1888.44..1888.44 rows=38916 width=8) (actual time=17.200..17.201
rows=31133 loops=3)
Buckets: 131072 Batches:
1 Memory Usage: 4736kB
-> Parallel Index Only
Scan using idx_book_id on "Book" _age_default_alias_1 (cost=0.29..1888.44
rows=38916
width=8) (actual time=0.070..7.131 rows=31133 loops=3)
Heap Fetches: 0
-> Parallel Hash
(cost=13802.19..13802.19 rows=268619 width=110) (actual time=118.590..118.590
rows=214895 loops=3)
Buckets: 65536 (originally
65536) Batches: 32 (originally 16) Memory Usage: 9792kB
-> Parallel Seq Scan on "User"
u (cost=0.00..13802.19 rows=268619 width=110) (actual time=0.046..44.205
rows=2148
95 loops=3)
Planning Time: 2.422 ms
Execution Time: 18251.645 ms
(33 rows)
psql-17.5-5432-pgsql=#
```
**Cypher query without Book endpoint using the new vars**
```
psql-17.5-5432-pgsql=# SELECT * FROM cypher('goodreads_graph_x', $$explain
analyze MATCH (u:User)-[:HAS_INTERACTION]->() RETURN count(*), u_idc ORDER BY
count(*) DESC LIMIT 10 $$) AS (count agtype, u_idc agtype);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------
Subquery Scan on _ (cost=6539203.76..6539203.81 rows=10 width=64) (actual
time=10232.518..10232.605 rows=10 loops=1)
-> Limit (cost=6539203.76..6539203.79 rows=10 width=40) (actual
time=10232.514..10232.595 rows=10 loops=1)
-> Sort (cost=6539203.76..6540815.48 rows=644686 width=40)
(actual time=10232.512..10232.592 rows=10 loops=1)
Sort Key: ((count(*))::agtype) DESC
Sort Method: top-N heapsort Memory: 26kB
-> Finalize GroupAggregate (cost=786218.01..6525272.33
rows=644686 width=40) (actual time=2704.329..10140.579 rows=132661 loops=1)
Group Key: u.id
-> Gather Merge (cost=786218.01..6510766.89
rows=1289372 width=16) (actual time=2704.314..10002.115 rows=133999 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate
(cost=785217.99..6360941.35 rows=644686 width=16) (actual
time=2400.315..9630.184 rows=44666 loops=3)
Group Key: u.id
-> Nested Loop
(cost=785217.99..1217172.46 rows=1027464405 width=8) (actual
time=2399.584..9121.389 rows=3333333 loops=3)
-> Merge Join
(cost=785217.98..885067.31 rows=4166747 width=16) (actual
time=2399.385..5151.825 rows=3333333 loops=3)
Merge Cond:
(_age_default_alias_0.start_id = u.id)
-> Sort
(cost=785081.51..795498.38 rows=4166747 width=16) (actual
time=2399.313..3120.343 rows=3333333 loops=3)
Sort Key:
_age_default_alias_0.start_id
Sort Method: external
merge Disk: 87984kB
Worker 0: Sort Method:
external merge Disk: 83192kB
Worker 1: Sort Method:
external merge Disk: 83280kB
-> Parallel Seq Scan on
"HAS_INTERACTION" _age_default_alias_0 (cost=0.00..184515.47 rows=4166747
width=16)
(actual time=0.045..842.639 rows=3333333 loops=3)
-> Materialize
(cost=0.42..18362.43 rows=644686 width=8) (actual time=0.059..560.710
rows=3933332 loops=3)
-> Index Only Scan using
"User_id_idx" on "User" u (cost=0.42..16750.72 rows=644686 width=8) (actual
time=0
.056..142.077 rows=644665 loops=3)
Heap Fetches: 0
-> Memoize (cost=0.01..1.54 rows=5
width=8) (actual time=0.001..0.001 rows=1 loops=10000000)
Cache Key:
_age_default_alias_0.end_id
Cache Mode: logical
Hits: 3386794 Misses: 70936
Evictions: 0 Overflows: 0 Memory Usage: 7759kB
Worker 0: Hits: 3200070
Misses: 69560 Evictions: 0 Overflows: 0 Memory Usage: 7609kB
Worker 1: Hits: 3201422
Misses: 71218 Evictions: 0 Overflows: 0 Memory Usage: 7790kB
-> Append (cost=0.00..1.53
rows=5 width=8) (actual time=0.010..0.019 rows=1 loops=211714)
-> Seq Scan on
_ag_label_vertex _age_default_alias_1_1 (cost=0.00..0.00 rows=1 width=8)
(actual time=0.000.
.0.000 rows=0 loops=211714)
Filter: (id =
_age_default_alias_0.end_id)
-> Index Only Scan using
"User_id_idx" on "User" _age_default_alias_1_2 (cost=0.42..0.44 rows=1
width=8) (a
ctual time=0.004..0.004 rows=0 loops=211714)
Index Cond: (id =
_age_default_alias_0.end_id)
Heap Fetches: 0
-> Index Only Scan using
idx_book_id on "Book" _age_default_alias_1_3 (cost=0.29..0.31 rows=1 width=8)
(act
ual time=0.004..0.004 rows=1 loops=211714)
Index Cond: (id =
_age_default_alias_0.end_id)
Heap Fetches: 0
-> Index Only Scan using
idx_author_id on "Author" _age_default_alias_1_4 (cost=0.29..0.31 rows=1
width=8)
(actual time=0.003..0.003 rows=0 loops=211714)
Index Cond: (id =
_age_default_alias_0.end_id)
Heap Fetches: 0
-> Index Only Scan using
idx_review_id on "Review" _age_default_alias_1_5 (cost=0.43..0.45 rows=1
width=8)
(actual time=0.005..0.005 rows=0 loops=211714)
Index Cond: (id =
_age_default_alias_0.end_id)
Heap Fetches: 0
Planning Time: 1.860 ms
Execution Time: 10250.143 ms
(47 rows)
psql-17.5-5432-pgsql=#
```
**Cypher query with Book endpoint using the new vars**
```
psql-17.5-5432-pgsql=# SELECT * FROM cypher('goodreads_graph_x', $$explain
analyze MATCH (u:User)-[:HAS_INTERACTION]->(:Book) RETURN count(*), u_idc ORDER
BY count(*) DESC LIMIT 10 $$) AS (count agtype, u_idc agtype);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------
Subquery Scan on _ (cost=830343.44..830343.49 rows=10 width=64) (actual
time=5091.056..5118.571 rows=10 loops=1)
-> Limit (cost=830343.44..830343.47 rows=10 width=40) (actual
time=5091.052..5118.561 rows=10 loops=1)
-> Sort (cost=830343.44..831955.16 rows=644686 width=40) (actual
time=5091.050..5118.558 rows=10 loops=1)
Sort Key: ((count(*))::agtype) DESC
Sort Method: top-N heapsort Memory: 26kB
-> Finalize GroupAggregate (cost=651469.34..816412.01
rows=644686 width=40) (actual time=4851.058..5033.469 rows=132661 loops=1)
Group Key: u.id
-> Gather Merge (cost=651469.34..801906.57
rows=1289372 width=16) (actual time=4851.043..4917.321 rows=135554 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=650469.31..652081.03 rows=644686
width=16) (actual time=4840.942..4849.306 rows=45185 loops=3)
Sort Key: u.id
Sort Method: quicksort Memory: 2962kB
Worker 0: Sort Method: quicksort Memory:
2914kB
Worker 1: Sort Method: quicksort Memory:
2969kB
-> Partial HashAggregate
(cost=538245.23..577244.80 rows=644686 width=16) (actual
time=4803.703..4818.569 rows=45185 loops=3)
Group Key: u.id
Planned Partitions: 8 Batches: 1
Memory Usage: 6673kB
Worker 0: Batches: 1 Memory Usage:
6673kB
Worker 1: Batches: 1 Memory Usage:
6673kB
-> Parallel Hash Join
(cost=19772.67..303865.71 rows=4166747 width=8) (actual time=2694.604..3822.223
rows=3333333 loop
s=3)
Hash Cond:
(_age_default_alias_0.start_id = u.id)
-> Parallel Hash Join
(cost=2374.89..219877.18 rows=4166747 width=8) (actual time=17.696..1863.307
rows=3333333 l
oops=3)
Hash Cond:
(_age_default_alias_0.end_id = _age_default_alias_1.id)
-> Parallel Seq Scan on
"HAS_INTERACTION" _age_default_alias_0 (cost=0.00..184515.47 rows=4166747
width=16)
(actual time=0.043..505.487 rows=3333333 loops=3)
-> Parallel Hash
(cost=1888.44..1888.44 rows=38916 width=8) (actual time=16.889..16.890
rows=31133 loops=3)
Buckets: 131072
Batches: 1 Memory Usage: 4736kB
-> Parallel Index
Only Scan using idx_book_id on "Book" _age_default_alias_1 (cost=0.29..1888.44
rows
=38916 width=8) (actual time=0.068..7.123 rows=31133 loops=3)
Heap Fetches: 0
-> Parallel Hash
(cost=12990.05..12990.05 rows=268619 width=8) (actual time=115.445..115.446
rows=214895 loops=3)
Buckets: 262144
(originally 262144) Batches: 16 (originally 8) Memory Usage: 12352kB
-> Parallel Index Only
Scan using "User_id_idx" on "User" u (cost=0.42..12990.05 rows=268619 width=8)
(actu
al time=0.056..51.793 rows=214895 loops=3)
Heap Fetches: 0
Planning Time: 1.453 ms
Execution Time: 5123.253 ms
(35 rows)
psql-17.5-5432-pgsql=#
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]