Zainab-Saad commented on issue #982:
URL: https://github.com/apache/age/issues/982#issuecomment-1752059252
Tested with around 1048567 edges, 12637040 and 9960576 two differently
labelled vertices.
Label redesign branch chooses normal seq scan over parallel seq scan for
simple edge finding `MATCH(:Person)-[e:EDGE]->(:Title)`
In the following queries, I have first tested without changing any
configuration parameters, but after that changed the `parallel_tuple_cost`
parameter to 0 in order to force a parallel seq scan for testing. This resulted
in less than half the execution time as before.
```
agedb=# SHOW parallel_tuple_cost;
parallel_tuple_cost
---------------------
0.1
(1 row)
agedb=# SELECT * FROM cypher('imdb', $$
EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->()
RETURN e
$$) AS (result agtype);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Seq Scan on "EDGE" e (cost=0.00..33392.76 rows=1048567 width=32) (actual
time=0.047..4619.293 rows=1048567 loops=1)
Filter: (start_label_id = 3)
Planning Time: 0.220 ms
Execution Time: 4670.069 ms
(4 rows)
agedb=# SELECT * FROM cypher('imdb', $$
EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->(:Title)
RETURN e
$$) AS (result agtype);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Seq Scan on "EDGE" e (cost=0.00..36014.17 rows=1048567 width=32) (actual
time=0.045..4615.328 rows=1048567 loops=1)
Filter: ((start_label_id = 3) AND (end_label_id = 5))
Planning Time: 0.150 ms
Execution Time: 4665.713 ms
(4 rows)
agedb=# SET parallel_tuple_cost TO 0;
SET
agedb=# SELECT * FROM cypher('imdb', $$
EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->()
RETURN e
$$) AS (result agtype);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..20630.32 rows=1048567 width=32) (actual
time=0.561..1606.535 rows=1048567 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on "EDGE" e (cost=0.00..19630.32 rows=436903
width=32) (actual time=0.246..1543.629 rows=349522 loops=3)
Filter: (start_label_id = 3)
Planning Time: 0.147 ms
Execution Time: 1660.394 ms
(7 rows)
agedb=# SELECT * FROM cypher('imdb', $$
EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->(:Title)
RETURN e
$$) AS (result agtype);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..21722.57 rows=1048567 width=32) (actual
time=0.401..1622.878 rows=1048567 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on "EDGE" e (cost=0.00..20722.57 rows=436903
width=32) (actual time=0.219..1554.710 rows=349522 loops=3)
Filter: ((start_label_id = 3) AND (end_label_id = 5))
Planning Time: 0.110 ms
Execution Time: 1676.656 ms
(7 rows)
agedb=# SHOW parallel_setup_cost;
parallel_setup_cost
---------------------
1000
(1 row)
agedb=# SET parallel_setup_cost TO 10;
SET
agedb=# SELECT * FROM cypher('imdb', $$
EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->()
RETURN e
$$) AS (result agtype);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Gather (cost=10.00..19640.32 rows=1048567 width=32) (actual
time=0.471..1604.505 rows=1048567 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on "EDGE" e (cost=0.00..19630.32 rows=436903
width=32) (actual time=0.347..1540.499 rows=349522 loops=3)
Filter: (start_label_id = 3)
Planning Time: 0.147 ms
Execution Time: 1658.601 ms
(7 rows)
agedb=# SELECT * FROM cypher('imdb', $$
EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->(:Title)
RETURN e
$$) AS (result agtype);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Gather (cost=10.00..20732.57 rows=1048567 width=32) (actual
time=0.341..1617.122 rows=1048567 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on "EDGE" e (cost=0.00..20722.57 rows=436903
width=32) (actual time=0.240..1553.976 rows=349522 loops=3)
Filter: ((start_label_id = 3) AND (end_label_id = 5))
Planning Time: 0.110 ms
Execution Time: 1672.127 ms
(7 rows)
```
Below is the QPT for these queries run on AGE (without label redesign and
the HEAD of the branch is at commit `b4574f5df8de721eed0ed67a2788075b387af418`
which is the base of the
[label_redesign](https://github.com/rafsun42/age/tree/label_redesign) branch)
I have first tested without changing any configuration parameters, but after
that changed the `max_parallel_workers_per_gather ` to 0 in order to force the
normal seq scan instead of parallel scan (did this for comparison of execution
times with the above QPTs).
```
agedb=# SELECT * FROM cypher('imdb', $$
EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->()
RETURN e
$$) AS (result agtype);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..15799.77 rows=5243 width=32) (actual
time=0.343..875.155 rows=1048567 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on "EDGE" e (cost=0.00..14275.47 rows=2185
width=32) (actual time=0.414..819.148 rows=349522 loops=3)
Filter: ((_extract_label_id(start_id))::integer = 3)
Planning Time: 0.095 ms
Execution Time: 929.516 ms
(7 rows)
agedb=# SELECT * FROM cypher('imdb', $$
EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->(:Title)
RETURN e
$$) AS (result agtype);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..17451.71 rows=26 width=32) (actual
time=0.491..879.482 rows=1048567 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on "EDGE" e (cost=0.00..16449.11 rows=11 width=32)
(actual time=3.546..829.526 rows=349522 loops=3)
Filter: (((_extract_label_id(start_id))::integer = 3) AND
((_extract_label_id(end_id))::integer = 4))
Planning Time: 0.137 ms
Execution Time: 933.877 ms
(7 rows)
agedb=# SHOW max_parallel_workers_per_gather;
max_parallel_workers_per_gather
---------------------------------
2
(1 row)
agedb=# SET max_parallel_workers_per_gather TO 0;
SET
agedb=# SELECT * FROM cypher('imdb', $$
EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->()
RETURN e
$$) AS (result agtype);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on "EDGE" e (cost=0.00..23465.72 rows=5243 width=32) (actual
time=0.088..2397.991 rows=1048567 loops=1)
Filter: ((_extract_label_id(start_id))::integer = 3)
Planning Time: 0.115 ms
Execution Time: 2448.949 ms
(4 rows)
agedb=# SELECT * FROM cypher('imdb', $$
EXPLAIN ANALYZE MATCH (:Person)-[e: EDGE]->(:Title)
RETURN e
$$) AS (result agtype);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Seq Scan on "EDGE" e (cost=0.00..28682.47 rows=26 width=32) (actual
time=0.081..2464.521 rows=1048567 loops=1)
Filter: (((_extract_label_id(start_id))::integer = 3) AND
((_extract_label_id(end_id))::integer = 4))
Planning Time: 0.127 ms
Execution Time: 2515.350 ms
(4 rows)
```
@rafsun42 @CapnSpek
Do you think is it something for which we should further look into code to
improve this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]