Re: [I] Performance Issue in Edge Creation for Large Tables and Index Creation [age]

via GitHub Mon, 04 Aug 2025 19:14:26 -0700


ysvoon commented on issue #2198:
URL: https://github.com/apache/age/issues/2198#issuecomment-3153036994


   > [@ysvoon](https://github.com/ysvoon) Could you provide the query plan? 
Also, are you increasing the size of Person? Meaning, adding in more people?
   > 
   > For the query plan, use explain analyze -
   > 
   > ```
   > SELECT * FROM cypher('schema_name', $$ EXPLAIN ANALYZE 
   >                     WITH {cypher_array} AS rows
   >                     UNWIND rows AS row
   >                     MATCH (a:Person {PersonId: row.StartId}),
   >                           (b:Person {PersonId: row.EndId})
   >                     CREATE (a)-[e:{edge_name} {{
   >                         StartId: row.StartId,
   >                         EndId: row.EndId,
   >                         otherprops: row.other_props
   >                     }}]->(b)
   >                 $$) AS (e agtype);
   > ```
   
   The size of Person table will likely stay within this range (around 8M 
nodes), and it is only for the first time full load that we need to create 
these edges from the Person table, which will create around 8M edges, and we 
have two other edge tables of such sizes to be created.
   
   
   For such a query,
   
   ```
   EXPLAIN ANALYZE SELECT * FROM cypher('schema_name', $$
        MATCH (a: Person {PersonId: '<redacted>'}), (b: Person {PersonId: 
'<redacted>'})
        CREATE (a)-[e:edge_name{StartId: '<redacted>', EndId: 
'<redacted>'}]->(b)
        RETURN e $$)
        as (e agtype);
   ```
   
   The query plan is as follows:
   
   ```
   "Custom Scan (Cypher Create)  (cost=0.00..0.00 rows=0 width=32) (actual 
time=1161.431..3478.889 rows=2 loops=1)"
   "  ->  Subquery Scan on _age_default_alias_previous_cypher_clause  
(cost=261.17..3383466.14 rows=73822464 width=32) (actual time=0.243..0.270 
rows=2 loops=1)"
   "        ->  Nested Loop  (cost=261.17..2645241.50 rows=73822464 width=192) 
(actual time=0.243..0.267 rows=2 loops=1)"
   "              ->  Bitmap Heap Scan on ""Person"" a  (cost=130.58..30716.89 
rows=8592 width=376) (actual time=0.139..0.140 rows=1 loops=1)"
   "                    Recheck Cond: (properties @> '{""PersonId"": 
""<redacted>""}'::agtype)"
   "                    Heap Blocks: exact=1"
   "                    ->  Bitmap Index Scan on gin_idx_person_id  
(cost=0.00..128.44 rows=8592 width=0) (actual time=0.129..0.129 rows=1 loops=1)"
   "                          Index Cond: (properties @> '{""PersonId"": 
""<redacted>""}'::agtype)"
   "              ->  Materialize  (cost=130.58..30759.85 rows=8592 width=376) 
(actual time=0.079..0.092 rows=2 loops=1)"
   "                    ->  Bitmap Heap Scan on ""Person"" b  
(cost=130.58..30716.89 rows=8592 width=376) (actual time=0.076..0.086 rows=2 
loops=1)"
   "                          Recheck Cond: (properties @> '{""PersonId"": 
""<redacted>""}'::agtype)"
   "                          Heap Blocks: exact=2"
   "                          ->  Bitmap Index Scan on gin_idx_person_id  
(cost=0.00..128.44 rows=8592 width=0) (actual time=0.073..0.073 rows=2 loops=1)"
   "                                Index Cond: (properties @> '{""PersonId"": 
""<redacted>""}'::agtype)"
   "Planning Time: 0.373 ms"
   "Execution Time: 3478.965 ms"
   
   ```
   I understood we have two bitmap heap scans on the Person table because we 
are doing 2 MATCH clause to match the start_id and end_id for the edge 
creation, and it seems like when I'm using GIN index on just `properties`, the 
query plan shows that it is using the index, but how efficiently it is used I 
am not sure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Performance Issue in Edge Creation for Large Tables and Index Creation [age]

Reply via email to