Re: [I] Performance Issue in Edge Creation for Large Tables and Index Creation [age]

via GitHub Wed, 06 Aug 2025 07:23:58 -0700


ysvoon commented on issue #2198:
URL: https://github.com/apache/age/issues/2198#issuecomment-3160397531


   > This is due to the 2 MATCH commands used in creating the edges, as you are 
building your edges from the Person table; this is a O(n^2) operation.
   > 
   > You might want to try the MERGE command.
   
   I see, I've tried the MERGE command today and there are a few issues I have 
observed:
   1. The performance is alright, but again after >200k entries it starts to 
slow down. In the beginning it was taking just a few seconds to create 100 
edges, but after the >200k mark, it takes >1 minute to create 100 edges. Was 
thinking could it be that after a certain point, indexing won't help and for 
MERGE, where it still does a `Seq Scan` on the edges, maybe this is why the 
performance worsens with time?
   2. If I use the MERGE command like this, it creates duplicated vertices, so 
I don't think this is the correct way.
       ```
      SELECT * FROM cypher('schema_name', $$
      MERGE (a: Person {PersonId: '<redacted>'})-[e:edge_name]->(b: Person 
{PersonId: '<redacted>'})
        SET e.StartId = '<redacted>', e.EndId = '<redacted>'
        RETURN e $$)
        as (e agtype);
       ```
   
   3. So I tried the MERGE command as recommended here 
https://github.com/apache/age/issues/1517, but it doesn't create the properties 
(returned empty), and I had to run this same query for the second time for the 
properties to show up, same as this issue here 
https://github.com/apache/age/issues/1907, although it says here that it has 
been resolved, I am seeing this issue still.
      ```
      SELECT * FROM cypher('schema_name', $$
      MERGE (a: Person {PersonId: '<redacted>'})
      MERGE (b: Person {PersonId: '<redacted>'})
      MERGE (a: Person {PersonId: '<redacted>'})-[e:edge_name]->(b: Person 
{PersonId: '<redacted>'})
        SET e.StartId = '<redacted>', e.EndId = '<redacted>'
        RETURN e $$)
        as (e agtype);
      ```
     Query Plan:
     ```
   "Custom Scan (Cypher Set)  (cost=0.00..0.00 rows=0 width=32) (actual 
time=822.972..822.984 rows=1 loops=1)"
   "  ->  Subquery Scan on _age_default_alias_previous_cypher_clause  
(cost=0.00..0.00 rows=1 width=32) (actual time=822.903..822.915 rows=1 loops=1)"
   "        ->  Custom Scan (Cypher Merge)  (cost=0.00..0.00 rows=0 width=448) 
(actual time=822.901..822.911 rows=1 loops=1)"
   "              ->  Subquery Scan on 
_age_default_alias_previous_cypher_clause_1  (cost=0.00..1475455.11 
rows=73770921 width=32) (actual time=0.180..0.190 rows=1 loops=1)"
   "                    ->  Hash Right Join  (cost=0.00..737745.90 
rows=73770921 width=192) (actual time=0.180..0.188 rows=1 loops=1)"
   "                          Hash Cond: ((e.start_id = 
(age_id(_age_default_alias_previous_cypher_clause_2.a))::graphid) AND (e.end_id 
= (age_id(_age_default_alias_previous_cypher_clause_2.b))::graphid))"
   "                          Join Filter: 
((age_properties(_age_default_alias_previous_cypher_clause_2.a) @> 
'{""PersonId"": ""<redacted>""}'::agtype) AND 
(age_properties(_age_default_alias_previous_cypher_clause_2.b) @> 
'{""PersonId"": ""<redacted""}'::agtype))"
   "                          ->  Seq Scan on ""edge_name"" e  
(cost=0.00..24.55 rows=970 width=48) (actual time=0.002..0.002 rows=0 loops=1)"
   "                          ->  Hash  (cost=0.00..0.00 rows=0 width=64) 
(actual time=0.168..0.172 rows=1 loops=1)"
   "                                Buckets: 1024  Batches: 1  Memory Usage: 
9kB"
   "                                ->  Custom Scan (Cypher Merge)  
(cost=0.00..0.00 rows=0 width=64) (actual time=0.158..0.162 rows=1 loops=1)"
   "                                      ->  Subquery Scan on 
_age_default_alias_previous_cypher_clause_2  (cost=130.57..1321589.78 
rows=73770921 width=64) (actual time=0.158..0.162 rows=1 loops=1)"
   "                                            ->  Nested Loop Left Join  
(cost=130.57..583880.57 rows=73770921 width=128) (actual time=0.157..0.161 
rows=1 loops=1)"
   "                                                  ->  Custom Scan (Cypher 
Merge)  (cost=0.00..0.00 rows=0 width=32) (actual time=0.071..0.072 rows=1 
loops=1)"
   "                                                        ->  Subquery Scan 
on _age_default_alias_previous_cypher_clause_3  (cost=130.57..30641.61 
rows=8589 width=32) (actual time=0.070..0.072 rows=1 loops=1)"
   "                                                              ->  Bitmap 
Heap Scan on ""Person"" a  (cost=130.57..30555.72 rows=8589 width=64) (actual 
time=0.070..0.071 rows=1 loops=1)"
   "                                                                    Recheck 
Cond: (properties @> '{""PersonId"": ""<redacted>""}'::agtype)"
   "                                                                    Heap 
Blocks: exact=1"
   "                                                                    ->  
Bitmap Index Scan on index_name  (cost=0.00..128.42 rows=8589 width=0) (actual 
time=0.061..0.061 rows=1 loops=1)"
   "                                                                          
Index Cond: (properties @> '{""PersonId"": ""<redacted>""}'::agtype)"
   "                                                  ->  Bitmap Heap Scan on 
""Person"" b  (cost=130.57..30512.77 rows=8589 width=32) (actual 
time=0.083..0.084 rows=1 loops=1)"
   "                                                        Recheck Cond: 
(properties @> '{""PersonId"": ""<redacted>""}'::agtype)"
   "                                                        Heap Blocks: 
exact=1"
   "                                                        ->  Bitmap Index 
Scan on index_name (cost=0.00..128.42 rows=8589 width=0) (actual 
time=0.079..0.080 rows=1 loops=1)"
   "                                                              Index Cond: 
(properties @> '{""PersonId"": ""<redacted>""}'::agtype)"
   "Planning Time: 0.390 ms"
   "Execution Time: 823.126 ms"
     ```
   
   
   > Without understanding your use case or dataset, it is hard to suggest 
alternatives.
   
   We have 3 vertex graph tables, and they are each in the volume of around 8M 
nodes as of now, and we need to create edges to represent the relationships 
between them. Currently, there are 3 types of edges that needs to be created, 
and each edge graph table will create around 8M edges. There are 3 such edge 
graph tables, so in total there are approximately >24M edges to be created.
   
   Currently, it still seems too slow with the MERGE clause and I am unsure if 
this is normal, but would there be any other best practices that we can try to 
apply to speed up this process?
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Performance Issue in Edge Creation for Large Tables and Index Creation [age]

Reply via email to