ysvoon commented on issue #2198:
URL: https://github.com/apache/age/issues/2198#issuecomment-3160397531
> This is due to the 2 MATCH commands used in creating the edges, as you are
building your edges from the Person table; this is a O(n^2) operation.
>
> You might want to try the MERGE command.
I see, I've tried the MERGE command today and there are a few issues I have
observed:
1. The performance is alright, but again after >200k entries it starts to
slow down. In the beginning it was taking just a few seconds to create 100
edges, but after the >200k mark, it takes >1 minute to create 100 edges. Was
thinking could it be that after a certain point, indexing won't help and for
MERGE, where it still does a `Seq Scan` on the edges, maybe this is why the
performance worsens with time?
2. If I use the MERGE command like this, it creates duplicated vertices, so
I don't think this is the correct way.
```
SELECT * FROM cypher('schema_name', $$
MERGE (a: Person {PersonId: '<redacted>'})-[e:edge_name]->(b: Person
{PersonId: '<redacted>'})
SET e.StartId = '<redacted>', e.EndId = '<redacted>'
RETURN e $$)
as (e agtype);
```
3. So I tried the MERGE command as recommended here
https://github.com/apache/age/issues/1517, but it doesn't create the properties
(returned empty), and I had to run this same query for the second time for the
properties to show up, same as this issue here
https://github.com/apache/age/issues/1907, although it says here that it has
been resolved, I am seeing this issue still.
```
SELECT * FROM cypher('schema_name', $$
MERGE (a: Person {PersonId: '<redacted>'})
MERGE (b: Person {PersonId: '<redacted>'})
MERGE (a: Person {PersonId: '<redacted>'})-[e:edge_name]->(b: Person
{PersonId: '<redacted>'})
SET e.StartId = '<redacted>', e.EndId = '<redacted>'
RETURN e $$)
as (e agtype);
```
Query Plan:
```
"Custom Scan (Cypher Set) (cost=0.00..0.00 rows=0 width=32) (actual
time=822.972..822.984 rows=1 loops=1)"
" -> Subquery Scan on _age_default_alias_previous_cypher_clause
(cost=0.00..0.00 rows=1 width=32) (actual time=822.903..822.915 rows=1 loops=1)"
" -> Custom Scan (Cypher Merge) (cost=0.00..0.00 rows=0 width=448)
(actual time=822.901..822.911 rows=1 loops=1)"
" -> Subquery Scan on
_age_default_alias_previous_cypher_clause_1 (cost=0.00..1475455.11
rows=73770921 width=32) (actual time=0.180..0.190 rows=1 loops=1)"
" -> Hash Right Join (cost=0.00..737745.90
rows=73770921 width=192) (actual time=0.180..0.188 rows=1 loops=1)"
" Hash Cond: ((e.start_id =
(age_id(_age_default_alias_previous_cypher_clause_2.a))::graphid) AND (e.end_id
= (age_id(_age_default_alias_previous_cypher_clause_2.b))::graphid))"
" Join Filter:
((age_properties(_age_default_alias_previous_cypher_clause_2.a) @>
'{""PersonId"": ""<redacted>""}'::agtype) AND
(age_properties(_age_default_alias_previous_cypher_clause_2.b) @>
'{""PersonId"": ""<redacted""}'::agtype))"
" -> Seq Scan on ""edge_name"" e
(cost=0.00..24.55 rows=970 width=48) (actual time=0.002..0.002 rows=0 loops=1)"
" -> Hash (cost=0.00..0.00 rows=0 width=64)
(actual time=0.168..0.172 rows=1 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage:
9kB"
" -> Custom Scan (Cypher Merge)
(cost=0.00..0.00 rows=0 width=64) (actual time=0.158..0.162 rows=1 loops=1)"
" -> Subquery Scan on
_age_default_alias_previous_cypher_clause_2 (cost=130.57..1321589.78
rows=73770921 width=64) (actual time=0.158..0.162 rows=1 loops=1)"
" -> Nested Loop Left Join
(cost=130.57..583880.57 rows=73770921 width=128) (actual time=0.157..0.161
rows=1 loops=1)"
" -> Custom Scan (Cypher
Merge) (cost=0.00..0.00 rows=0 width=32) (actual time=0.071..0.072 rows=1
loops=1)"
" -> Subquery Scan
on _age_default_alias_previous_cypher_clause_3 (cost=130.57..30641.61
rows=8589 width=32) (actual time=0.070..0.072 rows=1 loops=1)"
" -> Bitmap
Heap Scan on ""Person"" a (cost=130.57..30555.72 rows=8589 width=64) (actual
time=0.070..0.071 rows=1 loops=1)"
" Recheck
Cond: (properties @> '{""PersonId"": ""<redacted>""}'::agtype)"
" Heap
Blocks: exact=1"
" ->
Bitmap Index Scan on index_name (cost=0.00..128.42 rows=8589 width=0) (actual
time=0.061..0.061 rows=1 loops=1)"
"
Index Cond: (properties @> '{""PersonId"": ""<redacted>""}'::agtype)"
" -> Bitmap Heap Scan on
""Person"" b (cost=130.57..30512.77 rows=8589 width=32) (actual
time=0.083..0.084 rows=1 loops=1)"
" Recheck Cond:
(properties @> '{""PersonId"": ""<redacted>""}'::agtype)"
" Heap Blocks:
exact=1"
" -> Bitmap Index
Scan on index_name (cost=0.00..128.42 rows=8589 width=0) (actual
time=0.079..0.080 rows=1 loops=1)"
" Index Cond:
(properties @> '{""PersonId"": ""<redacted>""}'::agtype)"
"Planning Time: 0.390 ms"
"Execution Time: 823.126 ms"
```
> Without understanding your use case or dataset, it is hard to suggest
alternatives.
We have 3 vertex graph tables, and they are each in the volume of around 8M
nodes as of now, and we need to create edges to represent the relationships
between them. Currently, there are 3 types of edges that needs to be created,
and each edge graph table will create around 8M edges. There are 3 such edge
graph tables, so in total there are approximately >24M edges to be created.
Currently, it still seems too slow with the MERGE clause and I am unsure if
this is normal, but would there be any other best practices that we can try to
apply to speed up this process?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]