vladiksun opened a new issue, #2020:
URL: https://github.com/apache/age/issues/2020
**Describe the bug**
Potential memory leak
**How are you accessing AGE (Command line, driver, etc.)?**
- JDBC
**What data setup do we need to do?**
```pgsql
select ag_catalog.create_graph('test_graph');
--- create test labels ---
select ag_catalog.create_vlabel('test_graph','domain');
select ag_catalog.create_vlabel('test_graph','user');
--- create test relation ---
select ag_catalog.create_elabel('test_graph','edges');
--- generate test data, should take about 2 minutes ---
DO
$do$
BEGIN
FOR i IN 1..8000 LOOP
EXECUTE format('
select * from ag_catalog.cypher(''test_graph'',
$$
CREATE (Org_level_1_%s: domain {id:"Org_level_1_%s",
type:"Organization", name: "Org_level_1_%s some real name that could
looooooooooooooong" ,hidden:false })
CREATE (Org_level_2_%s: domain {id:"Org_level_2_%s",
type:"Organization", name: "Org_level_2_%s some real name that could
looooooooooooooong" ,hidden:false })
CREATE (Org_level_3_%s: domain {id:"Org_level_3_%s",
type:"Organization", name: "Org_level_3_%s some real name that could
looooooooooooooong" ,hidden:false })
CREATE (Org_level_4_%s: domain {id:"Org_level_4_%s",
type:"Organization", name: "Org_level_4_%s some real name that could
looooooooooooooong" ,hidden:false })
CREATE (Dep_level_5_%s: domain {id:"Dep_level_5_%s",
type:"Department", name: "Dep_level_5_%s some real name that could
looooooooooooooong" ,hidden:false })
CREATE (User_%s: user { id:"User_%s", type:"User", name:
"user_%s" ,hidden:false })
CREATE (User_%s)-[_rel_user:edges { from: ''User_%s'',
to: ''Dep_level_5_%s'', hidden: false, toCollection: ''domain'',
fromCollection: ''user'', relation: ''user-domain'' }]->(Dep_level_5_%s)
CREATE (Dep_level_5_%s)-[_rel1:edges { from:
''Dep_level_5_%s'', to: ''Org_level_4_%s'', hidden: false, toCollection:
''domain'', fromCollection: ''domain'', relation: ''parent''
}]->(Org_level_4_%s)
CREATE (Org_level_4_%s)-[_rel2:edges { from:
''Org_level_4_%s'', to: ''Org_level_3_%s'', hidden: false, toCollection:
''domain'', fromCollection: ''domain'', relation: ''parent''
}]->(Org_level_3_%s)
CREATE (Org_level_3_%s)-[_rel3:edges { from:
''Org_level_3_%s'', to: ''Org_level_2_%s'', hidden: false, toCollection:
''domain'', fromCollection: ''domain'', relation: ''parent''
}]->(Org_level_2_%s)
CREATE (Org_level_2_%s)-[_rel4:edges { from:
''Org_level_2_%s'', to: ''Org_level_1_%s'', hidden: false, toCollection:
''domain'', fromCollection: ''domain'', relation: ''parent''
}]->(Org_level_1_%s)
$$
) as (any_vertex ag_catalog.agtype)
', VARIADIC ARRAY(SELECT array_fill('iter_' || i::text,
'{100}')) );
END LOOP;
END
$do$;
```
**What is the command that caused the error?**
```pgsql
-- run all three scripts below
-- create some data
select * from ag_catalog.cypher('test_graph', $$
CREATE (d: domain { id: "test1", type: "Organization", name: "test1" ,
hidden: false } )
RETURN d $$
) as (vertex agtype);
-- delete some data
select * from ag_catalog.cypher('test_graph',
$$
MATCH (any_vertex: domain { id: "test1" } )
DELETE any_vertex
RETURN any_vertex
$$) as (catalog ag_catalog.agtype);
-- run query
select
domain
from
ag_catalog.cypher('test_graph',$$
MATCH (user_vertex: user { hidden: false, id: 'User_iter_100'})
-[e1:edges {hidden: false, fromCollection: 'user',
toCollection: 'domain', relation: 'user-domain'}]->(to1: domain {hidden: false})
-[e2:edges*0..10 {hidden: false, fromCollection: 'domain',
toCollection: 'domain', relation: 'parent'}]->(to2: domain {hidden: false})
RETURN to2
$$) as domain(domain ag_catalog.agtype);
```
```
Database went into recovery mode, could be seen as an IO error on the client
side.
```
**Expected behavior**
No server crash happens
**Environment (please complete the following information):**
- Postgres 14 + Apache AGE 1.5.0
**Additional context**
It looks like any Cypher script's result is being cached after the first
run.
The second run and all subsequent runs are faster unless any data has been
created/updated/deleted.
If create/update/delete happens the scripts executes slower again.
This leads to increased DB memory consumption that never released.
For our real data (about 20 vertex labels with different connectedness over
3 million edges) similar script such as below executes
around 5 seconds first time, around 300ms second time until
create/update/delete happens.
```pgsql
select
domain
from
ag_catalog.cypher('test_graph',$$
MATCH (user_vertex: user { hidden: false, id: 'User_iter_100'})
-[e1:edges {hidden: false, fromCollection: 'user',
toCollection: 'domain', relation: 'user-domain'}]->(to1: domain {hidden: false})
-[e2:edges*0..10 {hidden: false, fromCollection: 'domain',
toCollection: 'domain', relation: 'parent'}]->(to2: domain {hidden: false})
RETURN to2
$$) as domain(domain ag_catalog.agtype);
```
Please take a look at the attached screenshot.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]