vladiksun opened a new issue, #2020:
URL: https://github.com/apache/age/issues/2020

   **Describe the bug**
   
   Potential memory leak
   
   **How are you accessing AGE (Command line, driver, etc.)?**
   - JDBC
   
   **What data setup do we need to do?**
   ```pgsql
   
   select ag_catalog.create_graph('test_graph');
   
   --- create test labels ---
   select ag_catalog.create_vlabel('test_graph','domain');
   select ag_catalog.create_vlabel('test_graph','user');
   
   --- create test relation ---
   select ag_catalog.create_elabel('test_graph','edges');
   
   --- generate test data, should take about 2 minutes ---
   DO
   $do$
       BEGIN
           FOR i IN 1..8000 LOOP
                   EXECUTE format('
                   select * from ag_catalog.cypher(''test_graph'',
                   $$
                       CREATE (Org_level_1_%s: domain {id:"Org_level_1_%s", 
type:"Organization", name: "Org_level_1_%s some real name that could 
looooooooooooooong" ,hidden:false })
                       CREATE (Org_level_2_%s: domain {id:"Org_level_2_%s", 
type:"Organization", name: "Org_level_2_%s some real name that could 
looooooooooooooong" ,hidden:false })
                       CREATE (Org_level_3_%s: domain {id:"Org_level_3_%s", 
type:"Organization", name: "Org_level_3_%s some real name that could 
looooooooooooooong" ,hidden:false })
                       CREATE (Org_level_4_%s: domain {id:"Org_level_4_%s", 
type:"Organization", name: "Org_level_4_%s some real name that could 
looooooooooooooong" ,hidden:false })
                       CREATE (Dep_level_5_%s: domain {id:"Dep_level_5_%s", 
type:"Department", name: "Dep_level_5_%s some real name that could 
looooooooooooooong" ,hidden:false })
   
                       CREATE (User_%s: user { id:"User_%s", type:"User", name: 
"user_%s" ,hidden:false })
   
                       CREATE (User_%s)-[_rel_user:edges { from: ''User_%s'', 
to: ''Dep_level_5_%s'', hidden: false, toCollection: ''domain'', 
fromCollection: ''user'', relation: ''user-domain'' }]->(Dep_level_5_%s)
   
                       CREATE (Dep_level_5_%s)-[_rel1:edges { from: 
''Dep_level_5_%s'', to: ''Org_level_4_%s'', hidden: false, toCollection: 
''domain'', fromCollection: ''domain'', relation: ''parent'' 
}]->(Org_level_4_%s)
                       CREATE (Org_level_4_%s)-[_rel2:edges { from: 
''Org_level_4_%s'', to: ''Org_level_3_%s'', hidden: false, toCollection: 
''domain'', fromCollection: ''domain'', relation: ''parent'' 
}]->(Org_level_3_%s)
                       CREATE (Org_level_3_%s)-[_rel3:edges { from: 
''Org_level_3_%s'', to: ''Org_level_2_%s'', hidden: false, toCollection: 
''domain'', fromCollection: ''domain'', relation: ''parent'' 
}]->(Org_level_2_%s)
                       CREATE (Org_level_2_%s)-[_rel4:edges { from: 
''Org_level_2_%s'', to: ''Org_level_1_%s'', hidden: false, toCollection: 
''domain'', fromCollection: ''domain'', relation: ''parent'' 
}]->(Org_level_1_%s)
                   $$
               ) as (any_vertex ag_catalog.agtype)
               ', VARIADIC ARRAY(SELECT array_fill('iter_' || i::text, 
'{100}')) );
               END LOOP;
       END
   $do$;
   
   
   ```
   
   **What is the command that caused the error?**
   ```pgsql
   
   -- run all three scripts below
   
   -- create some data
   select * from ag_catalog.cypher('test_graph', $$
       CREATE (d: domain { id: "test1", type: "Organization", name: "test1" , 
hidden: false } )
       RETURN d $$
   ) as (vertex agtype);
   
   -- delete some data
   select * from ag_catalog.cypher('test_graph',
                                   $$
       MATCH (any_vertex: domain { id: "test1" } )
       DELETE any_vertex
       RETURN any_vertex
   $$) as (catalog ag_catalog.agtype);
   
   -- run query
   select
       domain
   from
       ag_catalog.cypher('test_graph',$$
   
           MATCH (user_vertex: user { hidden: false, id: 'User_iter_100'})
               -[e1:edges  {hidden: false, fromCollection: 'user', 
toCollection: 'domain', relation: 'user-domain'}]->(to1: domain {hidden: false})
               -[e2:edges*0..10 {hidden: false, fromCollection: 'domain', 
toCollection: 'domain', relation: 'parent'}]->(to2: domain {hidden: false})
   
           RETURN to2
   $$) as domain(domain ag_catalog.agtype);
   ```
   ```
   Database went into recovery mode, could be seen as an IO error on the client 
side.
   ```
   
   **Expected behavior**
   
   No server crash happens
   
   **Environment (please complete the following information):**
   - Postgres 14 + Apache AGE 1.5.0
   
   **Additional context**
   
   It looks like any Cypher script's result is being cached after the first 
run. 
   The second run and all subsequent runs are faster unless any data has been 
created/updated/deleted.
   If create/update/delete happens the scripts executes slower again.
   This leads to increased DB memory consumption that never released.
   
   For our real data (about 20 vertex labels with different connectedness over 
3 million edges) similar script such as below executes
   around 5 seconds first time, around 300ms second time until 
create/update/delete happens.
   
   ```pgsql
   select
       domain
   from
       ag_catalog.cypher('test_graph',$$
   
           MATCH (user_vertex: user { hidden: false, id: 'User_iter_100'})
               -[e1:edges  {hidden: false, fromCollection: 'user', 
toCollection: 'domain', relation: 'user-domain'}]->(to1: domain {hidden: false})
               -[e2:edges*0..10 {hidden: false, fromCollection: 'domain', 
toCollection: 'domain', relation: 'parent'}]->(to2: domain {hidden: false})
   
           RETURN to2
   $$) as domain(domain ag_catalog.agtype);
   ```
   
   Please take a look at the attached screenshot.
   
![memory_leak](https://github.com/user-attachments/assets/8a55d2ed-988b-4bfd-83ee-5f9d18466cb3)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@age.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to