JoshInnis opened a new issue #89:
URL: https://github.com/apache/incubator-age/issues/89


   ### Background
   
   The ExecMaterial node in an Execution Tree may not rescan it's children 
nodes. The material node acts as an intermediary between it's parent and it's 
children. During the initial run, MaterialNode will execute its subplans, cache 
the results in a data structure called a Tuplestorestate (postgres has only 
exposed the name of, the structure is hidden in Tuplestorestate.c), and pass 
the child tuple to its parent. 
   
   When the node is notified it will be rescanned or it will change scan 
direction. The material node will try to get the needed tuple in its local 
Tuplestorestate before scanning the child Append node. Assuming the above query 
will return at least one result, the tuplestorestate will be used.
   
   The following query will use a MaterialNode in it's execution:
   
   `SELECT * FROM cypher('G', $$ match (a)-[e]->(b) delete e return e$$) AS (u 
agtype);`
   
   The Execution Tree:
   ```
   SELECT * FROM cypher('G', $$EXPLAIN match ()-[e]->(b) delete e return e$$) 
AS (u agtype);
                                              QUERY PLAN                        
                   
   
------------------------------------------------------------------------------------------------
    Custom Scan (Cypher Delete)  (cost=0.00..0.00 rows=0 width=32)
      ->  Subquery Scan on _  (cost=0.00..3.06 rows=3 width=32)
            ->  Nested Loop  (cost=0.00..3.03 rows=3 width=96)
                  Join Filter: (graphid_to_agtype(e.end_id) = 
graphid_to_agtype(b.id))
                  ->  Append  (cost=0.00..1.03 rows=3 width=38)
                        ->  Seq Scan on _ag_label_edge e  (cost=0.00..0.00 
rows=1 width=56)
                        ->  Seq Scan on edge e_1  (cost=0.00..1.02 rows=2 
width=29)
                  ->  Materialize  (cost=0.00..1.05 rows=3 width=8)
                        ->  Append  (cost=0.00..1.03 rows=3 width=8)
                              ->  Seq Scan on _ag_label_vertex b  
(cost=0.00..0.00 rows=1 width=8)
                              ->  Seq Scan on vertex b_1  (cost=0.00..1.02 
rows=2 width=8)
   (11 rows)
   ```
   
   Cypher's Delete clause (The Custom Scan node at the top of the execution 
tree) attempts to dive into its children to get the physical tuple that it has 
been asked to delete. Which is stored in the SeqScan, for the above query 
(DELETE, CREATE, SET, or REMOVE may store it also). It needs this information 
because we need the delete clause to mark the tuple as deleted*. 
   
   *Deleting a tuple in Postgres whether using Cypher's or Postgres' delete 
functionality does't actually delete data, they just mark it as no longer a 
valid tuple for future queries.
   
   ### Problem
   The problem is that when the Materialize Node passes a cached tuple: the 
SeqScan is not going to be holding the physical tuple anymore, NULL will be 
returned to Cypher Delete node. The delete clause then cannot find the tuple 
and doesn't delete the tuple.
   
   ### Solution
   Instead of the DELETE (or SET/REMOVE) using its children to get the physical 
location of the tuple on disc, the clause get it themselves. We start a new 
scan on the table until we find the physical tuple. Use the id, which is unique 
per edge/vertex in any given graph.
   
   ### Considerations
   This will slow down performance, because we have to scan the disc for the 
tuple's location ourselves. However, using the execution tree when possible and 
scanning the tables when not will be time consuming and potentially 
inconsequential relative to other performance solutions that could render this 
dual retrieval solution useless.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to