[GitHub] [age] WendelLana commented on issue #995: Research functions that extracts label ID

via GitHub Mon, 10 Jul 2023 21:56:51 -0700


WendelLana commented on issue #995:
URL: https://github.com/apache/age/issues/995#issuecomment-1630116157


   I've analyzed the query plan trees for the queries that I provied before, 
I'll write them here.
   
   In the `entity_exists` function query that returns false, we have the 
following query plan:
   ````
                                           QUERY PLAN
   
-------------------------------------------------------------------------------------------
    Custom Scan (Cypher Create)  (cost=0.00..0.00 rows=0 width=32)
      ->  Subquery Scan on _  (cost=0.00..0.00 rows=1 width=32)
            ->  Result  (cost=0.00..0.00 rows=0 width=256)
                  ->  Custom Scan (Cypher Delete)  (cost=0.00..0.00 rows=0 
width=32)
                        ->  Subquery Scan on __1  (cost=0.00..343.00 rows=1200 
width=32)
                              ->  Seq Scan on "Dev" x  (cost=0.00..331.00 
rows=1200 width=64)
   (6 rows)
   ````
   
   In the `entity_exists` function query that returns true, we have the 
following query plan:
   ````
                                      QUERY PLAN
   
---------------------------------------------------------------------------------
    Custom Scan (Cypher Create)  (cost=0.00..0.00 rows=0 width=32)
      ->  Subquery Scan on _  (cost=0.00..1549.00 rows=1200 width=32)
            ->  Seq Scan on "Developer" x  (cost=0.00..1537.00 rows=1200 
width=224)
   (3 rows)
   ````
   
   In the `get_label_name` function query, we have the following query plan:
   ```` 
                                                       QUERY PLAN
   
-------------------------------------------------------------------------------------------------------------------
    Gather  (cost=1000.00..37915106.89 rows=139750 width=64)
      Workers Planned: 2
      ->  Nested Loop  (cost=0.00..37900131.89 rows=58229 width=64)
            Join Filter: (((r.start_id = d.id) AND (r.end_id = p.id)) OR 
((r.end_id = d.id) AND (r.start_id = p.id)))
            ->  Parallel Append  (cost=0.00..35.46 rows=809 width=56)
                  ->  Parallel Seq Scan on "PARTICIPATES" r_2  
(cost=0.00..15.71 rows=571 width=56)
                  ->  Parallel Seq Scan on "FRIENDS_WITH" r_3  
(cost=0.00..15.71 rows=571 width=56)
                  ->  Parallel Seq Scan on _ag_label_edge r_1  (cost=0.00..0.00 
rows=1 width=56)
            ->  Nested Loop  (cost=0.00..18047.00 rows=1440000 width=16)
                  ->  Seq Scan on "Developer" d  (cost=0.00..22.00 rows=1200 
width=8)
                  ->  Materialize  (cost=0.00..28.00 rows=1200 width=8)
                        ->  Seq Scan on "Project" p  (cost=0.00..22.00 
rows=1200 width=8)
   (12 rows)
   ````
   
   As the QPT shows, they don't utilize the `_extract_label_id` function, and 
there's only a simple join filter to scan the edge table. I believe your 
solution would work just fine with all these functions.
   
   Additionally, I believe there is no need to duplicate the each label table, 
`Person` for example, because the column `properties` is already present in the 
`_ag_label_vertex` table. However, I'm still analyzing how much code we would 
need to change to ensure everything continues to work.
   
   My ideia is that the query plan would first scan the `Person` table with 
only the IDs, and then in some cases, scan `ag_label_vertex`, and in other 
cases, simply filter by the properties in `ag_label_vertex`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [age] WendelLana commented on issue #995: Research functions that extracts label ID

Reply via email to