MuhammadTahaNaveed commented on issue #2194: URL: https://github.com/apache/age/issues/2194#issuecomment-3098181976
@serdarmicrosoft In your SQL query, the tables are specified, but in Cypher query, the target label is not specified which causes age to scan all the vertex labels. **SQL query plan:** ``` db=# EXPLAIN select count(*), u.id from issue_2194."User" u, issue_2194."HAS_INTERACTION" h, issue_2194."Book" b where u.id = h.start_id and b.id = h.end_id GROUP BY u.id ORDER BY 1 DESC LIMIT 10; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- .......................... -> Seq Scan on "User" u (cost=0.00..22.00 rows=1200 width=8) ........... -> Seq Scan on "HAS_INTERACTION" h (cost=0.00..19.70 rows=970 width=16) ......... -> Seq Scan on "Book" b (cost=0.00..22.00 rows=1200 width=8) ``` **Cypher query plan:** ``` db=# SELECT * FROM cypher('issue_2194', $$EXPLAIN MATCH (u:User)-[:HAS_INTERACTION]->() RETURN u.user_id, count(*) ORDER BY count(*) DESC LIMIT 10 $$) as (user_id agtype, count agtype); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ............................................ -> Parallel Append (cost=0.00..78.24 rows=2001 width=8) -> Parallel Seq Scan on "Book" _age_default_alias_1_2 (cost=0.00..17.06 rows=706 width=8) -> Parallel Seq Scan on "Author" _age_default_alias_1_3 (cost=0.00..17.06 rows=706 width=8) -> Parallel Seq Scan on "Review" _age_default_alias_1_4 (cost=0.00..17.06 rows=706 width=8) -> Parallel Seq Scan on "User" _age_default_alias_1_5 (cost=0.00..17.06 rows=706 width=8) -> Parallel Seq Scan on _ag_label_vertex _age_default_alias_1_1 (cost=0.00..0.00 rows=1 width=8) ..... -> Seq Scan on "HAS_INTERACTION" _age_default_alias_0 (cost=0.00..19.70 rows=970 .......... -> Seq Scan on "User" u (cost=0.00..22.00 rows=1200 width=40) ``` Potential optimization is to **explicitly include the label on the target node**: ``` SELECT * FROM cypher('graph', $$ EXPLAIN MATCH (u:User)-[:HAS_INTERACTION]->(:Book) RETURN u.user_id, count(*) ORDER BY count(*) DESC LIMIT 10 $$) AS (user_id agtype, count agtype); ``` That said, I believe AGE should ideally be smart enough to infer and apply such label constraints automatically to reduce unnecessary scans when label filters are present elsewhere in the pattern. Let us know if that improves some performance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@age.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org