MironAtHome commented on issue #1987: URL: https://github.com/apache/age/issues/1987#issuecomment-2256760557
Could you please use Explain Analyze and upload plans here: https://explain.dalibo.com/ or just paste here so that we could inspect in detail, where Postgres spends time, when performing queries. Also, please see if this hint ``` BEGIN; SET LOCAL parallel_leader_participation = off; explain analyze SELECT * from cypher( ... ``` learned earlier on some parallel query ( I would imagine postgres tries to invoke this query using parallel threads ) makes positive impact on query duration. Ideally it would also be useful to do something to reduce I/O, like passing an id column to count function, something like `count(v.id)` assuming id is indexed field in the vertex, however, it's not a given, that scanning index would outrun scanning table, normally database query optimizer should be able to make good decision regarding organization of physical activities, however, in case above it seems that postgres for some reason performs steps less than optimal way, so, it would be really nice to experiment a bit. In the end counting even 1 million rows normally should take minutes. In addition, please do make sure to check, in order of appearance, closer to the top, higher priority 1 parallel activity on the same server 2 check database for anything like physical file corruption 3 run vacuum 4 please check logs for any exception or indicators of unusual activities --- specific note --- It is unlikely that the cause of this specific issue is in some deeply ornate workings of optimizer, but since we are here, it makes sense to see if performing count in a loop against individual vertices can help. Here is a script that counts rows for just vertices: ``` load 'age'; set search_path = ag_catalog, "$user", public; do $RUN$ declare count_tally_var_temp bigint; count_tally_var bigint; sql_statement_var varchar(1000000); vertice_name_var varchar(256); bool_var boolean; begin load 'age'; set search_path = ag_catalog, "$user", public; for vertice_name_var in SELECT trim(cast(ag_catalog.agtype_out(vertice_name) as varchar(256)), '"') FROM cypher('test_graph', $$ MATCH(v) WITH label(v) as vertice_name RETURN DISTINCT vertice_name $$) AS t(vertice_name agtype) loop sql_statement_var := CONCAT('MATCH (v:', vertice_name_var, ') return ag_catalog.agtype_to_int8(count(v))'); PERFORM * FROM ag_catalog.age_prepare_cypher('test_graph'::cstring, sql_statement_var::cstring); SELECT INTO count_tally_var_temp count_rows FROM cypher(NULL,NULL) AS (count_rows BIGINT); RAISE NOTICE 'vertice name: %, row count: %, time: %', vertice_name_var, count_tally_var_temp, current_time; count_tally_var := coalesce(count_tally_var, 0) + coalesce(count_tally_var_temp, 0); end loop; RAISE NOTICE 'count_tally_var = %', count_tally_var; PERFORM * FROM cypher(NULL,NULL) AS (count_rows BIGINT); end; $RUN$; ``` At least this should help to note which vertice performs slower to narrow down performance tuning effort. Going back to troubleshooting steps: Top 1 is something to be expected and checked first. Since resources on RDBMS are always shared and if a lot of things happening at the same time, it is expected that overall operations run slower. The further to 4, the more it's a system level check for troubleshooting, catching something out of order. Although it's not something I would suggest, and I haven't tried it myself, yet, if any of these queries carry significant importance, please see what is the frequency and expected freshness of data. If similar query is critical to your production system please consider materializing data to table every so often in procedure. Querying table itself should be very quick, and in fact in this case it could be useful to have the looping construct, to tally per vertex / edge and than some totals in the query. Please share plans if above hints do not net good improvement for the next step in tuning. Hope it helps. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@age.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org