Bala Bellam created SPARK-46747: ----------------------------------- Summary: Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1 Key: SPARK-46747 URL: https://issues.apache.org/jira/browse/SPARK-46747 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.4, 3.5.0, 3.4.1, 3.4.0, 3.3.2, 3.4.2, 3.3.3, 3.2.4, 3.2.3, 3.3.1, 3.2.2, 3.3.0, 3.2.1, 3.1.3, 3.2.0, 3.1.2, 3.1.1, 3.1.0, 3.0.3, 3.0.2, 3.0.1, 3.0.0, 2.4.8, 2.4.7, 2.4.6, 2.4.5, 2.4.4, 2.4.3, 2.4.2, 2.4.1, 2.4.0, 2.3.4, 2.3.3, 2.3.2, 2.3.1, 2.3.0, 2.2.3, 2.2.2, 2.2.1, 2.2.0, 2.1.3, 2.1.2, 2.1.1, 2.1.0, 2.0.2, 2.0.1, 2.0.0 Reporter: Bala Bellam
+*Background:*+ PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table existence in the database by overriding the default JdbcDialect.getTableExistsQuery which has WHERE 1 = 0. +*Issue:*+ Due to LIMIT 1 query pattern, we are seeing high number of shared locks in the PostgreSQL installations where there are many partitions under a table that's being written to. Hence resorting to the default JdbcDialect which does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the partitions and effectively checks for table existence. The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain scenarios, especially with partitioned tables or tables with a lot of data, as it may take shared locks on all partitions or involve more planner and execution time to determine the quickest way to get a single row. On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read any data due to the always false WHERE condition. This makes it a lighter operation, as it typically only involves checking the table's metadata to validate the table's existence without taking locks on the table's data or partitions. So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 1=0 would be a better choice if we're strictly looking to check for a table's existence and want to avoid potentially heavier operations like taking shared locks on partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org