Bala Bellam created SPARK-46747:
-----------------------------------

             Summary: Too Many Shared Locks due to 
PostgresDialect.getTableExistsQuery - LIMIT 1
                 Key: SPARK-46747
                 URL: https://issues.apache.org/jira/browse/SPARK-46747
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.3.4, 3.5.0, 3.4.1, 3.4.0, 3.3.2, 3.4.2, 3.3.3, 3.2.4, 
3.2.3, 3.3.1, 3.2.2, 3.3.0, 3.2.1, 3.1.3, 3.2.0, 3.1.2, 3.1.1, 3.1.0, 3.0.3, 
3.0.2, 3.0.1, 3.0.0, 2.4.8, 2.4.7, 2.4.6, 2.4.5, 2.4.4, 2.4.3, 2.4.2, 2.4.1, 
2.4.0, 2.3.4, 2.3.3, 2.3.2, 2.3.1, 2.3.0, 2.2.3, 2.2.2, 2.2.1, 2.2.0, 2.1.3, 
2.1.2, 2.1.1, 2.1.0, 2.0.2, 2.0.1, 2.0.0
            Reporter: Bala Bellam


+*Background:*+
PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table 
existence in the database by overriding the default 
JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.

+*Issue:*+
Due to LIMIT 1 query pattern, we are seeing high number of shared locks in the 
PostgreSQL installations where there are many partitions under a table that's 
being written to. Hence resorting to the default JdbcDialect which does WHERE 1 
= 0 is proven to be more optimal as it doesn't scan any of the partitions and 
effectively checks for table existence.

The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain 
scenarios, especially with partitioned tables or tables with a lot of data, as 
it may take shared locks on all partitions or involve more planner and 
execution time to determine the quickest way to get a single row.

On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read 
any data due to the always false WHERE condition. This makes it a lighter 
operation, as it typically only involves checking the table's metadata to 
validate the table's existence without taking locks on the table's data or 
partitions.

So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 1=0 
would be a better choice if we're strictly looking to check for a table's 
existence and want to avoid potentially heavier operations like taking shared 
locks on partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to