[
https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun closed SPARK-46747.
---------------------------------
> Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
> --------------------------------------------------------------------------
>
> Key: SPARK-46747
> URL: https://issues.apache.org/jira/browse/SPARK-46747
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0,
> 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2,
> 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0,
> 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3,
> 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
> Reporter: Bala Bellam
> Assignee: Kent Yao
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.5.1, 3.4.3, 4.0.0
>
>
> +*Background:*+
> PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table
> existence in the database by overriding the default
> JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.
> +*Issue:*+
> Due to LIMIT 1 query pattern, we are seeing high number of shared locks in
> the PostgreSQL installations where there are many partitions under a table
> that's being written to. Hence resorting to the default JdbcDialect which
> does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the
> partitions and effectively checks for table existence.
> The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain
> scenarios, especially with partitioned tables or tables with a lot of data,
> as it may take shared locks on all partitions or involve more planner and
> execution time to determine the quickest way to get a single row.
> On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read
> any data due to the always false WHERE condition. This makes it a lighter
> operation, as it typically only involves checking the table's metadata to
> validate the table's existence without taking locks on the table's data or
> partitions.
> So, considering performance and minimizing locks, SELECT 1 FROM table WHERE
> 1=0 would be a better choice if we're strictly looking to check for a table's
> existence and want to avoid potentially heavier operations like taking shared
> locks on partitions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]