[
https://issues.apache.org/jira/browse/SPARK-55830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-55830:
-----------------------------------
Labels: pull-request-available (was: )
> JDBC predicate pushdown drops driver properties causing connection failure
> --------------------------------------------------------------------------
>
> Key: SPARK-55830
> URL: https://issues.apache.org/jira/browse/SPARK-55830
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.2
> Reporter: Andreas Franz
> Priority: Major
> Labels: pull-request-available
>
> When reading data via JDBC using *predicate pushdown (predicates)*, Spark
> appears to lose custom *JDBC driver properties* during connection creation.
>
> This becomes problematic when using a *custom SocketFactory*, such as the
> *Google Cloud SQL JDBC SocketFactory*, which relies on additional driver
> properties for establishing the connection.
>
> As a result, Spark fails to connect to the database when predicates are used.
>
> Without predicates, the connection works correctly.
>
> *Examples to reproduce:*
> {code:java}
> def get_jdbc_parameters():
> return {
> "url": f"jdbc:postgresql:///<DB_NAME>",
> "socketFactory": "com.google.cloud.sql.postgres.SocketFactory",
> "user": "<SERVICE ACCOUNT AUTHORIZED TO ACCESS CLOUDSQL DATABASE>",
> "cloudSqlInstance": "<INSTANCE_CONNECTION_NAME>",
> "cloudSqlTargetPrincipal": "<SERVICE ACCOUNT AUTHORIZED TO ACCESS
> CLOUDSQL DATABASE>",
> "enableIamAuth": "true",
> "password": "disabled",
> "sslmode": "disable",
> }
> def get_table_name():
> return "<TABLE>"
> # CASE 1: No predicates -> returns up to 10 rows as expected
> jdbc_parameters = get_jdbc_parameters()
> spark.read.jdbc(
> url=jdbc_parameters.pop("url"),
> table=get_table_name(),
> properties=jdbc_parameters,
> ).limit(10).display()
> # CASE 2: Empty predicated -> fails with exception below
> jdbc_parameters = get_jdbc_parameters()
> predicates = []
> spark.read.jdbc(
> url=jdbc_parameters.pop("url"),
> table=get_table_name(),
> properties=jdbc_parameters,
> predicates=predicates,
> ).limit(10).display()
> '''
> Py4JJavaError: An error occurred while calling o488.jdbc.
> : org.postgresql.util.PSQLException: Connection to :5432 refused. Check that
> the hostname and port are correct and that the postmaster is accepting TCP/IP
> connections.
> at
> org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:342)
> at
> org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54)
> at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:263)
> at org.postgresql.Driver.makeConnection(Driver.java:443)
> at org.postgresql.Driver.connect(Driver.java:297)
> '''
> # CASE 3: Non-empty predicates -> fails with exception below
> jdbc_parameters = get_jdbc_parameters()
> predicates = ["id BETWEEN 1000 AND 1010 "]
> spark.read.jdbc(
> url=jdbc_parameters.pop("url"),
> table=get_table_name(),
> properties=jdbc_parameters,
> predicates=predicates,
> ).limit(10).display()
> '''
> Py4JJavaError: An error occurred while calling o488.jdbc.
> : org.postgresql.util.PSQLException: Connection to :5432 refused. Check that
> the hostname and port are correct and that the postmaster is accepting TCP/IP
> connections.
> at
> org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:342)
> at
> org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54)
> at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:263)
> at org.postgresql.Driver.makeConnection(Driver.java:443)
> at org.postgresql.Driver.connect(Driver.java:297)
> '''
> # CASE 4: Predicates with JDBC parameters encoded in URL as workaround ->
> only returns entries of the given predicate range
> jdbc_parameters = get_jdbc_parameters()
> predicates = ["id BETWEEN 1000 AND 1010 "]
> url = jdbc_parameters.pop("url")
> params = "&".join(f"{k}={v}" for k, v in jdbc_parameters.items())
> spark.read.jdbc(
> url=f"{url}?{params}",
> table=get_table_name(),
> predicates=predicates,
> ).limit(10).display()
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]