[jira] [Commented] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()

2022-10-31 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626742#comment-17626742
 ] 

Apache Spark commented on SPARK-40802:
--

User 'Mingli-Rui' has created a pull request for this issue:
https://github.com/apache/spark/pull/38452

> Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve 
> schema instead of PreparedStatement.executeQuery()
> ---
>
> Key: SPARK-40802
> URL: https://issues.apache.org/jira/browse/SPARK-40802
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Mingli Rui
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to 
> resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM 
> $table_or_query WHERE 1=0".*
> But it is not necessary to execute the query. It's enough to *prepare* the 
> query. With preparing the statement, the query is parsed and compiled, but is 
> not executed. It will be more efficient.
> So, it's better to use PreparedStatement.getMetaData() to resolve schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()

2022-10-31 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626743#comment-17626743
 ] 

Apache Spark commented on SPARK-40802:
--

User 'Mingli-Rui' has created a pull request for this issue:
https://github.com/apache/spark/pull/38452

> Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve 
> schema instead of PreparedStatement.executeQuery()
> ---
>
> Key: SPARK-40802
> URL: https://issues.apache.org/jira/browse/SPARK-40802
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Mingli Rui
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to 
> resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM 
> $table_or_query WHERE 1=0".*
> But it is not necessary to execute the query. It's enough to *prepare* the 
> query. With preparing the statement, the query is parsed and compiled, but is 
> not executed. It will be more efficient.
> So, it's better to use PreparedStatement.getMetaData() to resolve schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()

2022-10-17 Thread Mingli Rui (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619124#comment-17619124
 ] 

Mingli Rui commented on SPARK-40802:


Hi, [~hyukjin.kwon], Could you please explain more about +We could probably 
introduce a dialect to optimize this further.+ Do you mean let's move 
{{JDBCRDD.getQueryOutputSchema}} to a function for JdbcDialect? So that every 
concrete Jdbc dialect class has a chance to resolve the schema by their own 
way? For example,
{code:java}
abstract class JdbcDialect extends Serializable with Logging {
  def getQueryOutputSchema(query: String, options: JDBCOptions): StructType
}

private object MsSqlServerDialect extends JdbcDialect {
   override def getQueryOutputSchema(query: String, options: JDBCOptions): 
StructType = {
  // The provider specific solution
   }
}{code}

> Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve 
> schema instead of PreparedStatement.executeQuery()
> ---
>
> Key: SPARK-40802
> URL: https://issues.apache.org/jira/browse/SPARK-40802
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Mingli Rui
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to 
> resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM 
> $table_or_query WHERE 1=0".*
> But it is not necessary to execute the query. It's enough to *prepare* the 
> query. With preparing the statement, the query is parsed and compiled, but is 
> not executed. It will be more efficient.
> So, it's better to use PreparedStatement.getMetaData() to resolve schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()

2022-10-17 Thread Mingli Rui (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619043#comment-17619043
 ] 

Mingli Rui commented on SPARK-40802:


[~hyukjin.kwon] Thanks for the comment!

Do we know any JDBC drivers don't support this 
*PreparedStatement.{{{}*getMetaData()*{}}}* ?

This interface function has been an API for *{{PreparedStatement}}* for pretty 
long time. I believe it should be supported by most JDBC drivers.

Below is its definition. It does indicate {{*SQLFeatureNotSupportedException* 
may be thrown. An alternative solution is that we try to use 
{{getMetaData() first, if {{*SQLFeatureNotSupportedException*}}{{ is 
thrown, fallback to use current implementation.}}

 
{code:java}
/**
* Retrieves a ResultSetMetaData object that contains
* information about the columns of the ResultSet object
* that will be returned when this PreparedStatement object
* is executed.
* 
* Because a PreparedStatement object is precompiled, it is
* possible to know about the ResultSet object that it will
* return without having to execute it. Consequently, it is possible
* to invoke the method getMetaData on a
* PreparedStatement object rather than waiting to execute
* it and then invoking the ResultSet.getMetaData method
* on the ResultSet object that is returned.
* 
* NOTE: Using this method may be expensive for some drivers due
* to the lack of underlying DBMS support.
*
* @return the description of a ResultSet object's columns or
* null if the driver cannot return a
* ResultSetMetaData object
* @exception SQLException if a database access error occurs or
* this method is called on a closed PreparedStatement
* @exception SQLFeatureNotSupportedException if the JDBC driver does not support
* this method
* @since 1.2
*/
ResultSetMetaData getMetaData() throws SQLException;{code}
 

 

> Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve 
> schema instead of PreparedStatement.executeQuery()
> ---
>
> Key: SPARK-40802
> URL: https://issues.apache.org/jira/browse/SPARK-40802
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Mingli Rui
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to 
> resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM 
> $table_or_query WHERE 1=0".*
> But it is not necessary to execute the query. It's enough to *prepare* the 
> query. With preparing the statement, the query is parsed and compiled, but is 
> not executed. It will be more efficient.
> So, it's better to use PreparedStatement.getMetaData() to resolve schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()

2022-10-16 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618389#comment-17618389
 ] 

Hyukjin Kwon commented on SPARK-40802:
--

I guess the problem is that {{getMetaData}} doesn't gurantee to work in all 
cases or all DBMSes. We could probably introduce a dialect to optimize this 
further.

> Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve 
> schema instead of PreparedStatement.executeQuery()
> ---
>
> Key: SPARK-40802
> URL: https://issues.apache.org/jira/browse/SPARK-40802
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Mingli Rui
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to 
> resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM 
> $table_or_query WHERE 1=0".*
> But it is not necessary to execute the query. It's enough to *prepare* the 
> query. With preparing the statement, the query is parsed and compiled, but is 
> not executed. It will be more efficient.
> So, it's better to use PreparedStatement.getMetaData() to resolve schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org