[jira] [Commented] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()
[ https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626742#comment-17626742 ] Apache Spark commented on SPARK-40802: -- User 'Mingli-Rui' has created a pull request for this issue: https://github.com/apache/spark/pull/38452 > Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve > schema instead of PreparedStatement.executeQuery() > --- > > Key: SPARK-40802 > URL: https://issues.apache.org/jira/browse/SPARK-40802 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Mingli Rui >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to > resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM > $table_or_query WHERE 1=0".* > But it is not necessary to execute the query. It's enough to *prepare* the > query. With preparing the statement, the query is parsed and compiled, but is > not executed. It will be more efficient. > So, it's better to use PreparedStatement.getMetaData() to resolve schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()
[ https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626743#comment-17626743 ] Apache Spark commented on SPARK-40802: -- User 'Mingli-Rui' has created a pull request for this issue: https://github.com/apache/spark/pull/38452 > Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve > schema instead of PreparedStatement.executeQuery() > --- > > Key: SPARK-40802 > URL: https://issues.apache.org/jira/browse/SPARK-40802 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Mingli Rui >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to > resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM > $table_or_query WHERE 1=0".* > But it is not necessary to execute the query. It's enough to *prepare* the > query. With preparing the statement, the query is parsed and compiled, but is > not executed. It will be more efficient. > So, it's better to use PreparedStatement.getMetaData() to resolve schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()
[ https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619124#comment-17619124 ] Mingli Rui commented on SPARK-40802: Hi, [~hyukjin.kwon], Could you please explain more about +We could probably introduce a dialect to optimize this further.+ Do you mean let's move {{JDBCRDD.getQueryOutputSchema}} to a function for JdbcDialect? So that every concrete Jdbc dialect class has a chance to resolve the schema by their own way? For example, {code:java} abstract class JdbcDialect extends Serializable with Logging { def getQueryOutputSchema(query: String, options: JDBCOptions): StructType } private object MsSqlServerDialect extends JdbcDialect { override def getQueryOutputSchema(query: String, options: JDBCOptions): StructType = { // The provider specific solution } }{code} > Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve > schema instead of PreparedStatement.executeQuery() > --- > > Key: SPARK-40802 > URL: https://issues.apache.org/jira/browse/SPARK-40802 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Mingli Rui >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to > resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM > $table_or_query WHERE 1=0".* > But it is not necessary to execute the query. It's enough to *prepare* the > query. With preparing the statement, the query is parsed and compiled, but is > not executed. It will be more efficient. > So, it's better to use PreparedStatement.getMetaData() to resolve schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()
[ https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619043#comment-17619043 ] Mingli Rui commented on SPARK-40802: [~hyukjin.kwon] Thanks for the comment! Do we know any JDBC drivers don't support this *PreparedStatement.{{{}*getMetaData()*{}}}* ? This interface function has been an API for *{{PreparedStatement}}* for pretty long time. I believe it should be supported by most JDBC drivers. Below is its definition. It does indicate {{*SQLFeatureNotSupportedException* may be thrown. An alternative solution is that we try to use {{getMetaData() first, if {{*SQLFeatureNotSupportedException*}}{{ is thrown, fallback to use current implementation.}} {code:java} /** * Retrieves a ResultSetMetaData object that contains * information about the columns of the ResultSet object * that will be returned when this PreparedStatement object * is executed. * * Because a PreparedStatement object is precompiled, it is * possible to know about the ResultSet object that it will * return without having to execute it. Consequently, it is possible * to invoke the method getMetaData on a * PreparedStatement object rather than waiting to execute * it and then invoking the ResultSet.getMetaData method * on the ResultSet object that is returned. * * NOTE: Using this method may be expensive for some drivers due * to the lack of underlying DBMS support. * * @return the description of a ResultSet object's columns or * null if the driver cannot return a * ResultSetMetaData object * @exception SQLException if a database access error occurs or * this method is called on a closed PreparedStatement * @exception SQLFeatureNotSupportedException if the JDBC driver does not support * this method * @since 1.2 */ ResultSetMetaData getMetaData() throws SQLException;{code} > Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve > schema instead of PreparedStatement.executeQuery() > --- > > Key: SPARK-40802 > URL: https://issues.apache.org/jira/browse/SPARK-40802 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Mingli Rui >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to > resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM > $table_or_query WHERE 1=0".* > But it is not necessary to execute the query. It's enough to *prepare* the > query. With preparing the statement, the query is parsed and compiled, but is > not executed. It will be more efficient. > So, it's better to use PreparedStatement.getMetaData() to resolve schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()
[ https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618389#comment-17618389 ] Hyukjin Kwon commented on SPARK-40802: -- I guess the problem is that {{getMetaData}} doesn't gurantee to work in all cases or all DBMSes. We could probably introduce a dialect to optimize this further. > Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve > schema instead of PreparedStatement.executeQuery() > --- > > Key: SPARK-40802 > URL: https://issues.apache.org/jira/browse/SPARK-40802 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Mingli Rui >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to > resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM > $table_or_query WHERE 1=0".* > But it is not necessary to execute the query. It's enough to *prepare* the > query. With preparing the statement, the query is parsed and compiled, but is > not executed. It will be more efficient. > So, it's better to use PreparedStatement.getMetaData() to resolve schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org