[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8676 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140608496 It has been merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140607740 @rxin I reverted the patch that caused those. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140607734 I've merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140607639 @vanzin do you know what's going on with the tests? [error] Execution of test test.org.apache.spark.sql.JavaApplySchemaSuite failed: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.ExtendedYarnTest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140578273 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140578275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42504/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140578152 [Test build #42504 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42504/console) for PR 8676 at commit [`ee7b842`](https://github.com/apache/spark/commit/ee7b8426c0cdadeecb2f0f07d4f62024daefed19). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140545384 [Test build #42504 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42504/consoleFull) for PR 8676 at commit [`ee7b842`](https://github.com/apache/spark/commit/ee7b8426c0cdadeecb2f0f07d4f62024daefed19). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140541508 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140541446 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140539473 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140526528 [Test build #1760 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1760/console) for PR 8676 at commit [`ee7b842`](https://github.com/apache/spark/commit/ee7b8426c0cdadeecb2f0f07d4f62024daefed19). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140520458 [Test build #1760 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1760/consoleFull) for PR 8676 at commit [`ee7b842`](https://github.com/apache/spark/commit/ee7b8426c0cdadeecb2f0f07d4f62024daefed19). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140520174 (Oops spoke too soon - I will merge after tests pass) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140520067 Thanks. Merging this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user sureshthalamati commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-140519265 @rxin Thank you for reviewing the patch . Just to make sure tested with out the next() call on MySql, Postgres, and DB2, it worked fine. Updated the pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user sureshthalamati commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-139611781 Typo in my previous comment, I meant when query is where 1=0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user sureshthalamati commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-139611398 next() will return false because resultset will be empty when query is where 1!=0. executeQuery() will throw an exception if table is not found. next() call is not really required to find if the table exists or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/8676#discussion_r39230872 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -42,10 +42,13 @@ object JdbcUtils extends Logging { /** * Returns true if the table already exists in the JDBC database. */ - def tableExists(conn: Connection, table: String): Boolean = { + def tableExists(conn: Connection, url: String, table: String): Boolean = { +val dialect = JdbcDialects.get(url) + // Somewhat hacky, but there isn't a good way to identify whether a table exists for all -// SQL database systems, considering "table" could also include the database name. -Try(conn.prepareStatement(s"SELECT 1 FROM $table LIMIT 1").executeQuery().next()).isSuccess +// SQL database systems using JDBC meta data calls, considering "table" could also include +// the database name. Query used to find table exists can be overriden by the dialects. + Try(conn.prepareStatement(dialect.getTableExistsQuery(table)).executeQuery().next()).isSuccess --- End diff -- will next still return success if the query is where 1 = 0? there is no result isn't there? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/8676#discussion_r39230841 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -88,6 +88,17 @@ abstract class JdbcDialect { def quoteIdentifier(colName: String): String = { s$colName } + + /** + * Get the SQL query that should be used to find if the given table exists. Dialects can + * override this method to return a query that works best in a particular database. + * @param table The name of the table. + * @return The SQL query to use for checking the table. + */ + def getTableExistsQuery(table: String): String = { +s"SELECT * FROM $table WHERE 1=0" --- End diff -- actually never mind we cannot quote it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/8676#discussion_r39227358 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -88,6 +88,17 @@ abstract class JdbcDialect { def quoteIdentifier(colName: String): String = { s$colName } + + /** + * Get the SQL query that should be used to find if the given table exists. Dialects can + * override this method to return a query that works best in a particular database. + * @param table The name of the table. + * @return The SQL query to use for checking the table. + */ + def getTableExistsQuery(table: String): String = { +s"SELECT * FROM $table WHERE 1=0" --- End diff -- maybe we should quote the table here actually --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user sureshthalamati commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-139410536 @rxin Even if spark is running on jdk1.7, customers using older version of drivers will run into AbstractMethodError exception. I think adding requirement for customers to use new drivers that implement getSchema() function will be unnecessary. After implementing the current approach I got curious on how the jdbc read functionality finds the meta data and learned org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.resolveTable also uses s"SELECT * FROM $table WHERE 1=0" to get column information. Alternative approach is to add getMetadataQuery(table:string) to the JdbcDialect interface that helps to determine if table exists for write case , and column type information in the case of read instead of getTableExistsQuery() as implemented in the current pull request. It might be a milli second slower in the case of write call for dialects that specify âselect 1 from $table limit 1", instead of âselect * from $table limit 1â. Advantage is one method to the interface will address both the cases. Any comments ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-139097036 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8676#issuecomment-139097004 FWIW, we dropped JVM 1.6 support in Spark 1.5. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9078] [SQL] Allow jdbc dialects to over...
GitHub user sureshthalamati opened a pull request: https://github.com/apache/spark/pull/8676 [SPARK-9078] [SQL] Allow jdbc dialects to override the query used to check the table. Current implementation uses query with a LIMIT clause to find if table already exists. This syntax works only in some database systems. This patch changes the default query to the one that is likely to work on most databases, and adds a new method to the JdbcDialect abstract class to allow dialects to override the default query. I looked at using the JDBC meta data calls, it turns out there is no common way to find the current schema, catalog..etc. There is a new method Connection.getSchema() , but that is available only starting jdk1.7 , and existing jdbc drivers may not have implemented it. Other option was to use jdbc escape syntax clause for LIMIT, not sure on how well this supported in all the databases also. After looking at all the jdbc metadata options my conclusion was most common way is to use the simple select query with 'where 1 =0' , and allow dialects to customize as needed You can merge this pull request into a Git repository by running: $ git pull https://github.com/sureshthalamati/spark table_exists_spark-9078 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8676.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8676 commit d4787548cc0ec9408c36deaa64443554d19e7f5f Author: sureshthalamati Date: 2015-09-10T01:35:24Z Modifying query to check table exists to be more generic, and allow dialect implementations to specify the query. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org