[ https://issues.apache.org/jira/browse/SPARK-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516990#comment-16516990 ]
Paul Staab edited comment on SPARK-21063 at 6/19/18 12:01 PM: -------------------------------------------------------------- I was able to find a workaround for this problem on Spark 2.1.0: 1. Create an Hive Dialect which uses the correct quotes for escaping the column names: {code:java} object HiveDialect extends JdbcDialect { override def canHandle(url: String): Boolean = url.startsWith("jdbc:hive2") override def quoteIdentifier(colName: String): String = s"`$colName`" } } {code} This is taken from [https://github.com/apache/spark/pull/19238] 2. Register it before making the call with spark.read.jdbc {code:java} JdbcDialects.registerDialect(HiveDialect) {code} 3. Execute spark.read.jdbc with fetchsize option {code:java} spark.read.jdbc("jdbc:hive2://localhost:10000/default","test1", properties={"driver": "org.apache.hive.jdbc.HiveDriver", "fetchsize": "10"}).show() {code} It only works when both registering the dialect and using fetchsize. There was a merge request for adding the dialect to spark [https://github.com/apache/spark/pull/19238] but unfortunately it was not merged. was (Author: paulstaab): I was able to find a workaround for this problem on Spark 2.1.0: 1. Create an Hive Dialect which uses the correct quotes for escaping the column names: {code:java} object HiveDialect extends JdbcDialect { override def canHandle(url: String): Boolean = url.startsWith("jdbc:hive2") override def quoteIdentifier(colName: String): String = s"`$colName`" } } {code} This is taken from [https://github.com/apache/spark/pull/19238] 2. Register it before making the call with spark.read.jdbc {code:java} JdbcDialects.registerDialect(HiveDialect) {code} 3. Execute spark.read.jdbc with fetchsize option {code:java} spark.read.jdbc("jdbc:hive2://localhost:10000/default","test1", properties={"driver": "org.apache.hive.jdbc.HiveDriver", "fetchsize": "10"}).show() {code} It only works when registering the dialect and using fetchsize. There was a merge request for adding the dialect to spark [https://github.com/apache/spark/pull/19238] but unfortunately it was not merged. > Spark return an empty result from remote hadoop cluster > ------------------------------------------------------- > > Key: SPARK-21063 > URL: https://issues.apache.org/jira/browse/SPARK-21063 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 2.1.0, 2.1.1 > Reporter: Peter Bykov > Priority: Major > > Spark returning empty result from when querying remote hadoop cluster. > All firewall settings removed. > Querying using JDBC working properly using hive-jdbc driver from version 1.1.1 > Code snippet is: > {code:java} > val spark = SparkSession.builder > .appName("RemoteSparkTest") > .master("local") > .getOrCreate() > val df = spark.read > .option("url", "jdbc:hive2://remote.hive.local:10000/default") > .option("user", "user") > .option("password", "pass") > .option("dbtable", "test_table") > .option("driver", "org.apache.hive.jdbc.HiveDriver") > .format("jdbc") > .load() > > df.show() > {code} > Result: > {noformat} > +-------------------+ > |test_table.test_col| > +-------------------+ > +-------------------+ > {noformat} > All manipulations like: > {code:java} > df.select(*).show() > {code} > returns empty result too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org