[ https://issues.apache.org/jira/browse/SPARK-21216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063662#comment-16063662 ]
Apache Spark commented on SPARK-21216: -------------------------------------- User 'brkyvz' has created a pull request for this issue: https://github.com/apache/spark/pull/18426 > Streaming DataFrames fail to join with Hive tables > -------------------------------------------------- > > Key: SPARK-21216 > URL: https://issues.apache.org/jira/browse/SPARK-21216 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.1.1 > Reporter: Burak Yavuz > Assignee: Burak Yavuz > > The following code will throw a cryptic exception: > {code} > import org.apache.spark.sql.execution.streaming.MemoryStream > import testImplicits._ > implicit val _sqlContext = spark.sqlContext > Seq((1, "one"), (2, "two"), (4, "four")).toDF("number", > "word").createOrReplaceTempView("t1") > // Make a table and ensure it will be broadcast. > sql("""CREATE TABLE smallTable(word string, number int) > |ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > |STORED AS TEXTFILE > """.stripMargin) > sql( > """INSERT INTO smallTable > |SELECT word, number from t1 > """.stripMargin) > val inputData = MemoryStream[Int] > val joined = inputData.toDS().toDF() > .join(spark.table("smallTable"), $"value" === $"number") > val sq = joined.writeStream > .format("memory") > .queryName("t2") > .start() > try { > inputData.addData(1, 2) > sq.processAllAvailable() > } finally { > sq.stop() > } > {code} > If someone creates a HiveSession, the planner in `IncrementalExecution` > doesn't take into account the Hive scan strategies -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org