Re: Inner join with the table itself

2018-01-16 Thread Michael Shtelma
Hi Jacek, Thank you for the workaround. It is really working in this way: pos.as("p1").join(pos.as("p2")).filter($"p1.POSITION_ID0"===$"p2.POSITION_ID") I have checked, that in this way I get the same execution plan as for the join with renamed columns. Best, Michael On Mon, Jan 15, 2018 at

Re: Inner join with the table itself

2018-01-15 Thread Jacek Laskowski
Hi Michael, scala> spark.version res0: String = 2.4.0-SNAPSHOT scala> val r1 = spark.range(1) r1: org.apache.spark.sql.Dataset[Long] = [id: bigint] scala> r1.as("left").join(r1.as("right")).filter($"left.id" === $"right.id ").show +---+---+ | id| id| +---+---+ | 0| 0| +---+---+ Am I missing

Re: Inner join with the table itself

2018-01-15 Thread Michael Shtelma
Hi Jacek & Gengliang, let's take a look at the following query: val pos = spark.read.parquet(prefix + "POSITION.parquet") pos.createOrReplaceTempView("POSITION") spark.sql("SELECT POSITION.POSITION_ID FROM POSITION POSITION JOIN POSITION POSITION1 ON POSITION.POSITION_ID0 =

Re: Inner join with the table itself

2018-01-15 Thread Gengliang Wang
Hi Michael, You can use `Explain` to see how your query is optimized. https://docs.databricks.com/spark/latest/spark-sql/language-manual/explain.html I believe your query is an actual cross join, which is usually

Re: Inner join with the table itself

2018-01-15 Thread Jacek Laskowski
Hi Michael, -dev +user What's the query? How do you "fool spark"? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams