Hi,

I am not sure if you were writing pseudo-code or real one but there were few issues in the sql.

I have reproduced you example in the Spark REPL and all worked as expected and result is the one you need

Please see below full code:

## *Spark 3.0.0*

>>> a = spark.read.csv("tab1", sep="|", header=True, schema="col1 TIMESTAMP, col2 INT, col3 STRING")
>>> a.printSchema
<bound method DataFrame.printSchema of DataFrame[col1: timestamp, col2: int, col3: string]>

>>> a.show()
+--------------------+----+----+
|                col1|col2|col3|
+--------------------+----+----+
|2020-11-17 20:50:...|   1|null|
+--------------------+----+----+

>>> b = spark.read.csv("tab2", sep="|", header=True, schema="col1 TIMESTAMP, col2 INT, col3 STRING")
>>> b.printSchema
<bound method DataFrame.printSchema of DataFrame[col1: timestamp, col2: int, col3: string]>

>>> b.show()
+--------------------+----+----+
|                col1|col2|col3|
+--------------------+----+----+
|2020-11-17 21:19:...|   1| win|
|2020-11-17 20:49:...|   1| win|
|2020-11-17 20:19:...|   1| Win|
+--------------------+----+----+

>>> a.createOrReplaceTempView("table1")
>>> b.createOrReplaceTempView("table2")

>>> res = spark.sql("""
... select a.col1,a.col2, b.col1, b.col2, coalesce(a.col3,b.col3) as col3
... from table1 a *join* table2 b
... on (a.col2=b.col2) and (*a.col1 < b.col1*)
... """)
>>> res.show()
+--------------------+----+--------------------+----+----+
|                col1|col2|                col1|col2|col3|
+--------------------+----+--------------------+----+----+
|2020-11-17T20:50:...|   1|2020-11-17T21:19:...|   1| win|
+--------------------+----+--------------------+----+----+

Regards,

Khalid

On 19/11/2020 05:13, anbutech wrote:
select a.col1,a.col2.coalesce(a.col3,b.col3) as col3
from table1 a left table2 b
on (a.col2=b.col2) and (b.col1 < b.col1)

Reply via email to