Hi,
I am not sure if you were writing pseudo-code or real one but there were
few issues in the sql.
I have reproduced you example in the Spark REPL and all worked as
expected and result is the one you need
Please see below full code:
## *Spark 3.0.0*
>>> a = spark.read.csv("tab1", sep="|", header=True, schema="col1
TIMESTAMP, col2 INT, col3 STRING")
>>> a.printSchema
int, col3: string]>
>>> a.show()
++++
| col1|col2|col3|
++++
|2020-11-17 20:50:...| 1|null|
++++
>>> b = spark.read.csv("tab2", sep="|", header=True, schema="col1
TIMESTAMP, col2 INT, col3 STRING")
>>> b.printSchema
int, col3: string]>
>>> b.show()
++++
| col1|col2|col3|
++++
|2020-11-17 21:19:...| 1| win|
|2020-11-17 20:49:...| 1| win|
|2020-11-17 20:19:...| 1| Win|
++++
>>> a.createOrReplaceTempView("table1")
>>> b.createOrReplaceTempView("table2")
>>> res = spark.sql("""
... select a.col1,a.col2, b.col1, b.col2, coalesce(a.col3,b.col3) as col3
... from table1 a *join* table2 b
... on (a.col2=b.col2) and (*a.col1 < b.col1*)
... """)
>>> res.show()
++++++
| col1|col2| col1|col2|col3|
++++++
|2020-11-17T20:50:...| 1|2020-11-17T21:19:...| 1| win|
++++++
Regards,
Khalid
On 19/11/2020 05:13, anbutech wrote:
select a.col1,a.col2.coalesce(a.col3,b.col3) as col3
from table1 a left table2 b
on (a.col2=b.col2) and (b.col1 < b.col1)