Re: Spark 2.3 Stream-Stream Join with left outer join lost left stream value

2021-02-27 Thread Jungtaek Lim
We figured out edge-case from stream-stream left/right outer join in Spark 2.x and fixed in Spark 3.0.0. Please refer SPARK-26154 for more details. The fix brought another regression which was fixed in 3.0.1, so you may want to move to Spark

Spark structured streaming Stuck on Batch = 0 on spark 3.1.1, Dataproc cluster

2021-02-27 Thread Mich Talebzadeh
Hi, I have a Pyspark program that uses *Spark 3.0.1* to read Kafka topic and write it to Google BigQuery. This works fine on Premise and loops over micro-batch of data. def fetch_data(self): self.sc.setLogLevel("ERROR")

Re: DropNa in Spark for Columns

2021-02-27 Thread Peyman Mohajerian
I don't have personal experience with Koalas but it does seem to have the same api: https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.dropna.html On Fri, Feb 26, 2021 at 11:46 PM Vitali Lupusor wrote: > Hello Chetan, > > I don’t know about Scala, but in PySpark

Spark 2.3 Stream-Stream Join with left outer join lost left stream value

2021-02-27 Thread Xu Yan
I'm trying to implement a stream-stream join toy with Spark 2.3.0 The stream joins work fine when the condition matches, but lost the left stream value when the condition mismatched even using leftOuterJoin. Thanks in advance Here are my source code and data, basically, I'm creating two