Need Unit test complete reference for Pyspark

2020-11-18 Thread Sachit Murarka
Hi Users, I have to write Unit Test cases for PySpark. I think pytest-spark and "spark testing base" are good test libraries. Can anyone please provide full reference for writing the test cases in Python using these? Kind Regards, Sachit Murarka

Spark SQL check timestamp with other table and update a column.

2020-11-18 Thread anbutech
Hi Team, i want to update a col3 in table 1 if col1 from table2 is less than col1 in table1 and update each record in table 1.I 'am not getting the correct output. Table 1: col1|col2|col3 2020-11-17T20:50:57.777+|1|null Table 2: col1|col2|col3 2020-11-17T21:19:06.508+|1|win

Re: Cannot perform operation after producer has been closed

2020-11-18 Thread Eric Beabes
I must say.. *Spark has let me down in this case*. I am surprised an important issue like this hasn't been fixed in Spark 2.4. I am fighting a battle of 'Spark Structured Streaming' Vs 'Flink' at work & now because Spark 2.4 can't handle this *I've been asked to rewrite the code in Flink*.

Spark Exception

2020-11-18 Thread Amit Sharma
Hi, we are running a spark streaming job and sometimes it throws below two exceptions . I am not understanding what is the difference between these two exception for one timeout is 120 seconds and another is 600 seconds. What could be the reason for these Error running job streaming job