You don't need to use foreachBatch to write to Cassandra. You just need to use Spark Cassandra Connector version 2.5.0 or higher - it supports native writing of stream data into Cassandra.
Here is an announcement: https://www.datastax.com/blog/advanced-apache-cassandra-analytics-now-open-all guillaume farcy at "Mon, 21 Mar 2022 16:33:51 +0100" wrote: gf> Hello, gf> I am a student and I am currently doing a big data project. gf> Here is my code: gf> https://gist.github.com/Balykoo/262d94a7073d5a7e16dfb0d0a576b9c3 gf> My project is to retrieve messages from a twitch chat and send them into kafka then spark gf> reads the kafka topic to perform the processing in the provided gist. gf> I will want to send these messages into cassandra. gf> I tested a first solution on line 72 which works but when there are too many messages gf> spark crashes. Probably due to the fact that my function connects to cassandra each time gf> it is called. gf> I tried the object approach to mutualize the connection object but without success: gf> _pickle.PicklingError: Could not serialize object: TypeError: cannot pickle gf> '_thread.RLock' object gf> Can you please tell me how to do this? gf> Or at least give me some advice? gf> Sincerely, gf> FARCY Guillaume. gf> --------------------------------------------------------------------- gf> To unsubscribe e-mail: user-unsubscr...@spark.apache.org -- With best wishes, Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian) --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org