Re: [Spark SQL] Structured Streaming in pyhton can connect to cassandra ?

Alex Ott Fri, 25 Mar 2022 06:22:49 -0700

You don't need to use foreachBatch to write to Cassandra. You just need to
use Spark Cassandra Connector version 2.5.0 or higher - it supports native
writing of stream data into Cassandra.


Here is an announcement: 
https://www.datastax.com/blog/advanced-apache-cassandra-analytics-now-open-all

guillaume farcy  at "Mon, 21 Mar 2022 16:33:51 +0100" wrote:
 gf> Hello,

 gf> I am a student and I am currently doing a big data project.
 gf> Here is my code:
 gf> https://gist.github.com/Balykoo/262d94a7073d5a7e16dfb0d0a576b9c3

 gf> My project is to retrieve messages from a twitch chat and send them into 
kafka then spark
 gf> reads the kafka topic to perform the processing in the provided gist.

 gf> I will want to send these messages into cassandra.

 gf> I tested a first solution on line 72 which works but when there are too 
many messages
 gf> spark crashes. Probably due to the fact that my function connects to 
cassandra each time
 gf> it is called.

 gf> I tried the object approach to mutualize the connection object but without 
success:
 gf> _pickle.PicklingError: Could not serialize object: TypeError: cannot pickle
 gf> '_thread.RLock' object

 gf> Can you please tell me how to do this?
 gf> Or at least give me some advice?

 gf> Sincerely,
 gf> FARCY Guillaume.



 gf> ---------------------------------------------------------------------
 gf> To unsubscribe e-mail: user-unsubscr...@spark.apache.org



-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [Spark SQL] Structured Streaming in pyhton can connect to cassandra ?

Reply via email to