dear student,

Check this article of  mine in Linkedin


Processing Change Data Capture with Spark Structured Streaming
<https://www.linkedin.com/pulse/processing-change-data-capture-spark-structured-talebzadeh-ph-d-/>


There is a link to GitHub
<https://github.com/michTalebzadeh/SparkStructuredStreaming>  as well.


This writes to the Google BigQuery table. You can write to Cassandra via
JDBC connection if i am correct.



HTH


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 21 Mar 2022 at 16:51, guillaume farcy <
guillaume.fa...@imt-atlantique.net> wrote:

> Hello,
>
> I am a student and I am currently doing a big data project.
> Here is my code:
> https://gist.github.com/Balykoo/262d94a7073d5a7e16dfb0d0a576b9c3
>
> My project is to retrieve messages from a twitch chat and send them into
> kafka then spark reads the kafka topic to perform the processing in the
> provided gist.
>
> I will want to send these messages into cassandra.
>
> I tested a first solution on line 72 which works but when there are too
> many messages spark crashes. Probably due to the fact that my function
> connects to cassandra each time it is called.
>
> I tried the object approach to mutualize the connection object but
> without success:
> _pickle.PicklingError: Could not serialize object: TypeError: cannot
> pickle '_thread.RLock' object
>
> Can you please tell me how to do this?
> Or at least give me some advice?
>
> Sincerely,
> FARCY Guillaume.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to