Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-05 Thread Mich Talebzadeh
OK I found a workaround. Basically each stream state is not kept and I have two streams. One is a business topic and the other one created to shut down spark structured streaming gracefully. I was interested to print the value for the most recent batch Id for the business topic called "md" here

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
This might help https://docs.databricks.com/structured-streaming/foreach.html streamingDF.writeStream.foreachBatch(...) allows you to specify a function that is executed on the output data of every micro-batch of the streaming query. It takes two parameters: a DataFrame or Dataset that has the

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
I am aware of your point that global don't work in a distributed environment. With regard to your other point, these are two different topics with their own streams. The point of second stream is to set the status to false, so it can gracefully shutdown the main stream (the one called "md") here

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Sean Owen
I don't quite get it - aren't you applying to the same stream, and batches? worst case why not apply these as one function? Otherwise, how do you mean to associate one call to another? globals don't help here. They aren't global beyond the driver, and, which one would be which batch? On Sat, Mar

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
Thanks. they are different batchIds >From sendToControl, newtopic batchId is 76 >From sendToSink, md, batchId is 563 As a matter of interest, why does a global variable not work? view my Linkedin profile

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Sean Owen
It's the same batch ID already, no? Or why not simply put the logic of both in one function? or write one function that calls both? On Sat, Mar 4, 2023 at 2:07 PM Mich Talebzadeh wrote: > > This is probably pretty straight forward but somehow is does not look > that way > > > > On Spark