Re: Unable to handle bignumeric datatype in spark/pyspark

2023-03-04 Thread Atheeth SH
Hi Rajnil, Sorry for the multiple emails. It seems you are getting the ModuleNotFoundError error was curious, have you tried using the below-mentioned solution mentioned in the readme file? Below is the link:- https://github.com/GoogleCloudDataproc/spark-bigquery-connector#bignumeric-support

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
This might help https://docs.databricks.com/structured-streaming/foreach.html streamingDF.writeStream.foreachBatch(...) allows you to specify a function that is executed on the output data of every micro-batch of the streaming query. It takes two parameters: a DataFrame or Dataset that has the

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
I am aware of your point that global don't work in a distributed environment. With regard to your other point, these are two different topics with their own streams. The point of second stream is to set the status to false, so it can gracefully shutdown the main stream (the one called "md") here

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Sean Owen
I don't quite get it - aren't you applying to the same stream, and batches? worst case why not apply these as one function? Otherwise, how do you mean to associate one call to another? globals don't help here. They aren't global beyond the driver, and, which one would be which batch? On Sat, Mar

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
Thanks. they are different batchIds >From sendToControl, newtopic batchId is 76 >From sendToSink, md, batchId is 563 As a matter of interest, why does a global variable not work? view my Linkedin profile

Re: How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Sean Owen
It's the same batch ID already, no? Or why not simply put the logic of both in one function? or write one function that calls both? On Sat, Mar 4, 2023 at 2:07 PM Mich Talebzadeh wrote: > > This is probably pretty straight forward but somehow is does not look > that way > > > > On Spark

How to pass variables across functions in spark structured streaming (PySpark)

2023-03-04 Thread Mich Talebzadeh
This is probably pretty straight forward but somehow is does not look that way On Spark Structured Streaming, "foreachBatch" performs custom write logic on each micro-batch through a call function. Example, foreachBatch(sendToSink) expects 2 parameters, first: micro-batch as DataFrame or

Re: SPIP architecture diagrams

2023-03-04 Thread Mich Talebzadeh
ok I decided to bite the bullet and use a Visio diagram for my SPIP "Shutting down spark structured streaming when the streaming process completed the current process". Details from here https://issues.apache.org/jira/browse/SPARK-42485 This is not meant to be complete. In this an indication. I