Re: Processing Multiple Streams in a Single Job

Artemis User Fri, 27 Aug 2021 07:12:02 -0700

Thanks Mich. I understand now how to deal multiple streams in a singlejob, but the responses I got before were very abstract and confusing. So I had to go back to the Spark doc and figure out the details. Thisis what I found out:


1. The standard and recommended way to do multi-stream processing in
   Structured Streaming (not DStream) is to use the join operations. 
   No need to use collections and mapping functions (I guess these
   could be for DStream).  Once the you have created the
   combined/joined DF, you use that DF's stream writer to process each
   microbatch or event data before dumping results to the output sink
   (the official Spark doc on this isn't very clear and the coding
   examples were not complete,
   
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#join-operations)
2. The semantics on multi-stream processing is really mind-boggling. 
   You have to clearly define inner and outer join along with
   watermarks conditions.  We are still in the process of detailing our
   use cases since we need to process three streams simultaneously in
   near real-time, and we don't want to have any blocking situations
   (i.e. one stream source doesn't produce any data for a considerable
   time period).  If you or anyone have any suggestions, I'd appreciate
   your comments..


Thanks!  -- ND


On 8/26/21 2:38 AM, Mich Talebzadeh wrote:

Hi ND,
Within the same Spark job you can handle two topicssimultaneously SSS. Is that what you are implying?
HTH
**view my Linkedin profile<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
*Disclaimer:* Use it at your own risk.Any and all responsibility forany loss, damage or destruction of data or any other property whichmay arise from relying on this email's technical content is explicitlydisclaimed. The author will in no case be liable for any monetarydamages arising from such loss, damage or destruction.
On Tue, 24 Aug 2021 at 23:37, Artemis User <arte...@dtechspace.com<mailto:arte...@dtechspace.com>> wrote:
    Is there a way to run multiple streams in a single Spark job using
    Structured Streaming?  If not, is there an easy way to perform
    inter-job
    communications (e.g. referencing a dataframe among concurrent
    jobs) in
    Spark?  Thanks a lot in advance!

    -- ND

    ---------------------------------------------------------------------
    To unsubscribe e-mail: user-unsubscr...@spark.apache.org
    <mailto:user-unsubscr...@spark.apache.org>

Re: Processing Multiple Streams in a Single Job

Reply via email to