Re: Processing Multiple Streams in a Single Job

Sean Owen Fri, 27 Aug 2021 08:31:15 -0700

That is something else. Yes, you can create a single, complex stream job
that joins different data sources, etc. That is not different than any
other Spark usage. What are you looking for w.r.t. docs?


We are also saying you can simply run N unrelated streaming jobs in
parallel on the driver, which is a few lines of code, but, not specific to
Spark.

Which one you want depends on what you are trying to do, which I'm not
clear on.

On Fri, Aug 27, 2021 at 9:12 AM Artemis User <arte...@dtechspace.com> wrote:

> Thanks Mich.  I understand now how to deal multiple streams in a single
> job, but the responses I got before were very abstract and confusing.  So I
> had to go back to the Spark doc and figure out the details.  This is what I
> found out:
>
>    1. The standard and recommended way to do multi-stream processing in
>    Structured Streaming (not DStream) is to use the join operations.  No need
>    to use collections and mapping functions (I guess these could be for
>    DStream).  Once the you have created the combined/joined DF, you use that
>    DF's stream writer to process each microbatch or event data before dumping
>    results to the output sink (the official Spark doc on this isn't very clear
>    and the coding examples were not complete,
>    
> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#join-operations
>    )
>    2. The semantics on multi-stream processing is really mind-boggling.
>    You have to clearly define inner and outer join along with watermarks
>    conditions.  We are still in the process of detailing our use cases since
>    we need to process three streams simultaneously in near real-time, and we
>    don't want to have any blocking situations (i.e. one stream source doesn't
>    produce any data for a considerable time period).  If you or anyone have
>    any suggestions, I'd appreciate your comments..
>
> Thanks!  -- ND
>
> On 8/26/21 2:38 AM, Mich Talebzadeh wrote:
>
> Hi ND,
>
> Within the same Spark job you can handle two topics simultaneously SSS. Is
> that what you are implying?
>
> HTH
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 24 Aug 2021 at 23:37, Artemis User <arte...@dtechspace.com> wrote:
>
>> Is there a way to run multiple streams in a single Spark job using
>> Structured Streaming?  If not, is there an easy way to perform inter-job
>> communications (e.g. referencing a dataframe among concurrent jobs) in
>> Spark?  Thanks a lot in advance!
>>
>> -- ND
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>

Re: Processing Multiple Streams in a Single Job

Reply via email to