Re: Processing Multiple Streams in a Single Job

Sean Owen Wed, 25 Aug 2021 08:01:23 -0700

This part isn't Spark specific, just a matter of running code in parallel
on the driver (that happens to start streaming jobs). In Scala it's things
like .par collections, in Python it's something like multiprocessing.


On Wed, Aug 25, 2021 at 8:48 AM Artemis User <arte...@dtechspace.com> wrote:

> Thanks Sean.  Excuse my ignorant, but I just can't figure out how to
> create a collection across multiple streams using multiple stream readers.
> Could you provide some examples or additional references?  Thanks!
>
> On 8/24/21 11:01 PM, Sean Owen wrote:
>
> No, that applies to the streaming DataFrame API too.
> No jobs can't communicate with each other.
>
> On Tue, Aug 24, 2021 at 9:51 PM Artemis User <arte...@dtechspace.com>
> wrote:
>
>> Thanks Daniel.  I guess you were suggesting using DStream/RDD.  Would it
>> be possible to use structured streaming/DataFrames for multi-source
>> streaming?  In addition, we really need each stream data ingestion to be
>> asynchronous or non-blocking...  thanks!
>>
>> On 8/24/21 9:27 PM, daniel williams wrote:
>>
>> Yeah. Build up the streams as a collection and map that query to the
>> start() invocation and map those results to awaitTermination() or whatever
>> other blocking mechanism you’d like to use.
>>
>> On Tue, Aug 24, 2021 at 4:37 PM Artemis User <arte...@dtechspace.com>
>> wrote:
>>
>>> Is there a way to run multiple streams in a single Spark job using
>>> Structured Streaming?  If not, is there an easy way to perform inter-job
>>> communications (e.g. referencing a dataframe among concurrent jobs) in
>>> Spark?  Thanks a lot in advance!
>>>
>>> -- ND
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>> --
>> -dan
>>
>>
>>
>

Re: Processing Multiple Streams in a Single Job

Reply via email to