Re: Processing Multiple Streams in a Single Job

2021-08-24 Thread Gourav Sengupta
Hi,

can you please give more details around this? What is the requirement? What
is the SPARK version you are using? What do you mean by multiple sources?
What are these sources?



Regards,
Gourav Sengupta

On Wed, Aug 25, 2021 at 3:51 AM Artemis User  wrote:

> Thanks Daniel.  I guess you were suggesting using DStream/RDD.  Would it
> be possible to use structured streaming/DataFrames for multi-source
> streaming?  In addition, we really need each stream data ingestion to be
> asynchronous or non-blocking...  thanks!
>
> On 8/24/21 9:27 PM, daniel williams wrote:
>
> Yeah. Build up the streams as a collection and map that query to the
> start() invocation and map those results to awaitTermination() or whatever
> other blocking mechanism you’d like to use.
>
> On Tue, Aug 24, 2021 at 4:37 PM Artemis User 
> wrote:
>
>> Is there a way to run multiple streams in a single Spark job using
>> Structured Streaming?  If not, is there an easy way to perform inter-job
>> communications (e.g. referencing a dataframe among concurrent jobs) in
>> Spark?  Thanks a lot in advance!
>>
>> -- ND
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>> --
> -dan
>
>
>


Re: Processing Multiple Streams in a Single Job

2021-08-24 Thread Sean Owen
No, that applies to the streaming DataFrame API too.
No jobs can't communicate with each other.

On Tue, Aug 24, 2021 at 9:51 PM Artemis User  wrote:

> Thanks Daniel.  I guess you were suggesting using DStream/RDD.  Would it
> be possible to use structured streaming/DataFrames for multi-source
> streaming?  In addition, we really need each stream data ingestion to be
> asynchronous or non-blocking...  thanks!
>
> On 8/24/21 9:27 PM, daniel williams wrote:
>
> Yeah. Build up the streams as a collection and map that query to the
> start() invocation and map those results to awaitTermination() or whatever
> other blocking mechanism you’d like to use.
>
> On Tue, Aug 24, 2021 at 4:37 PM Artemis User 
> wrote:
>
>> Is there a way to run multiple streams in a single Spark job using
>> Structured Streaming?  If not, is there an easy way to perform inter-job
>> communications (e.g. referencing a dataframe among concurrent jobs) in
>> Spark?  Thanks a lot in advance!
>>
>> -- ND
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>> --
> -dan
>
>
>


Re: Processing Multiple Streams in a Single Job

2021-08-24 Thread Artemis User
Thanks Daniel.  I guess you were suggesting using DStream/RDD. Would it 
be possible to use structured streaming/DataFrames for multi-source 
streaming?  In addition, we really need each stream data ingestion to be 
asynchronous or non-blocking...  thanks!


On 8/24/21 9:27 PM, daniel williams wrote:
Yeah. Build up the streams as a collection and map that query to the 
start() invocation and map those results to awaitTermination() or 
whatever other blocking mechanism you’d like to use.


On Tue, Aug 24, 2021 at 4:37 PM Artemis User > wrote:


Is there a way to run multiple streams in a single Spark job using
Structured Streaming?  If not, is there an easy way to perform
inter-job
communications (e.g. referencing a dataframe among concurrent
jobs) in
Spark?  Thanks a lot in advance!

-- ND

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org


--
-dan




Processing Multiple Streams in a Single Job

2021-08-24 Thread Artemis User
Is there a way to run multiple streams in a single Spark job using 
Structured Streaming?  If not, is there an easy way to perform inter-job 
communications (e.g. referencing a dataframe among concurrent jobs) in 
Spark?  Thanks a lot in advance!


-- ND

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: AWS EMR SPARK 3.1.1 date issues

2021-08-24 Thread Gourav Sengupta
Hi,

I received a response from AWS, this is an issue with EMR, and they are
working on resolving the issue I believe.

Thanks and Regards,
Gourav Sengupta

On Mon, Aug 23, 2021 at 1:35 PM Gourav Sengupta <
gourav.sengupta.develo...@gmail.com> wrote:

> Hi,
>
> the query still gives the same error if we write "SELECT * FROM table_name
> WHERE data_partition > CURRENT_DATE() - INTERVAL 10 DAYS".
>
> Also the queries work fine in SPARK 3.0.x, or in EMR 6.2.0.
>
>
> Thanks and Regards,
> Gourav Sengupta
>
> On Mon, Aug 23, 2021 at 1:16 PM Sean Owen  wrote:
>
>> Date handling was tightened up in Spark 3. I think you need to compare to
>> a date literal, not a string literal.
>>
>> On Mon, Aug 23, 2021 at 5:12 AM Gourav Sengupta <
>> gourav.sengupta.develo...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> while I am running in EMR 6.3.0 (SPARK 3.1.1) a simple query as "SELECT
>>> * FROM  WHERE  > '2021-03-01'" the query
>>> is failing with error:
>>>
>>> ---
>>> pyspark.sql.utils.AnalysisException:
>>> org.apache.hadoop.hive.metastore.api.InvalidObjectException: Unsupported
>>> expression '2021 - 03 - 01' (Service: AWSGlue; Status Code: 400; Error
>>> Code: InvalidInputException; Request ID:
>>> dd3549c2-2eeb-4616-8dc5-5887ba43dd22; Proxy: null)
>>>
>>> ---
>>>
>>> The above query works fine in all previous versions of SPARK.
>>>
>>> Is this the expected behaviour in SPARK 3.1.1? If so can someone please
>>> let me know how to write this query.
>>>
>>> Also if this is the expected behaviour I think that a lot of users will
>>> have to make these changes in their existing code making transition to
>>> SPARK 3.1.1 expensive I think.
>>>
>>> Regards,
>>> Gourav Sengupta
>>>
>>