Hi,
Is there any way I can add/delete actions/jobs dynamically in a running
spark streaming job.
I will call an API and execute only the configured actions in the system.
Eg . In the first batch suppose there are 5 actions in the spark
application.
Now suppose some configuration is changed and
Hi,
I do not think that you are doing anything very particularly concerning
here.
There is a setting in SPARK which limits the number of records that we can
write out at a time you can try that. The other thing that you can try is
to ensure that the number of partitions are more (just like you
Hi Anil,
I was trying to work out things for a while yesterday, but may need your
kind help.
Can you please share the code for the following steps?
-
Create DF from hive (from step #c)
- Deduplicate spark DF by primary key
- Write DF to s3 in parquet format
- Write metadata to s3
Regards,
Gourav
Unsubscribe
On Thu, Mar 3, 2022, 05:45 Basavaraj wrote:
> unsubscribe
Answers in the context. Thanks.
From: Gourav Sengupta
Date: Thursday, March 3, 2022 at 12:13 AM
To: Anil Dasari
Cc: Yang,Jie(INF) , user@spark.apache.org
Subject: Re: {EXT} Re: Spark Parquet write OOM
Hi Anil,
I was trying to work out things for a while yesterday, but may need your kind
Hi Gourav,
Tried increasing shuffle partitions number and higher executor memory. Both
didn’t work.
Regards
From: Gourav Sengupta
Date: Thursday, March 3, 2022 at 2:24 AM
To: Anil Dasari
Cc: Yang,Jie(INF) , user@spark.apache.org
Subject: Re: {EXT} Re: Spark Parquet write OOM
Hi,
I do not
In short, I don't think there is such a possibility. However, there is the
option of shutting down spark gracefully with checkpoint directory enabled.
In such a way you can re-submit the modified code which will pick up
BatchID from where it was left off, assuming the topic is the same. See the
unsubscribe