Re: Spark Context Shutodown

2022-11-09 Thread Shrikant Prasad
I have gone through debug logs of jobs. There are no failures or exceptions
in logs.
This issue does not seem to be specific to jobs as several of our jobs have
been impacted by this issue and these same jobs pass also on retry.

I am trying to figure out why the driver pod is getting deleted when this
issue occurs. Even if there was some error, driver pod should remain there
in Error state.

What could be the potential reasons for driver pod deletion so that we can
investage in that direction?

Regards,
Shrikant

On Sat, 29 Oct 2022 at 1:14 PM, Dongjoon Hyun 
wrote:

> Maybe enabling DEBUG level log in your job and follow the processing logic
> until the failure?
>
> BTW, you need to look at what happens during job processing.
>
> `Spark Context was shutdown` is not the root cause, but the result of job
> failure in most cases.
>
> Dongjoon.
>
> On Fri, Oct 28, 2022 at 12:10 AM Shrikant Prasad 
> wrote:
>
>> Thanks Dongjoon for replying. I have tried with Spark 3.2 and still
>> facing the same issue.
>>
>> Looking for some pointers which can help in debugging to find the
>> root cause.
>>
>> Regards,
>> Shrikant
>>
>> On Thu, 27 Oct 2022 at 10:36 PM, Dongjoon Hyun 
>> wrote:
>>
>>> Hi, Shrikant.
>>>
>>> It seems that you are using non-GA features.
>>>
>>> FYI, since Apache Spark 3.1.1, Kubernetes Support became GA in the
>>> community.
>>>
>>> https://spark.apache.org/releases/spark-release-3-1-1.html
>>>
>>> In addition, Apache Spark 3.1 reached EOL last month.
>>>
>>> Could you try the latest distribution like Apache Spark 3.3.1 to see
>>> that you are still experiencing the same issue?
>>>
>>> It will reduce the scope of your issues by excluding many known and
>>> fixed bugs at 3.0/3.1/3.2/3.3.0.
>>>
>>> Thanks,
>>> Dongjoon.
>>>
>>>
>>> On Wed, Oct 26, 2022 at 11:16 PM Shrikant Prasad 
>>> wrote:
>>>
>>>> Hi Everyone,
>>>>
>>>> We are using Spark 3.0.1 with Kubernetes resource manager. Facing an
>>>> intermittent issue in which the driver pod gets deleted and the driver logs
>>>> have this message that Spark Context was shutdown.
>>>>
>>>> The same job works fine with given set of configurations most of the
>>>> time but sometimes it fails. It mostly occurs while reading or writing
>>>> parquet files to hdfs. (but not sure if it's the only usecase affected)
>>>>
>>>> Any pointers to find the root cause?
>>>>
>>>> Most of the earlier reported issues mention executors getting OOM as
>>>> the cause. But we have not seen an OOM error in any of executors. Also, why
>>>> the context will be shutdown in this case instead of retrying with new
>>>> executors.
>>>> Another doubt is why the driver pod gets deleted. Shouldn't it just
>>>> error out?
>>>>
>>>> Regards,
>>>> Shrikant
>>>>
>>>> --
>>>> Regards,
>>>> Shrikant Prasad
>>>>
>>> --
>> Regards,
>> Shrikant Prasad
>>
> --
Regards,
Shrikant Prasad


Re: Spark Context Shutodown

2022-10-28 Thread Shrikant Prasad
Thanks Dongjoon for replying. I have tried with Spark 3.2 and still facing
the same issue.

Looking for some pointers which can help in debugging to find the
root cause.

Regards,
Shrikant

On Thu, 27 Oct 2022 at 10:36 PM, Dongjoon Hyun 
wrote:

> Hi, Shrikant.
>
> It seems that you are using non-GA features.
>
> FYI, since Apache Spark 3.1.1, Kubernetes Support became GA in the
> community.
>
> https://spark.apache.org/releases/spark-release-3-1-1.html
>
> In addition, Apache Spark 3.1 reached EOL last month.
>
> Could you try the latest distribution like Apache Spark 3.3.1 to see that
> you are still experiencing the same issue?
>
> It will reduce the scope of your issues by excluding many known and fixed
> bugs at 3.0/3.1/3.2/3.3.0.
>
> Thanks,
> Dongjoon.
>
>
> On Wed, Oct 26, 2022 at 11:16 PM Shrikant Prasad 
> wrote:
>
>> Hi Everyone,
>>
>> We are using Spark 3.0.1 with Kubernetes resource manager. Facing an
>> intermittent issue in which the driver pod gets deleted and the driver logs
>> have this message that Spark Context was shutdown.
>>
>> The same job works fine with given set of configurations most of the time
>> but sometimes it fails. It mostly occurs while reading or writing parquet
>> files to hdfs. (but not sure if it's the only usecase affected)
>>
>> Any pointers to find the root cause?
>>
>> Most of the earlier reported issues mention executors getting OOM as the
>> cause. But we have not seen an OOM error in any of executors. Also, why the
>> context will be shutdown in this case instead of retrying with new
>> executors.
>> Another doubt is why the driver pod gets deleted. Shouldn't it just error
>> out?
>>
>> Regards,
>> Shrikant
>>
>> --
>> Regards,
>> Shrikant Prasad
>>
> --
Regards,
Shrikant Prasad


Spark Context Shutodown

2022-10-26 Thread Shrikant Prasad
Hi Everyone,

We are using Spark 3.0.1 with Kubernetes resource manager. Facing an
intermittent issue in which the driver pod gets deleted and the driver logs
have this message that Spark Context was shutdown.

The same job works fine with given set of configurations most of the time
but sometimes it fails. It mostly occurs while reading or writing parquet
files to hdfs. (but not sure if it's the only usecase affected)

Any pointers to find the root cause?

Most of the earlier reported issues mention executors getting OOM as the
cause. But we have not seen an OOM error in any of executors. Also, why the
context will be shutdown in this case instead of retrying with new
executors.
Another doubt is why the driver pod gets deleted. Shouldn't it just error
out?

Regards,
Shrikant

-- 
Regards,
Shrikant Prasad


Re: How to set platform-level defaults for array-like configs?

2022-08-18 Thread Shrikant Prasad
Hi Mridul,

If you are using Spark on Kubernetes, you can make use of admission
controller to validate or mutate the confs set in the spark defaults
configmap. But this approach will work only for cluster deploy mode and not
for client.

Regards,
Shrikant

On Fri, 12 Aug 2022 at 12:26 AM, Tom Graves 
wrote:

> A few years ago when I was doing more deployment management I kicked
> around the idea of having different types of configs or different ways to
> specify the configs.  Though one of the problems at the time was actually
> with users specifying a properties file and not picking up the
> spark-defaults.conf.So I was thinking about creating like a
> spark-admin.conf or something to that nature.
>
>  I think there is benefit in it, it just comes down to how to implement it
> best.  The other thing I don't think I saw addressed was the the ability
> prevent user from overriding configs.  If you just do the defaults I
> presume users could still override it.  That gets a bit trickier especially
> if they can override the entire spark-defaults.conf file.
>
>
> Tom
> On Thursday, August 11, 2022, 12:16:10 PM CDT, Mridul Muralidharan <
> mri...@gmail.com> wrote:
>
>
>
> Hi,
>
>   Wenchen, would be great if you could chime in with your thoughts - given
> the feedback you originally had on the PR.
> It would be great to hear feedback from others on this, particularly folks
> managing spark deployments - how this is mitigated/avoided in your
> case, any other pain points with configs in this context.
>
>
> Regards,
> Mridul
>
> On Wed, Jul 27, 2022 at 12:28 PM Erik Krogen  wrote:
>
> I find there's substantial value in being able to set defaults, and I
> think we can see that the community finds value in it as well, given the
> handful of "default"-like configs that exist today as mentioned in
> Shardul's email. The mismatch of conventions used today (suffix with
> ".defaultList", change "extra" to "default", ...) is confusing and
> inconsistent, plus requires one-off additions for each config.
>
> My proposal here would be:
>
>- Define a clear convention, e.g. a suffix of ".default" that enables
>a default to be set and merged
>- Document this convention in configuration.md so that we can avoid
>separately documenting each default-config, and instead just add a note in
>the docs for the normal config.
>- Adjust the withPrepended method
>
> <https://github.com/apache/spark/blob/c7c51bcab5cb067d36bccf789e0e4ad7f37ffb7c/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala#L219>
>added in #24804 <https://github.com/apache/spark/pull/24804> to
>leverage this convention instead of each usage instance re-defining the
>additional config name
>- Do a comprehensive review of applicable configs and enable them all
>to use the newly updated withPrepended method
>
> Wenchen, you expressed some concerns with adding more default configs in
> #34856 <https://github.com/apache/spark/pull/34856>, would this proposal
> address those concerns?
>
> Thanks,
> Erik
>
> On Wed, Jul 13, 2022 at 11:54 PM Shardul Mahadik <
> shardulsmaha...@gmail.com> wrote:
>
> Hi Spark devs,
>
> Spark contains a bunch of array-like configs (comma separated lists). Some
> examples include `spark.sql.extensions`,
> `spark.sql.queryExecutionListeners`, `spark.jars.repositories`,
> `spark.extraListeners`, `spark.driver.extraClassPath` and so on (there are
> a dozen or so more). As owners of the Spark platform in our organization,
> we would like to set platform-level defaults, e.g. custom SQL extension and
> listeners, and we use some of the above mentioned properties to do so. At
> the same time, we have power users writing their own listeners, setting the
> same Spark confs and thus unintentionally overriding our platform defaults.
> This leads to a loss of functionality within our platform.
>
> Previously, Spark has introduced "default" confs for a few of these
> array-like configs, e.g. `spark.plugins.defaultList` for `spark.plugins`,
> `spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`.
> These properties are meant to only be set by cluster admins thus allowing
> separation between platform default and user configs. However, as discussed
> in https://github.com/apache/spark/pull/34856, these configs are still
> client-side and can still be overridden, while also not being a scalable
> solution as we cannot introduce 1 new "default" config for every array-like
> config.
>
> I wanted to know if others have experienced this issue and what systems
> have been implemented to tackle this. Are there any existing solutions for
> this; either client-side or server-side? (e.g. at job submission server).
> Even though we cannot easily enforce this at the client-side, the
> simplicity of a solution may make it more appealing.
>
> Thanks,
> Shardul
>
> --
Regards,
Shrikant Prasad