[ 
https://issues.apache.org/jira/browse/SPARK-32221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-32221:
------------------------------------
    Description: 
This would avoid failures, in case the files are a bit large or a user places a 
binary file inside the SPARK_CONF_DIR.

Both of which are not supported at the moment.

The reason is, underlying etcd store does limit the size of each entry to only 
1.5 MiB.

[https://etcd.io/docs/v3.4.0/dev-guide/limit/] 

We can apply a straightforward approach of skipping files that cannot be 
accommodated within 1.5MiB limit (limit is configurable as per above link) and 
WARNING the user about the same.

For most use cases, this limit is more than sufficient, however a user may 
accidentally place a larger file and observe an unpredictable result or 
failures at run time.


  was:
This would avoid failures, in case the files are a bit large or a user places a 
binary file inside the SPARK_CONF_DIR.

Both of which are not supported at the moment.

The reason is, underlying etcd store does limit the size of each entry to only 
1 MiB( Recent versions of K8s have moved to using 3.4.x of etcd which allows 
for 1.5MiB limit). Once etcd is upgraded in all the popular k8s clusters, then 
we can hope to overcome this limitation. e.g. 
[https://etcd.io/docs/v3.4.0/dev-guide/limit/] version of etcd allows for 
higher limit on each entry.

Even if that does not happen, there are other ways to overcome this limitation, 
for example, we can have config files split across multiple configMaps. We need 
to discuss, and prioritise, this issue takes the straightforward approach of 
skipping files that cannot be accommodated within 1.5MiB limit and WARNING the 
user about the same.


> Avoid possible errors due to incorrect file size or type supplied in spark 
> conf.
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-32221
>                 URL: https://issues.apache.org/jira/browse/SPARK-32221
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Kubernetes
>    Affects Versions: 3.1.0
>            Reporter: Prashant Sharma
>            Priority: Major
>
> This would avoid failures, in case the files are a bit large or a user places 
> a binary file inside the SPARK_CONF_DIR.
> Both of which are not supported at the moment.
> The reason is, underlying etcd store does limit the size of each entry to 
> only 1.5 MiB.
> [https://etcd.io/docs/v3.4.0/dev-guide/limit/] 
> We can apply a straightforward approach of skipping files that cannot be 
> accommodated within 1.5MiB limit (limit is configurable as per above link) 
> and WARNING the user about the same.
> For most use cases, this limit is more than sufficient, however a user may 
> accidentally place a larger file and observe an unpredictable result or 
> failures at run time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to